performance 9

1-03-13. Cheaper in Bulk

In the same way that mget allows us to retrieve multiple documents at once, the bulk API allows us to make multiple create, index, update, or delete requests in a single step. This is particularly useful if you need to index a data stream such as log events, which can be queued up and indexed in batches of hundreds or thousands.mget 이 다수의 document를 한번에 가져오는 것과 마찬가지로, bulk API는 다수의 create, index,..

2-4-6. Improving Performance

Phrase and proximity queries are more expensive than simple match queries. Whereas a matchquery just has to look up terms in the inverted index, a match_phrase query has to calculate and compare the positions of multiple possibly repeated terms.phrase와 proximity query는, 단순한 match query에 비해, 더 많은 비용이 든다. match query는 단어를 inverted index에서 찾는 반면에, match_phrase query는 가능한 한 여러 번, 반복해서 단어들의 위치를 계산하고 ..

2-4-7. Finding Associated

As useful as phrase and proximity queries can be, they still have a downside. They are overly strict: all terms must be present for a phrase query to match, even when using slop.phrase와 proximity query는 유용하지만, 단점이 있다. 지나치게 엄격하다. phrase query에 일치하기 위해, 심지어 slop 을 사용할 경우에도, 모든 단어가 반드시 존재해야 한다.The flexibility in word ordering that you gain with slop also comes at a price, because you lose the assoc..

3-4-2. Dictionary Stemmers

Dictionary stemmers work quite differently from algorithmic stemmers. Instead of applying a standard set of rules to each word, they simply look up the word in the dictionary. Theoretically, they could produce much better results than an algorithmic stemmer. A dictionary stemmer should be able to do the following: 사전 형태소 분석기(dictionary stemmers) 는 algorithmic stemmers와 전혀 다르게 동작한다. 각 단어에 규칙의 기준을..

3-4-4. Choosing a Stemmer

The documentation for the stemmer token filter lists multiple stemmers for some languages. For English we have the following:stemmer token filter에 대한 문서에서는, 특정 언어에 대한 여러 가지 형태소 분석기를 나열하고 있다. 예를 들어 영어를 보면,englishThe porter_stem token filter.light_englishThe kstem token filter.minimal_englishThe EnglishMinimalStemmer in Lucene, which removes plurals복수형을 제거하는 Lucene의 English Minimal StemmerlovinsTh..

3-5-1. Pros and Cons of Stopwords

We have more disk space, more RAM, and better compression algorithms than existed back in the day. Excluding the preceding 33 common words from the index will save only about 4MB per million documents. Using stopwords for the sake of reducing index size is no longer a valid reason. (However, there is one caveat to this statement, which we discuss in Stopwords and Phrase Queries.)우리는 과거에 존재했던 것보다..

3-5-3. Stopwords and Performance

The biggest disadvantage of keeping stopwords is that of performance. When Elasticsearch performs a full-text search, it has to calculate the relevance _score on all matching documents in order to return the top 10 matches.불용어를 유지하는 경우의 가장 큰 단점은 성능이다. Elasticsearch가 full-text 검색을 수행할 경우, 상위 10개의 document를 반환하기 위해, 일치하는 모든 document의 relevance _score 를 계산해야 한다.While most words typically occur in m..