'performance' 태그의 글 목록

performance 9

1-03-13. Cheaper in Bulk

In the same way that mget allows us to retrieve multiple documents at once, the bulk API allows us to make multiple create, index, update, or delete requests in a single step. This is particularly useful if you need to index a data stream such as log events, which can be queued up and indexed in batches of hundreds or thousands.mget 이 다수의 document를 한번에 가져오는 것과 마찬가지로, bulk API는 다수의 create, index,..

2.X/1. Getting Started 2017.10.01

2-4-6. Improving Performance

Phrase and proximity queries are more expensive than simple match queries. Whereas a matchquery just has to look up terms in the inverted index, a match_phrase query has to calculate and compare the positions of multiple possibly repeated terms.phrase와 proximity query는, 단순한 match query에 비해, 더 많은 비용이 든다. match query는 단어를 inverted index에서 찾는 반면에, match_phrase query는 가능한 한 여러 번, 반복해서 단어들의 위치를 계산하고 ..

2.X/2. Search in Depth 2017.09.24

2-4-7. Finding Associated

As useful as phrase and proximity queries can be, they still have a downside. They are overly strict: all terms must be present for a phrase query to match, even when using slop.phrase와 proximity query는 유용하지만, 단점이 있다. 지나치게 엄격하다. phrase query에 일치하기 위해, 심지어 slop 을 사용할 경우에도, 모든 단어가 반드시 존재해야 한다.The flexibility in word ordering that you gain with slop also comes at a price, because you lose the assoc..

2.X/2. Search in Depth 2017.09.24

3-4-2. Dictionary Stemmers

Dictionary stemmers work quite differently from algorithmic stemmers. Instead of applying a standard set of rules to each word, they simply look up the word in the dictionary. Theoretically, they could produce much better results than an algorithmic stemmer. A dictionary stemmer should be able to do the following: 사전 형태소 분석기(dictionary stemmers) 는 algorithmic stemmers와 전혀 다르게 동작한다. 각 단어에 규칙의 기준을..

2.X/3. Dealing with Human Language 2017.09.24

3-4-4. Choosing a Stemmer

The documentation for the stemmer token filter lists multiple stemmers for some languages. For English we have the following:stemmer token filter에 대한 문서에서는, 특정 언어에 대한 여러 가지 형태소 분석기를 나열하고 있다. 예를 들어 영어를 보면,englishThe porter_stem token filter.light_englishThe kstem token filter.minimal_englishThe EnglishMinimalStemmer in Lucene, which removes plurals복수형을 제거하는 Lucene의 English Minimal StemmerlovinsTh..

2.X/3. Dealing with Human Language 2017.09.24

3-5. Stopwords: Performance Versus Precision

Back in the early days of information retrieval, disk space and memory were limited to a tiny fraction of what we are accustomed to today. It was essential to make your index as small as possible. Every kilobyte saved meant a significant improvement in performance. Stemming (see Reducing Words to Their Root Form) was important, not just for making searches broader and increasing retrieval in the..

2.X/3. Dealing with Human Language 2017.09.24

3-5-1. Pros and Cons of Stopwords

We have more disk space, more RAM, and better compression algorithms than existed back in the day. Excluding the preceding 33 common words from the index will save only about 4MB per million documents. Using stopwords for the sake of reducing index size is no longer a valid reason. (However, there is one caveat to this statement, which we discuss in Stopwords and Phrase Queries.)우리는 과거에 존재했던 것보다..

2.X/3. Dealing with Human Language 2017.09.24

3-5-3. Stopwords and Performance

The biggest disadvantage of keeping stopwords is that of performance. When Elasticsearch performs a full-text search, it has to calculate the relevance _score on all matching documents in order to return the top 10 matches.불용어를 유지하는 경우의 가장 큰 단점은 성능이다. Elasticsearch가 full-text 검색을 수행할 경우, 상위 10개의 document를 반환하기 위해, 일치하는 모든 document의 relevance _score 를 계산해야 한다.While most words typically occur in m..

2.X/3. Dealing with Human Language 2017.09.24

7-3-3. Indexing Performance Tips

If you are in an indexing-heavy environment, such as indexing infrastructure logs, you may be willing to sacrifice some search performance for faster indexing rates. In these scenarios, searches tend to be relatively rare and performed by people internal to your organization. They are willing to wait several seconds for a search, as opposed to a consumer facing a search that must return in milli..

2.X/7. Administration Monitoring Deployment 2017.09.23

1

더보기

elasticsearch, definitive guide

score, json, Cluster, Mapping, Shard, Filter, parent, Relevance, Type, primary, phrase, inverted, Size, cache, full-text, MATCH, Query, index, Term, replica,

Today :
Yesterday :

티스토리툴바