stemming 8

2. Mixing exact search with stemming

When building a search application, stemming is often a must as it is desirable for a query on skiing to match documents that contain ski or skis. But what if a user wants to search for skiing specifically? The typical way to do this would be to use a multi-field in order to have the same content indexed in two different ways:검색 application을 만드는 경우, skiing 에 대한 query가 ski 나 skis 를 포함하는 document와..

1-10-05. Custom Analyzers

While Elasticsearch comes with a number of analyzers available out of the box, the real power comes from the ability to create your own custom analyzers by combining character filters, tokenizers, and token filters in a configuration that suits your particular data.Elasticsearch가 수많은 내장 analyzer를 제공하지만, 진정한 힘은 자신의 특별한 데이터에 적합한 설정에서, character filters, tokenizers 그리고 token filters를 조합하여, 자신만의 사용자..

3-1. Getting Started with Languages

Elasticsearch ships with a collection of language analyzers that provide good, basic, out-of-the-box support for many of the world’s most common languages:Elasticsearch는 세상의 대부분의 공용 언어에 대해, 적절하고 기본적인, 즉시 사용 가능한 language analyzer collection(언어 분석기 모음)을 가지고 있다.Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, ..

3-4. Reducing Words to Their Root Form

Most languages of the world are inflected, meaning that words can change their form to express differences in the following:세상의 모든 언어는 어형이 굴절(inflect) 된다. 즉, 단어의 차이점을 표현하기 위해, 단어의 형태를 변경할 수 있다.Number: fox, foxesTense: pay, paid, payingGender: waiter, waitressPerson: hear, hearsCase: I, me, myAspect: ate, eatenMood: so be it, were it soWhile inflection aids expressivity, it interferes with retrie..

3-4-4. Choosing a Stemmer

The documentation for the stemmer token filter lists multiple stemmers for some languages. For English we have the following:stemmer token filter에 대한 문서에서는, 특정 언어에 대한 여러 가지 형태소 분석기를 나열하고 있다. 예를 들어 영어를 보면,englishThe porter_stem token filter.light_englishThe kstem token filter.minimal_englishThe EnglishMinimalStemmer in Lucene, which removes plurals복수형을 제거하는 Lucene의 English Minimal StemmerlovinsTh..

3-4-6. Stemming in situ

For the sake of completeness, we will finish this chapter by explaining how to index stemmed words into the same field as unstemmed words. As an example, analyzing the sentence The quick foxes jumped would produce the following terms:완벽을 기하기 위하여, 형태소 분석을 하지 않은 단어와 형태소 분석을 한 단어를, 동일한 field에 색인하는 방법을 설명하면서, 이 장를 마무리하겠다. 예를 들어, The quick foxes jumped 라는 문장을 분석하면, 아래와 같은 단어를 얻을 수 있다.Pos 1: (the) Pos..

3-6. Synonyms

While stemming helps to broaden the scope of search by simplifying inflected words to their root form, synonyms broaden the scope by relating concepts and ideas. Perhaps no documents match a query for "English queen", but documents that contain "British monarch" would probably be considered a good match.형태소 분석은 굴절된 단어를 원형으로 단순화하여, 검색의 범위를 확장하는데 도움이 되는 반면에, 동의어는, 개념과 뜻을 관련시켜, 범위를 확대한다. "English q..