1-10-04. Configuring Analyzers

2.X/1. Getting Started

1-10-04. Configuring Analyzers

drscg 2017. 9. 30. 17:23

The third important index setting is the analysis section, which is used to configure existing analyzers or to create new custom analyzers specific to your index.

세 번째로 중요한 index 설정은 analysis 부분이다. 이것은 기존의 analyzer를 설정하거나, index에 지정된 새로운 사용자 정의 analyzer를 생성하는데 사용된다.

In Analysis and Analyzers, we introduced some of the built-in analyzers, which are used to convert full-text strings into an inverted index, suitable for searching.

Analysis and Analyzers에서, full-text 문자열을, 검색에 적합하도록, inverted index로 변환하는데 사용되는, 내장 analyzer의 일부를 소개한 바 있다.

The standard analyzer, which is the default analyzer used for full-text fields, is a good choice for most Western languages. It consists of the following:

full-text field에 사용되는 기본 analyzer인, standard analyzer는 대부분의 서양 언어에게는 좋은 선택이다. 아래처럼 구성되어 있다.

The standard tokenizer, which splits the input text on word boundaries
standard tokenizer는 입력 문자열을 단어 경계를 기준으로 분할한다.
The standard token filter, which is intended to tidy up the tokens emitted by the tokenizer (but currently does nothing)
standard token filter는 tokenizer에서 나온 token을 정리한다(현재는 아무런 동작도 하지 않는다.)
The lowercase token filter, which converts all tokens into lowercase
lowercase token filter는 모든 token을 소문자로 변경한다.
The stop token filter, which removes stopwords—common words that have little impact on search relevance, such as a, the, and, is.
stop token filter 는 불용어(stopwords: a, the, and, is 등의, 검색 relevance에 거의 영향을 미치지 않는 일반적인 단어)를 제거한다.

By default, the stopwords filter is disabled. You can enable it by creating a custom analyzer based on the standard analyzer and setting the stopwords parameter. Either provide a list of stopwords or tell it to use a predefined stopwords list from a particular language.

기본적으로, stopwords filter는 비활성화되어 있다. standard analyzer를 기반으로 사용자 정의 analyzer를 생성하고, stopwords 매개변수를 설정해서, 그것을 활성화할 수 있다. 불용어 목록을 제공하거나, 특정 언어에 대해 미리 정의된 불용어 목록을 사용한다.

In the following example, we create a new analyzer called the es_std analyzer, which uses the predefined list of Spanish stopwords:

다음 예는, es_std analyzer라는 새로운 analyzer를 생성하고, 미리 정의된 Spanish 불용어 목록을 사용한다.

PUT /spanish_docs
{
    "settings": {
        "analysis": {
            "analyzer": {
                "es_std": {
                    "type":      "standard",
                    "stopwords": "_spanish_"
                }
            }
        }
    }
}

COPY AS CURL VIEW IN SENSE

The es_std analyzer is not global—it exists only in the spanish_docs index where we have defined it. To test it with the analyze API, we must specify the index name:

es_std analyzer는 범용적인 것이 아니다. 이것을 정의한, spanish_docs index에만 존재한다. analyzerAPI를 테스트하기 위해, index 이름을 지정해야 한다.

GET /spanish_docs/_analyze
{
  "analyzer": "es_std",
  "text":"El veloz zorro marrón"
}

COPY AS CURL VIEW IN SENSE

The abbreviated results show that the Spanish stopword El has been removed correctly:

결과에서, Spanish stopwords El 이 올바르게 제거되었음을 알 수 있다.

{
  "tokens" : [
    { "token" :    "veloz",   "position" : 2 },
    { "token" :    "zorro",   "position" : 3 },
    { "token" :    "marrón",  "position" : 4 }
  ]
}

'2.X > 1. Getting Started' 카테고리의 다른 글

1-10-02. Deleting an Index (0)	2017.09.30
1-10-03. Index Settings (0)	2017.09.30
1-10-05. Custom Analyzers (0)	2017.09.30
1-10-06. Types and Mappings (0)	2017.09.30
1-10-07. The Root Object (0)	2017.09.30

현재글1-10-04. Configuring Analyzers

elasticsearch, definitive guide

Filter, Size, score, Cluster, replica, Shard, Term, Relevance, json, index, primary, Mapping, Type, inverted, MATCH, cache, Query, parent, full-text, phrase,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

不爲也比不能也