3-3-1. In That Case

2.X/3. Dealing with Human Language

3-3-1. In That Case

drscg 2017. 9. 24. 17:24

The most frequently used token filter is the lowercase filter, which does exactly what you would expect; it transforms each token into its lowercase form:

가장 자주 사용되는 token filter는 lowercase filter이다. 이것은 여러분이 예상한 것과 마찬가지로, 각각의 token을 소문자로 변경한다.

GET /_analyze?tokenizer=standard&filters=lowercase
The QUICK Brown FOX!

출력되는 tokens the, quick, brown, fox

It doesn’t matter whether users search for fox or FOX, as long as the same analysis process is applied at query time and at search time. The lowercase filter will transform a query for FOX into a query for fox, which is the same token that we have stored in our inverted index.

query시나 검색 시에 적용된 분석 프로세스가 동일한 한, fox 나 FOX 중, 사용자가 어떤 것을 검색하든 관계없다. lowercase filter는 FOX 에 대한 query를 fox 에 대한 query로 변경할 것이다. 그리고, 이것은 inverted index에 저장된 것과 동일한 token이다.

To use token filters as part of the analysis process, we can create a custom analyzer:

분석 프로세스의 일부로 token filter를 사용하기 위해, 사용자 정의(custom) analyzer를 생성할 수 있다.

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_lowercaser": {
          "tokenizer": "standard",
          "filter":  [ "lowercase" ]
        }
      }
    }
  }
}

And we can test it out with the analyze API:

그리고, analyze API를 이용해, 이를 테스트할 수 있다.

GET /my_index/_analyze?analyzer=my_lowercaser
The QUICK Brown FOX!

출력되는 token은 the, quick, brown, fox 이다

저작자표시 비영리 변경금지 (새창열림)

'2.X > 3. Dealing with Human Language' 카테고리의 다른 글

3-2-5. Tidying Up Input Text (0)	2017.09.24
3-3. Normalizing Tokens (0)	2017.09.24
3-3-2. You Have an Accent (0)	2017.09.24
3-3-3. Living in a Unicode World (0)	2017.09.24
3-3-4. Unicode Case Folding (0)	2017.09.24

현재글3-3-1. In That Case

elasticsearch, definitive guide

replica, Mapping, inverted, Cluster, Shard, score, MATCH, parent, Filter, index, Type, Size, json, Term, Relevance, primary, cache, full-text, Query, phrase,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

不爲也比不能也

3-3-1. In That Case

'2.X > 3. Dealing with Human Language' 카테고리의 다른 글

'2.X/3. Dealing with Human Language'의 다른글

티스토리툴바

3-3-1. In That Case

'2.X > 3. Dealing with Human Language' 카테고리의 다른 글

'2.X/3. Dealing with Human Language'의 다른글

관련글

티스토리툴바