不爲也 比不能也

  • 홈
  • 태그
  • 미디어로그
  • 위치로그
  • 방명록

Normalization 2

3-3. Normalizing Tokens

Breaking text into tokens is only half the job. To make those tokens more easily searchable, they need to go through a normalization process to remove insignificant differences between otherwise identical words, such as uppercase versus lowercase. Perhaps we also need to remove significant differences, to make esta, ésta, and está all searchable as the same word. Would you search for déjà vu, or..

2.X/3. Dealing with Human Language 2017.09.24

3-3-3. Living in a Unicode World

When Elasticsearch compares one token with another, it does so at the byte level. In other words, for two tokens to be considered the same, they need to consist of exactly the same bytes. Unicode, however, allows you to write the same letter in different ways.Elasticsearch가 어떤 token과 다른 것을 비교하는 경우, byte 수준으로 비교한다. 즉, 두 token이 동일하다고 간주되기 위해서는, 정확히 동일한 byte로 구성되어야 한다. 그러나, Unicode는 동일한 문자를 다른 방법으로..

2.X/3. Dealing with Human Language 2017.09.24
1
더보기
프로필사진

elasticsearch, definitive guide

  • 분류 전체보기 (464)
    • 7.x (0)
      • 8. Breaking Changes (0)
      • Important Elasticsearch con.. (0)
      • Important System Configurat.. (0)
      • Bootstrap Checks (0)
    • 6.x (32)
      • 8. Breaking Changes (32)
      • Important Elasticsearch con.. (0)
      • Important System Configurat.. (0)
      • Bootstrap Checks (0)
    • 5.X (27)
      • 8. Breaking Changes (27)
    • 2.X (336)
      • 0. Preface (3)
      • 1. Getting Started (93)
      • 2. Search in Depth (62)
      • 3. Dealing with Human Langu.. (49)
      • 4. Aggregations (38)
      • 6. Modeling Your Data (35)
      • 7. Administration Monitorin.. (28)
      • 8. Breaking Changes (28)
    • Blog (54)
    • Reference (7)
      • How To ... (7)
    • MongoDB (3)
    • SearchGuard (0)
    • redis (4)

Tag

MATCH, replica, Shard, phrase, cache, Size, inverted, Query, Filter, Relevance, full-text, parent, json, index, Cluster, score, Type, Term, primary, Mapping,

최근글과 인기글

  • 최근글
  • 인기글

최근댓글

공지사항

  • 이 블로그의 내용에 대하여 ...

페이스북 트위터 플러그인

  • Facebook
  • Twitter

Archives

Calendar

«   2025/07   »
일 월 화 수 목 금 토
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31

방문자수Total

  • Today :
  • Yesterday :

Copyright © Kakao Corp. All rights reserved.

티스토리툴바