'Normalization' 태그의 글 목록

Normalization 2

Breaking text into tokens is only half the job. To make those tokens more easily searchable, they need to go through a normalization process to remove insignificant differences between otherwise identical words, such as uppercase versus lowercase. Perhaps we also need to remove significant differences, to make esta, ésta, and está all searchable as the same word. Would you search for déjà vu, or..

2.X/3. Dealing with Human Language 2017.09.24

3-3-3. Living in a Unicode World

When Elasticsearch compares one token with another, it does so at the byte level. In other words, for two tokens to be considered the same, they need to consist of exactly the same bytes. Unicode, however, allows you to write the same letter in different ways.Elasticsearch가 어떤 token과 다른 것을 비교하는 경우, byte 수준으로 비교한다. 즉, 두 token이 동일하다고 간주되기 위해서는, 정확히 동일한 byte로 구성되어야 한다. 그러나, Unicode는 동일한 문자를 다른 방법으로..

2.X/3. Dealing with Human Language 2017.09.24

elasticsearch, definitive guide

phrase, full-text, Type, MATCH, inverted, replica, Cluster, parent, json, score, Term, Mapping, Relevance, index, cache, Shard, Query, Size, primary, Filter,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

不爲也比不能也

Normalization 2

티스토리툴바