Analysis 11

2. Mixing exact search with stemming

When building a search application, stemming is often a must as it is desirable for a query on skiing to match documents that contain ski or skis. But what if a user wants to search for skiing specifically? The typical way to do this would be to use a multi-field in order to have the same content indexed in two different ways:검색 application을 만드는 경우, skiing 에 대한 query가 ski 나 skis 를 포함하는 document와..

v5.0-07. Settings changes

From Elasticsearch 5.0 on all settings are validated before they are applied. Node level and default index level settings are validated on node startup, dynamic cluster and index setting are validated before they are updated/added to the cluster state.Elasticsearch 5.0 부터 모든 설정의 적용되기 전에 유효성을 검사한다. node level과 index level 설정은 시작 시에 유효성이 검사되고, 동적 cluster 및 index 설정은 cluster state에 업데이트/추가되기 전에 유효성..

1-05. Searching—The Basic Tools

So far, we have learned how to use Elasticsearch as a simple NoSQL-style distributed document store. We can throw JSON documents at Elasticsearch and retrieve each one by ID. But the real power of Elasticsearch lies in its ability to make sense out of chaos — to turn Big Data into Big Information.지금까지, 간단한 NoSQL-style의 분산 document 저장소로서의 Elasticsearch를 사용하는 방법에 대해 배웠다. Elasticsearch에 JSON docume..

1-06-2. Inverted Index

Elasticsearch uses a structure called an inverted index, which is designed to allow very fast full-text searches. An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.Elasticsearch는 full-text 검색을 매우 빠르게 할 수 있도록 설계된, inverted index 라는 구조를 사용한다. inverted index는 특정 document에 나타나는 유일한 단어 모두의 목록과, 각각의 ..

1-06-3. Analysis and Analyzers

Analysis is a process that consists of the following:analysis 프로세스는 다음과 같이 구성된다.First, tokenizing a block of text into individual terms suitable for use in an inverted index,먼저, 문장(text)을, inverted index에서 사용하기에 적합한, 개별 단어(term) 로 분리한다.Then normalizing these terms into a standard form to improve their "searchability" or recall그리고, "검색 능력", recall 을 개선하기 위해, 표준 형태로 이들 단어를 정규화한다.This job is perfor..

1-10-04. Configuring Analyzers

The third important index setting is the analysis section, which is used to configure existing analyzers or to create new custom analyzers specific to your index.세 번째로 중요한 index 설정은 analysis 부분이다. 이것은 기존의 analyzer를 설정하거나, index에 지정된 새로운 사용자 정의 analyzer를 생성하는데 사용된다.In Analysis and Analyzers, we introduced some of the built-in analyzers, which are used to convert full-text strings into an inverted..

1-10-05. Custom Analyzers

While Elasticsearch comes with a number of analyzers available out of the box, the real power comes from the ability to create your own custom analyzers by combining character filters, tokenizers, and token filters in a configuration that suits your particular data.Elasticsearch가 수많은 내장 analyzer를 제공하지만, 진정한 힘은 자신의 특별한 데이터에 적합한 설정에서, character filters, tokenizers 그리고 token filters를 조합하여, 자신만의 사용자..

2-2. Full-Text Search

Now that we have covered the simple case of searching for structured data, it is time to explore full-text search: how to search within full-text fields in order to find the most relevant documents.지금까지 구조화된 데이터를 위한, 간단한 검색을 살펴봤다. 이제 full-text 검색(full-text search) 을 탐험할 시간이다. 가장 적합한 document를 찾기 위해, full-text field를 검색하는 방법을 알아 보자.The two most important aspects of full-text search are as follows..

2-2-1. Term-Based Versus Full-Text

While all queries perform some sort of relevance calculation, not all queries have an analysis phase.Besides specialized queries like the bool or function_score queries, which don’t operate on text at all, textual queries can be broken down into two families:모든 query가 relevance 연산의 일종을 수행하지만, 모든 query가 analysis 절을 가지지는 않는다. 텍스트를 전혀 다루지 않는, bool 이나 function-score query 같은 특별한 query 이외에, 텍스트를 다루는 ..