2-2-2. The match Query

2.X/2. Search in Depth

2-2-2. The match Query

drscg 2017. 9. 30. 01:44

The match query is the go-to query—the first query that you should reach for whenever you need to query any field. It is a high-level full-text query, meaning that it knows how to deal with both full-text fields and exact-value fields.

match query는 어떤 field를 query할 때마다 만나야 하는 첫 번째 query이다. match query는, full-text field와 exact-value field 양쪽 모두를 처리하는 방법을 알고 있는, high-level full-text query 이다.

That said, the main use case for the match query is for full-text search. So let’s take a look at how full-text search works with a simple example.

그렇지만, match query는 주로 full-text 검색에서 주로 사용된다. 그럼 간단한 예제와 함께, full-text 검색이 동작하는 방법을 살펴 보자.

Index Some Dataedit

First, we’ll create a new index and index some documents using the bulk API:

먼저, 새로운 index를 생성하고, bulk API를 이용하여, 몇 개의 document를 색인하자.

DELETE /my_index 

PUT /my_index
{ "settings": { "number_of_shards": 1 }} 

POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "title": "The quick brown fox" }
{ "index": { "_id": 2 }}
{ "title": "The quick brown fox jumps over the lazy dog" }
{ "index": { "_id": 3 }}
{ "title": "The quick brown fox jumps over the quick dog" }
{ "index": { "_id": 4 }}
{ "title": "Brown fox brown dog" }

COPY AS CURL VIEW IN SENSE

	이미 있다면, my_index를 지운다.
	하나의 primary shard만을 생성한 이유는, Relevance Is Broken!에서 설명하겠다.

A Single-Word Queryedit

Our first example explains what happens when we use the match query to search within a full-text field for a single word:

첫 번째 예제는, match query를 사용해서, full-text field를 하나의 단어로 검색했을 경우, 어떤 일이 일어나는지 설명하고 있다.

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": "QUICK!"
        }
    }
}

COPY AS CURL VIEW IN SENSE

Elasticsearch executes the preceding match query as follows:

Elasticsearch는 위의 match query를 아래처럼 실행한다.

Check the field type.
field type의 확인.
The title field is a full-text (analyzed) string field, which means that the query string should be analyzed too.
title field는 full-text(analyzed) string field이다. 따라서 query string 또한 분석되어야 한다.
Analyze the query string.
query string의 분석.
The query string QUICK! is passed through the standard analyzer, which results in the single term quick. Because we have just a single term, the match query can be executed as a single low-level term query.
query string QUICK! 은 standard analyzer로 전달되고, 그 결과는 단일 단어 quick 이다. 단 하나의 단어만 있기 때문에, match query는 단일 low-level term query로 실행될 수 있다.
Find matching docs.
일치하는 document 검색.
The term query looks up quick in the inverted index and retrieves the list of documents that contain that term—in this case, documents 1, 2, and
term query는 inverted index에서 quick 을 감색하고, 해당 단어를 포함하고 있는 document 목록을 가져온다. 위의 예에서는 id가 document 1, 2, 3을 가져온다. 그리고
Score each doc.
각 document의 score 계산.
The term query calculates the relevance _score for each matching document, by combining the term frequency (how often quick appears in the title field of each document), with the inverse document frequency (how often quick appears in the title field in all documents in the index), and the length of each field (shorter fields are considered more relevant). See What Is Relevance?.
term query는 일치하는 document 각각에 대해, relevance _score 를 계산한다. Term Frequency(각 document의 title field에 quick 이 몇 번 나타나는지), Inverse Document Frequency(index에 있는 모든 document의 title field에서 quick 이 몇 번 나타나는지), 각 field의 길이(짧을수록 좀더 relevance가 높음)를 조합하여 계산한다. What Is Relevance?를 참고하자.

This process gives us the following (abbreviated) results:

이 process의 결과는 아래와 같다.

"hits": [
 {
    "_id":      "1",
    "_score":   0.5, 
    "_source": {
       "title": "The quick brown fox"
    }
 },
 {
    "_id":      "3",
    "_score":   0.44194174, 
    "_source": {
       "title": "The quick brown fox jumps over the quick dog"
    }
 },
 {
    "_id":      "2",
    "_score":   0.3125, 
    "_source": {
       "title": "The quick brown fox jumps over the lazy dog"
    }
 }
]

	Doc 1는 `title` field가 짧기 때문에, 가장 관련있는 document이다. 즉, `quick` 이라는 단어가 내용의 많은 부분을 나타낸다.
	Doc 3에서 `quick` 이라는 단어가 두 번 나타나기 때문에, Doc 2보다 더 관련있는 document이다.