2-4-1. Phrase Matching

2.X/2. Search in Depth

2-4-1. Phrase Matching

drscg 2017. 9. 24. 21:58

In the same way that the match query is the go-to query for standard full-text search, the match_phrase query is the one you should reach for when you want to find words that are near each other:

표준 full-text 검색에 대해, match query가 정보를 찾는 query인 것과 동일한 방식으로, match_phrasequery는 서로 가까이 있는 단어들을 찾는 경우에 사용할 query 중 하나이다.

GET /my_index/my_type/_search
{
    "query": {
        "match_phrase": {
            "title": "quick brown fox"
        }
    }
}

COPY AS CURL VIEW IN SENSE

Like the match query, the match_phrase query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other. A query for the phrase quick fox would not match any of our documents, because no document contains the word quick immediately followed by fox.

match query와 마찬가지로, match_phrase query는, 단어의 목록을 만들기 위해, 먼저 query string을 분석한다. 그리고 모든 단어를 검색하는데, 서로 관련된 동일한 위치 에서, 검색어 모두 를 포함하는 document를 가질 뿐이다. 단어 quick 다음에 fox 가 바로 이어지는 document는 없기 때문에, quick fox 라는 구(phrase, 句)를 찾는 query는 어떤 document와도 일치하지 않는다.

The match_phrase query can also be written as a match query with type phrase:

match_phrase query는 phrase type을 사용한 match query로 작성할 수도 있다.

"match": {
    "title": {
        "query": "quick brown fox",
        "type":  "phrase"
    }
}

COPY AS CURL VIEW IN SENSE

Term Positionsedit

When a string is analyzed, the analyzer returns not only a list of terms, but also the position, or order, of each term in the original string:

문자열이 분석될 때, analyzer는 단어의 목록뿐 아니라, 원래 문자열에서 각 단어의 위치, 순서도 반환한다.

GET /_analyze?analyzer=standard
Quick brown fox

COPY AS CURL VIEW IN SENSE

This returns the following:

아래처럼 반환된다.

{
   "tokens": [
      {
         "token": "quick",
         "start_offset": 0,
         "end_offset": 5,
         "type": "<ALPHANUM>",
         "position": 1 
      },
      {
         "token": "brown",
         "start_offset": 6,
         "end_offset": 11,
         "type": "<ALPHANUM>",
         "position": 2 
      },
      {
         "token": "fox",
         "start_offset": 12,
         "end_offset": 15,
         "type": "<ALPHANUM>",
         "position": 3 
      }
   ]
}

원래 문자열에서 각 단어의 위치(position)

Positions can be stored in the inverted index, and position-aware queries like the match_phrasequery can use them to match only documents that contain all the words in exactly the order specified, with no words in-between.

위치는 inverted index에 저장된다. 그리고 match_phrase query 같은 위치-인식(position-aware) query는, 정확히 지정된 순서로 된 단어 모두를 포함하는, 단어 사이에 아무것도 없는, document만 일치하도록, 위치를 사용할 수 있다.

What Is a Phraseedit

For a document to be considered a match for the phrase "quick brown fox", the following must be true:

"quick brown fox" 라는 구문(phrase)에 일치하는 document에 대해, 다음은 참일 것이다.

quick, brown, and fox must all appear in the field.
quick, brown, fox 라는 단어 모두가 filed에 반드시 있다.
The position of brown must be 1 greater than the position of quick.
brown 이라는 단어의 위치가 quick 이라는 단어의 위치보다 1 이상 크다.
The position of fox must be 2 greater than the position of quick.
fox 라는 단어의 위치가 quick 이라는 단어의 위치보다 2 이상 크다.

If any of these conditions is not met, the document is not considered a match.

이러한 조건에 하나라도 일치하지 않으면, document는 일치하지 않는 것으로 판단한다.

Internally, the match_phrase query uses the low-level span query family to do position-aware matching. Span queries are term-level queries, so they have no analysis phase; they search for the exact term specified.

내부적으로, match_phrase query는, 위치-인식 일치를 수행하는, low-level의 범위(span)query군이다. 범위 query는 term-level query이다. 그래서 분석 단계가 없다. 그들은 지정된 정확한 단어를 검색한다.

Thankfully, most people never need to use the span queries directly, as thematch_phrase query is usually good enough. However, certain specialized fields, like patent searches, use these low-level queries to perform very specific, carefully constructed positional searches.

다행히 대부분의 사람들은, 일반적으로 match_phrase 가 충분하다고 여겨, 직접적으로 범위(span) query를 사용하지 않는다. 그러나 특허 검색 같은 특정 전문 분야는, 매우 독특하고 신중하게 구성된 위치 검색을 수행하기 위해, 이런 low-level의 query를 사용한다.

저작자표시 비영리 변경금지

'2.X > 2. Search in Depth' 카테고리의 다른 글

2-3-11. Exact-Value Fields (0)	2017.09.28
2-4. Proximity Matching (0)	2017.09.24
2-4-2. Mixing It Up (0)	2017.09.24
2-4-3. Multivalue Fields (0)	2017.09.24
2-4-4. Closer Is Better (0)	2017.09.24

현재글2-4-1. Phrase Matching

elasticsearch, definitive guide

primary, phrase, parent, Size, json, inverted, Term, score, Shard, Filter, Query, MATCH, Mapping, index, full-text, Cluster, cache, Type, replica, Relevance,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

不爲也比不能也