1-01-10. Full-Text Search

2.X/1. Getting Started

1-01-10. Full-Text Search

drscg 2017. 10. 1. 12:11

The searches so far have been simple: single names, filtered by age. Let’s try a more advanced, full-text search—a task that traditional databases would really struggle with.

지금까지의 검색은 이름 하나만을 검색하고, 나이를 filtering하는 단순한 검색이었다. 기존의 DB와 진정으로 겨뤄볼 수 있는, 더 고급스러운 full-text 검색을 해 보자.

We are going to search for all employees who enjoy rock climbing:

rock climbing을 즐겨 하는 모든 직원을 찾아보자.

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}

COPY AS CURL VIEW IN SENSE

You can see that we use the same match query as before to search the about field for "rock climbing". We get back two matching documents:

"rock climbing" 을 about field에서 찾기 위해, 이전과 동일한 match query를 사용한 것을 알 수 있다.

{
   ...
   "hits": {
      "total":      2,
      "max_score":  0.16273327,
      "hits": [
         {
            ...
            "_score":         0.16273327, 
            "_source": {
               "first_name":  "John",
               "last_name":   "Smith",
               "age":         25,
               "about":       "I love to go rock climbing",
               "interests": [ "sports", "music" ]
            }
         },
         {
            ...
            "_score":         0.016878016, 
            "_source": {
               "first_name":  "Jane",
               "last_name":   "Smith",
               "age":         32,
               "about":       "I like to collect rock albums",
               "interests": [ "music" ]
            }
         }
      ]
   }
}

The relevance scores

By default, Elasticsearch sorts matching results by their relevance score, that is, by how well each document matches the query. The first and highest-scoring result is obvious: John Smith’s aboutfield clearly says "rock climbing" in it.

기본적으로, Elasticsearch는 일치하는 결과를 relevance score에 따라 정렬한다. relevance score는 각 document가 query에 얼마나 많이 일치하는가를 나타낸다. 가장 먼저 나오고, 가장 score가 높은 결과는 확실하다. John Smith의 about field에는 "rock climbing" 이 분명히 들어가 있다.

But why did Jane Smith come back as a result? The reason her document was returned is because the word "rock" was mentioned in her about field. Because only "rock" was mentioned, and not "climbing", her _score is lower than John’s.

그런데, Jane Smith는 왜 결과로 나오는가? 그녀의 document가 결과로 나온 이유는 그녀의 about field에 "rock" 이라는 단어가 언급되어 있기 때문이다. "rock" 이라는 단어만 언급되고, "climbing" 은 없기 때문에, 그녀의 _score 가 John보다 낮다.

This is a good example of how Elasticsearch can search within full-text fields and return the most relevant results first. This concept of relevance is important to Elasticsearch, and is a concept that is completely foreign to traditional relational databases, in which a record either matches or it doesn’t.

이 예제는 Elasticsearch가 full-text field 내에서 검색할 수 있는 방법과, 가장 관련 있는 결과를 먼저 돌려준다는 것을 보여주는 좋은 예제이다. relevance 라는 이 개념은, Elasticsearch에서 중요하며, record가 일치하는지 여부를 따지는 기존의 RDB와는 전혀 다른 개념이다.

'2.X > 1. Getting Started' 카테고리의 다른 글

1-01-08. Search with Query DSL (0)	2017.10.01
1-01-09. More-Complicated Searches (0)	2017.10.01
1-01-11. Phrase Search (0)	2017.10.01
1-01-12. Highlighting Our Searches (0)	2017.10.01
1-01-13. Analytics (0)	2017.10.01

현재글1-01-10. Full-Text Search

elasticsearch, definitive guide

index, Mapping, Size, replica, Type, Term, Shard, primary, Cluster, Query, Relevance, cache, full-text, score, phrase, MATCH, json, inverted, parent, Filter,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

不爲也比不能也