2-3-04. Tuning Best Fields Queries

2.X/2. Search in Depth

2-3-04. Tuning Best Fields Queries

drscg 2017. 9. 30. 00:48

What would happen if the user had searched instead for "quick pets"? Both documents contain the word quick, but only document 2 contains the word pets. Neither document contains both wordsin the same field.

만약 사용자가 "quick pets" 을 검색한다면 어떻게 될까? 두 document 모두 quick 이라는 단어를 포함하고 있고, document 2만 pets 를 포함하고 있다. 동일한 field 에 두 단어 모두 를 포함하고 있는 document는 없다.

A simple dis_max query like the following would choose the single best matching field, and ignore the other:

아래와 같이, 단순한 dis_max query는 하나의 가장 일치하는 field를 선택하고, 나머지는 무시한다.

{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ]
        }
    }
}

COPY AS CURL VIEW IN SENSE

{
  "hits": [
     {
        "_id": "1",
        "_score": 0.12713557, 
        "_source": {
           "title": "Quick brown rabbits",
           "body": "Brown rabbits are commonly seen."
        }
     },
     {
        "_id": "2",
        "_score": 0.12713557, 
        "_source": {
           "title": "Keeping pets healthy",
           "body": "My quick brown fox eats rabbits on a regular basis."
        }
     }
   ]
}

score가 정확히 같다는 것을 기억하자.

We would probably expect documents that match on both the title field and the body field to rank higher than documents that match on just one field, but this isn’t the case. Remember: the dis_maxquery simply uses the _score from the single best-matching clause.

아마도 title field와 body field 모두에 일치하는 document가, 하나의 field만 일치하는 document보다, 더 높은 순위를 가질 것이라 예상했을 것이다. 그러나 이 경우는 아니다. dis_max query는 간단하게 하나의 가장 일치(best matching)하는 절로부터 나온 _score 를 사용한다는 사실을 기억하자.

tie_breakeredit

It is possible, however, to also take the _score from the other matching clauses into account, by specifying the tie_breaker parameter:

그러나, tie_breaker 매개변수를 지정하여, 다른 일치하는 절에서 나온 _score 를 고려하는 것도 가능하다.

{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ],
            "tie_breaker": 0.3
        }
    }
}

COPY AS CURL VIEW IN SENSE

This gives us the following results:

위 query의 결과는 아래와 같다.

{
  "hits": [
     {
        "_id": "2",
        "_score": 0.14757764, 
        "_source": {
           "title": "Keeping pets healthy",
           "body": "My quick brown fox eats rabbits on a regular basis."
        }
     },
     {
        "_id": "1",
        "_score": 0.124275915, 
        "_source": {
           "title": "Quick brown rabbits",
           "body": "Brown rabbits are commonly seen."
        }
     }
   ]
}

이제 document 2가 document 1보다 약간 앞선다.

The tie_breaker parameter makes the dis_max query behave more like a halfway house between dis_max and bool. It changes the score calculation as follows:

tie_breaker 매개변수는 dis_max query를 dis_max 와 bool 의 타협점처럼 동작하게 한다. score 계산 방법이 아래처럼 바뀐다.

Take the _score of the best-matching clause.
가장 일치(best matching)하는 절에서 _score 를 가져온다.
Multiply the score of each of the other matching clauses by the tie_breaker.
tie_breaker 에 의해서, 다른 일치하는 절, 각각의 score를 곱한다.
Add them all together and normalize.
그들 모두를 더하고 정규화한다.

With the tie_breaker, all matching clauses count, but the best-matching clause counts most.

tie_breaker 를 사용하면, 모든 일치하는 절을 반영하지만, 가장 일치(best matching)하는 절을 가장 많이 반영한다.

The tie_breaker can be a floating-point value between 0 and 1, where 0 uses just the best-matching clause and 1 counts all matching clauses equally. The exact value can be tuned based on your data and queries, but a reasonable value should be close to zero, (for example, 0.1 - 0.4), in order not to overwhelm the best-matching nature of dis_max.

tie_breaker 는 0 과 1 사이의 부동소수점값이 될 수 있다. 0 을 사용하면 가장 일치(best matching)하는 절이고, 1 을 사용하면 일치하는 모든(all) 절을 똑같이 계산한다. 정확한 값은 데이터와 query에 따라 조정할 수 있지만, 합리적인 값은, 압도적인 것이 아닌, dis_max 원래의 의미인 가장 일치하는(best matching) 것을 위한, 0에 가까운 값(0.1 ~ 0.4)이다.

'2.X > 2. Search in Depth' 카테고리의 다른 글

2-3-02. Single Query String (0)	2017.09.30
2-3-03. Best Fields (0)	2017.09.30
2-3-05. multi_match Query (0)	2017.09.30
2-3-06. Most Fields (0)	2017.09.30
2-3-07. Cross-fields Entity Search (0)	2017.09.30

현재글2-3-04. Tuning Best Fields Queries

elasticsearch, definitive guide

Type, MATCH, Term, Filter, Relevance, primary, replica, Cluster, inverted, Shard, Query, score, Mapping, Size, index, parent, full-text, phrase, cache, json,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

不爲也比不能也