2-6-13. Scoring with Scripts

2.X/2. Search in Depth

2-6-13. Scoring with Scripts

drscg 2017. 9. 24. 19:08

Finally, if none of the function_score's built-in functions suffice, you can implement the logic that you need with a script, using the script_score function.

마지막으로, function_score 의 내장 함수 어느 것도 충분하지 않다면, script_score function을 사용하여, script로 필요한 logic을 구현할 수 있다.

For an example, let’s say that we want to factor our profit margin into the relevance calculation. In our business, the profit margin depends on three factors:

예를 들어, relevance 계산에 이윤 폭(profit margin)을 반영하는 것을 가정해 보자. 사업에서 이윤 폭은 3가지 요소에 달려 있다.

The price per night of the vacation home.
여름용 별장의 1박당 가격(price).
The user’s membership level—some levels get a percentage discount above a certain price per night threshold.
사용자의 회원 단계(level) – 특정 단계의 회원에게, 1박당 특정 가격(threshold) 이상이면, 일정 비율을 할인(discount)해 준다.
The negotiated margin as a percentage of the price-per-night, after user discounts.
사용자 할인 후에, 1박당 가격의 일정 비율로 협상 가능한 이윤 폭(margin)

The algorithm that we will use to calculate the profit for each home is as follows:

각 별장 별로 이윤을 계산하기 위해, 사용되는 알고리즘은 아래와 같다.

if (price < threshold) {
  profit = price * margin
} else {
  profit = price * (1 - discount) * margin;
}

We probably don’t want to use the absolute profit as a score; it would overwhelm the other factors like location, popularity and features. Instead, we can express the profit as a percentage of our target profit. A profit margin above our target will have a positive score (greater than 1.0), and a profit margin below our target will have a negative score (less than 1.0):

순이익을 score로 사용하지는 않을 것이다. 그럴 경우, 위치나 인기, 기능 같은 다른 요소를 무시하게 될 것이다. 대신, 목표(target) 이윤의 비율로 이윤을 나타낼 수 있다. 목표 이상의 이윤 폭은 +(positive, 1.0 보다 큰) score를 가질 것이고, 목표 이하의 이윤 폭은 –(negative, 1.0 보다 작은) score를 가질 것이다.

if (price < threshold) {
  profit = price * margin
} else {
  profit = price * (1 - discount) * margin
}
return profit / target

The default scripting language in Elasticsearch is Groovy, which for the most part looks a lot like JavaScript. The preceding algorithm as a Groovy script would look like this:

Elasticsearch에서 기본 script language는 Groovy이다. 대부분의 경우, 이것은 javascript처럼 보인다. Groovy로 위의 알고리즘을 나타내면, 아래와 같다.

price  = doc['price'].value 
margin = doc['margin'].value 

if (price < threshold) { 
  return price * margin / target
}
return price * (1 - discount) * margin / target

	`price` 와 `margin` 변수는 document의 `price` 와 `margin` field에서 읽어온다.
	`threshold`, `discount`, `target` 변수는 매개변수(`params`)로 전달한다.

Finally, we can add our script_score function to the list of other functions that we are already using:

마지막으로, script_score function을 이미 사용했던 다른 function의 목록에 추가할 수 있다.

GET /_search
{
  "function_score": {
    "functions": [
      { ...location clause... }, 
      { ...price clause... }, 
      {
        "script_score": {
          "params": { 
            "threshold": 80,
            "discount": 0.1,
            "target": 10
          },
          "script": "price  = doc['price'].value; margin = doc['margin'].value;
          if (price < threshold) { return price * margin / target };
          return price * (1 - discount) * margin / target;" 
        }
      }
    ]
  }
}

	`location` 과 `price` 절은 The Closer, The Better에서 설명한 예제를 참조하고 있다.
	`params` 으로 이런 변수를 전갈하여, script를 다시 compile하지 않고, 이 query를 실행할 때 마다 이 값을 변경할 수 있다.
	JSON은 내부에 newline 문자를 포함할 수 없다. javascript에서 newline 문자는, `\n` 으로 escape되거나 semicolon으로 대체된다.

This query would return the documents that best satisfy the user’s requirements for location and price, while still factoring in our need to make a profit.

이 query는 이윤을 만들어야 하는 요구를 충족시키면서, 위치나 가격에 대한 사용자의 요구사항을 가장 만족시키는 document를 반환한다.

The script_score function provides enormous flexibility. Within a script, you have access to the fields of the document, to the current _score, and even to the term frequencies, inverse document frequencies, and field length norms (see Text scoring in scripts).

script_score function은 많은 유연성을 제공한다. script내에서, document의 field, 현재 _score, 그리고 심지어 TF, IDF, field length norm(Text scoring in scripts 참조)까지 접근할 수 있다.

That said, scripts can have a performance impact. If you do find that your scripts are not quite fast enough, you have three options:

그렇지만, script는 성능에 영향을 줄 수 있다. 만약, script가 충분히 빠르지 않다고 판단이 되면, 3가지 옵션을 선택할 수 있다.

Try to precalculate as much information as possible and include it in each document.
가능한 한 많은 정보를 미리 계산하여, 각 document가 그것을 포함하도록 하자.
Groovy is fast, but not quite as fast as Java. You could reimplement your script as a native Java script. (See Native Java Scripts).
Groovy는 빠르다. 그러나 Java만큼 빠르지 않다. script를 native java script(Native Java Scripts 참조)로 다시 구현할 수 있다.
Use the rescore functionality described in Rescoring Results to apply your script to only the best-scoring documents.
script를 최고의 score를 가진 document에게만 적용되도록, Rescoring Results에서 언급한, rescore 기능을 사용하자.

저작자표시 비영리 변경금지

'2.X > 2. Search in Depth' 카테고리의 다른 글

2-6-10. Random Scoring (0)	2017.09.24
2-6-11. The Closer, The Better (0)	2017.09.24
2-6-12. Understanding the price Clause (0)	2017.09.24
2-6-14. Pluggable Similarity Algorithms (0)	2017.09.24
2-6-15. Changing Similarities (0)	2017.09.24

현재글2-6-13. Scoring with Scripts

elasticsearch, definitive guide

primary, Query, inverted, Type, cache, parent, Size, replica, phrase, Mapping, MATCH, Relevance, Term, Shard, Cluster, full-text, score, Filter, index, json,

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

不爲也比不能也