1-09-3. Search Options

2.X/1. Getting Started

1-09-3. Search Options

drscg 2017. 9. 30. 17:54

A few optional query-string parameters can influence the search process.

검색 프로세스에 영향을 미칠 수 있는, 몇 개의 선택적 query-string 매개변수가 있다.

preferenceedit

The preference parameter allows you to control which shards or nodes are used to handle the search request. It accepts values such as _primary, _primary_first, _local, _only_node:xyz, _prefer_node:xyz, and _shards:2,3, which are explained in detail on the search preferencedocumentation page.

preference 매개변수는 검색 request를 다룰 때, 어느 shard나 node를 사용할지를 조정할 수 있다. 값으로는 _primary, _primary_first, _local, _only_node:xyz, _prefer_node:xyz 그리고 _shards:2,3 가 있는데, 이 값들에 대한 자세한 설명은 search preference documentation을 참고하기 바란다.

However, the most generally useful value is some arbitrary string, to avoid the bouncing resultsproblem.

그러나, 일반적으로 가장 유용한 값은, 튀는 결과(bouncing result) 문제를 방지하기 위한, 임의의 문자열이다.

튀는 결과(bouncing result)

Imagine that you are sorting your results by a timestamp field, and two documents have the same timestamp. Because search requests are round-robined between all available shard copies, these two documents may be returned in one order when the request is served by the primary, and in another order when served by the replica.

결과를 timestamp field로 정렬하고, 동일한 timestamp를 가진 2개의 document가 있다고 가정해 보자. 검색 request는 모든 사용 가능한 shard 복사본을 round-robin 방식으로 사용하기 때문에, 이들 2개의 document는 primary에 의해 제공되는 순서와, replica에 의해 제공되는 순서의 두 가지로 반환된다.

This is known as the bouncing results problem: every time the user refreshes the page, the results appear in a different order. The problem can be avoided by always using the same shards for the same user, which can be done by setting the preference parameter to an arbitrary string like the user’s session ID.

이것은 튀는 결과(bouncing result) 문제로 알려져 있다. 사용자가 page를 refresh할 때마다, 다른 순서의 결과를 보게 된다. 이 문제는 동일한 사용자에게, 항상 동일한 shard를 사용하도록 해 피할 수 있는데, 이것은 preference 매개변수에, 사용자의 session ID 같은 임의의 문자열을 설정함으로써 가능하다.

timeoutedit

By default, shards process all the data they have before returning a response to the coordinating node, which will in turn merge these responses to build the final response.

기본적으로, shard는 조정(coordinating) node에 response를 반환하기 전에 그들이 가진 모든 데이터를 처리한다. 그리고 최종 response를 만들기 위해 이 response들을 차례로 병합한다.

This means that the time it takes to run a search request is the sum of the time it takes to process the slowest shard and the time it takes to merge responses. If one node is having trouble, it could slow down the response to all search requests.

즉, search request를 실행하기 위해 소요되는 시간은 가장 느린 shard를 처리하는데 소요된 시간과 response를 병합하는데 소요된 시간의 합이다. 만약 어떤 node에 문제가 있다면, 모든 search request에 대한 response가 느려질 수 있다.

The timeout parameter tells shards how long they are allowed to process data before returning a response to the coordinating node. If there was not enough time to process all data, results for this shard will be partial, even possibly empty.

timeout 매개변수는 shard가 조정 node로 response를 반환하기 전에 얼마나 오랫동안 데이터를 처리할지를 지정한다. 민약 모든 데이터를 처리할 만큼 시간이 충분하지 않다면, 이 shard에 대한 결과는 일부분이거나 비어있을 수도 있다.

The response to a search request will indicate whether any shards returned a partial response with the timed_out property:

검색 request에 대한 response는, timed_out property로 어떤 shard가 부분적인 response를 반환했는지를 나타낸다.

    ...
    "timed_out":     true,  
    ...

검색 request에서 time out이 발생했다.

It’s important to know that the timeout is still a best-effort operation; it’s possible for the query to surpass the allotted timeout. There are two reasons for this behavior:

timeout은 여전히 최선의 연산이라는 것을 알아야 한다. query가 할당된 timeout을 초과할 수 있다. 이런 동작에는 2가지 이유가 있다.

Timeout checks are performed on a per-document basis. However, some query types have a significant amount of work that must be performed before documents are evaluated. This "setup" phase does not consult the timeout, and so very long setup times can cause the overall latency to shoot past the timeout.

timeout 확인은 개별 document를 기반으로 수행된다. 그러나 일부 query는 document가 평가되기 전에 상당한 양의 작업을 수행해야 한다. 이 "setup(상당한 양의 작업)" 절은 timeout을 참조하지 않는다. 그리고 그렇게 매우 긴 setup 시간은 전체 대기 시간을 timeout 이상으로 만들게 된다.
Because the time is once per document, a very long query can execute on a single document and it won’t timeout until the next document is evaluated. This also means poorly written scripts (e.g. ones with infinite loops) will be allowed to execute forever.

시간은 document 별로 한 번이기 때문에, 매우 오래 걸리는 query가 단일 document에서 실행되면, 다음 document가 평가될 때까지 timeout되지 않을 것이다. 잘못 작성된 script(e.g. 무한 루프)는 영원히 실행되다는 것을 의미한다.

routingedit

In Routing a Document to a Shard, we explained how a custom routing parameter could be provided at index time to ensure that all related documents, such as the documents belonging to a single user, are stored on a single shard. At search time, instead of searching on all the shards of an index, you can specify one or more routing values to limit the search to just those shards:

Routing a Document to a Shard에서, 모든 관련된 document(예를 들자면, 특정 사용자에게 속하는 모든 document)가 단일 shard에 저장된다는 것을 보증하기 위해, 색인 시에 제공할 수 있는, 사용자 정의 routing 매개변수의 사용 방법을 설명했었다. 검색 시에, index의 모든 shard를 검색하는 대신에, 특정 shard로 검색을 제한하기 위해, 하나 이상의 routing 값을 지정할 수 있다.

GET /_search?routing=user_1,user2

This technique comes in handy when designing very large search systems, and we discuss it in detail in Designing for Scale.

이 기술은 매우 큰 검색 시스템을 디자인할 경우에 유용하다. Designing for Scale에서 자세히 이야기해 보자.

search_typeedit

The default search type is query_then_fetch . In some cases, you might want to explicitly set the search_type to dfs_query_then_fetch to improve the accuracy of relevance scoring:

query_then_fetch 가 기본 검색 type이지만, 특별한 목적을 위해, 다른 검색 type을 지정할 수 있다.

GET /_search?search_type=dfs_query_then_fetch

The dfs_query_then_fetch search type has a prequery phase that fetches the term frequencies from all involved shards to calculate global term frequencies. We discuss this further in Relevance Is Broken!.

dfs_query_then_fetch 검색 type은 전체 TF(Term Frequencies)를 계산하기 위해, 모든 관련된 shard로부터 TF를 가져오는 pre-query 절을 가진다. 나중에 Relevance Is Broken!에서 이야기 해 보자.

'2.X > 1. Getting Started' 카테고리의 다른 글

1-09-1. Query Phase (0)	2017.09.30
1-09-2. Fetch Phase (0)	2017.09.30
1-09-4. Scroll (0)	2017.09.30
1-10. Index Management (0)	2017.09.30
1-10-01. Creating an Index (0)	2017.09.30

현재글1-09-3. Search Options

elasticsearch, definitive guide

full-text, Relevance, Mapping, parent, Size, Query, Cluster, Shard, replica, inverted, phrase, Term, json, Filter, MATCH, primary, index, Type, cache, score,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

不爲也比不能也