6-3-7. Practical Considerations

2.X/6. Modeling Your Data

6-3-7. Practical Considerations

drscg 2017. 9. 23. 13:04

Parent-child joins can be a useful technique for managing relationships when index-time performance is more important than search-time performance, but it comes at a significant cost. Parent-child queries can be 5 to 10 times slower than the equivalent nested query!

부모-자식의 join은, 색인 시의 성능이 검색 시의 성능보다 중요한 경우에, 관계를 관리하는 유용한 기술이다. 그러나 상당한 비용이 발생한다. 부모-자식 query는 동급의 nested query에 비해 5 ~ 10배 정도 더 느릴 수 있다.

Global Ordinals and Latencyedit

Parent-child uses global ordinals to speed up joins. Regardless of whether the parent-child map uses an in-memory cache or on-disk doc values, global ordinals still need to be rebuilt after any change to the index.

부모 자식은 join의 속도를 높이기 위해, global ordinals을 사용한다. 부모-자식 map이 메모리의 cache나 디스크의 doc values 중 어떤 것을 사용하더라도, index에 대한 어떠한 변경 작업 후에는, global ordinal은 여전히 다시 만들어야 한다.

The more parents in a shard, the longer global ordinals will take to build. Parent-child is best suited to situations where there are many children for each parent, rather than many parents and few children.

shard에 부모가 많을수록, global ordinal이 만드는데 더 많은 시간이 소요된다. 부모-자식은 많은 부모에 소수의 자식이 있는 경우보다, 각각의 부모에 대해 많은 자식이 있는 상황에 가장 적합하다.

Global ordinals, by default, are built lazily: the first parent-child query or aggregation after a refresh will trigger building of global ordinals. This can introduce a significant latency spike for your users. You can use eager_global_ordinals to shift the cost of building global ordinals from query time to refresh time, by mapping the _parent field as follows:

기본적으로, global ordinal은 지연되어(lazily) 구축된다. refresh 후, 최초의 부모-자식 query나 aggregation으로 global ordinal이 만들어진다. 이것은 사용자에게 상당한 대기 시간 문제를 일으킨다. 다음과 같이 _parent field를 mapping하여, query시에서 refresh할 때로, global ordinal을 만드는데 소요되는 비용을 이동시킬 수 있다. 이를 위하여 eager_global_ordinals을 사용할 수 있다.

PUT /company
{
  "mappings": {
    "branch": {},
    "employee": {
      "_parent": {
        "type": "branch",
        "fielddata": {
          "loading": "eager_global_ordinals" 
        }
      }
    }
  }
}

_parent field에 대한 global ordinal은 새로운 segment가 검색에 노출되기 전에 만들어진다.

With many parents, global ordinals can take several seconds to build. In this case, it makes sense to increase the refresh_interval so that refreshes happen less often and global ordinals remain valid for longer. This will greatly reduce the CPU cost of rebuilding global ordinals every second.

많은 부모가 있으면, global ordinal를 만드는데 수초가 소요된다. 이 경우에는 refresh가 덜 발생하고, global ordinal이 더 오래 유효하도록 하기 위하여, refresh_interval 을 증가시키는 것이 합리적이다. 이것은 매 초마다 global ordinal을 다시 만드는 CPU의 비용을 크게 감소시킬 것이다.

Multigenerations and Concluding Thoughtsedit

The ability to join multiple generations (see Grandparents and Grandchildren) sounds attractive until you think of the costs involved:

여러 세대간을 join시키는 능력(Grandparents and Grandchildren 참조)은 거기에 소요되는 비용을 생각하기 전까지는, 매우 매력적이다.

The more joins you have, the worse performance will be.
join이 많을수록 성능은 나빠진다.
Each generation of parents needs to have their string _id fields stored in memory, which can consume a lot of RAM.
각 부모 세대는 그들의 문자열 _id field를, 많은 RAM을 소비할 수 있는, 메모리에 저장해야 한다.

As you consider your relationship schemes and whether parent-child is right for you, consider this advice about parent-child relationships:

여러분의 관계 schema를 고려할 때, 부모-자식이 적절하다고 판단이 되면, 부모-자식 관계에 대한 다음 사항을 충분히 생각하도록 하자.

Use parent-child relationships sparingly, and only when there are many more children than parents.
부모보다 더 많은 자식들이 있는 곳에서, 가능한 한 적게 부모-자식 관계를 사용하자.
Avoid using multiple parent-child joins in a single query.
단일 query에서, 다수의 부모-자식 join을 사용하는 것을 피하자.
Avoid scoring by using the has_child filter, or the has_child query with score_mode set to none.
score_mode 를 none 으로 설정하여, has_child filter나 has_child query를 사용하여, score를 계산하는 것을 피하자.
Keep the parent IDs short, so that they compress better in doc values, and use less memory when transiently loaded.
doc value에서 더 잘 압축하고, 일시적으로 로드될 때 더 작은 메모리를 사용하도록, 부모 ID를 짧게 유지하자.

Above all: think about the other relationship techniques that we have discussed before reaching for parent-child.

무엇보다도, 부모-자식을 고려하기 전에, 이 장에서 언급했던 다른 관계 기술들에 대해, 먼저 생각해 보자.

'2.X > 6. Modeling Your Data' 카테고리의 다른 글

6-3-5. Children Aggregation (0)	2017.09.23
6-3-6. Grandparents and Grandchildren (0)	2017.09.23
6-4. Designing for Scale (0)	2017.09.23
6-4-01. The Unit of Scale (0)	2017.09.23
6-4-02. Shard Overallocation (0)	2017.09.23

현재글6-3-7. Practical Considerations

elasticsearch, definitive guide

Type, cache, Relevance, Query, Filter, score, full-text, index, MATCH, primary, Size, inverted, json, parent, Mapping, replica, Shard, Cluster, phrase, Term,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

不爲也比不能也