2.X/4. Aggregations 38

4-07. Sorting Multivalue Buckets

Multivalue buckets—the terms, histogram, and date_histogram—dynamically produce many buckets. How does Elasticsearch decide the order that these buckets are presented to the user?다중 값 bucket(terms, histogram, date_histogram)은 동적으로 많은 bucket을 생성한다. 이런 bucket이 사용자에게 표시되는 순서를 Elasticsearch는 어떻게 결정할까?By default, buckets are ordered by doc_count in descending order. This is a good default because oft..

2.X/4. Aggregations 2017.09.23

4-07-1. Intrinsic Sorts

These sort modes are intrinsic to the bucket: they operate on data that bucket generates, such as doc_count. They share the same syntax but differ slightly depending on the bucket being used.이 정렬 방식은 bucket의 기본 이다. doc_count 처럼, bucket이 생성한 데이터에 따라 동작한다. 동일한 문법을 공유하지만, 사용된 bucket에 따라 약간 다르다.Let’s perform a terms aggregation but sort by doc_count, in ascending order:terms aggregation을 해보자. 그러나 do..

2.X/4. Aggregations 2017.09.23

4-07-2. Sorting by a Metric

Often, you’ll find yourself wanting to sort based on a metric’s calculated value. For our car sales analytics dashboard, we may want to build a bar chart of sales by car color, but order the bars by the average price, ascending.metric의 계산된 값을 기준으로 정렬해야 하는 경우가 있다. 자동차 판매 분석 대시보드에서, 자동차 색상으로, 판매 bar chart를 만든다고 가정해 보자. 그런데, bar의 순서는 평균가의 오름차순이다.We can do this by adding a metric to our bucket, and ..

2.X/4. Aggregations 2017.09.23

4-07-3. Sorting Based on "Deep" Metrics

In the prior examples, the metric was a direct child of the bucket. An average price was calculated for each term. It is possible to sort on deeper metrics, which are grandchildren or great-grandchildren of the bucket—with some limitations.이전의 예제에서, metric은 bucket의 직접적인 자식이었다. 평균 가격은 각 단어에 대해 계산된 값이었다. 더 아래의(deeper) metric으로 정렬하는 것이 가능하다. 약간의 제한이 있지만, bucket의 손자, 증손자도 가능하다.You can define a path ..

2.X/4. Aggregations 2017.09.23

4-08. Approximate Aggregations

Life is easy if all your data fits on a single machine. Classic algorithms taught in CS201 will be sufficient for all your needs. But if all your data fits on a single machine, there would be no need for distributed software like Elasticsearch at all. But once you start distributing data, algorithm selection needs to be made carefully.모든 데이터가 단일 시스템으로 충분하다면, 참 쉬울 것이다. CS201에서 배운 고전적인 알고리즘으로, 모든 ..

2.X/4. Aggregations 2017.09.23

4-08-1. Finding Distinct Counts

The first approximate aggregation provided by Elasticsearch is the cardinality metric. This provides the cardinality of a field, also called a distinct or unique count. You may be familiar with the SQL version:Elasticsearch에서 제공되는, 첫 번째 approximate aggregation은 cardinality metric이다. 이것은 고유한(distinct) 또는 유일한(unique) 값의 수라고 불리기도 하는, filed의 cardinality(기수)를 제공한다. 아래 SQL 버전에 익숙할 것이다.SELECT COUNT(DIS..

2.X/4. Aggregations 2017.09.23

4-08-2. Calculating Percentiles

The other approximate metric offered by Elasticsearch is the percentiles metric. Percentiles show the point at which a certain percentage of observed values occur. For example, the 95th percentile is the value that is greater than 95% of the data.Elasticsearch에 의해 제공되는 또 다른 approximate metric은, percentiles(백분위) metric이다.percentiles는 관찰된 값들 중에서 특정 비율이 나타나는 지점을 나타낸다. 예를 들어, 95번째 percentile는 데이터의 9..

2.X/4. Aggregations 2017.09.23

4-09. Significant Terms

he significant_terms (SigTerms) aggregation is rather different from the rest of the aggregations. All the aggregations we have seen so far are essentially simple math operations. By combining the various building blocks, you can build sophisticated aggregations and reports about your data.significant_terms(SigTerm) aggregation은 다른 aggregation과 다소 다르다. 지금까지 본 모든 aggregation은 본질적으로 간단한 수학 연산이다. 다..

2.X/4. Aggregations 2017.09.23

4-09-1. significant_terms Demo

Because the significant_terms aggregation works by analyzing statistics, you need to have a certain threshold of data for it to become effective. That means we won’t be able to index a small amount of example data for the demo.significant_terms aggregation은 통계의 분석에 의해 동작하기 때문에, 효율성을 위해, 데이터의 특정 임계 값이 필요하다. 즉, 데모를 하려면, 적은 양의 데이터로는 불가능하다.Instead, we prepared a dataset that contains about 80,000 do..

2.X/4. Aggregations 2017.09.23