2016.11.15 - 번역 - A New Way To Ingest

Blog

2016.11.15 - 번역 - A New Way To Ingest - Part 2 ...

drscg 2019. 1. 7. 10:43

This is the second part of a two-part series about ingest nodes, a new feature in Elasticsearch 5.0.

이 글은 Elasticsearch 5.0 의 새로운 기능인 ingest node에 대한 2개의 시리즈 중 두번째 파트이다.

In the first part we talked about what ingest nodes are, and how to configure and use them. In this second part we will focus on how to use ingest nodes as part of a deployment of the Elastic Stack.

첫 번째 파트에서, ingest node가 무엇이고, 그것의 구성과 사용 방법에 대해 이야기했다. 이 2번째에서는 Elastic Stack 배포시에 ingest node의 사용법에 집중할 것이다.

A quick word about architecture

Many types of data have to be changed in some way (fields added or removed, text fields parsed, etc.) before it is indexed into Elasticsearch. The tool of choice for this task has so far been Logstash. With the ingest node in Elasticsearch, there is now an alternative for some cases: connecting one of the Beats directly to Elasticsearch and doing the transformations in the ingest node.

많은 유형의 data는 Elasticsearch에 index되기 전에 몇 가지 방식으로 변경(field가 추가/제거되거나, text field를 parsing하는 등)되어야 한다. 이 작업을 위해 선택한 tool은 지금까지는 Logstash였다. Elasticsearch에서 ingest node를 사용하면, 이제 몇 가지 경우(Beat 중 하나를 Elasticsearch에 바로 연결하고 ingest node에서 변경)에 대안이 있다.

Screen Shot 2016-11-07 at 18.41.10.png

By default, the ingest functionality is enabled on any node. If you know you’ll have to process a lot of documents you might want to start a dedicated Elasticsearch node that will only be used for running ingest pipelines. To do that, switch off the master and data node types for that node in elasticsearch.yml (node.master: false and node.data: false) and switch off the ingest node type for all other nodes (node.ingest: false).

기본적으로, ingest 기능은 모든 node에서 활성화된다. 많은 document를 처리해야 하는 경우, ingest pipeline만 실행하는 전용 Elasticsearch node를 시작할 수 있다. 이를 위해, 해당 node의 elasticsearch.yml 에서 master, data node type을 off(node.master: false, node.data: false)하고 모든 다른 node에 대해 ingest node type을 off(node.ingest: false)한다.

When consuming data from either Filebeat or Winlogbeat it might also no longer be necessary to use a message queue. Both of these Beats (but not others at this time) can handle backpressure: If Elasticsearch can’t keep up with indexing requests due to a large burst of events, these Beats will fall behind in sending the most recent data but will maintain a pointer to the last successfully indexed position in the files (Filebeat) or the event log (Winlogbeat). Once Elasticsearch has again caught up with the indexing load, the Beats will catch up to the most recent events.

Filebeat 나 Winlogbeat 를 사용하는 경우, message queue를 사용할 필요가 없다. 이들 Beat(현재 다른 것들은 아니다)는 back pressure를 처리할 수 있다. 많은 event로 인해 Elasticsearch가 index request를 따라가지 못하면, 이들 Beat는 최신 data를 전송하지는 못하지만 file(FileBeat)이나 event log(Winlogbeat)에서 성공적으로 index된 위치에 대한 point를 유지한다. Elasticsearch가 index 부하를 처리하면, Beat는 가장 최근의 event를 처리한다.

It might still be a good idea to use a message queue such as Kafka or Redis, for example to decouple Beats and Elasticsearch architecturally or to be able to take Elasticsearch offline, e.g. for an upgrade. Keep in mind though that the ingest node is a push-based system, it will not read from a message queue. So when using a queue, it’s necessary to use Logstash to push the data from the queue into Elasticsearch and its ingest node.

Beat나 Elasticsearch를 분리하거나 upgrade를 위해 Elasticsearch를 offline으로 전환할 수 있도록 하기 위해, Kafka나 Resdis 같은 message queue를 사용하는 것은 여전히 좋은 생각이다. ingest node는 push 기반의 system이고 message queue에서 읽지 않는다 라는 것을 기억하자. 따라서, queue를 사용할 경우, queue에서 Elasticsearch와 그것의 ingest node로 data를 push하기 위해 Logstash를 사용해야 한다.

Configure Filebeat and Elasticsearch

To deploy the new architecture described above we need to configure three components: Filebeat, Elasticsearch (incl. the ingest node) and Kibana. We’ll continue to use the example of ingesting web access logs from an Apache httpd web server.

위에서 이야기한 새로운 architecture를 구현하려면, 3가지 구성요소(Filebeat, ingest node를 포함한 Elasticsearch, Kibana)를 구성해야 한다. Apache httpd web server의 web access log를 ingest하는 예를 계속해 보자.

We’ll take the configuration of each component in turn:

각 구성요소를 차례로 구성해 보자.

Elasticsearch

All the configuration was already described in Part 1 of this blog post series, but for reference we’ll include the relevant pieces here as well.

모든 구성은 이 게시물 시리즈의 첫 번째 파트에서 이미 설명하였지만, 참고를 위해 관련 정보를 포함한다.

We’ll be using two ingest plugins that need to be installed first:

먼저 설치해야 하는 2개의 plugin을 사용한다.

bin/elasticsearch-plugin install ingest-geoip

bin/elasticsearch-plugin install ingest-user-agent

Having done that successfully, we can create an ingest pipeline for web access logs (the format is that used by Kibana’s Console):

이를 설치했으면, web access log를 위한 ingest pipeline(Kibana의 Console에서 사용하는 형식)을 생성할 수 있다.

PUT _ingest/pipeline/access_log
{
  "description" : "Ingest pipeline for Combined Log Format",
  "processors" : [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \\[%{HTTPDATE:timestamp}\\] \"%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}"]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "formats": [ "dd/MMM/YYYY:HH:mm:ss Z" ]
      }
    },
    {
      "geoip": {
        "field": "clientip"
      }
    },
    {
      "user_agent": {
        "field": "agent"
      }
    }
  ]
}

Note that the order of processors is important, as they are executed one after the other for each incoming document. First we use grok to extract the fields (clientip, timestamp, bytes, etc.), and then we use the appropriate processors (date, geoip, user_agent) on the extracted data.

들어오는 document를 하나씩 차례로 실행하므로, processor의 순서가 중요하다. 먼저, field(clientip, timestamp, bytes 등)를 추출하기 위해 grok 를 사용하고, 그 다음에 추출한 data에 적절한 processor(date, geoip, user_agent)를 사용한다.

Filebeat

Filebeat is used to collect the log messages line by line from the log file. In the filebeat.yml configuration file this will look something like this:

Filebeat는 log file에서 line 별로 log message를 수집하는데 사용된다. filebeat.yml 설정 파일에서, 이 부분은 다음과 같다.

filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/apache2/access_log

Further down, we can configure the Elasticsearch output:

또한, Elasticsearch output을 구성할 수 있다.

output.elasticsearch:
  hosts: [<insert-your-es-host>:<es-port>]
  template.name: "access_log"
  template.path: <insert-path-to-template>
  pipeline: access_log

template.path has to point to a file containing an Elasticsearch template. Filebeat ships with a file called filebeat.template.json that you can modify. For the example in this blog post, we need to add one field to the properties section:

template.path 는 Elasticsearch template을 가지고 있는 file을 가리켜야 한다. Filebeat은 수정할 수 있는 filebeat.template.json 라는 file을 제공한다. 이 게시물의 예제를 위해서는 properties 부분에 field 하나를 추가해야 한다.

"geoip": {
  "properties": {
    "location": {
      "type": "geo_point"
    }
  }
}

This will allow us to visualise the locations of IP addresses in our logs on a map in Kibana later.

이를 통해, 나중에 Kibana의 지도에 log에 있는 IP의 위치를 시각화할 수 있다.

Ingest the data

We’re good to go! We can run Filebeat to start indexing logs into our Elasticsearch:

Filebeat를 실행하여 Elasticsearch에 log index를 시작하자.

./filebeat -c filebeat.yml -e

Note: -e causes Filebeat to log to stderr instead of a log file and shouldn’t be used in production.

-e 는 log를 log file이 아닌 stderr로 보내므로, 제품에서는 사용하면 안된다.

Using the _cat API we can check that the data has been successfully ingested:

_cat API를 사용하여, data가 잘 ingest되는지 확인할 수 있다.

GET _cat/indices/filebeat*?v

health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   filebeat-2016.10.28 W_p-RZwQRWWn_S49jwaTFw   5   1     300000            0    212.7mb        212.7mb

Use it in Kibana

To use the data in Kibana, we'll configure an index pattern for filebeat-*. These configurations are at Management -> Index Pattern.

Kibana에서 data를 사용하기 위해, index pattern을 filebeat-* 로 설정하자. 이들 구성은 Management -> Index Pattern 에 있다.

Screen Shot 2016-10-28 at 17.05.29.png

To explore the data a bit, we can use Kibana’s Discover view. The search bar allows us to explore website visits by country, requested resource and other criteria.

data를 좀 더 탐샛하기 위해, Kibana의 Discover view를 사용할 수 있다. search bar를 통해, 국가별 website 방문, 요청된 resource 등을 살펴볼 수 있다.

For example, in this dataset we can search for visits to the blog section of the website from Amsterdam using the query request:blog AND geoip.city_name:"Amsterdam":

예를 들어, 이 data 집합에서 query request:blog AND geoip.city_name:"Amsterdam"를 사용하여, 암스테르담에서 website의 blog section 방문을 search할 수 있다.

Screen Shot 2016-10-28 at 17.23.23.png

In the Visualize section, we can build a pie chart of Top Countries:

Visualize section에서, 상위 국가들의 pie chart를 만들 수 있다.

Screen Shot 2016-10-28 at 17.32.12.png

We can build complex visualisations in Timelion, e.g. with this query we see the fluctuations in the amount of requests from the North America and Europe overlaid with the number of all requests.

Timelion 등에서 복잡한 시각화를 만들 수 있다. 아래 query로, 북미와 유럽에서 들어온 request의 양이 머든 request의 수와 겹치는 것을 볼 수 있다.

.es().bars(10).label(World) .es(q="geoip.continent_name:North America", metric=count).points(10, fill=10).label("North America") .es(q="geoip.continent_name:Europe", metric=count).points(10, fill=10).label(Europe)

Screen Shot 2016-10-28 at 17.40.52.png

A complete dashboard for web traffic might look something like this:

web traffic에 대한 완벽한 dashboard는 다음과 같다.

Screen Shot 2016-11-01 at 01.30.13.png

Conclusion

With the ingest node, there is now a way to transform data inside Elasticsearch before indexing it. This is especially useful if only simpler operations are required, while more complex ones can still be performed using Logstash. Written in Java, operations performed in the ingest node are very efficient.

ingest node를 사용하면, index하기 전에 Elasticsearch 내부에서 data를 변환할 수 있는 방법이 있다. 이는 Logstash를 사용하여 더 복잡한 작업을 수행할 수도 있지만, 더 간단한 작업만 필요한 경우 특히 유용하다. java로 작성할 경우, ingest node에서 수행되는 작업은 매유 효율적이다.

When you don’t need the additional power and flexibility of Logstash filters, this allows you to simplify your architecture for simpler use cases. And with Kibana and Timelion as one of its new built-in features you have the perfect tool for visualising the data.

Logstash filter의 추가적인 강력함과 유연성이 필요하지 않은 경우, 간단한 사용을 위해 architecture를 간소화할 수 있다. 그리고 Kibana와 그것의 새로운 내장 기능인 Timelion은 data 시각화에 완벽한 tool을 제공한다.

원문 : A New Way To Ingest - Part 2

저작자표시 (새창열림)

'Blog' 카테고리의 다른 글

2017.01.05 - 번역 - Numeric and Date Ranges in Elasticsearch: Just Another Brick in the Wall ... (0)	2019.01.07
2016.12.14 - 번역 - State of the official Elasticsearch Java clients ... (0)	2019.01.07
2016.11.11 - 번역 - Every shard deserves a home ... (0)	2019.01.07
2016.09.29 - 번역 - Elasticsearch as a column store ... (0)	2019.01.07
2016.09.27 - 번역 - A New Way To Ingest - Part 1 ... (0)	2019.01.07

현재글2016.11.15 - 번역 - A New Way To Ingest - Part 2 ...

elasticsearch, definitive guide

score, Cluster, phrase, Query, Type, replica, Term, json, inverted, primary, full-text, cache, Filter, parent, MATCH, index, Size, Shard, Mapping, Relevance,

Today :
Yesterday :

不爲也比不能也