2017.08.10 - 번역 - Sequence IDs: Coming Soon to an Elasticsearch Cluster Near You ...

Blog

2017.08.10 - 번역 - Sequence IDs: Coming Soon to an Elasticsearch Cluster Near You ...

drscg 2019. 1. 7. 11:48

What If...

"What if" questions are fun. "What if you had a time machine: when would you visit?" "What if you had one wish and it would come true: what would it be?" They're the types of hypothetical questions you can ask at a dinner party and get insights about what interests and motives somebody has if barriers were broken down. A few years ago at Elastic, we asked ourselves a "what if" that we knew would lead to interesting insights:

"만약에?(What if?) ..." 라는 질문은 재미있다. "만약 타임머신이 있다면? 어느 시대로 가 볼까?", "만약 네가 가진 한 가지 소망이 이루어진다면, 무엇을 하고 싶니?" 이런 것들은 저녁 식사 시간에서물어볼 수있는 가설적인 질문 유형으로, 어떤 장벽같은 것이 무너지면 누군가의 관심과 동기에 대한 통찰력을 얻을 수 있다. 몇 년 전에 Elastic에서, 우리가 알고 있는 것이 흥미로운 통찰력으로 이어질 것이라는, "만약에" 라고 스스로에게 질문했다.

"What if we had total ordering on index operations in Elasticsearch? What could we build?"
만약 Elasticsearch에서 index 작업에 완벽한 순서를 부여
(total ordering)할 수 있다면, 무엇을 만들 수 있을까?

The answers were far ranging:

답은 다양했다.

We could build a sort of "Changes API" that could take an operation ID and give you a list of changes to the data since then. Neat!
작업 ID를 취할 수있는 일종의 "Changes API" 를 만들고, 그 이후로 data에 대한 변경 목록을 제공한다. 깔끔하다.
We could build an incremental reindex by looking for only the index operations that changed!
변경된 index 작업만 찾아서, 증분 reindex를 만들 수 있다.
We could use that incremental reindexing functionality to build entity-centric indices through a filtered/continual reindex!
filtering된/연속적인 reindex를 통해 entity-centric indices를 만들기 위해, 증분 reindexing 기능을 사용할 수 있다!
We could build data-rollups / summarized indices that don't rely on data arriving on-time, in-order, and with globally accurate timestamps!
제시간에 도착하지 않는 data, 순서에 맞는 data, 전 세계적으로 정확한 timestamp에 의존하지 않는, data-rollups / 요약된 indices를 만들 수 있다!
We could build something like materialized views that get updated as new data/operations arrive!
새로운 data/작업이 도착하자 마자 update되는 materialized views 같은 것을 만들 수 있다.
We could build a way to replay operations between nodes if any are lost from things like network disconnects which would make recovery much faster!
network이 끊어지는 것 같은 상황으로 손실이 있을 경우, 복구(recovery)를 훨씬 더 빠르게 하는, node간의 작업을 재생하는 방안을 만들 수 있다.
We could build a way to replay operations between clusters! Cross-datacenter replication!
cluster간의 작업을 재생할 수있는 방법을 만들 수 있다! data center간의 replication!

All of this requires one little barrier to be broken down: adding sequence numbers to index operations. Easy: we just need to add a counter to every operation in the primary! So easy, we saw several iterations of community members and employees trying their hand at it. But as we peeled back the layers of the onion, we discovered it was much, much more complicated than what it first appeared. Almost 6 years after we were first discussing how useful a Changes API would be, we still don't have one. What gives?! The purpose of this blog is to share what happened behind the scenes and give some insights into the answer for this question.

이 모든 것은 하나의 작은 장벽을 허물어야 하는데, 바로 index 작업에 sequence number를 추가하는 것이다. 간단하다. primary의 모든 작업에 counter를 추가하기만 하면 된다! 너무 간단해서, 그것을 해 보려는 커뮤니티 회원들과 직원들을 여러 번 보았다. 하지만 (양파 껍질을 벗겨 내면서,) 그것이 처음 나타났을때 보다 훨씬 더 복잡하다는 것을 알게 되었다. Change API가 얼마나 유용한지에 대해, 처음으로 이야기한 지 거의 6년이 지난, 지금도 여전히 그런 API가 없다. 뭘 준다고? 이 블로그의 목적은 그동안 무슨 일이 일었는지를 공유하고, 이 질문에 대한 답에 대한 통찰력을 주는 것이다.

In the last two years, we practically rewrote the replication logic from the ground up. We took the best things from known academic algorithms, while making sure we could still ensure the parallelism that makes Elasticsearch so fast: something many if not all traditional consensus algorithms can't do. We collaborated with distributed systems specialists and built a TLA+ specification for our replication model. We added lots of tests and test infrastructure.

지난 2 년 동안, replication logic을 사실상 처음부터 다시 작성했다. Elasticsearch를 매우 빠르게 만드는 병렬 처리(parallelism)를 보장할 수 있는지 확인하면서, 알려진 학술 알고리즘에서 최상의 것을 가져왔다. 전부는 아니지만, 많은 전통적인 합의(consensus) 알고리즘은 할 수 없다. 우리는 분산 시스템 전문가와 협력하여, replication model을 위한 TLA+ 명세를 만들었다. 우리는 많은 test와 test infrastructure를 추가했다.

This blog is necessarily technical, as we dig into some of the guts of how Elasticsearch does replication. However, we've aimed to make it accessible to a wider audience by explaining/defining/linking to some terminology, even if you may already understand it. First, let's dive into some of the challenges Elasticsearch deals with.

이 블로그는 Elasticsearch가 replication을 수행하는 방법에 대해 알아보고 있기 때문에, 어쩔 수 없이 전문적이다. 그러나, 이미 이해하고 있을지라도, 일부 용어에 대해 설명/정의/연결함으로써, 더 많은 사용자가 접근할 수 있게 하는 것을 목표로 삼았다. 먼저, Elasticsearch에서 다루는 몇 가지 과제에 대해 살펴 보겠다.

Challenges

Before we go much further, we have to talk a bit about our replication model and why it matters. In an Elasticsearch data index, data is split up into what are called "shards" which are basically sub-divisions of the index. You may have 5 primary shards (basically 5 sub-divisions of the index) and each primary shard may have any number of copies (called replicas) of that primary. But it's important that there's only 1 "primary shard" (often shortened to primaries) for each sub-division. The primary shard accepts indexing operations (indexing operations are things like "add documents" or "delete a document") first and replicates the index operations to replicas. It is therefore fairly straightforward to keep incrementing a counter and assign each operation a sequence number before it's being forwarded to the replicas. And as long as nobody ever restarts a server, you have a network with 100% uptime, your hardware doesn't fail, you don't have any long Java garbage collection events, and nobody ever upgrades the software, this straightforward approach actually works.

더 나아 가기 전에, replication model,과 그것이 중요한 이유에 대해 잠시 이야기 하겠다. Elasticsearch data index에서, data는 기본적으로 index의 하위 부문인 "shard"로 분할된다. 5 개의 primary shard (기본적으로 index의 5 개 하위 부문)가 있다면, 각 primary shard에는 해당 primary shard의 사본(replica라고 함)이 있을 수 있다. 그러나 각 하위 부문별로 1 개의 "primary shard"(종종 줄여서 primary)만 있는 것이 중요하다. primary shard는 먼저 indexing 작업( "document 추가" 또는 "document 삭제" 와 같은 indexing 작업)을 하고, replica에 index 작업을 복제한다. 따라서 counter를 증가시키고 replica에 전달되기 전에 각 작업에 sequence number를 할당하는 것은 매우 간단하다. 아무도 server를 다시 시작하지 않고, network이 항상 가동되고, hardware가 고장나지 않고, 긴 Java garbage collection 이벤트가 없으며, 아무도 software를 upgrade하지 않는 한, 이 간단한 접근 방식이 실제로 작동한다.

But we live in the real world and it's when these assumptions change that Elasticsearch can end up in a "failure" mode and "recovery" process. If they affect a node running a primary shard, it may require the primary to step down and for another replica to take its place. Since the change can come abruptly, it is possible that some ongoing indexing operations were not fully replicated yet. If you had 2 or more replicas, some of those operations may have reached one and not the other. Even worse, because Elasticsearch indexes documents concurrently (which is part of why Elasticsearch is so fast!), each of those replicas can have a different set of operations that do not exist in the other one. Even if you run with one replica (the default setting in Elasticsearch) there might trouble. If an old primary copy comes back and is added as a replica, it may contain operations that were never replicated to the new primary. All of these scenarios have one thing in common: the history of operations on shards can diverge upon primary failure and we need some way to fix it.

그러나 우리는 현실 세계에 살고 있고, 이러한 가정이 바뀌면, Elasticsearch는 결국 "failure" mode와 "recovery" process로 끝날 수 있다. 그것들이 primary shard를 실행하는 node에 영향을 주는 경우, primary를 down시키고 다른 replica를 대신 사용해야 할 수도 있다. 변경이 갑자기 발생할 수 있기 때문에, 일부 진행중인 indexing이 아직 완전히 복제되지 않았을 수 있다. replica가 2 개 이상인 경우, 해당 작업 중 일부는 어떤 replica에는 도달하고 다른 replica에는 도달하지 못할 수도 있다. 더 심각한 것은, Elasticsearch가 동시에 document를 index하기 때문에(Elasticsearch가 매우 빠른 이유의 일부이다), 각 replica는 다른 replica에 존재하지 않는, 다른 집합을 가질 수 있다는 것이다. 하나의 replica로 실행((Elasticsearch의 기본 설정)하더라도, 문제가 있을 수 있다. 기존의 primary 복사본이 다시 replica로 추가되면, 새로운 primary에 복제되지 않은 작업이 포함될 수 있다. 이 모든 시나리오에는 공통점이 하나 있다. shard에서의 작업 내역은, primary 실패시 예상대로 안될 수 있고, 우리는 이를 해결할 방법이 필요하다.

Primary Terms & Sequence Numbers

The first step we took was to be able to distinguish between old and new primaries. We have to have a way to identify operations that came from an older primary vs operations that come from a newer primary. On top of that, the entire cluster needs to be in agreement as to that so there's no contention in the event of a problem. This drove us to implement primary terms. These primary terms are incremental and change when a primary is promoted. They're persisted in the cluster state, thus representing a sort of “version” or “generation” of primaries that the cluster is on. With primary terms in place, any collision in the operation history can be resolved by looking at the operations’ primary term. New terms win over older terms. We can even start rejecting operations that are too old and avoid messy situations.

첫 번째 단계는 기존의 primary와 새로운 primary를 구분할 수 있게 하는 것이었다. 기존의 primary에서 나온 작업과 새로운 primary에서 나온 작업을 식별할 수 있는 방법이 필요했다. 게다가, 전체 cluster가 이에 동의해야 하기 때문에, 문제가 발생할 경우에 논란의 여지가 없다. 이로 인해, primary terms를 구현하게 되었다. 이 primary terms는 증분이며 primary가 승격될 때의 변경 사항이다. 그것들은 cluster state에서 유지되므로, cluster에 있는 primary의 일종의 "version" 또는 "generation" 을 나타낸다. primary terms를 사용하면, 작업의 primary term을 보고, 작업 기록의 충돌을 해결할 수 있다. 새로운 terms는 기존의 terms보다 우선한다. 심지어 너무 오래되고 골치아픈 상황을 피하는 작업을 거부할 수 있다.

Once we had the protection of primary terms in place, we added a simple counter and started issuing each operation a sequence number from that counter. These sequence numbers thus allow us to understand the specific order of index operations that happened on the primary, which we can use for a variety of purposes we'll talk about in the next few sections. You can see the assigned sequence number and the primary term in the response:

일단 primary terms의 보장를 받게 되자, 간단한 counter를 추가했고, 해당 counter로 부터, 각 작업에 sequence number를 발급하기 시작했다.따라서, 이 sequence number를 통해, primary에서 발생한 index 작업의 특정 순서를 이해할 수 있다. 이것을, 다음 몇몇 섹션에서 다룰, 다양한 목적으로 사용할 수 있다. response에서 할당된 sequence number와 primary term을 볼 수 있다.

$ curl -H 'Content-Type: application/json' -XPOST http://127.0.0.1:9200/foo/doc?pretty -d '{ "bar": "baz" }'
{
  "_index" : "foo",
  "_type" : "doc",
  "_id" : "MlDBm10BditXXu4kjj5E",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 19,
  "_primary_term" : 1
}

Notice the _seq_no and _primary_term which are now returned with the response.

이제 response와 함께 반환되는 _seq_no 및 _primary_term 에 주목하자.

Local and Global Checkpoints

With primary terms and sequence numbers, we had the tools we need to theoretically be able to detect differences between shards and realign them when a primary fails. An old primary with primary term x can be recovered by removing operations with primary term x which don't exist in the new primary's history and missing operations with a higher primary term can be indexed into the old primary.

primary terms와 sequence numbers를 사용하면, 이론적으로 shard 사이의 차이점을 찾을 수 있고, primary가 실패할 경우 shard를 다시 정렬할 수 있는 수단을 가지게 된다. primary term x를 가진 기존의 primary는, 새로운 primary의 기록에 존재하지 않는 primary term x를 가진 작업을 제거하여, 복구될 수 있으며, 더 큰 primary term을 가진 누락된 작업을 기존 primary에 index할 수 있다.

Sadly, comparing histories of millions of operations is just impractical when you're simultaneously indexing hundreds of thousands or millions of events per second. Storage costs become prohibitive and the computational effort of a straightforward comparison will just take too long. In order to deal with this, Elasticsearch maintains a safety marker called the global checkpoint. The global checkpoint is a sequence number for which we know that all active shards histories are aligned at least up to it. Said differently, all operations with a lower sequence number than the global checkpoint are guaranteed to have been processed by all active shards and are equal in their respective histories. This means that after a primary fails, we only need to compare operations above the last known global checkpoint between the new primary and any remaining replicas. When the old primary comes back, we take the global checkpoint it last knew about and compare operations above it with the new primary. This way, we only compare operations that need comparing rather than complete history.

안타깝게도, 수백만 건의 작업 내역을 비교하는 것은, 초당 수십만 또는 수백만 건의 이벤트를 동시에 indexing 할 경우, 비현실적이다. storage 비용이 엄청나게 늘어나고, 간단한 비교를 위한 계산 작업이 너무 오래 걸린다. 이를 처리하기 위해, Elasticsearch는 global checkpoint라고 하는 안전 장치(safety marker)를 유지한다. global checkpoint는 모든 활성화된 shard 이력이 최소한 이에 맞추어 정렬된다는 것을 알고 있는 sequence number이다. 달리 말하면, global checkpoint보다 더 작은 sequence number를 갖는 모든 작업은, 모든 활성화된 shard에 의해 처리되고, 각각의 기록에서 동일하다는 것이 보장된다. 즉, primary가 실패한 후에는, 새로운 primary와 남아있는 replica 사이의, 마지막 global checkpoint 이상의 작업만 비교하면 된다. 기존 primary가 돌아 왔을 때, 우리는 마지막으로 알고 있던 global checkpoint를 사용하여, 그보다 큰 작업과 새로운 primary를 비교한다. 이렇게 하면, 전체 기록이 아닌 비교가 필요한 작업을 비교할 뿐이다.

Advancing the global checkpoint is the responsibility of the primary shard. It does so by keeping track of completed operations on the replicas. Once it detects that all replicas have advanced beyond a given sequence number, it will update the global checkpoint accordingly. Instead of keeping track of all operations all the time, the shard copies maintain a local variant the global checkpoint called the local checkpoint. The local checkpoint is a sequence number for which all lower sequence numbers were processed on the shard copy. Whenever the replicas acknowledge (or ack) a write operation to the primary, they also give it an updated local checkpoint. Using the local checkpoints, the primary is able to update the global checkpoint which is then sent to all shard copies during the next indexing operation.

global checkpoint의 증가는 primary shard의 책임이다. replica에서 완료된 작업을 추적하여 그렇게 한다. 모든 replica가 주어진 sequence number를 넘어서고 있음을 알게 되면, 이에 따라 global checkpoint를 update한다. 항상 모든 작업을 추적하는 대신, shard 복사본은, local checkpoint라는, global checkpoint의 local 변수를 유지한다. local checkpoint는, 더 작은 sequence number가 shard 복사본에서 모두 처리되었다는 것을 나타내,는 sequence number이다. replica가 primary에 쓰기 작업을 알릴(또는 ack) 때마다, primary는 update된 local checkpoint를 제공한다. local checkpoint를 사용하여, primary는 global checkpoint를 update할 수 있으며, 이것은 다음 indexing 작업 시에 모든 shard 복사본으로 전송된다.

The following animation shows what happens as sequence numbers and global/local checkpoints are increased given concurrency challenges in the face of things like lossy networks and sudden failures:

다음 애니메이션은, network 손실 및 갑작스러운 실패와 같은 상황에서, 동시성(concurrency) 문제가 발생하면, sequence number와 global/local checkpoint가 증가함에 따라, 어떤 일이 발생하는지 보여준다.

As index operations are sent from the primary to the replicas concurrently, we keep track of the highest sequence number that every replica has acknowledged receipt of and call this the global checkpoint. The primary tells all replicas what the global checkpoint is. Thus, if a primary switches, we need to compare and potentially re-process only the operations since the last global checkpoint rather than all the files on disk.

index 작업이 primary에서 replica로 동시에 전송 될 때마다, 모든 replica가 수신을 알린 가장 큰 sequence number를 추적하여, 이를 global checkpoint라 한다. primary는 모든 replica에 global checkpoint를 알려준다. 따라서 primary가 바뀌면, disk의 모든 file보다는, 마지막 global checkpoint 이후의 작업만 비교하고, 잠재적으로 다시 처리해야 한다.

The Global Checkpoint has another nice property — it represents the boundary between operations that are guaranteed to stay (they are in the histories of all active shards) and the region which can contain operations that may be rolled back if the primary just happened to fail before they were fully replicated and acknowledge to the user. This is subtle but important property which will be crucial to a future Changes API or Cross-Datacenter Replication features.

global checkpoint는 또 다른 멋진 속성을 가지고 있다. 이는 유지(stay)가 보장되는 작업(모든 활성화된 shard의 기록에 있음)과, primary가 완전히 복제되고 사용자에게 알리기 전에 실패하여, rolled back될 수 있는 작업을 포함하는 영역간의 경계를 나타낸다. 이것은 미래의 changes API 또는 Cross-Datacenter Replication 기능에 중요할 수 있는, 미묘하지만 중요한 속성이다.

The First Benefit: Faster Recovery

We skipped over how the actual recovery process worked prior to Elasticsearch 6.0. When Elasticsearch recovers a replica after it has been offline, it has to make sure that that replica is identical with the active primary. Inactive shards have synced flush markers to quickly make this validation but shards with active indexing simply have no guarantees. If a shard goes offline while there is still active indexing, the new primary shard then copies Lucene segments (which are files on disk) across the network. This can be a heavy, time-consuming operation if those shards are large. This had to happen because we weren't keeping track of individual write operations (sequence numbers) until 6.0 and behind the scenes, Lucene merges all the adds/updates/deletes into larger segments in a way that you can't recover the individual operations that made up the changes… that is, unless you keep the transaction log (or translog) around for a period of time.

Elasticsearch 6.0 이전의 recovery process가 실제로 어떻게 작동하는지에 대해서는 다루지 않았다. Elasticsearch가 offline이 된 후 replica를 복구할 때, replica가 활성화된 primary와 동일한 지 확인해야 한다. 활성화되지 않은 shard는 신속하게 이를 검증하기 위해, flush marker를 동기화하지만, indexing이 활성화된 shard는 단순하게 보장하지 않는다. 여전히 indexing이 활성화되어 있는 동안 shard가 offline 상태가 되면, 새로운 primary shard는 network을 통해 Lucene segment(disk에 있는 file)를 복사한다. 해당 shard가 큰 경우, 시간이 오래 걸리는 무거운 작업이 될 수 있다. 이것은 6.0 까지 개별적인 쓰기 작업(sequence numbers)을 추적하지 않았기 때문에 발생했다. 그리고, background에서, Lucene은 변경 사항을 구성하는 개별 작업을 복구할 수 없는 방식으로, 모든 추가/업데이트/삭제를 큰 segment로 병합한다... 일정 기간 동안 transaction log (또는 translog)를 유지하지 않는 한 말이다.

That's what we now do: we keep the translog until it grows "too large" or "too old" to warrant keeping any more of it. If a replica needs to be "brought up to date" we use the last global checkpoint known to that replica and just replay the relevant changes from the primary translog rather than an expensive large file copy. If the primary's translog was "too large" or "too old" to be able to re-play to the replica, then we fall back to the old file-based recovery.

그것이 지금 우리가 하는 일이다: translog를 보관을 보증하기 위하여, 그것이 "너무 크다" 또는 "너무 오래되었다" 의 경우가 될 때까지 translog를 보관한다. replica를 "최신 상태로 유지" 해야 할 경우, 해당 replica가 알고 있는 마지막 global checkpoint를 사용하고, 비용이 많이 드는 대용량 file 복사가 아닌, primary translog에서 관련 변경 사항을 재생한다. replica에 재생하려는 복제본을 다시 재생할 수 있도록 하기 위한, primary의 translog가 "너무 크다" 또는 "너무 오래되었다" 의 경우에는, 기존의 file 기반 recovery로 되돌아간다.

If you've been operating a large cluster that has real network disconnects, restarts, upgrades, etc, we expect this will make you significantly happier as you won't be waiting for long periods as shards recover.

실제 network 연결이 끊어지거나, 재시작, upgrade 등을 가진 대규모 cluster를 운영해 왔다면, shard가 복구되는 동안 긴 시간을 기다리지 않아도 되므로, 상당히 더 만족스러워할 것으로 기대한다.

Things to Know

As mentioned in the last section, a translog is kept until it's "too large" or "too old" to warrant keeping it. How do we determine what's too large or too old? It's configurable, of course! In 6.0, we're introducing 2 new translog settings:

마지막에 언급한 것처럼, translog의 보관을 보증하기 위하여, 그것이 "너무 크다" 또는 "너무 오래 되었다" 의 경우가 될 때까지, translog는 유지된다. 너무 크거나 너무 오래된 것을 어떻게 판단해야 하는가? 물론, 설정이 가능하다! 6.0에서는 다음과 같은 2가지 새로운 translog 설정을 소개하고 있다.

index.translog.retention.size: defaults to 512mb. If the translog grows past this, we only keep this amount around.
기본값은 512 MB이다. 이 값을 초과하면, 이 양만 유지한다.
index.translog.retention.age: defaults to 12h. We don't keep translog files past this age.
기본값은 12h(시간)이다. 이 시간을 경과한 translog file은 보관하지 않는다.

These settings are important because they affect how well the new, faster recovery works as well as the disk usage. A larger translog retention size or a longer age will mean that you'll have a higher chance of recovering with the new faster recovery vs relying on the older file-based recovery. However, they come with a cost: these increase disk utilization, and remember that transaction logs are per-shard. As a working example, if you have 20 indices and each has 5 primary shards, and you're writing lots of data over a 12 hour period, it's possible to have 20*5*512mb = 50GB of extra disk utilized by the translog until that 12 hour window rolls off. You can tune this up or down on a per-index basis if you have different recovery and sizing needs on different indices. For example, you may want to consider any adjustments to the translog retention windows if you expect machine or node maintenance.

이러한 설정은, disk 사용뿐만 아니라, 새롭고 더 빠른 recovery 작업에 영향을 주기 때문에, 중요하다. translog 보존 크기가 크거나 보존 시간이 길면, 기존의 file 기반 recovery에 의존하기 보다는, 새로운 더 빠른 recovery를 통해, 복구할 가능성이 더 높아진다. 그러나 비용이 발생합니다. disk 사용률이 증가한다. 그리고, transaction log가 shard 단위라는 것을 기억하자. 실제 예로써, 20 개의 index가 있고, 각각 5 개의 primary shard가 있고, 12 시간 동안 많은 양의 data를 write하고 있다면, 12 시간이 지날 때까지, translog에 의해 활용되는 추가 disk는 20 * 5 * 512 MB = 50 GB로 추정할 수 있다. index마다 다른 복구 및 크기 조정 요구 사항이 있는 경우, index 단위로 조정할 수 있다. 예를 들어, machine 또는 node 유지 관리가 필요할 경우, translog 보존 기간을 조정할 수 있다.

Note: prior to 6.0 the translog size could grow to 512MB (by default) under indexing as well per the index.translog.flush_threshold_size setting. This means that the new retention policy will not change the storage requirements for active shards. The change does impact shards from which indexing stops. Instead of cleaning up the translog, we now keep it around for another 12 hours.

6.0 이전 버전의 경우, translog 크기는, index.translog.flush_threshold_size 설정에 따라, indexing 시에 512 MB(기본값)까지 증가할 수 있다. 즉, 새로운 보존 정책은 활성화된 shard의 storage 요구 사항을 변경하지 않는다. 이 변경은 indexing이 중단되는 shard에 영향을 준다. 이제는 translog를 정리하는 대신, 12시간 더 보관할 수 있다.

The Next Benefit: Cross-Datacenter Replication

As mentioned at the beginning of the post, there are lots of wonderful things we could do in Elasticsearch if we had ordered index operations. It took a while, but now we do. Faster recovery is the first use case we decided to build in: it allows us to test the waters of the new functionality we added.

초반부에 언급했듯이, index 작업에 순서를 부여할 수 있다면, Elasticsearch에서 할 수 있는 놀라운 일들이 많이 있다. 시간이 좀 걸렸지만, 이제 우리는 하고 있다. 더 빠른 recovery는 우리가 만들기로 한 첫 번째 use case이다. 이를 통해 우리가 추가한 새로운 기능의 가능성을 테스트할 수 있다.

But we know that cross-datacenter replication is also a common ask by our enterprise customers, so that's another feature we're going to be adding soon. This requires new APIs to be built, new additional monitoring functionality on top of the replication, and yes, a lot more testing and documentation.

그러나, cross-datacenter replication도 기업 고객의 공통적인 요구 사항이라는 것을 알고 있기 때문에, 곧 추가될 또 다른 기능이다. 이를 위해서는 새로운 API를 구축해야 하고, replication을 기반으로 한 새로운 모니터링 기능, 더 많은 test와 문서화가 필요하다.

There's still more to do

As you can see on the Sequence Numbers GitHub issue, we're off to a good start with the sequence numbers feature, but there is still work to be done! We think the work done so far represents a major step forward even if it doesn't cover everything we can build up/around sequence numbers. If you're interested in following our work as we continue, feel free to follow that ticket or PRs with the label :Sequence IDs or just ping us on discuss!

Sequence Numbers GitHub issue에서 볼 수 있듯이, sequence numbers 기능의 출발은 순조롭지만, 여전히 해야 할 일이 있다! 우리는 지금까지 수행된 작업이, sequence numbers와 관련하여 만들 수 있는, 모든 것을 다루지는 않았더라도, 중요한 진전을 보여주었다고 생각한다. 계속되는 우리의 작업에 관심이 있다면, 부담없이 to follow that ticket or PRs with the label :Sequence IDs or just ping us on discuss!

The framework for us to build on is there in 6.0 and we're excited by the next "what if" question we can ask and what kind of answers it may bring. For now, if you want to try out the new sequence numbers recovery, download 6.0.0-beta1 and become a pioneer!

우리가 만들어야 할 framework는 6.0에 있다. 우리는 다음에 질문할 수 있는 "만약에" 라는 질문과 그 대답에 흥분된다. 우선은, 새로운 sequence numbers recovery를 해 보려면, 6.0.0-beta1을 다운로드하고, 선구자가 되어보자!

원문 : Sequence IDs: Coming Soon to an Elasticsearch Cluster Near You

저작자표시

'Blog' 카테고리의 다른 글

2017.09.18 - 번역 - How many shards should I have in my Elasticsearch cluster? ... (0)	2019.01.07
2017.08.21 - 번역 - Intorducing Index Sorting in Elasticsearch 6.0 ... (0)	2019.01.07
2017.05.23 - 번역 - In which order are my Elasticsearch queries/filters executed? ... (0)	2019.01.07
2017.05.08 - 번역 - Indexing your CSV files with Elasticsearch Ingest Node ... (0)	2019.01.07
2017.04.10 - 번역 - Better Query Planning for Range Queries in Elasticsearch ... (0)	2019.01.07

현재글2017.08.10 - 번역 - Sequence IDs: Coming Soon to an Elasticsearch Cluster Near You ...

elasticsearch, definitive guide

Query, replica, Type, score, MATCH, Term, parent, Size, json, inverted, Filter, index, cache, Cluster, Mapping, Shard, full-text, Relevance, phrase, primary,

Today :
Yesterday :

不爲也比不能也