7-2-7. Heap: Sizing and Swapping

2.X/7. Administration Monitoring Deployment

7-2-7. Heap: Sizing and Swapping

drscg 2017. 9. 23. 12:12

The default installation of Elasticsearch is configured with a 1 GB heap. For just about every deployment, this number is usually too small. If you are using the default heap values, your cluster is probably configured incorrectly.

Elasticsearch를 기본으로 설치하면, 1 GB의 heap으로 구성된다. 거의 모든 경우에, 일반적으로 이 숫자는 너무 작다. 기본 heap 값을 사용하는 경우, 아마도 cluster는 올바르게 구성되지 않을 것이다.

There are two ways to change the heap size in Elasticsearch. The easiest is to set an environment variable called ES_HEAP_SIZE. When the server process starts, it will read this environment variable and set the heap accordingly. As an example, you can set it via the command line as follows:

Elasticsearch에서 heap의 크기를 변경하는 방법에는 두 가지가 있다. 가장 쉬운 방법은 ES_HEAP_SIZE 라 불리는 환경 변수를 설정하는 것이다. 서버 프로세스가 시작되면, 이 환경변수를 읽어, 그에 따라 heap을 설정한다. 예를 들면, 아래 command line으로 설정할 수 있다.

export ES_HEAP_SIZE=10g

Alternatively, you can pass in the heap size via JVM flags when starting the process, if that is easier for your setup:

그 대신에, 프로세스를 시작할 때, JVM flag로 heap의 크기를 전달할 수 있다.

ES_JAVA_OPTS="-Xms10g -Xmx10g" ./bin/elasticsearch

실행 시에 heap의 크기가 조정(많은 비용이 소모된다)되는 것을 방지하기 위하여, 최소(Xms)와 최대(Xmx) 크기가 동일한 지를 확인하자.

Generally, setting the ES_HEAP_SIZE environment variable is preferred over setting explicit -Xmx and -Xms values.

일반적으로, –Xms 와 –Xmx 값을 명시적으로 설정하는 것보다 ES_HEAP_SIZE 환경 변수 설정을 선호한다.

Give (less than) Half Your Memory to Luceneedit

A common problem is configuring a heap that is too large. You have a 64 GB machine—and by golly, you want to give Elasticsearch all 64 GB of memory. More is better!

흔히 발생하는 문제는 너무 큰 heap을 구성하는 것이다. 64 GB machine을 가지고 있는데, Elasticsearch에 64 GB를 모두를 주려고 한다. 많을수록 더 낫다!

Heap is definitely important to Elasticsearch. It is used by many in-memory data structures to provide fast operation. But with that said, there is another major user of memory that is off heap: Lucene.

heap은 Elasticsearch에 확실히 중요하다. 그것은 빠른 연산을 제공하기 위하여, 많은 in-memory 데이터 구조에 의해 사용된다. 그러나 heap과는 별개 인 메모리의 또 다른 주요 사용자, Lucene이 있다.

Lucene is designed to leverage the underlying OS for caching in-memory data structures. Lucene segments are stored in individual files. Because segments are immutable, these files never change. This makes them very cache friendly, and the underlying OS will happily keep hot segments resident in memory for faster access. These segments include both the inverted index (for fulltext search) and doc values (for aggregations).

Lucene은 in-memory 데이터 구조를 caching하기 위하여, OS를 활용하도록 설계되었다. Lucene segment는 개별 file에 저장된다. segment는 불변이기 때문에, 이 파일들은 절대로 변경되지 않는다. 이것은 그들을 cache에 매우 우호적으로 만든다. 그리고, 기본 OS는 다행히도, 더 빠른 액세스를 위해, 메모리에 상주하는 hot segment를 유지한다. 이들 segment는 inverted index(fulltext 검색을 위하여)와 doc values(aggregation을 위하여) 모두를 포함한다.

Lucene’s performance relies on this interaction with the OS. But if you give all available memory to Elasticsearch’s heap, there won’t be any left over for Lucene. This can seriously impact the performance.

Lucene의 성능은 이러한 OS와의 상호 작용에 의존적이다. 그런데, 이용할 수 있는 모든 메모리를 Elasticsearch의 heap에 할당하면, Lucene을 위해 남아 있는 메모리가 없다. 이것은 성능에 심각한 영향을 미칠 수 있다.

The standard recommendation is to give 50% of the available memory to Elasticsearch heap, while leaving the other 50% free. It won’t go unused; Lucene will happily gobble up whatever is left over.

표준 권장 사항은 Elasticsearch의 heap에 이용할 수 있는 메모리의 50%를 할당하고, 나머지 50%는 남겨두는 것이다. 그것이 쓰이지 않을 수도 있지만, Lucene은 행복하게 남아 있는 것을 먹어 치울 것이다.

If you are not aggregating on analyzed string fields (e.g. you won’t be needing fielddata) you can consider lowering the heap even more. The smaller you can make the heap, the better performance you can expect from both Elasticsearch (faster GCs) and Lucene (more memory for caching).

analyzed string field를 aggregation하지 않는다면(예를 들어, fielddata가 필요하지 않다면) heap을 더 낮게 하는 것을 고려할 수 있다. heap을 더 작게 할당할 수록, elasticsearch(더 빠른 GC)와 lucene(caching을 위한 더 많은 메모리) 양쪽 모두로부터 더 니은 성능을 기대할 수 있다.

Don’t Cross 32 GB!edit

There is another reason to not allocate enormous heaps to Elasticsearch. As it turns out, the HotSpot JVM uses a trick to compress object pointers when heaps are less than around 32 GB.

Elasticsearch에 엄청난 heap을 할당하지 말아야 할 또 다른 이유가 있다. 알다시피, JVM은 heap이 32 GB보다 적은 경우, 오브젝트 포인터를 압축하는 방법을 사용한다.

In Java, all objects are allocated on the heap and referenced by a pointer. Ordinary object pointers (OOP) point at these objects, and are traditionally the size of the CPU’s native word: either 32 bits or 64 bits, depending on the processor. The pointer references the exact byte location of the value.

Java에서, 모든 오브젝트는 heap에 할당되고 포인터에 의해 참조된다. OOP(Ordinary Object Pointer)는 이런 오브젝트를 가리키고, 전통적으로 CPU의 기본 word 크기이다. 따라서, 프로세서에 따라 32 bit나 64 bit가 된다. 포인터는 값의 정확한 byte 위치를 참조한다.

For 32-bit systems, this means the maximum heap size is 4 GB. For 64-bit systems, the heap size can get much larger, but the overhead of 64-bit pointers means there is more wasted space simply because the pointer is larger. And worse than wasted space, the larger pointers eat up more bandwidth when moving values between main memory and various caches (LLC, L1, and so forth).

32-bit 시스템의 경우, 이것은 최대 heap의 크기는 4 GB임을 의미한다. 64-bit 시스템의 경우, heap의 크기는 훨씬 더 커질 수 있지만, 64-bit 포인터의 오버헤드는 단순히 포인터가 더 크기 때문에, 더 낭비되는 공간이 있다는 것을 의미한다. 그리고, 낭비되는 것보다 더 나쁜 점은, 더 큰 포인터가 주 메모리와 다양한 cache(LLC, L1 등) 사이에서 값을 옮길 때, 더 많은 대역폭을 잡아먹는다는 것이다.

Java uses a trick called compressed oops to get around this problem. Instead of pointing at exact byte locations in memory, the pointers reference object offsets. This means a 32-bit pointer can reference four billion objects, rather than four billion bytes. Ultimately, this means the heap can grow to around 32 GB of physical size while still using a 32-bit pointer.

Java는 이 문제를 해결하기 위해, compressed oops라는 방법을 사용한다. 메모리에서 정확한 byte 위치를 가리키는 대신에, 포인터가 object offsets 을 참조한다. 즉, 32-bit 포인터는 40억 byte가 아닌, 40억개의 object 를 참조할 수 있다. 즉, 32-bit 포인터를 사용하더라도, heap은 물리적인 크기인 32 GB 정도까지 커질 수 있다.

Once you cross that magical ~32 GB boundary, the pointers switch back to ordinary object pointers. The size of each pointer grows, more CPU-memory bandwidth is used, and you effectively lose memory. In fact, it takes until around 40–50 GB of allocated heap before you have the same effectivememory of a heap just under 32 GB using compressed oops.

마법 같은 32 GB 경계선을 넘어서면, 포인터는 보통의 오브젝트 포인터로 다시 전환된다. 각 포인터의 크기는 커지고, 더 많은 CPU-메모리 대역폭이 사용되고, 사실상 메모리를 잃어버린다. 사실, 할당된 heap이 약 40 ~ 50 GB 정도가 되어야, compressed oops를 사용한 32 GB heap과 동일한 효과 가 발생한다.

The moral of the story is this: even when you have memory to spare, try to avoid crossing the 32 GB heap boundary. It wastes memory, reduces CPU performance, and makes the GC struggle with large heaps.

이 이야기의 의미는 다음과 같다. 메모리에 여유가 있더라도, 32 GB heap의 경계를 넘는 것을 피하자. 그것은 메모리의 낭비이고, CPU의 성능을 저하시키고, GC는 큰 heap으로 애를 먹게 된다.

Just how far under 32gb should I set the JVM?edit

Unfortunately, that depends. The exact cutoff varies by JVMs and platforms. If you want to play it safe, setting the heap to 31gb is likely safe. Alternatively, you can verify the cutoff point for the HotSpot JVM by adding -XX:+PrintFlagsFinal to your JVM options and checking that the value of theUseCompressedOops flag is true. This will let you find the exact cutoff for your platform and JVM.

유감스럽게도 상황에 따라 다르다. 정확한 한계는 JVM과 platform에 따라 다르다. 안전하게 운영하려면, heap을 31gb 로 설정하는 것이 안정적일 것이다. 그 대신에, JVM option에 -XX:+PrintFlagsFinal 을 추가하고, UseCompressedOops flag의 값이 true 인 것을 확인하여, HotSpot JVM을 위한 한계점(cutoff point)을 확인할 수 있다. 이것으로 여러분의 platform과 JVM에 대한 한계점을 찾을 수 있다.

For example, here we test a Java 1.7 installation on MacOSX and see the max heap size is around 32600mb (~31.83gb) before compressed pointers are disabled:

예를 들어, MacOSX에 Java 1.7을 설치하고 테스트해 보자. 압축 포인터를 비활성화되기 전에, heap의 최대 크기가 32600mb(~31.83gb)인 것을 알 수 있다.

$ JAVA_HOME=`/usr/libexec/java_home -v 1.7` java -Xmx32600m -XX:+PrintFlagsFinal 2> /dev/null | grep UseCompressedOops
     bool UseCompressedOops   := true
$ JAVA_HOME=`/usr/libexec/java_home -v 1.7` java -Xmx32766m -XX:+PrintFlagsFinal 2> /dev/null | grep UseCompressedOops
     bool UseCompressedOops   = false

In contrast, a Java 1.8 installation on the same machine has a max heap size around 32766mb (~31.99gb):

그에 반해서, 동일한 machine에 Java 1.8을 설치하니, heap의 최대 크기가 32766mb(~31.99gb)이다.

$ JAVA_HOME=`/usr/libexec/java_home -v 1.8` java -Xmx32766m -XX:+PrintFlagsFinal 2> /dev/null | grep UseCompressedOops
     bool UseCompressedOops   := true
$ JAVA_HOME=`/usr/libexec/java_home -v 1.8` java -Xmx32767m -XX:+PrintFlagsFinal 2> /dev/null | grep UseCompressedOops
     bool UseCompressedOops   = false

The moral of the story is that the exact cutoff to leverage compressed oops varies from JVM to JVM, so take caution when taking examples from elsewhere and be sure to check your system with your configuration and JVM.

compressed oops에 영향을 주는 정확한 한계점은 JVM에 따라 다르다는 것이 이야기의 핵심이다. 따라서, 다른 곳에서 예제를 실행할 때 주의해야 하고, 시스템의 설정과 JVM을 확인해야 한다.

Beginning with Elasticsearch v2.2.0, the startup log will actually tell you if your JVM is using compressed OOPs or not. You’ll see a log message like:

Elasticsearch v2.2.0을 시작해 보면, 시작 부분의 log는 JVM의 compressed OOPs 사용 여부를 실제로 알려줄 것이다. log 메세지는 다음과 같다.

[2015-12-16 13:53:33,417][INFO ][env] [Illyana Rasputin] heap size [989.8mb], compressed ordinary object pointers [true]

Which indicates that compressed object pointers are being used. If they are not, the message will say [false].

이것은 compressed object pointer가 사용되고 있다는 의미이다. 만약 아니라면, 메세지는 [false] 일 것이다.

I Have a Machine with 1 TB RAM!

The 32 GB line is fairly important. So what do you do when your machine has a lot of memory? It is becoming increasingly common to see super-servers with 512–768 GB of RAM.

32 GB 라인은 매우 중요하다. machine이 많은 메모리를 가지고 있는 경우 무엇을 하는가? 512 ~ 768 GB RAM을 가진 슈퍼 서버가 점차 일반화되고 있다.

First, we would recommend avoiding such large machines (see Hardware).

먼저, 그런 큰 machine을 피할 것을 추천한다.(Hardware 참조)

But if you already have the machines, you have two practical options:

그러나, 이미 machine을 가지고 있다면, 두 가지 현실적인 옵션이 있다.

Are you doing mostly full-text search? Consider giving 4-32 GB to Elasticsearch and letting Lucene use the rest of memory via the OS filesystem cache. All that memory will cache segments and lead to blisteringly fast full-text search.
full-text 검색이 대부분인가? Elasticsearch에 4-32 GB를 할당하고, Lucene이 OS file system cache를 통해, 메모리의 나머지를 사용하도록 하는 것을 고려하자. 해당 모든 메모리는 segment를 cache하고, 매우 빠른 full-text 검색으로 이어질 것이다.
Are you doing a lot of sorting/aggregations? Are most of your aggregations on numerics, dates, geo_points and not_analyzed strings? You’re in luck! Give Elasticsearch somewhere from 4-32 GB of memory and leave the rest for the OS to cache doc values in memory.
정렬과 aggregation이 많이 있는가? numeric, date, geo_point 그리고 not_analyzedstring에 대한 aggregation이 대부분인가? 운이 좋다. elasticsearch에 4-32 GB 정도의 메모리를 할당하고, 메모리에 doc value를 cache하기 위하여 OS에 나머지를 남겨주자.
Are you doing a lot of sorting/aggregations on analyzed strings (e.g. for word-tags, or SigTerms, etc)? Unfortunately that means you’ll need fielddata, which means you need heap space. Instead of one node with more than 512 GB of RAM, consider running two or more nodes on a single machine. Still adhere to the 50% rule, though. So if your machine has 128 GB of RAM, run two nodes, each with just under 32 GB. This means that less than 64 GB will be used for heaps, and more than 64 GB will be left over for Lucene.
analyzed string(태그, SigTerms 등)에 대한 정렬과 aggregation이 많이 있는가? 불행하게도 fielddata가 필요하며, 그것은 heap 공간이 필요하다는 의미이다. 512 GB의 RAM 이상을 가진 하나의 node 대신, 하나의 machine에 2개 이상의 node를 실행하자. 여전히 50% 규칙을 준수하자. machine이 128 GB의 RAM을 가지고 있다면, 각각 32 GB 이하를 가진 두 개의 node를 실행하자. 즉, 64 GB 이하는 heap으로 사용되고, 64 GB 이상은 Lucene을 위해 남을 것이다.
If you choose this option, set cluster.routing.allocation.same_shard.host: true in your config. This will prevent a primary and a replica shard from colocating to the same physical machine (since this would remove the benefits of replica high availability).
이 옵션을 선택한다면, 설정에 cluster.routing.allocation.same_shard.host: true 를 설정하자. 이것은 primary와 replica shard가 동일한 물리적 machine에 같이 위치(왜냐하면, 이것은 replica의 고 가용성을 제거한다)하는 것을 방지한다.

Swapping Is the Death of Performanceedit

It should be obvious, but it bears spelling out clearly: swapping main memory to disk will crushserver performance. Think about it: an in-memory operation is one that needs to execute quickly.

그것은 명확하지만, 상세히 설명하기가 쉽지 않다. 주 메모리를 디스크에 swap하는 것은 서버 성능을 망친다. 생각해 보자. in-memory 작업은 신속하게 실행되어야 할 것 중 하나이다.

If memory swaps to disk, a 100-microsecond operation becomes one that take 10 milliseconds. Now repeat that increase in latency for all other 10us operations. It isn’t difficult to see why swapping is terrible for performance.

메모리를 디스크에 swap하면, 100ms 작업은 10ms가 소요되는 작업이 된다. 이제, 다른 모든 10us 작업에 대해, 대기 시간이 증가되기를 반복한다. swapping이 성능에 끔찍한 이유를 쉽게 알 수 있다.

The best thing to do is disable swap completely on your system. This can be done temporarily:

가장 좋은 방법은 시스템에서 swap을 완전히 비활성화하는 것이다. 이것을 일시적으로 설정하려면, 아래와 같이 한다.

sudo swapoff -a

To disable it permanently, you’ll likely need to edit your /etc/fstab. Consult the documentation for your OS.

영구적으로 비활성화하려면, /etc/fstab 을 편집해야 한다. OS에 대한 설명서를 참고하자.

If disabling swap completely is not an option, you can try to lower swappiness. This value controls how aggressively the OS tries to swap memory. This prevents swapping under normal circumstances, but still allows the OS to swap under emergency memory situations.

swap을 완전히 비활성화할 수 없다면, 더 낮은 swappiness 를 해볼 수 있다. 이것은 OS가 얼마나 적극적으로 메모리를 swap하려 하는지를 제어하는 값이다. 이것은 정상적인 상황에서 swap을 막는다. 하지만 메모리가 비상인 상황에서는 여전히 swap이 가능하다.

For most Linux systems, this is configured using the sysctl value:

대부분의 linux 시스템의 경우, sysctl 값을 사용하여, 이를 설정할 수 있다.

vm.swappiness = 1

swappiness 1 은 0 보다 더 좋다. 왜냐하면, 어떤 kernel 버전에서, swappiness 0 은 OOM-killer가 될 수 있다.

Finally, if neither approach is possible, you should enable mlockall. file. This allows the JVM to lock its memory and prevent it from being swapped by the OS. In your elasticsearch.yml, set this:

마지막으로, 양쪽 방식 모두가 불가능하다면, mlockall file을 활성화해야 한다. JVM은 자신의 메모리를 잠그고, 그것이 OS에 의해 swap되는 것을 막는다. elasticsearch.yml 에서 설정할 수 있다.

bootstrap.mlockall: true

'2.X > 7. Administration Monitoring Deployment' 카테고리의 다른 글

7-2-5. Important Configuration Changes (0)	2017.09.23
7-2-6. Don’t Touch These Settings! (0)	2017.09.23
7-2-8. File Descriptors and MMap (0)	2017.09.23
7-2-9. Revisit This List Before Production (0)	2017.09.23
7-3. Post-Deployment (0)	2017.09.23

현재글7-2-7. Heap: Sizing and Swapping

elasticsearch, definitive guide

Relevance, index, Size, primary, Mapping, full-text, cache, Cluster, Term, MATCH, parent, Type, replica, json, Query, Filter, score, inverted, Shard, phrase,

Today :
Yesterday :

不爲也比不能也