HeapDumpOnOutOfMemoryError on K8s

There are a few reasons why OOM might happen in a JVM. For some of them a JVM will crash with an option to write heap dump to a file system. None of us wants to get OOM on prod, and have to reconfigure deployments and hope for the worst to happen again, this time with some fallback plan.

In a JVM this can be configured with:

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/heapDumpDirectory'

but the problem is that each heap dump will be by default saved to a file with the same or similar name: /heapDumpDirectory/<process_id>.hprof Potentially causing losses or damaged heap dump files or even a single file with 0 bytes.

To avoid this we need to specify a unique filename for each pod, this isn’t possible with Kubernetes deployments, as all pods will inherit the same configuration, so we need to leverage environment variables for it.

Each pod has own internal IP and hostname that can be used for that and this metadata can be robustly set in Kubernetes deployments to configure dynamic filename where heapdump file will be saved.

There’s one caveat to it, the environment variable must be defined before JAVA_TOOL_OPTIONS.

For my deployments it looks like this:

          env:
            - name: POD_HOSTNAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: JAVA_TOOL_OPTIONS
              value: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/heapDumpDirectory/$(POD_HOSTNAME).hprof

Make sure that the persistent volume where /heapDumpDirectory is mounted has write permissions!