Native memory leak in a cloud environment

A Java project with published container image that contains intentionally leaky native code to observe symptoms of a memory leak in Java in podman/docker or Kubernetes.

Native code intentionally “leaks” provided number of megabytes in a loop. The project runs by default with -XX:NativeMemoryTracking=summary enabled.

I wanted to observe how JVM will report native memory, crash and what pod and JVM metrics will look like.

Java doesn’t allocate many objects, almost none. This is pure Java project without any frameworks, and it runs as java -jar, except location of the native library needs to specified (see run.sh).

When I started the pod in Kubernetes and executed jcmd inside the pod:

kubectl apply -f pod.yaml
watch -d 'kubectl exec -it openjdk -- jcmd 1 VM.native_memory | tail -n 50'

I observed that the memory footprint remains constant for almost all areas, indicating that the JVM is unable to spot any memory leaks from the native memory allocated by an external library.

screenshot from terminals

Yet, memory consumed by the pod is constantly increasing

grafana pod memory

The native library is directed to allocate 1 MB 1024 times within a pod with a 1 GB limit. Consequently, this ultimately results in an Out-Of-Memory (OOM) event leading to termination of the process, even though the JVM is not the root cause of the issue.

❯ kgp
NAME      READY   STATUS      RESTARTS   AGE
openjdk   0/1     OOMKilled   0          19m

To identify a native code leaking memory, we must compare memory used by the JVM, often it’s represented as “heap area” and “non-heap area” to pod memory usage.