Pod size considerations for JVM

Page content

In this post I’ll describe things you want to consider to let JVM use own ergonomic configuration, without drastically overriding them, for which you need more advanced tuning and more metrics.

Pod sizing for GC

The limit numbers of processors and memory impacts how JVM will tune its own performance characteristics.

Most importantly it impacts what GC will be used and how many threads it will start to clean up memory, which impacts how frequent and long GC pauses are.

The magic numbers

The smaller the number of CPUs and MiB of memory, the less “hardware efficient” GC is selected. It doesn’t mean that the GC will be less efficient than other non-ergonomic. JVM GCs are designed to best utilise hardware they run on.

The table below shows requirements of a GC to be ergonomically enabled:

MemoryCPU#GC threads
SerialGCanyany1
ParallelGC--linked to the number of CPUs
G1GC>= 1792 MiB>= 2linked to the number of CPUs
Shenandoah--linked to the number of CPUs
ZGC--linked to the number of CPUs

Check this article for more opinions on GCs.

The following can be easily demonstrated with:

podman run --memory=1791m --cpus=2 -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep -E 'Use.*GC.*ergonomic'
     bool UseSerialGC                              = true                                      {product} {ergonomic}

podman run --memory=1792m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep -E 'Use.*GC.*ergonomic'
     bool UseSerialGC                              = true                                      {product} {ergonomic}{product} {ergonomic}

podman run --memory=1792m --cpus=2 -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep -E 'Use.*GC.*ergonomic'
     bool UseG1GC                                  = true                                      {product} {ergonomic}

Small pods

1 CPU and under, 3 GiB of memory.

SerialGC is most likely to be showing best performance characteristics on pods with 1 processor and under and smaller memory. SerialGC creates only one thread to process the heap, for both new and old generation. If a GC would be running multiple threads on a single core, they would start competing for CPU resources and causing more context switching in OS kernel, increasing stop the world pauses.

Medium pods

4 CPU and under, 8 GiB of memory.

ParallelGC is a controversial one, it’s suited for medium size pods, but it’s never enabled by the ergonomics, ZGC and Shenandoah are designed for huge heaps, also never enabled ergonomically. It’s either SerialGC or G1GC (Java 11 - 21).

ParallelGC has very simple performance characteristic. It scales up vertically with the number of available processors. 2 processors 2 threads, 4 processors 4 threads.

ParallelGC makes sense for a few CPU cores (under 4) and heaps under 4 GiB, this is mostly because it runs fewer threads than G1GC, giving more air to other threads/JVMs running on the same small size K8s node.

More often than not, G1GC will be your preferred choice. It has higher overhead on the machine by running more threads, but it completes GC cycles faster and can avoid some Stop The World pauses that ParallelGC doesn’t, making P(99) latency better.

Big pods

As of Java 17 I would still go for G1GC by default. I frequently run performance tests with ZGC and Shenandoah to see how they progressed, but so far I’ve seen them having slightly (± 5%) better P(99) latency, but higher memory and CPU utilisation. Even on pods with 30 CPU and 16 GiB heap G1GC was showing better overall performance characteristics than ZGC.

Yes, ZGC has much shorter pauses, but overall memory consumption increased by ~20% and CPU utilisation increased so much, it was affecting other neighbouring pods. Overall I noticed increase in average latency for all types of requests, but drop for P(95) and P(99). Performance optimisation is a moving target and you need to decide whether you care more about averages or outliers.

JVM flags for heap size

This section shows how -XX:MinRAMPercentage and -XX:MaxRAMPercentage impact heap size depending on pods resources. Both flags can not be used together, only one of them has an impact.

Both flags are widely considered to be poorly named. *RAMPercentage isn’t actually about RAM, but about heap area of memory allocated to the JVM, and use of these flags changes depending on how much memory is available in a container.

MinRAMPercentage

Default value of MinRAMPercentage is 50%. The flag is NOT used for all containers, it only applies to pods will small memory limits.

To clearly demonstrate it, I run java in a container with some informational flags to print heap size depending on the memory limit. I changed default value of Min and Max to higher values to help me easier scan the output visually. Each podman execution has different memory limit, which is also printed above the output from podman and grep.

echo 200
podman run --memory=200m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap

echo 206
podman run --memory=206m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap

echo 207
podman run --memory=207m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap

echo 208
podman run --memory=208m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap

echo 210
podman run --memory=210m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap

echo 254
podman run --memory=254m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap

echo 256
podman run --memory=256m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap

the output from this script is as the below:

bash run.sh
200
    Max. Heap Size (Estimated): 116.00M
206
    Max. Heap Size (Estimated): 119.88M
207
    Max. Heap Size (Estimated): 121.81M
208
    Max. Heap Size (Estimated): 162.44M
210
    Max. Heap Size (Estimated): 162.44M
254
    Max. Heap Size (Estimated): 197.25M
256
    Max. Heap Size (Estimated): 199.19M

For a pod with 206 memory limit, we had 119 MiB of heap, which is about 57%, so -XX:MinRAMPercentage=60 was used here.

For a pod with 208 memory limit, we had 162 MiB of heap, which is about 77%, so -XX:MaxRAMPercentage=80 was used here.

JVM changes which flag is used based on the memory limit being around ~208 Mi.

Adding 2 MiB of memory increased heap area by ~43 MiB, and you can clearly see when what flag was used.

MaxRAMPercentage

Default value of MaxRAMPercentage (MRP) is 25%. From my experiments it made sense to keep it at default for containers with memory limit under 2 GiB. Then it becomes a scaling exercise to reduce memory waste, and reduce GC pauses. Be aware that Microsoft recommends setting it much higher at much earlier stages.

In deployments I monitor, I tune this value based on size of a container and duration of GC pauses, utilisation of heap is also important. Values I use are between 75% and 15% depending on characteristics of a functionality. Eg. I set 75% for JVMs that store a lot of data in memory, eg. caches (Infinispan), and 15% for high throughput + low latency (Spring Reactive). This practice helps me reduce waste and shorten GC pauses, which in both JVMs I try to keep under 10ms average.

The bigger the heap, the more objects it will store, the longer average GC pause can be, also the bigger the pod, the more user and GC threads it will be able to run, so it’s important to find a balance somewhere. I monitor heap and non-heap usage in a JVM to see how much memory is “wasted”, by never being allocated by either thread stacks, heap, native code etc. When this becomes too high, it’s time to scale down memory limits of a container or increase heap size. In one extreme case I lowered memory limit from 4 GiB to 1 GiB with no impact on response times and throughput of a pod, by increasing MRP on that service.

InitialRAMPercentage

Default value of InitialRAMPercentage is around 1.5% regardless of the JVM memory limit. InitialRAMPercentage has a short alias -Xms where the value can be specified in bytes, kilobytes or megabytes. For consistency I will use InitialRAMPercentage.

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep InitialRAMPercentage
   double InitialRAMPercentage                     = 1.562500                                  {product} {default}

podman run --memory=16g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep InitialRAMPercentage
   double InitialRAMPercentage                     = 1.562500                                  {product} {default}

podman run --memory=200m --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep InitialRAMPercentage
   double InitialRAMPercentage                     = 1.562500                                  {product} {default}

The value of 1.5625 is actually results of 100/64, which comes from InitialRAMFraction, just another way to control heap allocation:

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep Fraction
    uintx InitialRAMFraction                       = 64                                        {product} {default}

What a great functionality, 3 ways to achieve the same thing ¯\(ツ)/¯.

This flag sets initial heap size as a percentage of the total JVM memory.

To demonstrate this flag I have the following code:

import java.lang.management.ManagementFactory;
import java.lang.management.MemoryUsage;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ForkJoinPool;

import com.sun.management.OperatingSystemMXBean;

public class GetMemorySizes {

  public static void main(String[] args) {
    System.out.println("# --- Memory");
    OperatingSystemMXBean osmxb = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
    System.out.println("getTotalMemorySize: " + osmxb.getTotalMemorySize() / (1024 * 1024));
    System.out.println("getFreeMemorySize: " + osmxb.getFreeMemorySize() / (1024 * 1024));
    System.out.println("getCommittedVirtualMemorySize: " + osmxb.getCommittedVirtualMemorySize() / (1024 * 1024));

    System.out.println("# --- Heap");
    java.lang.management.MemoryMXBean memBean = ManagementFactory.getMemoryMXBean();
    MemoryUsage heapMemoryUsage = memBean.getHeapMemoryUsage();

    System.out.println("HeapMax:\t" + heapMemoryUsage.getMax() / (1024 * 1024));
    System.out.println("HeapCommitted:\t" + heapMemoryUsage.getCommitted() / (1024 * 1024));

    System.out.println("HeapUsed:\t" + heapMemoryUsage.getUsed() / (1024 * 1024));
    System.out.println("HeapInit:\t" + heapMemoryUsage.getInit() / (1024 * 1024));
  }
}

I will execute this code inside a container where IRP (1.5625%) and MRP (25%) have the default values and save the result. For easier maths and mental model I set the container limit to 4 GiB and CPU to 1:

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java /mnt/GetMemorySizes.java

# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4042
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax:        989
HeapCommitted:  61
HeapUsed:       14
HeapInit:       64

Treat this as our baseline with default JVM configuration.

Now I’ll keep the default value of MRP, but increase IRP to 25% so both are the same:

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=25 /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4028
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax:        989
HeapCommitted:  989
HeapUsed:       44
HeapInit:       1024

Notice that getInit is now as expected 25% of the total memory of the container.

It’s important to note that InitialRAMPercentage value can be higher than MaxRAMPercentage. It obviously does not make sense, so it will default to the value of MRP. Do not set IRP to 100% and MRP to 50%. The value of 50% will be used.

I’ll raise the number to 100% and 75% just to show that IRP can be higher than MRP, but won’t throw any error and the output is the same:

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=100 -XX:MaxRAMPercentage=25 /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4027
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax:        989
HeapCommitted:  989
HeapUsed:       44
HeapInit:       1024

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=50 -XX:MaxRAMPercentage=25 /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4027
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax:        989
HeapCommitted:  989
HeapUsed:       44
HeapInit:       1024

In fact, the heap numbers didn’t even change at all. All this means that our heap is already initialised with 25% memory of the container, none of that memory is used. You can see that the OS still says

getFreeMemorySize: 4027
getCommittedVirtualMemorySize: 3314

which mean when the JVM will want to borrow this memory from the OS and the OS agreed to this, but it isn’t utilised by the JVM yet. We’ll revisit this later.

Let’s run the program again and increase both numbers to 25 and 50 and then to 50:

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=50 /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4031
getCommittedVirtualMemorySize: 4342
# --- Heap
HeapMax:        1979
HeapCommitted:  989
HeapUsed:       44
HeapInit:       1024

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=50 -XX:MaxRAMPercentage=50 /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4027
getCommittedVirtualMemorySize: 4342
# --- Heap
HeapMax:        1979
HeapCommitted:  1979
HeapUsed:       66
HeapInit:       2048

Observe how HeapCommitted and HeapInit have changed, also how .*MemorySize has not been impacted.

IRP means that heap size starts at 25% of the JVM memory. This is your committed memory (aka allocated) and will change during program execution to meet goals of a GC. If heap size increases, committed memory size increases too. MRP is a request to the OS to reserve that memory for execution, OS promises that the memory will be available when a program needs it. In context of some cloud environments, it can be a lie, but that’s a topic for another time.

This flag helps JVM to get the guaranteed memory from the OS on startup. When you allocate an object, JVM needs memory for that, if existing committed (allocated) heap is too small, it will go the OS and request more memory. On some environment this can be time consuming, so you can avoid JVM requesting more memory from the OS each time heap area needs to grow.

AlwaysPreTouch

Intro

Remember how I promised we will come back to the getFreeMemorySize and getCommittedVirtualMemorySize? AlwaysPreTouch will commit and reserve the memory. This will make the JVM request and claim as own all the required heap memory on startup. On large heaps this flag might add even a few seconds to startup times.

The default value is false. This is to save OS memory and let JVM start a bit faster.

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep PreTouch
     bool AlwaysPreTouch                           = false                                     {product} {default}

Let’s start from a baseline where I have IRP and MRP set to 25% and APT explicitly disabled:

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=25 -XX:-AlwaysPreTouch /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4028
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax:        989
HeapCommitted:  989
HeapUsed:       44
HeapInit:       1024

Behaviour

In the next program execution with APT enabled you can see how the JVM actually takes away that memory from the OS:

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=25 -XX:+AlwaysPreTouch /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 3033
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax:        989
HeapCommitted:  989
HeapUsed:       44
HeapInit:       1024

Notice how getFreeMemorySize dropped from 4028 to 3033. APT tells the JVM to reserve this memory, it becomes “virtual size of a process”. This memory is no longer in a promise mode by the OS, it’s actually claimed and owned by the process now.

One more thing to note here, AlwaysPreTouch, reserves the memory that’s wanted by InitialRAMPercentage, not MaxRAMPercentage.

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=50 -XX:+AlwaysPreTouch /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 3032
getCommittedVirtualMemorySize: 4342
# --- Heap
HeapMax:        1979
HeapCommitted:  989
HeapUsed:       44
HeapInit:       1024

The output above shows that IRP is 1 GiB and free physical memory dropped by 1 GiB.

Now if I make IRP and MRP match to 50%:

podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=50 -XX:MaxRAMPercentage=50 -XX:+AlwaysPreTouch /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 2003
getCommittedVirtualMemorySize: 4342
# --- Heap
HeapMax:        1979
HeapCommitted:  1979
HeapUsed:       66
HeapInit:       2048

Free memory dropped to match size of IRP.

If you are using AlwaysPreTouch, make sure to set IRP to a sensible value, do not leave it with the default.

Startup times

Let’s look this AlwaysPreTouch changes startup times depending on IRP. For this demonstration I’m actually going inside a K8s pod and not executing it using podman any more.

I have this pod (simplified) spec:

apiVersion: v1
kind: Pod
metadata:
  name: javas
spec:
  containers:
  - name: jbang
    image: jbangdev/jbang-action
    command: ["/bin/sh"]
    args: ["-c", "while true; do sleep 1;done"]
    volumeMounts:
    - name: javas
      mountPath: /javas
      readOnly: false
    resources:
      requests:
        memory: '4Gi'
        cpu: '4'
      limits:
        memory: '4Gi'
        cpu: '4'
  restartPolicy: Never

I’ll use very basic time command for that, it should be enough, right?

Here are the results for 4 GiB pod:

IRPMRPAlwaysPreTouchtime
2525falsereal:0m0.379s user:0m0.876s sys:0m0.043s
2525truereal:0m0.433s user:0m0.785s sys:0m0.299s
5050falsereal:0m0.372s user:0m0.784s sys:0m0.053s
5050truereal:0m0.533s user:0m0.859s sys:0m0.544s

Despite measuring methodology being far from scientific (works on my machine), the results are pretty clear. Let’s now increase the pod size to 16 GiB:

IRPMRPAlwaysPreTouchtime
2525falsereal:0m0.371s user:0m0.763s sys:0m0.067s
2525truereal:0m0.660s user:0m0.845s sys:0m1.054s
5050falsereal:0m0.402s user:0m0.867s sys:0m0.086s
5050truereal:0m1.148s user:0m1.006s sys:0m2.377s
9595falsereal:0m0.378s user:0m0.801s sys:0m0.055s
9595truereal:0m2.108s user:0m0.959s sys:0m5.043s

sys time is where Linux had to do the work to find and provide the JVM with reserve requested memory.

Look at the differences between program executions. Normally this time would be spread over early lifetime of the JVM, but with +AlwaysPreTouch I am eghm… investing it into startup. When heap needs to grow, the memory is already there and the JVM doesn’t need to go to the OS and request more. While this makes sense for some JVMs, it won’t be recommended for most. This flag is useful when starting JVMs that are rarely recreated. It will be harmful for autoscaled microservices and lambdas, but should be a good investment for monoliths where startup time is already long and you care more about getting lower response times earlier.

Experimentation

Let’s keep talking about APT and that committed vs reserved memory :) Let’s cause some troubles

Note: Xmx is an alias for MaxRAMPercentage, with Xms you can specify max heap size in kilo or megabytes, instead of in percentages, same as Xms is an alias for InitialRAMPercentage.

In this example instead using MRP now I’ll have Xmx specified. I will also lie to the JVM about containerisation, so it ignores pod memory limits just for the demo purposes:

The JVM in fact has 2 GiB of memory available, but with -XX:-UseContainerSupport it won’t know that, it will think it has 32 GiB (OS memory).

Here’s whats going to happen:

  1. podman container with 2 GiB limit will start a java process
  2. JVM thinks it has 32 GiB of memory
  3. JVM is told it can allocate max. of 4 GiB of heap
podman run --memory=2g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -Xmx4g -XX:-UseContainerSupport /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 32045
getFreeMemorySize: 16799
getCommittedVirtualMemorySize: 6971
# --- Heap
HeapMax:        4096
HeapCommitted:  4096
HeapUsed:       27
HeapInit:       4096

Look at this, it has not crashed! JVM thinks it will be able to allocate 4GiB of heap, when it’s running in a 2 GiB container! Let’s try with AlwaysPreTouch?

podman run --memory=2g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -Xmx4g -XX:-UseContainerSupport -XX:+AlwaysPreTouch /mnt/GetMemorySizes.java

It crashes the JVM without any output, and when I list all podman containers:

podman ps -a
551ea28d02ab  docker.io/library/openjdk:21-jdk-slim-buster  java -XX:InitialR...  59 seconds ago     Exited (137) 57 seconds ago               magical_saha

I can see that it exited with status code 137, which is OOM killed. JVM died with OOM without allocating lots of objects :)

If I do the same, but with lower Xmx set to 1g, the container executes correctly

podman run --memory=2g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -Xmx1g -XX:-UseContainerSupport -XX:+AlwaysPreTouch /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 32045
getFreeMemorySize: 8674
getCommittedVirtualMemorySize: 4423
# --- Heap
HeapMax:        1024
HeapCommitted:  1024
HeapUsed:       26
HeapInit:       1024

Thanks for staying with me that long!