Pod size considerations for JVM
In this post I’ll describe things you want to consider to let JVM use own ergonomic configuration, without drastically overriding them, for which you need more advanced tuning and more metrics.
Pod sizing for GC
The limit numbers of processors and memory impacts how JVM will tune its own performance characteristics.
Most importantly it impacts what GC will be used and how many threads it will start to clean up memory, which impacts how frequent and long GC pauses are.
The magic numbers
The smaller the number of CPUs and MiB of memory, the less “hardware efficient” GC is selected. It doesn’t mean that the GC will be less efficient than other non-ergonomic. JVM GCs are designed to best utilise hardware they run on.
The table below shows requirements of a GC to be ergonomically enabled:
Memory | CPU | #GC threads | |
---|---|---|---|
SerialGC | any | any | 1 |
ParallelGC | - | - | linked to the number of CPUs |
G1GC | >= 1792 MiB | >= 2 | linked to the number of CPUs |
Shenandoah | - | - | linked to the number of CPUs |
ZGC | - | - | linked to the number of CPUs |
Check this article for more opinions on GCs.
The following can be easily demonstrated with:
podman run --memory=1791m --cpus=2 -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep -E 'Use.*GC.*ergonomic'
bool UseSerialGC = true {product} {ergonomic}
podman run --memory=1792m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep -E 'Use.*GC.*ergonomic'
bool UseSerialGC = true {product} {ergonomic}{product} {ergonomic}
podman run --memory=1792m --cpus=2 -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep -E 'Use.*GC.*ergonomic'
bool UseG1GC = true {product} {ergonomic}
Small pods
1 CPU and under, 3 GiB of memory.
SerialGC is most likely to be showing best performance characteristics on pods with 1 processor and under and smaller memory. SerialGC creates only one thread to process the heap, for both new and old generation. If a GC would be running multiple threads on a single core, they would start competing for CPU resources and causing more context switching in OS kernel, increasing stop the world pauses.
Medium pods
4 CPU and under, 8 GiB of memory.
ParallelGC is a controversial one, it’s suited for medium size pods, but it’s never enabled by the ergonomics, ZGC and Shenandoah are designed for huge heaps, also never enabled ergonomically. It’s either SerialGC or G1GC (Java 11 - 21).
ParallelGC has very simple performance characteristic. It scales up vertically with the number of available processors. 2 processors 2 threads, 4 processors 4 threads.
ParallelGC makes sense for a few CPU cores (under 4) and heaps under 4 GiB, this is mostly because it runs fewer threads than G1GC, giving more air to other threads/JVMs running on the same small size K8s node.
More often than not, G1GC will be your preferred choice. It has higher overhead on the machine by running more threads, but it completes GC cycles faster and can avoid some Stop The World pauses that ParallelGC doesn’t, making P(99)
latency better.
Big pods
As of Java 17 I would still go for G1GC by default. I frequently run performance tests with ZGC and Shenandoah to see how they progressed, but so far I’ve seen them having slightly (± 5%) better P(99)
latency, but higher memory and CPU utilisation. Even on pods with 30 CPU and 16 GiB heap G1GC was showing better overall performance characteristics than ZGC.
Yes, ZGC has much shorter pauses, but overall memory consumption increased by ~20% and CPU utilisation increased so much, it was affecting other neighbouring pods. Overall I noticed increase in average latency for all types of requests, but drop for P(95)
and P(99)
. Performance optimisation is a moving target and you need to decide whether you care more about averages or outliers.
JVM flags for heap size
This section shows how -XX:MinRAMPercentage
and -XX:MaxRAMPercentage
impact heap size depending on pods resources. Both flags can not be used together, only one of them has an impact.
Both flags are widely considered to be poorly named. *RAMPercentage
isn’t actually about RAM, but about heap area of memory allocated to the JVM, and use of these flags changes depending on how much memory is available in a container.
MinRAMPercentage
Default value of MinRAMPercentage
is 50%. The flag is NOT used for all containers, it only applies to pods will small memory limits.
To clearly demonstrate it, I run java in a container with some informational flags to print heap size depending on the memory limit. I changed default value of Min and Max to higher values to help me easier scan the output visually. Each podman
execution has different memory limit, which is also printed above the output from podman
and grep
.
echo 200
podman run --memory=200m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap
echo 206
podman run --memory=206m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap
echo 207
podman run --memory=207m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap
echo 208
podman run --memory=208m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap
echo 210
podman run --memory=210m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap
echo 254
podman run --memory=254m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap
echo 256
podman run --memory=256m --cpus=1 -ti openjdk:21-jdk-slim-buster java -XX:MaxRAMPercentage=80 -XX:MinRAMPercentage=60 -XshowSettings:vm -version | grep Heap
the output from this script is as the below:
bash run.sh
200
Max. Heap Size (Estimated): 116.00M
206
Max. Heap Size (Estimated): 119.88M
207
Max. Heap Size (Estimated): 121.81M
208
Max. Heap Size (Estimated): 162.44M
210
Max. Heap Size (Estimated): 162.44M
254
Max. Heap Size (Estimated): 197.25M
256
Max. Heap Size (Estimated): 199.19M
For a pod with 206 memory limit, we had 119 MiB of heap, which is about 57%, so -XX:MinRAMPercentage=60
was used here.
For a pod with 208 memory limit, we had 162 MiB of heap, which is about 77%, so -XX:MaxRAMPercentage=80
was used here.
JVM changes which flag is used based on the memory limit being around ~208 Mi.
Adding 2 MiB of memory increased heap area by ~43 MiB, and you can clearly see when what flag was used.
MaxRAMPercentage
Default value of MaxRAMPercentage
(MRP) is 25%. From my experiments it made sense to keep it at default for containers with memory limit under 2 GiB. Then it becomes a scaling exercise to reduce memory waste, and reduce GC pauses. Be aware that Microsoft recommends setting it much higher at much earlier stages.
In deployments I monitor, I tune this value based on size of a container and duration of GC pauses, utilisation of heap is also important. Values I use are between 75% and 15% depending on characteristics of a functionality. Eg. I set 75% for JVMs that store a lot of data in memory, eg. caches (Infinispan), and 15% for high throughput + low latency (Spring Reactive). This practice helps me reduce waste and shorten GC pauses, which in both JVMs I try to keep under 10ms average.
The bigger the heap, the more objects it will store, the longer average GC pause can be, also the bigger the pod, the more user and GC threads it will be able to run, so it’s important to find a balance somewhere. I monitor heap and non-heap usage in a JVM to see how much memory is “wasted”, by never being allocated by either thread stacks, heap, native code etc. When this becomes too high, it’s time to scale down memory limits of a container or increase heap size. In one extreme case I lowered memory limit from 4 GiB to 1 GiB with no impact on response times and throughput of a pod, by increasing MRP on that service.
InitialRAMPercentage
Default value of InitialRAMPercentage
is around 1.5% regardless of the JVM memory limit. InitialRAMPercentage has a short alias -Xms
where the value can be specified in bytes, kilobytes or megabytes. For consistency I will use InitialRAMPercentage
.
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep InitialRAMPercentage
double InitialRAMPercentage = 1.562500 {product} {default}
podman run --memory=16g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep InitialRAMPercentage
double InitialRAMPercentage = 1.562500 {product} {default}
podman run --memory=200m --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep InitialRAMPercentage
double InitialRAMPercentage = 1.562500 {product} {default}
The value of 1.5625 is actually results of 100/64, which comes from InitialRAMFraction
, just another way to control heap allocation:
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep Fraction
uintx InitialRAMFraction = 64 {product} {default}
What a great functionality, 3 ways to achieve the same thing ¯\(ツ)/¯.
This flag sets initial heap size as a percentage of the total JVM memory.
To demonstrate this flag I have the following code:
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryUsage;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ForkJoinPool;
import com.sun.management.OperatingSystemMXBean;
public class GetMemorySizes {
public static void main(String[] args) {
System.out.println("# --- Memory");
OperatingSystemMXBean osmxb = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
System.out.println("getTotalMemorySize: " + osmxb.getTotalMemorySize() / (1024 * 1024));
System.out.println("getFreeMemorySize: " + osmxb.getFreeMemorySize() / (1024 * 1024));
System.out.println("getCommittedVirtualMemorySize: " + osmxb.getCommittedVirtualMemorySize() / (1024 * 1024));
System.out.println("# --- Heap");
java.lang.management.MemoryMXBean memBean = ManagementFactory.getMemoryMXBean();
MemoryUsage heapMemoryUsage = memBean.getHeapMemoryUsage();
System.out.println("HeapMax:\t" + heapMemoryUsage.getMax() / (1024 * 1024));
System.out.println("HeapCommitted:\t" + heapMemoryUsage.getCommitted() / (1024 * 1024));
System.out.println("HeapUsed:\t" + heapMemoryUsage.getUsed() / (1024 * 1024));
System.out.println("HeapInit:\t" + heapMemoryUsage.getInit() / (1024 * 1024));
}
}
I will execute this code inside a container where IRP (1.5625%) and MRP (25%) have the default values and save the result. For easier maths and mental model I set the container limit to 4 GiB and CPU to 1:
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4042
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax: 989
HeapCommitted: 61
HeapUsed: 14
HeapInit: 64
Treat this as our baseline with default JVM configuration.
Now I’ll keep the default value of MRP, but increase IRP to 25% so both are the same:
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=25 /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4028
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax: 989
HeapCommitted: 989
HeapUsed: 44
HeapInit: 1024
Notice that getInit
is now as expected 25% of the total memory of the container.
It’s important to note that InitialRAMPercentage
value can be higher than MaxRAMPercentage
. It obviously does not make sense, so it will default to the value of MRP. Do not set IRP to 100% and MRP to 50%. The value of 50% will be used.
I’ll raise the number to 100% and 75% just to show that IRP can be higher than MRP, but won’t throw any error and the output is the same:
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=100 -XX:MaxRAMPercentage=25 /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4027
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax: 989
HeapCommitted: 989
HeapUsed: 44
HeapInit: 1024
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=50 -XX:MaxRAMPercentage=25 /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4027
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax: 989
HeapCommitted: 989
HeapUsed: 44
HeapInit: 1024
In fact, the heap numbers didn’t even change at all. All this means that our heap is already initialised with 25% memory of the container, none of that memory is used. You can see that the OS still says
getFreeMemorySize: 4027
getCommittedVirtualMemorySize: 3314
which mean when the JVM will want to borrow this memory from the OS and the OS agreed to this, but it isn’t utilised by the JVM yet. We’ll revisit this later.
Let’s run the program again and increase both numbers to 25 and 50 and then to 50:
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=50 /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4031
getCommittedVirtualMemorySize: 4342
# --- Heap
HeapMax: 1979
HeapCommitted: 989
HeapUsed: 44
HeapInit: 1024
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=50 -XX:MaxRAMPercentage=50 /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4027
getCommittedVirtualMemorySize: 4342
# --- Heap
HeapMax: 1979
HeapCommitted: 1979
HeapUsed: 66
HeapInit: 2048
Observe how HeapCommitted and HeapInit have changed, also how .*MemorySize
has not been impacted.
IRP means that heap size starts at 25% of the JVM memory. This is your committed memory (aka allocated) and will change during program execution to meet goals of a GC. If heap size increases, committed memory size increases too. MRP is a request to the OS to reserve that memory for execution, OS promises that the memory will be available when a program needs it. In context of some cloud environments, it can be a lie, but that’s a topic for another time.
This flag helps JVM to get the guaranteed memory from the OS on startup. When you allocate an object, JVM needs memory for that, if existing committed (allocated) heap is too small, it will go the OS and request more memory. On some environment this can be time consuming, so you can avoid JVM requesting more memory from the OS each time heap area needs to grow.
AlwaysPreTouch
Intro
Remember how I promised we will come back to the getFreeMemorySize
and getCommittedVirtualMemorySize
? AlwaysPreTouch
will commit and reserve the memory. This will make the JVM request and claim as own all the required heap memory on startup. On large heaps this flag might add even a few seconds to startup times.
The default value is false. This is to save OS memory and let JVM start a bit faster.
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:+PrintFlagsFinal | grep PreTouch
bool AlwaysPreTouch = false {product} {default}
Let’s start from a baseline where I have IRP and MRP set to 25% and APT explicitly disabled:
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=25 -XX:-AlwaysPreTouch /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 4028
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax: 989
HeapCommitted: 989
HeapUsed: 44
HeapInit: 1024
Behaviour
In the next program execution with APT enabled you can see how the JVM actually takes away that memory from the OS:
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=25 -XX:+AlwaysPreTouch /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 3033
getCommittedVirtualMemorySize: 3314
# --- Heap
HeapMax: 989
HeapCommitted: 989
HeapUsed: 44
HeapInit: 1024
Notice how getFreeMemorySize dropped from 4028 to 3033. APT tells the JVM to reserve this memory, it becomes “virtual size of a process”. This memory is no longer in a promise mode by the OS, it’s actually claimed and owned by the process now.
One more thing to note here, AlwaysPreTouch, reserves the memory that’s wanted by InitialRAMPercentage, not MaxRAMPercentage.
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -XX:MaxRAMPercentage=50 -XX:+AlwaysPreTouch /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 3032
getCommittedVirtualMemorySize: 4342
# --- Heap
HeapMax: 1979
HeapCommitted: 989
HeapUsed: 44
HeapInit: 1024
The output above shows that IRP is 1 GiB and free physical memory dropped by 1 GiB.
Now if I make IRP and MRP match to 50%:
podman run --memory=4g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=50 -XX:MaxRAMPercentage=50 -XX:+AlwaysPreTouch /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 4096
getFreeMemorySize: 2003
getCommittedVirtualMemorySize: 4342
# --- Heap
HeapMax: 1979
HeapCommitted: 1979
HeapUsed: 66
HeapInit: 2048
Free memory dropped to match size of IRP.
If you are using AlwaysPreTouch
, make sure to set IRP to a sensible value, do not leave it with the default.
Startup times
Let’s look this AlwaysPreTouch changes startup times depending on IRP. For this demonstration I’m actually going inside a K8s pod and not executing it using podman
any more.
I have this pod (simplified) spec:
apiVersion: v1
kind: Pod
metadata:
name: javas
spec:
containers:
- name: jbang
image: jbangdev/jbang-action
command: ["/bin/sh"]
args: ["-c", "while true; do sleep 1;done"]
volumeMounts:
- name: javas
mountPath: /javas
readOnly: false
resources:
requests:
memory: '4Gi'
cpu: '4'
limits:
memory: '4Gi'
cpu: '4'
restartPolicy: Never
I’ll use very basic time
command for that, it should be enough, right?
Here are the results for 4 GiB pod:
IRP | MRP | AlwaysPreTouch | time |
---|---|---|---|
25 | 25 | false | real:0m0.379s user:0m0.876s sys:0m0.043s |
25 | 25 | true | real:0m0.433s user:0m0.785s sys:0m0.299s |
50 | 50 | false | real:0m0.372s user:0m0.784s sys:0m0.053s |
50 | 50 | true | real:0m0.533s user:0m0.859s sys:0m0.544s |
Despite measuring methodology being far from scientific (works on my machine), the results are pretty clear. Let’s now increase the pod size to 16 GiB:
IRP | MRP | AlwaysPreTouch | time |
---|---|---|---|
25 | 25 | false | real:0m0.371s user:0m0.763s sys:0m0.067s |
25 | 25 | true | real:0m0.660s user:0m0.845s sys:0m1.054s |
50 | 50 | false | real:0m0.402s user:0m0.867s sys:0m0.086s |
50 | 50 | true | real:0m1.148s user:0m1.006s sys:0m2.377s |
95 | 95 | false | real:0m0.378s user:0m0.801s sys:0m0.055s |
95 | 95 | true | real:0m2.108s user:0m0.959s sys:0m5.043s |
sys
time is where Linux had to do the work to find and provide the JVM with reserve requested memory.
Look at the differences between program executions. Normally this time would be spread over early lifetime of the JVM, but with +AlwaysPreTouch
I am eghm… investing it into startup. When heap needs to grow, the memory is already there and the JVM doesn’t need to go to the OS and request more. While this makes sense for some JVMs, it won’t be recommended for most. This flag is useful when starting JVMs that are rarely recreated. It will be harmful for autoscaled microservices and lambdas, but should be a good investment for monoliths where startup time is already long and you care more about getting lower response times earlier.
Experimentation
Let’s keep talking about APT and that committed vs reserved memory :) Let’s cause some troubles
Note: Xmx
is an alias for MaxRAMPercentage
, with Xms
you can specify max heap size in kilo or megabytes, instead of in percentages, same as Xms
is an alias for InitialRAMPercentage
.
In this example instead using MRP now I’ll have Xmx
specified. I will also lie to the JVM about containerisation, so it ignores pod memory limits just for the demo purposes:
The JVM in fact has 2 GiB of memory available, but with -XX:-UseContainerSupport
it won’t know that, it will think it has 32 GiB (OS memory).
Here’s whats going to happen:
- podman container with 2 GiB limit will start a
java
process - JVM thinks it has 32 GiB of memory
- JVM is told it can allocate max. of 4 GiB of heap
podman run --memory=2g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -Xmx4g -XX:-UseContainerSupport /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 32045
getFreeMemorySize: 16799
getCommittedVirtualMemorySize: 6971
# --- Heap
HeapMax: 4096
HeapCommitted: 4096
HeapUsed: 27
HeapInit: 4096
Look at this, it has not crashed! JVM thinks it will be able to allocate 4GiB of heap, when it’s running in a 2 GiB container! Let’s try with AlwaysPreTouch?
podman run --memory=2g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -Xmx4g -XX:-UseContainerSupport -XX:+AlwaysPreTouch /mnt/GetMemorySizes.java
It crashes the JVM without any output, and when I list all podman containers:
podman ps -a
551ea28d02ab docker.io/library/openjdk:21-jdk-slim-buster java -XX:InitialR... 59 seconds ago Exited (137) 57 seconds ago magical_saha
I can see that it exited with status code 137, which is OOM killed. JVM died with OOM without allocating lots of objects :)
If I do the same, but with lower Xmx
set to 1g
, the container executes correctly
podman run --memory=2g --cpus=1 -v $(pwd):/mnt -ti openjdk:21-jdk-slim-buster java -XX:InitialRAMPercentage=25 -Xmx1g -XX:-UseContainerSupport -XX:+AlwaysPreTouch /mnt/GetMemorySizes.java
# --- Memory
getTotalMemorySize: 32045
getFreeMemorySize: 8674
getCommittedVirtualMemorySize: 4423
# --- Heap
HeapMax: 1024
HeapCommitted: 1024
HeapUsed: 26
HeapInit: 1024
Thanks for staying with me that long!