Keycloak password hashing
This post doesn’t contain full context of the works performed, only benchmarking part
I had to test how number of iterations impacts login request time to KeyCloak and if or how we can improve it. After investigating a few other options I decided to check what’s the difference for password hashing times using default hashing mechanism in KeyCloak. I found and extracted parts of password hashing mechanism from KeyCloak to my repo, developed small parametrised JMH benchmark, comparing:
Hashing iterations:
- 1
- 1000
- 15000
- 27500 (default in KeyCloak at the time of writing)
Password complexity:
- easy: password
- hard: 666 random characters -> p([n:7_FSr0Su:`3g_"1:\FC8z-Q4qEm^gOwTd^HVZS[5P9$08},IZ1>}&=qLQ}#O_PzQd]
aPe_JL~by
4>#?XM?bTn]kPo?GOIy4C2Tv20I<]O|_"z:b(]sBnVO@D]>7wkNH)%[|s3.vzNWayO~\\xD;O.bNA6$5]*iT9y
0<ax$\"v'e~2|o,{'G0TML-#4ZV)3\"
cGHxtdmK6/RtP>+phPN"JjIybFbBP&.pusulTN|iXh:ynpW$qpO=m:|Le’^"TSXsUTg:D!Z->xf/.;.|24$Z5x3OCmmu{Jt-}+<p#{)S<\"\\7e-+
hi4*&62;q+H4xj<)?v+::"E@^#s{+~e
9XG/Nm?7+unDY$CoH(X*&qP’NE02K-{y2}6{/8?\xKYiFK(CUn`c9bvy<bQXr&&!R,‘XBybe2i"9NY;$U’rAl15h8}h7,m’yvr8o{FMIFgYqj6{[V+W#\\1={:I_%TJ!Ooy67:1OZ60Uyr10#A'3>yU!]dDBUXz$+_[r'Lm}spFfs&(2R12]e-Z*&^b]{kEY&iQtX0_{O
"t-JiXBZ{zjDFQA@Da*^76$2Mgf4d*[CiTqm8"oH;TdZA.}&qYr(a7a0;:"Ist]ES|v(HXq8Yn-s8V7\2qhgR6*|O)2}![#Lc@FF5%3(#TC=p(6va4NN
Salt:
- two standard complexity salts
As with password hashing we mostly care about throughput of login requests
@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Fork(value = 3)
@Warmup(iterations = 1)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
Benchmark | (iteration) | (password) | (salt) | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|---|---|---|
testPbkdf2Sha1 | 1 | password | Hy9X2DNQNOVYOictvV0wrA== | thrpt | 15 | 151618.151 | 2698.097 | ops/s |
testPbkdf2Sha1 | 1 | password | rqdzpZGc2XqFgjMeCRVZLg== | thrpt | 15 | 152955.359 | 2803.871 | ops/s |
testPbkdf2Sha1 | 1 | p([n:7_Fsr0Su: | Hy9X2DNQNOVYOictvV0wrA== | thrpt | 15 | 90630.841 | 1112.78 | ops/s |
testPbkdf2Sha1 | 1 | p([n:7_Fsr0Su: | rqdzpZGc2XqFgjMeCRVZLg== | thrpt | 15 | 90028.67 | 1253.793 | ops/s |
testPbkdf2Sha1 | 1000 | password | Hy9X2DNQNOVYOictvV0wrA== | thrpt | 15 | 201.355 | 1.239 | ops/s |
testPbkdf2Sha1 | 1000 | password | rqdzpZGc2XqFgjMeCRVZLg== | thrpt | 15 | 201.209 | 2.37 | ops/s |
testPbkdf2Sha1 | 1000 | p([n:7_Fsr0Su: | Hy9X2DNQNOVYOictvV0wrA== | thrpt | 15 | 200.38 | 3.181 | ops/s |
testPbkdf2Sha1 | 1000 | p([n:7_Fsr0Su: | rqdzpZGc2XqFgjMeCRVZLg== | thrpt | 15 | 200.71 | 1.033 | ops/s |
testPbkdf2Sha1 | 15000 | password | Hy9X2DNQNOVYOictvV0wrA== | thrpt | 15 | 13.442 | 0.093 | ops/s |
testPbkdf2Sha1 | 15000 | password | rqdzpZGc2XqFgjMeCRVZLg== | thrpt | 15 | 13.472 | 0.075 | ops/s |
testPbkdf2Sha1 | 15000 | p([n:7_Fsr0Su: | Hy9X2DNQNOVYOictvV0wrA== | thrpt | 15 | 13.466 | 0.123 | ops/s |
testPbkdf2Sha1 | 15000 | p([n:7_Fsr0Su: | rqdzpZGc2XqFgjMeCRVZLg== | thrpt | 15 | 13.462 | 0.086 | ops/s |
testPbkdf2Sha1 | 27500 | password | Hy9X2DNQNOVYOictvV0wrA== | thrpt | 15 | 7.249 | 0.192 | ops/s |
testPbkdf2Sha1 | 27500 | password | rqdzpZGc2XqFgjMeCRVZLg== | thrpt | 15 | 7.322 | 0.077 | ops/s |
testPbkdf2Sha1 | 27500 | p([n:7_Fsr0Su: | Hy9X2DNQNOVYOictvV0wrA== | thrpt | 15 | 7.317 | 0.044 | ops/s |
testPbkdf2Sha1 | 27500 | p([n:7_Fsr0Su: | rqdzpZGc2XqFgjMeCRVZLg== | thrpt | 15 | 7.292 | 0.037 | ops/s |
The benchmarks results show in the first two rows that different salt very insignificantly changes hashing throughput.
Comparison by password length
1 iteration
For the first password password
we got average (151618.151 + 152955.359) / 2 = 152286.755
hashing throughput and for the second password consisting of 666 characters the throughput on average was (90630.841 + 90028.67) / 290329.7555
. 95 times longer password decreased hashing throughput by 0.52x. Not such significant number, considering most people don’t use password longer than several characters. Even with password managers like KeePassXC I find such password very impractical. It would only(?) double password cracking time for 1 iteration.
27500 iterations
At the time of writing KeyCloak by default uses 27500 iterations. This number is reviewed when faster CPUs/GPUs land on market. KeyCloak team must increase password hashing iterations to prevent quick cracking of leaked hashes.
For standard password
I got (7.249 + 7.322) / 2 = 7.2855
and for 666 chars long password (7.317 + 7.292) / 2 = 7.3045
hashes per second. This difference is within margin of error, so I’m going to attribute it to solar flares messing with my CPU while the test was running.
This benchmark showed that the higher number of iterations, the smaller difference in hashing throughput, so let’s check one more thing. How number of iterations impacts throughput for the same password.
Comparison by password length
This time I will be testing the same password, the same salt, but I want to see how number of hashing iterations impacts throughput:
@Param({"password"})
public String password;
@Param({"Hy9X2DNQNOVYOictvV0wrA=="})
public String salt;
@Param({"1", "10", "1000", "10000", "15000", "20000", "27500", "50000", "100000"})
public int iteration;
I got the following results:
Benchmark | (iteration) | (password) | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|---|---|
testPbkdf2Sha1 | 1 | password | thrpt | 15 | 153590.544 | 1631.720 | ops/s |
testPbkdf2Sha1 | 10 | password | thrpt | 15 | 19548.587 | 156.750 | ops/s |
testPbkdf2Sha1 | 1000 | password | thrpt | 15 | 203.234 | 1.155 | ops/s |
testPbkdf2Sha1 | 10000 | password | thrpt | 15 | 20.411 | 0.138 | ops/s |
testPbkdf2Sha1 | 15000 | password | thrpt | 15 | 13.597 | 0.064 | ops/s |
testPbkdf2Sha1 | 20000 | password | thrpt | 15 | 10.150 | 0.055 | ops/s |
testPbkdf2Sha1 | 27500 | password | thrpt | 15 | 7.399 | 0.068 | ops/s |
testPbkdf2Sha1 | 50000 | password | thrpt | 15 | 4.069 | 0.020 | ops/s |
testPbkdf2Sha1 | 100000 | password | thrpt | 15 | 2.042 | 0.009 | ops/s |
The chart above shows number of hashing iterations vs throughput on a logarithmic scale.
- 1 iteration is 76795x faster than 100000 iterations (2.042 passwords hashed per second).
- 1 iteration is 20758x faster than 27500 iterations.
- 1 iteration is 7.857x faster than 10 iterations.
Password hashing per Garbage Collector
I read a really cool article about updates to ZGC and to Shenandoah in JDK17, so just out of curiosity decided to benchmark how GC will impact password hashing, results are quite interesting.
First, I increased number of forked JVM to perform this test:
@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Fork(value = 3)
@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 15, time = 1, timeUnit = TimeUnit.SECONDS)
then, benchmark method has another @Fork
annotation where I specified which GC to use in that fork:
@Fork(jvmArgsAppend = "-XX:+UseShenandoahGC")
and now, since we’re comparing GCs, I think it’s useful to have wider view on their performance and compare all 4 shipped in OpenJDK: G1, ZGC, Parallel and ShenandoahGC. I ended up with 4 methods annotated with different GC enabler. I run all 4 methods on my laptop and server with Ryzen 5 3600. The results turned out to be portable between computers:
Intel i5:
Benchmark | (iterations) | (password) | (salt) | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|---|---|---|
testPbkdf2Sha1G1 | 27500 | p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN | salt | thrpt | 45 | 7.426 | 0.018 | ops/s |
testPbkdf2Sha1ParallelGC | 27500 | p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN | salt | thrpt | 45 | 7.527 | 0.023 | ops/s |
testPbkdf2Sha1ShenandoahGC | 27500 | p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN | salt | thrpt | 45 | 6.938 | 0.025 | ops/s |
testPbkdf2Sha1ZGC | 27500 | p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN | salt | thrpt | 45 | 7.374 | 0.062 | ops/s |
Ryzen 5 3600:
Benchmark | (iterations) | (password) | (salt) | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|---|---|---|
testPbkdf2Sha1G1 | 27500 | p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN | salt | thrpt | 45 | 44.726 | 0.476 | ops/s |
testPbkdf2Sha1ParallelGC | 27500 | p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN | salt | thrpt | 45 | 48.621 | 0.546 | ops/s |
testPbkdf2Sha1ShenandoahGC | 27500 | p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN | salt | thrpt | 45 | 41.885 | 0.501 | ops/s |
testPbkdf2Sha1ZGC | 27500 | p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN | salt | thrpt | 45 | 44.708 | 0.762 | ops/s |
This is it. This is the discovery, ShenandoahGC managed to be the slowest in this test by 9% than baseline G1. The lesson for today: by playing with GCs you can make things faster, you can make things slower. Always test your scenarios.
Source code for these tests is available here
Benchmarks run on my laptop:
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
CPU family: 6
Model: 78
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Stepping: 3
CPU max MHz: 2800.0000
CPU min MHz: 400.0000
java -version
openjdk version "17" 2021-09-14
OpenJDK Runtime Environment Temurin-17+35 (build 17+35)
OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode, sharing)