Keycloak password hashing

September 24, 2021

Page content

This post doesn’t contain full context of the works performed, only benchmarking part

I had to test how number of iterations impacts login request time to KeyCloak and if or how we can improve it. After investigating a few other options I decided to check what’s the difference for password hashing times using default hashing mechanism in KeyCloak. I found and extracted parts of password hashing mechanism from KeyCloak to my repo, developed small parametrised JMH benchmark, comparing:

Hashing iterations:

1
1000
15000
27500 (default in KeyCloak at the time of writing)

Password complexity:

easy: password
hard: 666 random characters -> p([n:7_FSr0Su:`3g_"1:\FC8z-Q4qEm^gOwTd^HVZS[5P9$08},IZ1>}&=qLQ}#O_PzQd]aPe_JL~by4>#?XM?bTn]kPo?GOIy4C2Tv20I<]O|_"z:b(]sBnVO@D]>7wkNH)%[|s3.vzNWayO~\\xD;O.bNA6$5]*iT9y0<ax$\"v'e~2|o,{'G0TML-#4ZV)3\"cGHxtdmK6/RtP>+phPN"JjIybFbBP&.pusulTN|iXh:ynpW$qpO=m:|Le’^"TSXsUTg:D!Z->xf/.;.|24$Z5x3OCmmu{Jt-}+<p#{)S<\"\\7e-+hi4*&62;q+H4xj<)?v+::"E@^#s{+~e9XG/Nm?7+unDY$CoH(X*&qP’NE02K-{y2}6{/8?\xKYiFK(CUn`c9bvy<bQXr&&!R,‘XBybe2i"9NY;$U’rAl15h8}h7,m’yvr8o{FMIFgYqj6{[V+W#\\1={:I_%TJ!Ooy67:1OZ60Uyr10#A'3>yU!]dDBUXz$+_[r'Lm}spFfs&(2R12]e-Z*&^b]{kEY&iQtX0_{O"t-JiXBZ{zjDFQA@Da*^76$2Mgf4d*[CiTqm8"oH;TdZA.}&qYr(a7a0;:"Ist]ES|v(HXq8Yn-s8V7\2qhgR6*|O)2}![#Lc@FF5%3(#TC=p(6va4NN

Salt:

two standard complexity salts

As with password hashing we mostly care about throughput of login requests

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Fork(value = 3)
@Warmup(iterations = 1)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)

Benchmark	(iteration)	(password)	(salt)	Mode	Cnt	Score	Error	Units
testPbkdf2Sha1	1	password	Hy9X2DNQNOVYOictvV0wrA==	thrpt	15	151618.151	2698.097	ops/s
testPbkdf2Sha1	1	password	rqdzpZGc2XqFgjMeCRVZLg==	thrpt	15	152955.359	2803.871	ops/s
testPbkdf2Sha1	1	p([n:7_Fsr0Su:	Hy9X2DNQNOVYOictvV0wrA==	thrpt	15	90630.841	1112.78	ops/s
testPbkdf2Sha1	1	p([n:7_Fsr0Su:	rqdzpZGc2XqFgjMeCRVZLg==	thrpt	15	90028.67	1253.793	ops/s
testPbkdf2Sha1	1000	password	Hy9X2DNQNOVYOictvV0wrA==	thrpt	15	201.355	1.239	ops/s
testPbkdf2Sha1	1000	password	rqdzpZGc2XqFgjMeCRVZLg==	thrpt	15	201.209	2.37	ops/s
testPbkdf2Sha1	1000	p([n:7_Fsr0Su:	Hy9X2DNQNOVYOictvV0wrA==	thrpt	15	200.38	3.181	ops/s
testPbkdf2Sha1	1000	p([n:7_Fsr0Su:	rqdzpZGc2XqFgjMeCRVZLg==	thrpt	15	200.71	1.033	ops/s
testPbkdf2Sha1	15000	password	Hy9X2DNQNOVYOictvV0wrA==	thrpt	15	13.442	0.093	ops/s
testPbkdf2Sha1	15000	password	rqdzpZGc2XqFgjMeCRVZLg==	thrpt	15	13.472	0.075	ops/s
testPbkdf2Sha1	15000	p([n:7_Fsr0Su:	Hy9X2DNQNOVYOictvV0wrA==	thrpt	15	13.466	0.123	ops/s
testPbkdf2Sha1	15000	p([n:7_Fsr0Su:	rqdzpZGc2XqFgjMeCRVZLg==	thrpt	15	13.462	0.086	ops/s
testPbkdf2Sha1	27500	password	Hy9X2DNQNOVYOictvV0wrA==	thrpt	15	7.249	0.192	ops/s
testPbkdf2Sha1	27500	password	rqdzpZGc2XqFgjMeCRVZLg==	thrpt	15	7.322	0.077	ops/s
testPbkdf2Sha1	27500	p([n:7_Fsr0Su:	Hy9X2DNQNOVYOictvV0wrA==	thrpt	15	7.317	0.044	ops/s
testPbkdf2Sha1	27500	p([n:7_Fsr0Su:	rqdzpZGc2XqFgjMeCRVZLg==	thrpt	15	7.292	0.037	ops/s

The benchmarks results show in the first two rows that different salt very insignificantly changes hashing throughput.

Comparison by password length

1 iteration

For the first password password we got average (151618.151 + 152955.359) / 2 = 152286.755 hashing throughput and for the second password consisting of 666 characters the throughput on average was (90630.841 + 90028.67) / 290329.7555. 95 times longer password decreased hashing throughput by 0.52x. Not such significant number, considering most people don’t use password longer than several characters. Even with password managers like KeePassXC I find such password very impractical. It would only(?) double password cracking time for 1 iteration.

27500 iterations

At the time of writing KeyCloak by default uses 27500 iterations. This number is reviewed when faster CPUs/GPUs land on market. KeyCloak team must increase password hashing iterations to prevent quick cracking of leaked hashes.

For standard password I got (7.249 + 7.322) / 2 = 7.2855 and for 666 chars long password (7.317 + 7.292) / 2 = 7.3045 hashes per second. This difference is within margin of error, so I’m going to attribute it to solar flares messing with my CPU while the test was running.

This benchmark showed that the higher number of iterations, the smaller difference in hashing throughput, so let’s check one more thing. How number of iterations impacts throughput for the same password.

Comparison by password length

This time I will be testing the same password, the same salt, but I want to see how number of hashing iterations impacts throughput:

@Param({"password"})
public String password;

@Param({"Hy9X2DNQNOVYOictvV0wrA=="})
public String salt;

@Param({"1", "10", "1000", "10000", "15000", "20000", "27500", "50000", "100000"})
public int iteration;

I got the following results:

Benchmark	(iteration)	(password)	Mode	Cnt	Score	Error	Units
testPbkdf2Sha1	1	password	thrpt	15	153590.544	1631.720	ops/s
testPbkdf2Sha1	10	password	thrpt	15	19548.587	156.750	ops/s
testPbkdf2Sha1	1000	password	thrpt	15	203.234	1.155	ops/s
testPbkdf2Sha1	10000	password	thrpt	15	20.411	0.138	ops/s
testPbkdf2Sha1	15000	password	thrpt	15	13.597	0.064	ops/s
testPbkdf2Sha1	20000	password	thrpt	15	10.150	0.055	ops/s
testPbkdf2Sha1	27500	password	thrpt	15	7.399	0.068	ops/s
testPbkdf2Sha1	50000	password	thrpt	15	4.069	0.020	ops/s
testPbkdf2Sha1	100000	password	thrpt	15	2.042	0.009	ops/s

The chart above shows number of hashing iterations vs throughput on a logarithmic scale.

1 iteration is 76795x faster than 100000 iterations (2.042 passwords hashed per second).
1 iteration is 20758x faster than 27500 iterations.
1 iteration is 7.857x faster than 10 iterations.

Password hashing per Garbage Collector

I read a really cool article about updates to ZGC and to Shenandoah in JDK17, so just out of curiosity decided to benchmark how GC will impact password hashing, results are quite interesting.

First, I increased number of forked JVM to perform this test:

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Fork(value = 3)
@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 15, time = 1, timeUnit = TimeUnit.SECONDS)

then, benchmark method has another @Fork annotation where I specified which GC to use in that fork:

@Fork(jvmArgsAppend = "-XX:+UseShenandoahGC")

and now, since we’re comparing GCs, I think it’s useful to have wider view on their performance and compare all 4 shipped in OpenJDK: G1, ZGC, Parallel and ShenandoahGC. I ended up with 4 methods annotated with different GC enabler. I run all 4 methods on my laptop and server with Ryzen 5 3600. The results turned out to be portable between computers:

Intel i5:

Benchmark	(iterations)	(password)	(salt)	Mode	Cnt	Score	Error	Units
testPbkdf2Sha1G1	27500	p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN	salt	thrpt	45	7.426	0.018	ops/s
testPbkdf2Sha1ParallelGC	27500	p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN	salt	thrpt	45	7.527	0.023	ops/s
testPbkdf2Sha1ShenandoahGC	27500	p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN	salt	thrpt	45	6.938	0.025	ops/s
testPbkdf2Sha1ZGC	27500	p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN	salt	thrpt	45	7.374	0.062	ops/s

Ryzen 5 3600:

Benchmark	(iterations)	(password)	(salt)	Mode	Cnt	Score	Error	Units
testPbkdf2Sha1G1	27500	p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN	salt	thrpt	45	44.726	0.476	ops/s
testPbkdf2Sha1ParallelGC	27500	p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN	salt	thrpt	45	48.621	0.546	ops/s
testPbkdf2Sha1ShenandoahGC	27500	p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN	salt	thrpt	45	41.885	0.501	ops/s
testPbkdf2Sha1ZGC	27500	p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN	salt	thrpt	45	44.708	0.762	ops/s

This is it. This is the discovery, ShenandoahGC managed to be the slowest in this test by 9% than baseline G1. The lesson for today: by playing with GCs you can make things faster, you can make things slower. Always test your scenarios.

Source code for these tests is available here

Benchmarks run on my laptop:

lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
    CPU family:          6
    Model:               78
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            3
    CPU max MHz:         2800.0000
    CPU min MHz:         400.0000

java -version
openjdk version "17" 2021-09-14
OpenJDK Runtime Environment Temurin-17+35 (build 17+35)
OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode, sharing)