Keycloak password hashing

Page content

This post doesn’t contain full context of the works performed, only benchmarking part

I had to test how number of iterations impacts login request time to KeyCloak and if or how we can improve it. After investigating a few other options I decided to check what’s the difference for password hashing times using default hashing mechanism in KeyCloak. I found and extracted parts of password hashing mechanism from KeyCloak to my repo, developed small parametrised JMH benchmark, comparing:

Hashing iterations:

  • 1
  • 1000
  • 15000
  • 27500 (default in KeyCloak at the time of writing)

Password complexity:

  • easy: password
  • hard: 666 random characters -> p([n:7_FSr0Su:`3g_"1:\FC8z-Q4qEm^gOwTd^HVZS[5P9$08},IZ1>}&=qLQ}#O_PzQd]aPe_JL~by4>#?XM?bTn]kPo?GOIy4C2Tv20I<]O|_"z:b(]sBnVO@D]>7wkNH)%[|s3.vzNWayO~\\xD;O.bNA6$5]*iT9y0<ax$\"v'e~2|o,{'G0TML-#4ZV)3\"cGHxtdmK6/RtP>+phPN"JjIybFbBP&.pusulTN|iXh:ynpW$qpO=m:|Le’^"TSXsUTg:D!Z->xf/.;.|24$Z5x3OCmmu{Jt-}+<p#{)S<\"\\7e-+hi4*&62;q+H4xj<)?v+::"E@^#s{+~e9XG/Nm?7+unDY$CoH(X*&qP’NE02K-{y2}6{/8?\xKYiFK(CUn`c9bvy<bQXr&&!R,‘XBybe2i"9NY;$U’rAl15h8}h7,m’yvr8o{FMIFgYqj6{[V+W#\\1={:I_%TJ!Ooy67:1OZ60Uyr10#A'3>yU!]dDBUXz$+_[r'Lm}spFfs&(2R12]e-Z*&^b]{kEY&iQtX0_{O"t-JiXBZ{zjDFQA@Da*^76$2Mgf4d*[CiTqm8"oH;TdZA.}&qYr(a7a0;:"Ist]ES|v(HXq8Yn-s8V7\2qhgR6*|O)2}![#Lc@FF5%3(#TC=p(6va4NN

Salt:

  • two standard complexity salts

As with password hashing we mostly care about throughput of login requests

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Fork(value = 3)
@Warmup(iterations = 1)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
Benchmark(iteration)(password)(salt)ModeCntScoreErrorUnits
testPbkdf2Sha11passwordHy9X2DNQNOVYOictvV0wrA==thrpt15151618.1512698.097ops/s
testPbkdf2Sha11passwordrqdzpZGc2XqFgjMeCRVZLg==thrpt15152955.3592803.871ops/s
testPbkdf2Sha11p([n:7_Fsr0Su:Hy9X2DNQNOVYOictvV0wrA==thrpt1590630.8411112.78ops/s
testPbkdf2Sha11p([n:7_Fsr0Su:rqdzpZGc2XqFgjMeCRVZLg==thrpt1590028.671253.793ops/s
testPbkdf2Sha11000passwordHy9X2DNQNOVYOictvV0wrA==thrpt15201.3551.239ops/s
testPbkdf2Sha11000passwordrqdzpZGc2XqFgjMeCRVZLg==thrpt15201.2092.37ops/s
testPbkdf2Sha11000p([n:7_Fsr0Su:Hy9X2DNQNOVYOictvV0wrA==thrpt15200.383.181ops/s
testPbkdf2Sha11000p([n:7_Fsr0Su:rqdzpZGc2XqFgjMeCRVZLg==thrpt15200.711.033ops/s
testPbkdf2Sha115000passwordHy9X2DNQNOVYOictvV0wrA==thrpt1513.4420.093ops/s
testPbkdf2Sha115000passwordrqdzpZGc2XqFgjMeCRVZLg==thrpt1513.4720.075ops/s
testPbkdf2Sha115000p([n:7_Fsr0Su:Hy9X2DNQNOVYOictvV0wrA==thrpt1513.4660.123ops/s
testPbkdf2Sha115000p([n:7_Fsr0Su:rqdzpZGc2XqFgjMeCRVZLg==thrpt1513.4620.086ops/s
testPbkdf2Sha127500passwordHy9X2DNQNOVYOictvV0wrA==thrpt157.2490.192ops/s
testPbkdf2Sha127500passwordrqdzpZGc2XqFgjMeCRVZLg==thrpt157.3220.077ops/s
testPbkdf2Sha127500p([n:7_Fsr0Su:Hy9X2DNQNOVYOictvV0wrA==thrpt157.3170.044ops/s
testPbkdf2Sha127500p([n:7_Fsr0Su:rqdzpZGc2XqFgjMeCRVZLg==thrpt157.2920.037ops/s

The benchmarks results show in the first two rows that different salt very insignificantly changes hashing throughput.

Comparison by password length

1 iteration

For the first password password we got average (151618.151 + 152955.359) / 2 = 152286.755 hashing throughput and for the second password consisting of 666 characters the throughput on average was (90630.841 + 90028.67) / 290329.7555. 95 times longer password decreased hashing throughput by 0.52x. Not such significant number, considering most people don’t use password longer than several characters. Even with password managers like KeePassXC I find such password very impractical. It would only(?) double password cracking time for 1 iteration.

27500 iterations

At the time of writing KeyCloak by default uses 27500 iterations. This number is reviewed when faster CPUs/GPUs land on market. KeyCloak team must increase password hashing iterations to prevent quick cracking of leaked hashes.

For standard password I got (7.249 + 7.322) / 2 = 7.2855 and for 666 chars long password (7.317 + 7.292) / 2 = 7.3045 hashes per second. This difference is within margin of error, so I’m going to attribute it to solar flares messing with my CPU while the test was running.

This benchmark showed that the higher number of iterations, the smaller difference in hashing throughput, so let’s check one more thing. How number of iterations impacts throughput for the same password.

Comparison by password length

This time I will be testing the same password, the same salt, but I want to see how number of hashing iterations impacts throughput:

@Param({"password"})
public String password;

@Param({"Hy9X2DNQNOVYOictvV0wrA=="})
public String salt;

@Param({"1", "10", "1000", "10000", "15000", "20000", "27500", "50000", "100000"})
public int iteration;

I got the following results:

Benchmark(iteration)(password)ModeCntScoreErrorUnits
testPbkdf2Sha11passwordthrpt15153590.5441631.720ops/s
testPbkdf2Sha110passwordthrpt1519548.587156.750ops/s
testPbkdf2Sha11000passwordthrpt15203.2341.155ops/s
testPbkdf2Sha110000passwordthrpt1520.4110.138ops/s
testPbkdf2Sha115000passwordthrpt1513.5970.064ops/s
testPbkdf2Sha120000passwordthrpt1510.1500.055ops/s
testPbkdf2Sha127500passwordthrpt157.3990.068ops/s
testPbkdf2Sha150000passwordthrpt154.0690.020ops/s
testPbkdf2Sha1100000passwordthrpt152.0420.009ops/s

The chart above shows number of hashing iterations vs throughput on a logarithmic scale.

  • 1 iteration is 76795x faster than 100000 iterations (2.042 passwords hashed per second).
  • 1 iteration is 20758x faster than 27500 iterations.
  • 1 iteration is 7.857x faster than 10 iterations.

Password hashing per Garbage Collector

I read a really cool article about updates to ZGC and to Shenandoah in JDK17, so just out of curiosity decided to benchmark how GC will impact password hashing, results are quite interesting.

First, I increased number of forked JVM to perform this test:

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Fork(value = 3)
@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 15, time = 1, timeUnit = TimeUnit.SECONDS)

then, benchmark method has another @Fork annotation where I specified which GC to use in that fork:

@Fork(jvmArgsAppend = "-XX:+UseShenandoahGC")

and now, since we’re comparing GCs, I think it’s useful to have wider view on their performance and compare all 4 shipped in OpenJDK: G1, ZGC, Parallel and ShenandoahGC. I ended up with 4 methods annotated with different GC enabler. I run all 4 methods on my laptop and server with Ryzen 5 3600. The results turned out to be portable between computers:

Intel i5:

Benchmark(iterations)(password)(salt)ModeCntScoreErrorUnits
testPbkdf2Sha1G127500p([nO)2}![#Lc@FF5%3(#TC=p(6va4NNsaltthrpt457.4260.018ops/s
testPbkdf2Sha1ParallelGC27500p([nO)2}![#Lc@FF5%3(#TC=p(6va4NNsaltthrpt457.5270.023ops/s
testPbkdf2Sha1ShenandoahGC27500p([nO)2}![#Lc@FF5%3(#TC=p(6va4NNsaltthrpt456.9380.025ops/s
testPbkdf2Sha1ZGC27500p([nO)2}![#Lc@FF5%3(#TC=p(6va4NNsaltthrpt457.3740.062ops/s

Ryzen 5 3600:

Benchmark(iterations)(password)(salt)ModeCntScoreErrorUnits
testPbkdf2Sha1G127500p([nO)2}![#Lc@FF5%3(#TC=p(6va4NNsaltthrpt4544.7260.476ops/s
testPbkdf2Sha1ParallelGC27500p([nO)2}![#Lc@FF5%3(#TC=p(6va4NNsaltthrpt4548.6210.546ops/s
testPbkdf2Sha1ShenandoahGC27500p([nO)2}![#Lc@FF5%3(#TC=p(6va4NNsaltthrpt4541.8850.501ops/s
testPbkdf2Sha1ZGC27500p([nO)2}![#Lc@FF5%3(#TC=p(6va4NNsaltthrpt4544.7080.762ops/s

This is it. This is the discovery, ShenandoahGC managed to be the slowest in this test by 9% than baseline G1. The lesson for today: by playing with GCs you can make things faster, you can make things slower. Always test your scenarios.

Source code for these tests is available here

Benchmarks run on my laptop:

lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
    CPU family:          6
    Model:               78
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            3
    CPU max MHz:         2800.0000
    CPU min MHz:         400.0000
java -version
openjdk version "17" 2021-09-14
OpenJDK Runtime Environment Temurin-17+35 (build 17+35)
OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode, sharing)