Random Notes by agilob

JavaKeycloak password hashing

This post doesn’t contain full context of the works performed, only benchmarking part

I had to test how number of iterations impacts login request time to KeyCloak and if or how we can improve it. After investigating a few other options I decided to check what’s the difference for password hashing times using default hashing mechanism in KeyCloak. I found and extracted parts of password hashing mechanism from KeyCloak to my repo, developed small parametrised JMH benchmark, comparing:

Hashing iterations:

  • 1
  • 1000
  • 15000
  • 27500 (default in KeyCloak at the time of writing)

Password complexity:

  • easy: password
  • hard: 666 random characters -> p([n:7_FSr0Su:`3g_"1:\FC8z-Q4qEm^gOwTd^HVZS[5P9$08},IZ1>}&=qLQ}#O_PzQd]aPe_JL~by4>#?XM?bTn]kPo?GOIy4C2Tv20I<]O|_"z:b(]sBnVO@D]>7wkNH)%[|s3.vzNWayO~\\xD;O.bNA6$5]*iT9y0<ax$\"v'e~2|o,{'G0TML-#4ZV)3\"cGHxtdmK6/RtP>+phPN"JjIybFbBP&.pusulTN|iXh:ynpW$qpO=m:|Le'^"TSXsUTg:D!Z->xf/.;.|24$Z5x3OCmmu{Jt-}+<p#{)S<\"\\7e-+hi4*&62;q+H4xj<)?v+::"E@^#s{+~e9XG/Nm?7+unDY$CoH(X*&qP’NE02K-{y2}6{/8?\xKYiFK(CUn`c9bvy<bQXr&&!R,‘XBybe2i"9NY;$U’rAl15h8}h7,m’yvr8o{FMIFgYqj6{[V+W#\\1={:I_%TJ!Ooy67:1OZ60Uyr10#A'3>yU!]dDBUXz$+_[r'Lm}spFfs&(2R12]e-Z*&^b]{kEY&iQtX0_{O"t-JiXBZ{zjDFQA@Da*^76$2Mgf4d*[CiTqm8"oH;TdZA.}&qYr(a7a0;:"Ist]ES|v(HXq8Yn-s8V7\2qhgR6*|O)2}![#Lc@FF5%3(#TC=p(6va4NN

Salt:

  • two standard complexity salts

As with password hashing we mostly care about throughput of login requests

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Fork(value = 3)
@Warmup(iterations = 1)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
Benchmark (iteration) (password) (salt) Mode Cnt Score Error Units
testPbkdf2Sha1 1 password Hy9X2DNQNOVYOictvV0wrA== thrpt 15 151618.151 2698.097 ops/s
testPbkdf2Sha1 1 password rqdzpZGc2XqFgjMeCRVZLg== thrpt 15 152955.359 2803.871 ops/s
testPbkdf2Sha1 1 p([n:7_Fsr0Su: Hy9X2DNQNOVYOictvV0wrA== thrpt 15 90630.841 1112.78 ops/s
testPbkdf2Sha1 1 p([n:7_Fsr0Su: rqdzpZGc2XqFgjMeCRVZLg== thrpt 15 90028.67 1253.793 ops/s
testPbkdf2Sha1 1000 password Hy9X2DNQNOVYOictvV0wrA== thrpt 15 201.355 1.239 ops/s
testPbkdf2Sha1 1000 password rqdzpZGc2XqFgjMeCRVZLg== thrpt 15 201.209 2.37 ops/s
testPbkdf2Sha1 1000 p([n:7_Fsr0Su: Hy9X2DNQNOVYOictvV0wrA== thrpt 15 200.38 3.181 ops/s
testPbkdf2Sha1 1000 p([n:7_Fsr0Su: rqdzpZGc2XqFgjMeCRVZLg== thrpt 15 200.71 1.033 ops/s
testPbkdf2Sha1 15000 password Hy9X2DNQNOVYOictvV0wrA== thrpt 15 13.442 0.093 ops/s
testPbkdf2Sha1 15000 password rqdzpZGc2XqFgjMeCRVZLg== thrpt 15 13.472 0.075 ops/s
testPbkdf2Sha1 15000 p([n:7_Fsr0Su: Hy9X2DNQNOVYOictvV0wrA== thrpt 15 13.466 0.123 ops/s
testPbkdf2Sha1 15000 p([n:7_Fsr0Su: rqdzpZGc2XqFgjMeCRVZLg== thrpt 15 13.462 0.086 ops/s
testPbkdf2Sha1 27500 password Hy9X2DNQNOVYOictvV0wrA== thrpt 15 7.249 0.192 ops/s
testPbkdf2Sha1 27500 password rqdzpZGc2XqFgjMeCRVZLg== thrpt 15 7.322 0.077 ops/s
testPbkdf2Sha1 27500 p([n:7_Fsr0Su: Hy9X2DNQNOVYOictvV0wrA== thrpt 15 7.317 0.044 ops/s
testPbkdf2Sha1 27500 p([n:7_Fsr0Su: rqdzpZGc2XqFgjMeCRVZLg== thrpt 15 7.292 0.037 ops/s

The benchmarks results show in the first two rows that different salt very insignificantly changes hashing throughput.

Comparison by password length

1 iteration

For the first password password we got average (151618.151 + 152955.359) / 2 = 152286.755 hashing throughput and for the second password consisting of 666 characters the throughput on average was (90630.841 + 90028.67) / 290329.7555. 95 times longer password decreased hashing throughput by 0.52x. Not such significant number, considering most people don’t use password longer than several characters. Even with password managers like KeePassXC I find such password very impractical. It would only(?) double password cracking time for 1 iteration.

27500 iterations

At the time of writing KeyCloak by default uses 27500 iterations. This number is reviewed when faster CPUs/GPUs land on market. KeyCloak team must increase password hashing iterations to prevent quick cracking of leaked hashes.

For standard password I got (7.249 + 7.322) / 2 = 7.2855 and for 666 chars long password (7.317 + 7.292) / 2 = 7.3045 hashes per second. This difference is within margin of error, so I’m going to attribute it to solar flares messing with my CPU while the test was running.

This benchmark showed that the higher number of iterations, the smaller difference in hashing throughput, so let’s check one more thing. How number of iterations impacts throughput for the same password.

Comparison by password length

This time I will be testing the same password, the same salt, but I want to see how number of hashing iterations impacts throughput:

@Param({"password"})
public String password;

@Param({"Hy9X2DNQNOVYOictvV0wrA=="})
public String salt;

@Param({"1", "10", "1000", "10000", "15000", "20000", "27500", "50000", "100000"})
public int iteration;

I got the following results:

Benchmark (iteration) (password) Mode Cnt Score Error Units
testPbkdf2Sha1 1 password thrpt 15 153590.544 1631.720 ops/s
testPbkdf2Sha1 10 password thrpt 15 19548.587 156.750 ops/s
testPbkdf2Sha1 1000 password thrpt 15 203.234 1.155 ops/s
testPbkdf2Sha1 10000 password thrpt 15 20.411 0.138 ops/s
testPbkdf2Sha1 15000 password thrpt 15 13.597 0.064 ops/s
testPbkdf2Sha1 20000 password thrpt 15 10.150 0.055 ops/s
testPbkdf2Sha1 27500 password thrpt 15 7.399 0.068 ops/s
testPbkdf2Sha1 50000 password thrpt 15 4.069 0.020 ops/s
testPbkdf2Sha1 100000 password thrpt 15 2.042 0.009 ops/s

The chart above shows number of hashing iterations vs throughput on a logarithmic scale.

  • 1 iteration is 76795x faster than 100000 iterations (2.042 passwords hashed per second).
  • 1 iteration is 20758x faster than 27500 iterations.
  • 1 iteration is 7.857x faster than 10 iterations.

Password hashing per Garbage Collector

I read a really cool article about updates to ZGC and to Shenandoah in JDK17, so just out of curiosity decided to benchmark how GC will impact password hashing, results are quite interesting.

First, I increased number of forked JVM to perform this test:

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@Fork(value = 3)
@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 15, time = 1, timeUnit = TimeUnit.SECONDS)

then, benchmark method has another @Fork annotation where I specified which GC to use in that fork:

@Fork(jvmArgsAppend = "-XX:+UseShenandoahGC")

and now, since we’re comparing GCs, I think it’s useful to have wider view on their performance and compare all 4 shipped in OpenJDK: G1, ZGC, Parallel and ShenandoahGC. I ended up with 4 methods annotated with different GC enabler. I run all 4 methods on my laptop and server with Ryzen 5 3600. The results turned out to be portable between computers:

Intel i5:

Benchmark (iterations) (password) (salt) Mode Cnt Score Error Units
testPbkdf2Sha1G1 27500 p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN salt thrpt 45 7.426 0.018 ops/s
testPbkdf2Sha1ParallelGC 27500 p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN salt thrpt 45 7.527 0.023 ops/s
testPbkdf2Sha1ShenandoahGC 27500 p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN salt thrpt 45 6.938 0.025 ops/s
testPbkdf2Sha1ZGC 27500 p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN salt thrpt 45 7.374 0.062 ops/s

Ryzen 5 3600:

Benchmark (iterations) (password) (salt) Mode Cnt Score Error Units
testPbkdf2Sha1G1 27500 p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN salt thrpt 45 44.726 0.476 ops/s
testPbkdf2Sha1ParallelGC 27500 p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN salt thrpt 45 48.621 0.546 ops/s
testPbkdf2Sha1ShenandoahGC 27500 p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN salt thrpt 45 41.885 0.501 ops/s
testPbkdf2Sha1ZGC 27500 p([nO)2}![#Lc@FF5%3(#TC=p(6va4NN salt thrpt 45 44.708 0.762 ops/s

This is it. This is the discovery, ShenandoahGC managed to be the slowest in this test by 9% than baseline G1. The lesson for today: by playing with GCs you can make things faster, you can make things slower. Always test your scenarios.

Source code for these tests is avilable here

Benchmarks run on my laptop:

❯ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
    CPU family:          6
    Model:               78
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            3
    CPU max MHz:         2800.0000
    CPU min MHz:         400.0000
❯ java -version
openjdk version "17" 2021-09-14
OpenJDK Runtime Environment Temurin-17+35 (build 17+35)
OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode, sharing)