JDK-8365493 : Regression on Pet Clinic app with Compact Object Headers
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 26
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • OS: linux
  • CPU: x86_64
  • Submitted: 2025-08-13
  • Updated: 2025-08-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Blocks :  
Relates :  
Description
Observing a regression in requests / second on Spring Pet Clinic app when loading testing with oha (HTTP load generator).
Regression ranges from 8% - 10% when running oha remote from the Pet Clinic app, (oha on one machine, Pet Clinic app on remote machine — full network stack).  The regression is much larger (~ 30%) when running both oha and Pet Clinic on the same machine but isolating each in processor id sets using numactl.

Steps to reproduce:
1.) Grab a copy of Spring Pet Clinic (https://github.com/spring-projects/spring-petclinic). Follow instructions to build it. Instructions on how to launch Pet Clinic can also be found at previously mentioned download URL.

2.) Grab a copy of oha, a http load generator, (https://sourceforge.net/projects/oha.mirror/)

3.) Start Pet Clinic. The following command line was used to test with +UseCompactObjectHeaders on a 16 core AMD Rome machine running Linux:
$  JAVA=$HOME/jdks/jdk-26-b10/bin/java
$  numactl --physcpubind 8-15,24-31 ${JAVA} -Xmx16g -Xms16g -Xmn12g -XX:MetaspaceSize=128m -XX:ReservedCodeCacheSize=256m -XX:+UseParallelGC -Xlog:gc*:file=/tmp/coh-parallel-petclinic-gc.log -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch -XX:-UseCompactObjectHeaders -jar ./target/spring-petclinic-3.5.0-SNAPSHOT.jar
* Note, you may have to adjust the range of processor ids for the machine you are running on.
For a baseline test without compact object headers, change -XX:+UseCompactObjectHeaders to -XX:-UseCompactObjectHeaders

4.) Start the http load generator oha with the following command line:
$ OHA=<path to the oha http load generator>
$ numactl --physcpubind=1-7,17-23 ${OHA} -n 500000 --no-tui http://localhost:8080/vets.html
* Note: you may need to adjust the range of processor ids for the machine you are running on.

The oha http load generator will report statistics when it finishes. In its “Summary” section the last line is “Requests/sec:”

Averaging 5 runs with -UseCompactObjectHeaders, the system under test does 1086 requests / second.
Averaging 5 runs with +UseCompactObjectHeaders, the system under test does 688 requests / second.
Comments
This is a perf profile of a regression run: 8.05% http-nio-8080-e libjvm.so [.] ObjectMonitor::try_spin 6.08% http-nio-8080-e libjvm.so [.] ObjectMonitor::try_lock 5.59% http-nio-8080-e libjvm.so [.] LightweightSynchronizer::get_or_insert_monitor_from_table 3.27% http-nio-8080-e libjvm.so [.] LightweightSynchronizer::enter 2.12% http-nio-8080-e libjvm.so [.] LightweightSynchronizer::inflate_and_enter 2.01% http-nio-8080-e libjvm.so [.] SharedRuntime::complete_monitor_locking_C 1.15% http-nio-8080-e [JIT] tid 92363 [.] 0x00007fdd7be54bd3 1.12% http-nio-8080-e [JIT] tid 92363 [.] 0x00007fdd7bf92c85 1.05% http-nio-8080-e [JIT] tid 92363 [.] 0x00007fdd7b6d58ca 1.04% http-nio-8080-e [JIT] tid 92363 [.] 0x00007fdd7bb70e86 0.80% http-nio-8080-e libjvm.so [.] LightweightSynchronizer::fast_lock_spin_enter 0.76% http-nio-8080-e libjvm.so [.] AccessInternal::PostRuntimeDispatch<CardTableBarrierSet::AccessBarrier<594020ul, CardTableBarrierSet>, (AccessInterna 0.53% http-nio-8080-e [JIT] tid 92363 [.] 0x00007fdd7be54f09 0.53% http-nio-8080-e [JIT] tid 92363 [.] 0x00007fdd7b656b32 0.50% http-nio-8080-e [JIT] tid 92363 [.] 0x00007fdd7be54338 0.50% http-nio-8080-e [JIT] tid 92363 [.] 0x00007fdd7ba53e9b 0.50% http-nio-8080-e libjvm.so [.] ObjectMonitor::spin_enter 0.50% http-nio-8080-e libjvm.so [.] ObjectSynchronizer::FastHashCode 0.45% http-nio-8080-e libjvm.so [.] LightweightSynchronizer::get_or_insert_monitor and for comparison a profile from the exact same run on an unaffected x86 machine: 22.40% http-nio-8080-e libjvm.so [.] ObjectMonitor::try_lock 16.19% http-nio-8080-e libjvm.so [.] ObjectMonitor::try_spin 3.12% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567ead5c8 3.00% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567ead265 2.81% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567b18a85 2.79% http-nio-8080-e libjvm.so [.] SpinPause 2.76% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567b18ddd 2.53% http-nio-8080-e libjvm.so [.] ObjectMonitor::spin_enter 2.12% http-nio-8080-e libjvm.so [.] LightweightSynchronizer::inflate_and_enter 2.06% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f25676df9ca 1.73% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567b96004 1.33% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567b1ffe0 1.00% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567ead5d0 0.98% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567b18d5f 0.87% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567b18de5 0.82% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567ead5b8 0.59% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f25676576d5 0.50% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567b18dcd 0.43% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f25676528f1 0.39% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f25676573df 0.31% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f256765754c 0.29% http-nio-8080-e libjvm.so [.] SharedRuntime::complete_monitor_locking_C 0.28% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f25676577ac 0.28% http-nio-8080-e [JIT] tid 2410295 [.] 0x00007f2567657632 0.25% http-nio-8080-e libjvm.so [.] ObjectMonitor::enter_with_contention_mark The cmd line in both scenarios was: ${JAVA} -Xmx16g -Xms16g -Xmn12g -XX:MetaspaceSize=128m -XX:ReservedCodeCacheSize=256m -XX:+UseParallelGC '-Xlog:gc*:file=/tmp/coh-parallel-petclinic-gc.log' -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch -XX:-UseCompactObjectHeaders -XX:+UnlockDiagnosticVMOptions -XX:+UseObjectMonitorTable -jar ./target/spring-petclinic-3.5.0-SNAPSHOT.jar What sticks out is that complete_monitor_locking_C is quite a bit less heavy, and that the OMT stuff (LightweightSynchronizer::get_or_insert_monitor_from_table) seems absent from the non-affected run, even though it is also using OMT.
21-08-2025

I've been able to reproduce the regression on an older AMD processor (AWS instance type m5a). Baseline: 1396 r/s +COH: 1064 r/s That looks like a ~30% regression. In-fact, I have been able to reproduce this without compact headers by only enabling object-monitor-table (-XX:+UseObjectMonitorTable). This mirrors our earlier finding that OMT performs somewhat badly on older AMD processors, and we currently don't have a good explanation for it. That regression doesn't happen on Intel, ARM and newer AMD processors. Maybe [~coleenp] or [~aboldtch] have more ideas? Axel has implemented it, Coleen knows a lot about it, too, and has experimented with its performance as well.
19-08-2025

Today I tried to reproduce the issue, I tried it on a Xeon and a Graviton machine, in both cases I could not observe any regression. For example, averaging over 5 runs on Xeon, I got 724 r/s without COH and 720 r/s with COH. It seems likely that the underlying issue is similar to JDK-8339114. When experimenting with that, we could only see the regression on (oldish) AMD processors, not on Intels and not on ARMs. And it was caused by the object monitor tables that compact object headers also enable (i.e. not directly caused by smaller headers, but by the new implementation of heavyweight object locking). Tomorrow I will try to get my hands on a machine with AMD Rome or similar generation processor and see if I can reproduce there.
18-08-2025

Attached a screenshot of gprofng profiles of a baseline and coh which suggests the regression is related to locking / monitors with +COH.
13-08-2025