JDK-8366862 : MemorySegment related performance regression in lusearch in JDK21
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang.foreign
  • Affected Version: 19,21.0.6
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2025-09-04
  • Updated: 2025-09-05
Related Reports
Causes :  
Description
The recently accepted OOPSLA paper, Advancing Performance via a Systematic Application of Research and Industrial Best Practice, suggested that there is a large performance regression using G1 between JDK 11 and 21.

The attached plots used the following configuration:
JVM args: -server -XX:+AlwaysPreTouch -Xms${heap}m -Xmx${heap}m -Xlog:gc*:file=lusearch.jdk${JDK}.${heap}.${i}.gc::filesize=0

And invoking DaCapo with: -jar dacapo-23.11-chopin.jar lusearch -s default -n 25

Thomas has done further investigations and it does seem to be an issue with Panama/FFI/MemorySegment performance. The performance issue is not isolated to G1 as also Parallel can be observed having this issue depending on the -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegment setting.

The problem is that from JDK 19 to JDK 21 lucene selects a code path that uses the MemorySegment API (It does not when running with JDK22+ for unknown reasons, so performance is back to previous levels).

Not sure if this is a problem with the MemorySegment API or the Lucene use of the API. At least I would not expect that large of a performance difference.

The affects version is set based on the lucene versions showing this performance difference.
Comments
>> (It does not when running with JDK22+ for unknown reasons, so performance is back to previous levels) > >I was under the impression that Lucene now always uses FFM under the hood. My verification was based on -XX:+PrintCompilation: with 19 to 21 a lot of `MemorySegment` appears (say: java -XX:+PrintCompilation ... lusearch | grep "MemorySegment"). In 22+ it does not. However that detection failure may be an application error, or due to the lucene version not properly detecting newer JDKs. Or maybe 22+ just inline everything related. Testing with dacapo-23.11-MR2, jdk22+ lusearch performance is the same as with disabling MemorySegments, and that PrintCompilation test does not show any use of java.internal.foreign classes. Just to reiterate, I do not think this is a JDK issue. > FWIW, we also fixed another performance issue specifically with Lucene for JDK 24: https://bugs.openjdk.org/browse/JDK-8335480 > > Generally speaking though, the FFM API was still in preview in JDK 21, so performance work was still being finalized. I totally understand, feel free to close this issue as a non-issue for the JDK.
05-09-2025

I think this is a known issue originally reported by [~shade] in https://github.com/dacapobench/dacapobench/issues/264. There are two separate problems: 1. Lucene gating the use of memory segments to only certain JDK versions, and that the memory segment performance on those JDKs is not as good as the old method used by Lucene, and 2. In 23.11 of DaCapo (this is already fixed in the current 23.11-MR2 release), DaCapo didn't reuse IndexSearcher, amplifying this performance pathology. The ultimate conclusion in that issue is that the "regression" exposes thread local handshake overheads in HotSpot due to the non-idiomatic use of Lucene API.
05-09-2025

> (It does not when running with JDK22+ for unknown reasons, so performance is back to previous levels) I was under the impression that Lucene now always uses FFM under the hood. FWIW, we also fixed another performance issue specifically with Lucene for JDK 24: https://bugs.openjdk.org/browse/JDK-8335480 Generally speaking though, the FFM API was still in preview in JDK 21, so performance work was still being finalized.
04-09-2025

Not sure if this is expected when using the new API, so leaving this for further evaluation. Maybe it's the application's fault to automatically use the much slower API.
04-09-2025

the issue reproduces also with Parallel GC here: default settings: parallel gc, jdk17: ===== DaCapo 23.11-chopin lusearch PASSED in 2048 msec ===== parallel gc, jdk21: ===== DaCapo 23.11-chopin lusearch PASSED in 3175 msec ===== disabling memory segments: parallel gc, jdk21: ===== DaCapo 23.11-chopin lusearch PASSED in 2040 msec ===== This is either a benchmark error, or a problem with performance of Panama/memory segments. When running the benchmark on JDK 22+, it does not use memory segments, so performance is good again too.
04-09-2025

the problem is that with jdk19+ lusearch uses the new memory segments API by default (https://openjdk.org/jeps/424). If you disable the memory segment api use in Lusearch (https://lucene.apache.org/core/9_12_0/core/org/apache/lucene/store/MMapDirectory.html#ENABLE_MEMORY_SEGMENTS_SYSPROP ; using -Dorg.apache.lucene.store.MMapDirectory.enableMemorySegments=false performance is back to normal on 21)
04-09-2025