JDK-8345323 : Parallel GC does not handle UseLargePages and UseNUMA gracefully
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 24
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • Submitted: 2024-12-02
  • Updated: 2024-12-25
  • Resolved: 2024-12-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 24 JDK 25
24Fixed 25 b03Fixed
Related Reports
Relates :  
Relates :  
Description
Parallel does not handle the combination -XX:+UseLargePages and -XX:+UseNUMA gracefully. A minimal example that would trigger the bug:

$ jdk-24/bin/java -server -XX:+UseParallelGC -XX:+UseLargePages -XX:+UseNUMA -Xmx200m -version

If -Xmx is set to something that is larger than whatever /proc/sys/vm/nr_hugepages backs we get the expected message:

Java HotSpot(TM) 64-Bit Server VM warning: Failed to reserve and commit memory using large pages. req_addr: 0x0000000083000000 bytes: 2097152000

But if -Xmx is set to something that is smaller than whatever /proc/sys/vm/nr_hugepages backs we get:

$ jdk-24/bin/java -server -XX:+UseParallelGC -XX:+UseLargePages -XX
:+UseNUMA -Xmx200m -version
Java HotSpot(TM) 64-Bit Server VM warning: UseNUMA is not fully compatible with +UseLargePages, disabling adaptive resizing (-XX:-UseAdaptiveSizePolicy -XX:-UseAdaptiveNUMAChunkSizing)
mbind: Invalid argument
mbind: Invalid argument
mbind: Invalid argument

This error only happens for Parallel (I have tested Serial, G1, ZGC).
Comments
Changeset: a9a5f7cb Branch: master Author: Albert Mingkun Yang <ayang@openjdk.org> Date: 2024-12-13 11:43:32 +0000 URL: https://git.openjdk.org/jdk/commit/a9a5f7cb0a75b82d613ecd9018e13e5337e90363
13-12-2024

A pull request was submitted for review. Branch: jdk24 URL: https://git.openjdk.org/jdk/pull/22733 Date: 2024-12-13 12:03:14 +0000
13-12-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/22575 Date: 2024-12-05 12:04:20 +0000
05-12-2024

Actually the workaround is to set -XX:MinHeapSize=8m or larger. Setting a larger heap will not affect the MinHeapSize and this is the size used to determine the alignments.
04-12-2024

After a closer look, I believe the issue is not entirely specific to NUMA. The root cause appears to be that we select the wrong page size in `ParallelArguments::initialize_heap_flags_and_sizes`. Before JDK-8333962, the default `MinHeapSize` was 8M, so the calculated page size was 2M. However, after JDK-8333962, the default `MinHeapSize` is 2M (since the default `OldSize` is 0), resulting in a calculated page size of 4K, which is smaller than 2M. Before JDK-8333962, this error could still be triggered by explicitly setting a smaller `OldSize`, such as `OldSize=1m`. This would lead to a smaller `MinHeapSize`, which in turn would result in an incorrect page size. It seems that the NUMA API is the only component we use that complains loudly about the incorrect page size, which is why this issue becomes noticeable only when `-XX:+UseNUMA` is enabled.
04-12-2024

Not saying that JDK-8333962 is wrong, but it removed the default OldSize of 4M, and that in turn leads to a smaller MinHeapSize. I suppose the correct fix is to harden parallel to better align and size the different parts when NUMA and LargePages are in play. A temporary fix could be to change the default value for OldSize to ScaleForWordSize(4*M) again, because changing the sizing code is usually comes with a bug-tail and not sure we want to do that right now.
03-12-2024

This is most likely the cause: https://bugs.openjdk.org/browse/JDK-8333962
03-12-2024

Thanks, I can see that we now calculate the minimum heap size to something smaller compared to JDK-23 and that leads to the alignment not being updated correctly. I think the mbind failure occurs because the addresses of the spaces then becomes "not large page aligned".
03-12-2024

Can confirm your observation. My tests show that it was introduced in jdk-24+3.
03-12-2024

This looks to be a regression in JDK 24, at least in my quick local attempts I did not manage to provoke it with JDK 23.
03-12-2024