JDK-8165153 : Crash in rebuild_cpu_to_node_map
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86_64
  • Submitted: 2016-08-31
  • Updated: 2018-02-08
  • Resolved: 2016-09-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
8u121Fixed 9 b137Fixed
Related Reports
Relates :  
Running the following command line causes a crash in os::Linux::rebuild_cpu_to_node_map:
 numactl -m 0 -c 0 /localhome/java/jdk9-107/bin/java -XX:+UseNUMA -version

The crash is reproducible starting from JDK 9 build 107.

Relevant pieces of the code:
  size_t cpu_num = os::active_processor_count();
  cpu_to_node()->at_grow(cpu_num - 1);
          for (size_t k = 0; k < BitsPerCLong; k++) {
            if (cpu_map[j] & (1UL << k)) {
              cpu_to_node()->at_put(j * BitsPerCLong + k, i);

On my 32 hw thread machine (2 socket x 8 core x 2 ht)  cpu_num becomes 16 when numactl -c 0 is used, and we write outside of the array when K becomes 16.

os::active_processor_count() was changed in build 107 with the changeset for:
 JDK-6515172: Runtime.availableProcessors() ignores Linux taskset command

JDK-8147905 verified all uses of processor_count() to see if they should be active_processor_count(), but not the reverse.

Previously os::active_processor_count reported all the processors online in the hardware; now it reports the number actually available to the VM. We can easily restore the previous behaviour by using processor_count() instead of active_processor_count(). Looking more closely at the code: // rebuild_cpu_to_node_map() constructs a table mapping cpud id to node id. // The table is later used in get_node_by_cpu(). void os::Linux::rebuild_cpu_to_node_map() { it seems evident that this code needs to deal with the physical number of CPU's not the available number. That is something I should have detected while checking the usage of active_processor_count. I am very unclear as to what all this numa code actually does.

I did an experiment on my machine. I just have a basic 8 core desktop. I ran using: numactl -C 1-5 bin/java -XX:+UseNUMA -version I got the same assert. cpu_num == 5, but cpu_map[0] == 0xff. This is a bit different than I expected. I actually didn't think using 1-5 would have problems because I thought the map would only include cpus allocated to the process, but it looks like it includes all CPUs.

From slowdebug build: # Internal Error (src/share/vm/utilities/growableArray.hpp:269), pid=25549, tid=25550 # assert(0 <= i && i < _len) failed: illegal index GrowableArray<int>::at_put (this=0x7f65540111f0, i=16, elem=@0x7f655d56ea7c: 0) at src/share/vm/utilities/growableArray.hpp:269 os::Linux::rebuild_cpu_to_node_map () at src/os/linux/vm/os_linux.cpp:2864 os::Linux::libnuma_init () at src/os/linux/vm/os_linux.cpp:2822 os::init_2 () at src/os/linux/vm/os_linux.cpp:4688 Threads::create_vm (args=0x7f655d56ee80, canTryAgain=0x7f655d56ed8b) at src/share/vm/runtime/thread.cpp:3459 JNI_CreateJavaVM_inner (vm=0x7f655d56eed8, penv=0x7f655d56eee0, args=0x7f655d56ee80) at src/share/vm/prims/jni.cpp:3911 JNI_CreateJavaVM (vm=0x7f655d56eed8, penv=0x7f655d56eee0, args=0x7f655d56ee80) at src/share/vm/prims/jni.cpp:4002 InitializeJVM (pvm=0x7f655d56eed8, penv=0x7f655d56eee0, ifn=0x7f655d56ef30) at src/java.base/share/native/libjli/java.c:1156 JavaMain (_args=0x7ffceb48f640) at src/java.base/share/native/libjli/java.c:375 start_thread (arg=0x7f655d56f700) at pthread_create.c:333 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 From release build: *** Error in `build/linux-x86_64-normal-server-release/jdk/bin/java': free(): invalid pointer: 0x00007fea5800b170 ***