JDK-8198794 : Hotspot crash on Cassandra 3.11.1 startup with libnuma 2.0.3
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 8u152,8u161,8u162
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2018-02-27
  • Updated: 2019-01-14
  • Resolved: 2018-03-16
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 8
11 b07Fixed 8u172Fixed
Related Reports
Relates :  
Relates :  
Sub Tasks
JDK-8203243 :  
Description
Hotspot crashes with a segv when starting up Cassandra 3.11.1 on Amazon Linux 2012, kernel 4.4, libnuma 2.0.3. See attached hs_err file. The proximate cause is a null result when fetching numa_nodes_ptr in libnuma_init() in os_linux.cpp. The code went in as part of the fix for JDK-8175813 and the problem first appeared in 8u152, which is not available as an 'Introduced in Version' option. I've used 8u161 instead. The crash doesn't happen on Ubuntu 16.04, kernel 4.13, libnuma 2.0.11.

The Cassandra issue is https://issues.apache.org/jira/browse/CASSANDRA-13781.

Steps to reproduce:

wget https://www-us.apache.org/dist/cassandra/3.11.1/apache-cassandra-3.11.1-bin.tar.gz
tar xf apache-cassandra-3.11.1-bin.tar.gz
JAVA_HOME=jdk1.8.0_152 ./apache-cassandra-3.11.1/bin/cassandra

Comments
Hi Paul. I saw that bug only yesterday when you updated the priority to P2. For the records, I sent a webrev here to start discussing that on the ML: http://mail.openjdk.java.net/pipermail/hotspot-dev/2018-March/030657.html Thank you.
13-03-2018

Changed to P2 because the problem is that when -XX:+UseNUMA is used with any collector, Hotspot crashes. While strictly speaking there's a workaround (don't use -XX:+UseNUMA), in practice workloads and benchmarks such as SpecJBB won't run well without out it on NUMA (multi-socket) systems running older versions of libnuma.
12-03-2018

Here's a possible patch. It uses numa_all_nodes_ptr if numa_nodes_ptr is unavailable. diff --new-file --unified --recursive --exclude='*.autotools' --exclude='*.cproject' --exclude='*.project' ./hotspot/src/os/linux/vm/os_linux.cpp ./hotspot/src/os/linux/vm/os_linux.cpp --- ./hotspot/src/os/linux/vm/os_linux.cpp 2018-01-22 18:54:11.000000000 +0000 +++ ./hotspot/src/os/linux/vm/os_linux.cpp 2018-02-27 20:49:21.818298764 +0000 @@ -1,5 +1,5 @@ /* - * Copyright (c) 1999, 2016, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 1999, 2018, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -2870,8 +2870,10 @@ if (numa_available() != -1) { set_numa_all_nodes((unsigned long*)libnuma_dlsym(handle, "numa_all_nodes")); - set_numa_all_nodes_ptr((struct bitmask **)libnuma_dlsym(handle, "numa_all_nodes_ptr")); - set_numa_nodes_ptr((struct bitmask **)libnuma_dlsym(handle, "numa_nodes_ptr")); + struct bitmask** numa_all_nodes_ptr = (struct bitmask **)libnuma_dlsym(handle, "numa_all_nodes_ptr"); + set_numa_all_nodes_ptr(numa_all_nodes_ptr); + struct bitmask** numa_nodes_ptr = (struct bitmask **)libnuma_dlsym(handle, "numa_nodes_ptr"); + set_numa_nodes_ptr(numa_nodes_ptr == NULL ? numa_all_nodes_ptr : numa_nodes_ptr); // Create an index -> node mapping, since nodes are not always consecutive _nindex_to_node = new (ResourceObj::C_HEAP, mtInternal) GrowableArray<int>(0, true); rebuild_nindex_to_node_map();
05-03-2018