Bug ID: JDK-8129978 SIGSEGV when parsing command line options

JDK 9
9Resolved

Although it takes slightly different debug code to cause Kim's repro cmd line to fail, the underlying failure mode is the same. I'm closing this bug as a duplicate of: JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
25-08-2015
Ran an experiment with two pieces of debug code and my proposed fix for JDK-8049304. In the jdk repo, I have this piece of debug code (thanks Kim): $ hg diff diff -r d49f4e34e260 src/java.base/share/classes/sun/misc/VM.java --- a/src/java.base/share/classes/sun/misc/VM.java Tue Aug 18 04:29:28 2015 -0700 +++ b/src/java.base/share/classes/sun/misc/VM.java Fri Aug 21 19:23:38 2015 -0700 @@ -170,6 +170,7 @@ public class VM { // // This method is invoked by the Finalizer thread public static void awaitBooted() throws InterruptedException { +Thread.sleep(1000); synchronized (lock) { while (!booted) { lock.wait(); In the hotspot repo, I have this debug code (thanks Kim): $ hg diff src/share/vm/runtime/perfMemory.cpp diff -r efc17f03e5d4 src/share/vm/runtime/perfMemory.cpp --- a/src/share/vm/runtime/perfMemory.cpp Thu Aug 20 10:58:57 2015 -0700 +++ b/src/share/vm/runtime/perfMemory.cpp Fri Aug 21 19:24:55 2015 -0700 @@ -78,6 +78,11 @@ void perfMemory_exit() { // initialization. // PerfMemory::destroy(); +if (UseNewCode2) { +tty->print_cr("XXX perfdata destroyed"); +tty->flush(); +} +os::naked_short_sleep(999); } void PerfMemory::initialize() { The -XX:+UseNewCode2 option can be used to enable the extra message. I don't have it enabled by default in order to run the fix through Aurora Adhoc RT-SVC nightly testing and I don't want to fail any 'golden' file tests. In the hotspot repo, I have this key piece of my proposed fix: $ hg diff src/share/vm/runtime/perfData.cpp diff -r efc17f03e5d4 src/share/vm/runtime/perfData.cpp --- a/src/share/vm/runtime/perfData.cpp Thu Aug 20 10:58:57 2015 -0700 +++ b/src/share/vm/runtime/perfData.cpp Fri Aug 21 19:26:03 2015 -0700 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2001, 2014, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2001, 2015, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it @@ -39,6 +39,7 @@ PerfDataList* PerfDataManager::_all = PerfDataList* PerfDataManager::_all = NULL; PerfDataList* PerfDataManager::_sampled = NULL; PerfDataList* PerfDataManager::_constants = NULL; +volatile jint PerfDataManager::_has_PerfData = false; /* * The jvmstat global and subsystem jvmstat counter name spaces. The top @@ -282,6 +283,18 @@ void PerfDataManager::destroy() { // destroy already called, or initialization never happened return; + // Clear the flag before we free the PerfData counters. Thus begins + // the race between this thread and another thread that has just + // queried PerfDataManager::has_PerfData() and gotten back 'true'. + // The hope is that the other thread will finish its PerfData + // manipulation before we free the memory. The two alternatives + // are 1) leak the PerfData memory or 2) do some form of ordered + // access before every PerfData operation. +if (!UseNewCode) { + OrderAccess::release_store(&_has_PerfData, 0); + os::naked_short_sleep(1); // 1ms sleep to let other thread(s) run +} + for (int index = 0; index < _all->length(); index++) { PerfData* p = _all->at(index); delete p; @@ -302,6 +315,7 @@ void PerfDataManager::add_item(PerfData* if (_all == NULL) { _all = new PerfDataList(100); + OrderAccess::release_store(&_has_PerfData, 1); } assert(!_all->contains(p->name()), "duplicate name added"); The fix is enabled by default and the -XX:+UseNewCode option can be used to disable the fix. Here's Kim's test case above with extra options to disable my fix and enable the extra message: $ $JAVA_HOME/bin/java -XX:+UseNewCode -XX:+UseNewCode2 -XX:+UseConcMarkSweepGC -XX:CMSYoungGenPerWorker=1 -version Error occurred during initialization of VM GC triggered before VM initialization completed. Try increasing NewSize, current value 192K. XXX perfdata destroyed # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0xa) at pc=0xffffffff7d55d818, pid=29903, tid=0x000000000000000b # # JRE version: (9.0) (build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-fastdebug-20150821221801.ddaugher.8049304_for_jdk9-b00, mixed mode, tiered, compressed oops, concurrent mark sweep gc, solaris-sparc) # Problematic frame: # V [libjvm.so+0x155d818] ObjectMonitorObjectSynchronizer::inflate(Thread,oop)+0x208 # # Core dump will be written. Default location: /export/home/ddaugher/8049304_for_jdk9_hs_rt/hotspot/core or core.29903 # # An error report file with more information is saved as: # /export/home/ddaugher/8049304_for_jdk9_hs_rt/hotspot/hs_err_pid29903.log And again with the fix enabled (the default): $ $JAVA_HOME/bin/java -XX:+UseNewCode2 -XX:+UseConcMarkSweepGC -XX:CMSYoungGenPerWorker=1 -version Error occurred during initialization of VM GC triggered before VM initialization completed. Try increasing NewSize, current value 192K. XXX perfdata destroyed I ran the above in a 1000 loop set on a four way Solaris SPARC machine and didn't see any failures. At this point, I think I've done due diligence to say that this bug is a duplicate of: JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc() I'll reread this bug Monday and if I don't see anything that I missed, I close this one as a duplicate of JDK-8049304.
22-08-2015
But this one has debugging/repro info! Thanks Kim.
03-07-2015
The earlier bug is assigned to me so I'll take this one also: JDK-8049304 race between VM_Exit and _sync_FutileWakeups->inc()
03-07-2015
The segfault has nothing to do with GC or option checking; those only play a roll in tickling a "been there forever" VM shutdown race. During VM shutdown there is a call to perfMemory_exit, which (conditionally) calls PerfDataManager::destroy. If a still running thread can attempt to touch the performance data between the destruction of the performance data and the actual VM termination, we get a segfault. One place that touches the performance data is Java monitor inflation. During VM initialization we have some Java threads (such as the Finalizer thread) starting up. They go into a (monitor) wait, waiting for the VM to finish booting up (sun.misc.VM.awaitBooted). So if we have the following sequence of events: (1) we are in vm_exit_during_initialization, and (2) we have passed the call to PerfDataManager::destroy, but (3) the Finalizer reaches the wait and attempts to inflate the monitor then we crash with a segfault. The crash can be reliably reproduced by inserting a couple of sleeps, in order to expand the window of vulnerability. - In sun.misc.VM.awaitBooted(), call Thread.sleep(1000) at the beginning of the function. - In perfMemory_exit, call os::sleep(Thread::current(), 5000, false) after the (conditional) call to PerfDataManager::destroy. - Run the modified java application with options that will result in a call to vm_exit_during_initialization. The specific failing invocation recorded in this bug report used something like java -XX:+UseConcMarkSweepGC -XX:CMSYoungGenPerWorker=1 -version [Though that might not lead to a vm_exit_during_initialization in the future; that's a separate issue.] The sleep in awaitBooted ensures the Finalizer thread doesn't attempt the wait call and associated monitor inflation until after vm_exit_during_initialization has been called and has passed the call to PerfDataManager::destroy. The sleep in perfMemory_exit gives the sleeping Finalizer time to wake up, try to use the performance data, and crash. If we comment out the call to PerfMemoryData::destroy then that sequence no longer crashes. This problem isn't specific to vm_exit_during_initialization; it can arise from any call to os::shutdown that can be invoked concurrently with other threads that might touch the performance data. This includes guarantee/assert checks. Changing perfMemory_exit to not call PerfDataManager::destroy avoids the problem. That is the only call to PerfDataManager::destroy. However, it isn't correct to simply remove that call and delete the function. We should destroy the performace data in the embedded VM case (e.g. in jni_DestroyJavaVM), to avoid leaking it. So move the call to PerfDataManager::destroy from perfMemory_exit to exit_globals, immediately after its call to perfMemory_exit.
03-07-2015
integration_blocker remove justification: This is not new issue and is also exist in build 66 and older.
29-06-2015
Seems like a CMS issue. We know the heap has to be "big enough" to get through to a certain point of VM initialization before a GC can be allowed to occur. With those flags it seems we are borderline on that condition and sometimes fail.
27-06-2015
Script to run HelloWorld in a loop and analysis of output is attached. Open script and update JAVA_HOME variable to the tested Java Home. Then run script with one argument - desired number of iterations, e.g: ./catch_sigsegv.sh 10000 It will print current iteration and number of failed iterations. Also, if number of failed iterations is not 0, then it will create folders 1,2,3,etc. with output of the Java.
26-06-2015
Happens very rare. I was run HelloWorld.java with '-XX:+UseConcMarkSweepGC -XX:CMSYoungGenPerWorker=1' options in a loop and look for fatal error in the output on JDK 9-b66 fastdebug. And it seems that problem is appear once per 1000-3000 invocations. And not full message about fatal error is printed. But once I was able to get situation when bigger error message was displayed and hs_err_pid32649.log is generated(attached): Error occurred during initialization of VM GC triggered before VM initialization completed. Try increasing NewSize, current value 192K. # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f39154867f3, pid=32649, tid=0x00007f38f77ff700 # # JRE version: (9.0-b66) (build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-ea-fastdebug-b66 mixed mode linux-amd64 compressed oops) # Problematic frame: # V [libjvm.so+0x10077f3] ObjectSynchronizer::inflate(Thread*, oop)+0x6f3 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c" (or dumping to /home/dmitry/bundles/jdk9/b70/fastdebug/bin/core.32649) # # An error report file with more information is saved as: # /home/dmitry/bundles/jdk9/b70/fastdebug/bin/hs_err_pid32649.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # [error occurred during error reporting , id 0xb]
26-06-2015

Duplicate :	JDK-8049304 - race between VM_Exit and _sync_FutileWakeups->inc()
Relates :	JDK-8049304 - race between VM_Exit and _sync_FutileWakeups->inc()