JDK-7161844 : nsk/stress/stack/b4525850 hangs on classloading
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: hs23,hs25,8
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: solaris
  • CPU: sparc
  • Submitted: 2012-04-16
  • Updated: 2014-04-29
  • Resolved: 2014-04-22
Related Reports
Relates :  
Description
nsk/stress/stack/b4525850 hangs when tried to load classes on Solaris SPARC machines in JDK 7u4 b19
Problem wasn't found on linux machines. 

According to jstack all threads except daemons are in a waiting state. The problem is in thread "AWT-EventQueue-0"
that can't lock and waits for it. See jstack_log file:

"AWT-EventQueue-0" prio=3 tid=0x00b2f400 nid=0xf waiting on condition [0xd8b8c000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0xdee4c3d0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
	at java.util.concurrent.ConcurrentHashMap$Segment.scanAndLockForPut(ConcurrentHashMap.java:504)
	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:366)
	at java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1084)
	at java.lang.ClassLoader.getClassLoadingLock(ClassLoader.java:462)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:403)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
	at nsk.stress.stack.b4525850.b4525850$MyJTextArea.paint(b4525850.java:71)
...

See also jmap_log and pstack_log files.

Also problem appears in "VM JFR Buffer Thread", see jstack_log_JFR_thread file:

"VM JFR Buffer Thread" daemon prio=3 tid=0x00ab1c00 nid=0x7 waiting on condition [0xd967a000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0xd9cd72e8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
	at java.util.concurrent.ConcurrentHashMap$Segment.scanAndLockForPut(ConcurrentHashMap.java:504)
	at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:366)
	at java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1084)
	at java.lang.ClassLoader.getClassLoadingLock(ClassLoader.java:462)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:403)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
	at java.lang.Class.getDeclaredConstructors0(Native Method)
...

VM options set during promotion testing: 
-client -Xcomp -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:FlightRecorderOptions=defaultrecording=true,disk=true,dumponexit=true -XX:ReservedCodeCacheSize=256M

Tried to run with and without JFR but got the same results, see jstack_log_without_JFR.
Running with vm options "-client -Xmixed" results the same.

Reproduces on vm-t6320-01b, bur398-217.

Reproduceses on builds: 19, 17, 14. Builds with other numbers have not been tested.

Steps to reproduce:
1. ssh to vm-t6320-01b
2. cd to /import/gtee/pavel/reproduce_b4525850/
3. In folder you can find shell scripts run* to reproduce with appropriate options and builds. You can run them. Test may not hang during the first run.
   File rerun.sh is generated from Aurora. It hangs every run on this machine.
4. Also in subfolder "results" there are folders with jstack and pstack log files for each run I made.

Comments
This is a dup of 7011862. In fact the jfrMsg_lock is not the source for the hang --- the real cause as mentioned in David Holmes's comment is bug 7011862, which lead many test cases hang in same path.
20-03-2014

Thread #20: "VM JFR Buffer Thread" daemon prio=3 tid=0x0000000100f89000 nid=0x14 waiting on condition [0xffffffff6aef5000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000077b04f9e0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) at java.util.concurrent.ConcurrentHashMap$Segment.scanAndLockForPut(ConcurrentHashMap.java:570) at java.util.concurrent.ConcurrentHashMap$Segment.put(ConcurrentHashMap.java:432) It called Parker::park(0, 0) from java: public native void park(boolean isAbsolute, long time); from Unsafe_Park: JavaThreadParkedState jtps(thread, time != 0); thread->parker()->park(isAbsolute != 0, time); // so isAbsolute is 0, first isAbsolute to Parker::park is false. Thread 20# is the thread "VM JFR Buffer Thread" which processes JFR messages, and after it finished, before exit, notify thread is waiting on : JfrMsg_lock "Thread-1" prio=3 tid=0x000000010132a000 nid=0x1b waiting on condition [0xffffffff6a0fd000] java.lang.Thread.State: RUNNABLE at oracle.jrockit.jfr.VMJFR.rotate(Native Method) at oracle.jrockit.jfr.JFRImpl.started(JFRImpl.java:424) - locked <0x000000077b044ce0> (a java.lang.Object) at oracle.jrockit.jfr.Recording.start(Recording.java:343) at oracle.jrockit.jfr.JFRImpl.cloneRecording(JFRImpl.java:603) at oracle.jrockit.jfr.FlightRecorder.cloneRecording(FlightRecorder.java:236) at oracle.jrockit.jfr.DCmdDump.execute(DCmdDump.java:51) if (time == 0) { status = os::Solaris::cond_wait (_cond, _mutex) ; } else { status = os::Solaris::cond_timedwait (_cond, _mutex, &absTime); } Now, we call lwp_cond_wait and wait for someone else to wake us up, that never happened so Thread #27 (0x1b) waiting on JfrMsg_lock, hang too.
18-03-2014

If this is back to the Stackoverflow issue then it is present in 7, 8 and 9, even if you can not reliably induce it.
14-03-2014

b4470106.java, this hang happens only with 7, not 8 or 9.
14-03-2014

filed new bug https://bugs.openjdk.java.net/browse/JDK-8036576 for Test6792161. The timeout specified in the test is not enough, it took about 432 seconds for it to finish, but it is given 300 seconds.
04-03-2014

The log files and symptoms seemed not match. I have to clear the comments again. I misinterpret the result of Test6792161's result with b4525850. According the logs, I think they are different problems --- we should file different bug for Test6792161 failure though they all hang and timed out. The comment on 2014-03-03 09:08 showed OOM and hang at Thread #17. Thread 17: (state = BLOCKED) - oracle.jrockit.jfr.VMJFR.rotate() @bci=0 (Interpreted frame) - oracle.jrockit.jfr.JFRImpl.started(oracle.jrockit.jfr.Recording) @bci=113, line=424 (Interpreted frame) - oracle.jrockit.jfr.Recording.start() @bci=118, line=343 (Interpreted frame) - oracle.jrockit.jfr.JFRImpl.cloneRecording(oracle.jrockit.jfr.Recording, java.lang.String, boolean) @bci=104, line=603 (Interpreted frame) [2014-02-27T10:09:57.38] Stress time: 60 seconds [2014-02-27T10:09:57.38] Stress iterations factor: 1 [2014-02-27T10:09:57.38] Stress threads factor: 1 [2014-02-27T10:09:57.38] Max memory: 1983250432 [2014-02-27T10:09:57.38] Sleep time: 500 [2014-02-27T10:09:57.38] Iterations: 0 [2014-02-27T10:09:57.38] Number of threads: 4 [2014-02-27T10:09:57.38] Seed: 1393495797383 [2014-02-27T10:09:57.38] Run GC thread: false [2014-02-27T10:09:57.38] Run mem diag thread: false [2014-02-27T10:09:57.38] Run forever: false [2014-02-27T10:10:16.84] Exception in thread "SeedGenerator Thread" java.lang.OutOfMemoryError: Java heap space [2014-02-27T11:00:17.25] # Test level exit status: 151 # Host info: CYGWIN_NT-5.2-WOW64 spb23059 1.7.17(0.262/5/3) 2012-10-19 14:39 i686 Cygwin Looks Thread #12 is "SeedGenerator Thread" which caused OOM? I am not sure of that: Thread 12: (state = BLOCKED) - java.lang.Object.wait(long) @bci=0 (Interpreted frame) - java.lang.Object.wait() @bci=2, line=503 (Interpreted frame) - sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedByte() @bci=14, line=365 (Interpreted frame) - sun.security.provider.SeedGenerator$ThreadedSeedGenerator.getSeedBytes(byte[]) @bci=11, line=354 (Interpreted frame) - sun.security.provider.SeedGenerator.generateSeed(byte[]) @bci=4, line=139 (Interpreted frame) - sun.security.provider.SecureRandom$SeederHolder.<clinit>() @bci=20, line=186 (Interpreted frame) - sun.security.provider.SecureRandom.engineNextBytes(byte[]) @bci=21, line=203 (Interpreted frame) - java.security.SecureRandom.nextBytes(byte[]) @bci=5, line=455 (Interpreted frame)
04-03-2014

I'm finding the subsequent analysis somewhat confusing. The original stack trace is indicative of StackOverflowError leaving the ReentrantLock used by the ConcurrentHashMap segments in an inconsistent state. If OOME happens where does it happen and what is the new set of stack traces for such a hang?
04-03-2014

Found a problem for above comment. In face -XX:MaxInlineSize is not used in javac, it is only used in run. The problem is not from javac, instead, it could be from the OutOfMemoryError, which lead to starving of system resources. All thelog files so far passed javac but failed in run. I think the testing machine has less disk space during testing.
04-03-2014

This can be reproduced on Linux too. The MaxInlineSize=120 caused the javac timed out. It will increase the inline size threshold, do more inlining with bigger method. This caused more time spent on compilation. In Test6792126.java, the size of bytecodes is not more than 35, the default value. By changing - * @run main/othervm/timeout=300 -Xcomp -XX:MaxInlineSize=120 Test6792161 + * @run main/othervm/timeout=300 -Xcomp -XX:MaxInlineSize=45 Test6792161 It passed: #section:script_messages ----------messages:(5/236)---------- JDK under test: (/scratch/yqi/jdks/8/linux-x64/fastdebug) java version "1.8.0_20-ea-fastdebug" Java(TM) SE Runtime Environment (build 1.8.0_20-ea-fastdebug-b03) Java HotSpot(TM) 64-Bit Server VM (build 25.20-b04-fastdebug, mixed mode) #section:build ----------messages:(3/96)---------- command: build Test6792161 reason: Named class compiled on demand elapsed time (seconds): 2.914 result: Passed. Build successful #section:compile ----------messages:(3/169)---------- command: compile /scratch/yqi/ws/eight/hotspot-rt/test/compiler/6792161/Test6792161.java reason: .class file out of date or does not exist elapsed time (seconds): 2.904 ----------System.out:(0/0)---------- ----------System.err:(2/166)---------- Note: /scratch/yqi/ws/eight/hotspot-rt/test/compiler/6792161/Test6792161.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. result: Passed. Compilation successful #section:main ----------messages:(3/186)---------- command: main -Xcomp -XX:MaxInlineSize=45 Test6792161 reason: User specified action: run main/othervm/timeout=300 -Xcomp -XX:MaxInlineSize=45 Test6792161 elapsed time (seconds): 14.633 ----------System.out:(0/0)---------- ----------System.err:(1/15)---------- STATUS:Passed. result: Passed. Execution successful test result: Passed. Execution successful
03-03-2014

for Test6792161.java, the tests using fastdebug timed out on javac. When compile separately(not using jtreg), it passes with lint warning but I think it is OK. Will work to find more information since product build passed. With changing timetout to 600, it still timed out.
01-03-2014

The problem seems related to framework or jtreg or network. It timed out on javac: compiler/6792161/Test6792161.java Error. Program `/export/yqi/jdks/jdk1.8.0_20/fastdebug/bin/javac' interrupted! (timed out?) This used the most recent 8u20.
01-03-2014