JDK-8077392 : Stream fork/join tasks occasionally fail to complete
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2015-04-10
  • Updated: 2017-08-17
  • Resolved: 2016-04-04
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b115Fixed
Related Reports
Blocks :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8152358 :  
Description
java/util/stream/test/org/openjdk/tests/java/util/stream/ToArrayOpTest.java

Failed 1 time during jdk9/b56 same binaries run. Also see the test timeout at different runs at different machines.

#section:build
----------messages:(3/3266)----------
command: build org.openjdk.tests.java.lang.invoke.SerializedLambdaTest org.openjdk.tests.java.lang.invoke.DeserializeMethodTest
...
elapsed time (seconds): 0.003
result: Passed. All files up to date

#section:testng
----------messages:(272/15342)----------
command: testng org.openjdk.tests.java.util.stream.ToArrayOpTest
reason: Assumed action based on file name: run testng org.openjdk.tests.java.util.stream.ToArrayOpTest 
Timeout information:
Running jstack on process 36966
2015-04-06 21:03:15
Full thread dump Java HotSpot(TM) 64-Bit Server VM (1.9.0-ea-b56 mixed mode):

"Attach Listener" #55 daemon prio=9 os_prio=64 tid=0x00000000032f1000 nid=0x50 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"ForkJoinPool.commonPool-worker-28" #51 daemon prio=5 os_prio=64 tid=0x00000000032c1800 nid=0x4c waiting on condition [0xffff80ffa2094000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e000e880> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1826)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1695)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-3" #50 daemon prio=5 os_prio=64 tid=0x0000000002f6d000 nid=0x4b waiting on condition [0xffff80ffa2195000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e000e880> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1826)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1695)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-17" #48 daemon prio=5 os_prio=64 tid=0x0000000002bb1800 nid=0x49 waiting on condition [0xffff80ffa2397000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e000e880> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1826)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1695)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-27" #42 daemon prio=5 os_prio=64 tid=0x00000000026c8800 nid=0x44 waiting on condition [0xffff80ffa289c000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e000e880> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1826)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1695)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-20" #43 daemon prio=5 os_prio=64 tid=0x0000000003ebd800 nid=0x43 waiting on condition [0xffff80ffa299d000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e000e880> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1826)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1695)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-16" #39 daemon prio=5 os_prio=64 tid=0x0000000002b1f000 nid=0x40 waiting on condition [0xffff80ffa2ca0000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e000e880> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1826)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1695)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-12" #35 daemon prio=5 os_prio=64 tid=0x0000000001c59800 nid=0x3e waiting on condition [0xffff80ffa2ea2000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e000e880> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1826)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1695)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-19" #34 daemon prio=5 os_prio=64 tid=0x0000000002926800 nid=0x3b waiting on condition [0xffff80ffa31a5000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e000e880> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1826)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1695)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-1" #32 daemon prio=5 os_prio=64 tid=0x000000000288f000 nid=0x37 waiting on condition [0xffff80ffa35a9000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e000e880> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1826)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1695)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"ForkJoinPool.commonPool-worker-22" #29 daemon prio=5 os_prio=64 tid=0x00000000019f5800 nid=0x36 waiting on condition [0xffff80ffa36aa000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e000e880> (a java.util.concurrent.ForkJoinPool)
	at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1826)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1695)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

"MainThread" #23 prio=5 os_prio=64 tid=0x000000000090f800 nid=0x30 in Object.wait() [0xffff80ffb3129000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334)
	- locked <0x00000000f797c750> (a java.util.stream.Nodes$ToArrayTask$OfLong)
	at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405)
	at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
	at java.util.stream.Nodes.flattenLong(Nodes.java:522)
	at java.util.stream.Nodes.collectLong(Nodes.java:405)
	at java.util.stream.LongPipeline.evaluateToNode(LongPipeline.java:140)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:572)
	at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:255)
	at java.util.stream.LongPipeline.toArray(LongPipeline.java:486)
	at org.openjdk.tests.java.util.stream.ToArrayOpTest.lambda$testLongOpsWithFilter$67(ToArrayOpTest.java:313)
	at org.openjdk.tests.java.util.stream.ToArrayOpTest$$Lambda$239/1024538978.apply(Unknown Source)
	at java.util.stream.OpTestCase$BaseTerminalTestScenario.run(OpTestCase.java:404)
	at java.util.stream.OpTestCase$ExerciseDataTerminalBuilder.exercise(OpTestCase.java:528)
	at java.util.stream.OpTestCase.exerciseTerminalOps(OpTestCase.java:565)
	at org.openjdk.tests.java.util.stream.ToArrayOpTest.testLongOpsWithFilter(ToArrayOpTest.java:313)
	at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:502)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
	at org.testng.TestRunner.privateRun(TestRunner.java:767)
	at org.testng.TestRunner.run(TestRunner.java:617)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
	at org.testng.SuiteRunner.run(SuiteRunner.java:240)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1149)
	at org.testng.TestNG.run(TestNG.java:1057)
	at com.sun.javatest.regtest.TestNGAction$TestNGRunner.main(TestNGAction.java:163)
	at com.sun.javatest.regtest.TestNGAction$TestNGRunner.main(TestNGAction.java:147)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:502)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:92)
	at java.lang.Thread.run(Thread.java:745)

"Service Thread" #21 daemon prio=9 os_prio=64 tid=0x00000000006cd800 nid=0x2e runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Sweeper thread" #20 daemon prio=9 os_prio=64 tid=0x00000000006bc800 nid=0x2d runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread14" #19 daemon prio=9 os_prio=64 tid=0x00000000006ba800 nid=0x2c waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread13" #18 daemon prio=9 os_prio=64 tid=0x00000000006b0000 nid=0x2b waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread12" #17 daemon prio=9 os_prio=64 tid=0x000000000068d800 nid=0x2a waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread11" #16 daemon prio=9 os_prio=64 tid=0x000000000068b800 nid=0x29 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread10" #15 daemon prio=9 os_prio=64 tid=0x0000000000689800 nid=0x28 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread9" #14 daemon prio=9 os_prio=64 tid=0x0000000000676800 nid=0x27 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread8" #13 daemon prio=9 os_prio=64 tid=0x0000000000674800 nid=0x26 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread7" #12 daemon prio=9 os_prio=64 tid=0x0000000000671800 nid=0x25 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread6" #11 daemon prio=9 os_prio=64 tid=0x000000000064f000 nid=0x24 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread5" #10 daemon prio=9 os_prio=64 tid=0x0000000000645000 nid=0x23 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread4" #9 daemon prio=9 os_prio=64 tid=0x0000000000610800 nid=0x22 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread3" #8 daemon prio=9 os_prio=64 tid=0x00000000005ed000 nid=0x21 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread2" #7 daemon prio=9 os_prio=64 tid=0x00000000005eb800 nid=0x20 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #6 daemon prio=9 os_prio=64 tid=0x00000000005e8800 nid=0x1f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=64 tid=0x00000000005e2800 nid=0x1e waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=64 tid=0x00000000005e1800 nid=0x1d runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=64 tid=0x00000000005ad000 nid=0x1c in Object.wait() [0xffff80ffb6993000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000e0019d78> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
	- locked <0x00000000e0019d78> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:212)

"Reference Handler" #2 daemon prio=10 os_prio=64 tid=0x00000000005a1000 nid=0x1b in Object.wait() [0xffff80ffb6a94000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:508)
	at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
	- locked <0x00000000e0016448> (a java.lang.ref.Reference$Lock)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"main" #1 prio=5 os_prio=64 tid=0x0000000000420000 nid=0x2 in Object.wait() [0xffff80ffbf19e000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000e000f390> (a java.lang.Thread)
	at java.lang.Thread.join(Thread.java:1249)
	- locked <0x00000000e000f390> (a java.lang.Thread)
	at java.lang.Thread.join(Thread.java:1323)
	at com.sun.javatest.regtest.agent.MainWrapper.main(MainWrapper.java:69)

"VM Thread" os_prio=64 tid=0x0000000000599800 nid=0x1a runnable 

"ParGC Thread#0" os_prio=64 tid=0x0000000000432800 nid=0x3 runnable 

"ParGC Thread#1" os_prio=64 tid=0x0000000000434000 nid=0x4 runnable 

"ParGC Thread#2" os_prio=64 tid=0x0000000000435800 nid=0x5 runnable 

"ParGC Thread#3" os_prio=64 tid=0x0000000000437000 nid=0x6 runnable 

"ParGC Thread#4" os_prio=64 tid=0x0000000000438800 nid=0x7 runnable 

"ParGC Thread#5" os_prio=64 tid=0x000000000043a800 nid=0x8 runnable 

"ParGC Thread#6" os_prio=64 tid=0x000000000043c000 nid=0x9 runnable 

"ParGC Thread#7" os_prio=64 tid=0x000000000043d800 nid=0xa runnable 

"ParGC Thread#8" os_prio=64 tid=0x000000000043f000 nid=0xb runnable 

"ParGC Thread#9" os_prio=64 tid=0x0000000000440800 nid=0xc runnable 

"ParGC Thread#10" os_prio=64 tid=0x0000000000442000 nid=0xd runnable 

"ParGC Thread#11" os_prio=64 tid=0x0000000000443800 nid=0xe runnable 

"ParGC Thread#12" os_prio=64 tid=0x0000000000445000 nid=0xf runnable 

"ParGC Thread#13" os_prio=64 tid=0x0000000000446800 nid=0x10 runnable 

"ParGC Thread#14" os_prio=64 tid=0x0000000000448000 nid=0x11 runnable 

"ParGC Thread#15" os_prio=64 tid=0x0000000000449800 nid=0x12 runnable 

"ParGC Thread#16" os_prio=64 tid=0x000000000044b000 nid=0x13 runnable 

"ParGC Thread#17" os_prio=64 tid=0x000000000044e000 nid=0x14 runnable 

"ParGC Thread#18" os_prio=64 tid=0x000000000044f800 nid=0x15 runnable 

"ParGC Thread#19" os_prio=64 tid=0x0000000000451000 nid=0x16 runnable 

"ParGC Thread#20" os_prio=64 tid=0x0000000000452800 nid=0x17 runnable 

"ParGC Thread#21" os_prio=64 tid=0x0000000000456000 nid=0x18 runnable 

"ParGC Thread#22" os_prio=64 tid=0x0000000000457800 nid=0x19 runnable 

"VM Periodic Task Thread" os_prio=64 tid=0x00000000006cf800 nid=0x2f waiting on condition 

JNI global references: 599

--- Timeout information end.
elapsed time (seconds): 480.552
Comments
See JDK-8152358 for the latest pre-integration stress test results.
04-04-2016

The fix for this bug is not enough to enable ObjectSynchronizer::quick_enter() on ARM64. See: JDK-8153107 enabling ObjectSynchronizer::quick_enter() on ARM64 causes hangs
31-03-2016

No failures at all at the 24 hour mark... $ elapsed_times test_run.start doit_loop.copy00.log; grep -v PASS doit_loop.copy0[01].log test_run.start 0 seconds doit_loop.copy00.log 1 days 22 seconds doit_loop.copy00.log:Loop #3134... doit_loop.copy01.log:Loop #3133... Update: No failures at all at the 48 hour mark... $ elapsed_times test_run.start doit_loop.copy00.log; grep -v PASS doit_loop.copy0[01].log test_run.start 0 seconds doit_loop.copy00.log 2 days 3 minutes 35 seconds doit_loop.copy00.log:Loop #6232... doit_loop.copy01.log:Loop #6231... Update: No failures at all at the 72 hour mark... $ elapsed_times test_run.start doit_loop.copy00.log; grep -v PASS doit_loop.copy0[01].log test_run.start 0 seconds doit_loop.copy00.log 3 days 6 minutes 29 seconds doit_loop.copy00.log:Loop #9174... doit_loop.copy01.log:Loop #9173... Note: Had to run a "zpool scrub" during this last 24 hour period so the increased I/O bandwidth slowed down the number of possible iterations relative to the previous two 24 hour slices.
24-03-2016

@Vladimir - Thanks! Our discussions on Friday helped clear up the tracing diagnostics which revealed the racing code path. Thanks for your help! @David - Thanks! The BasicLock is just part of the basic "monitorenter" protocol (pun intended) and we leverage the availability of the _displaced_header field to mark a recursive enter without taking any additional space. This all works well for recursive entries of a Java Monitor until contention happens. We don't have the _recursions field until the Java Monitor is inflated into an ObjectMonitor. So the transition of a Java Monitor from stack lock to ObjectMonitor has these complications where a query like "is my current Java Monitor enter recursive or not?" is answered by different checks depending on whether the Java Monitor is a stack lock or an ObjectMonitor. Add Biased Locking to the mix and now we have an additional transition so we go from Biased Lock -> ObjectMonitor; we can also go from Baised Lock -> stack lock -> ObjectMonitor. When we do these transitions, we still have to keep the BasicLock housekeeping up to date because that BasicLock is associated with the monitorenter bytecode that got us here. We can't use a simple assert() to verify that NULL displaced header matches the _recursions count. Remember that the displaced header is in the BasicLock which is in the interpreter frame (or the compiled frame). If we have _recursions == 2, then we have an original monitorenter done in one frame and we have two more monitorenters that are potentially done in the same frame (weird) or in newer frames. In order to sanity check _recursions == 2, we would have to check the current frame and then walk older frames until we find the original BasicLock.
22-03-2016

Note: JBS munges the formatting of this note so I've attached it as eval_note5. This bug has finally been chased to ground. As expected, the bug is a race condition that is only present in certain configurations. This note is an attempt to describe the race and the conditions under which the race can occur. This race is due to a bug in ObjectSynchronizer::quick_enter() which is a C2 function that was added by the "fast enter" bucket for the Contended Locking project. See: JDK-8061553 Contended Locking fast enter bucket so this bug is only present in configurations that use the Server VM (C2); configurations that use the Client VM (C1) will not observe this bug. Secondarily, Biased Locking must be enabled for the race condition to manifest. By default Biased Locking is enabled at 4 seconds so any hangs seen where the VM uptime is less than 4 seconds are not likely to be due to this bug. Lastly, there must be contention on the Java Monitor in question so there must be two or more threads using the Java Monitor that has been observed as "stranded". Here's the conditions above in check list form with a few additional conditions: - Server Compiler/C2 is in use - Biased Locking enabled (VM uptime >= 4 seconds) - Java Monitor contention - Without special options, this hang should only be observed in JDK9-B53 -> JDK9-B63; JDK-8061553 was promoted in JDK9-B53 and the fix to disable it (JDK-8079359) was promoted in JDK9-B64. - So if your hang occurred before JDK9-B53 or in JDK9-B64 or later, then this bug is not likely the cause. If you think you have a hang that is caused by this bug, then use the following diagnostic options: -XX:+UnlockExperimentalVMOptions -XX:SyncKnobs=ReportSettings=1:Verbose=1:ExitRelease=1:VerifyMatch=1 The 'VerifyMatch=1' portion of the above diagnostic options will cause output like the following when you've run into this bug: INFO: unexpected locked object: - locked <0xfffffd7be95defe0> (a java.util.stream.Nodes$CollectorTask$OfDouble) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2203), pid=19281, tid=95 # fatal error: exiting JavaThread=0x0000000004278800 unexpectedly owns ObjectMonitor=0x00000000016f2000 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_03_18_18_43-b00) The diagnostic output above shows: - the unexpected locked object (0xfffffd7be95defe0) - the object's type (java.util.stream.Nodes$CollectorTask$OfDouble) - the thread that owns the lock (0x0000000004278800), - and the ObjectMonitor (0x00000000016f2000). Please note that mis-behaving programs that use JNI locking can also run into this diagnostic trap so I recommend careful use of these diagnostic options. Gory Code Details: ## ## JavaThread1 (JT1) - Part 1 ## The first JavaThread (JT1) in the race is executing this code (when the -XX:-UseOptoBiasInlining is specified): src/cpu/x86/vm/macroAssembler_x86.cpp: int MacroAssembler::biased_locking_enter(Register lock_reg, Register obj_reg, Register swap_reg, Register tmp_reg, bool swap_reg_contains_mark, Label& done, Label* slow_case, BiasedLockingCounters* counters) { <snip> movptr(tmp_reg, swap_reg); andptr(tmp_reg, markOopDesc::biased_lock_mask_in_place); cmpptr(tmp_reg, markOopDesc::biased_lock_pattern); jcc(Assembler::notEqual, cas_label); // The bias pattern is present in the object's header. Need to check // whether the bias owner and the epoch are both still current. <JT1 has just made it past the above check so the object's header> <has the bias pattern in it> Note: When UseOptoBiasInlining is enabled (the default), biased_locking_enter() is not used and the C2 ideal graph version of the algorithm is used; for this note, -XX:-UseOptoBiasInlining is used because it's easier to explain biased_locking_enter()'s assembly code than the C2 ideal graph code. See PhaseMacroExpand::expand_lock_node() for the C2 ideal graph code. ## ## JavaThread2 (JT2) - Part 1 ## The second JavaThread (JT2) is inflating the JavaMonitor associated with this object so it is here (for example): src/share/vm/runtime/synchronizer.cpp: void ObjectSynchronizer::slow_enter(Handle obj, BasicLock* lock, TRAPS) { <snip> lock->set_displaced_header(markOopDesc::unused_mark()); ObjectSynchronizer::inflate(THREAD, obj(), inflate_cause_monitor_enter)->enter(THREAD); Note: Don't be confused by the call to "lock->set_displaced_header(markOopDesc::unused_mark())" above; that's the BasicLock in JT2's context. JT2 has finished the inflation part using the Biased Locking "CASE: neutral" code: src/share/vm/runtime/synchronizer.cpp ObjectMonitor* ObjectSynchronizer::inflate(Thread * Self, oop object, const InflateCause cause) { <snip> // CASE: neutral // TODO-FIXME: for entry we currently inflate and then try to CAS _owner. // If we know we're inflating for entry it's better to inflate by swinging a // pre-locked objectMonitor pointer into the object header. A successful // CAS inflates the object *and* confers ownership to the inflating thread. // In the current implementation we use a 2-step mechanism where we CAS() // to inflate and then CAS() again to try to swing _owner from NULL to Self. // An inflateTry() method that we could call from fast_enter() and slow_enter() // would be useful. assert(mark->is_neutral(), "invariant"); ObjectMonitor * m = omAlloc(Self); // prepare m for installation - set monitor to initial state m->Recycle(); m->set_header(mark); m->set_owner(NULL); and is now racing with JT1 for ownership of the Java Monitor in ObjectMonitor::enter(). For our failure mode, JT2 loses the race to JT1. ## ## JavaThread1 (JT1) - Part 2 ## src/cpu/x86/vm/macroAssembler_x86.cpp: int MacroAssembler::biased_locking_enter(Register lock_reg, <snip> andptr(tmp_reg, markOopDesc::biased_lock_mask_in_place); cmpptr(tmp_reg, markOopDesc::biased_lock_pattern); jcc(Assembler::notEqual, cas_label); // The bias pattern is present in the object's header. Need to check // whether the bias owner and the epoch are both still current. <Once JT1 has made it past the 'cmpptr' above, the race with JT2's> <inflation of the Java Monitor starts.> if (swap_reg_contains_mark) { null_check_offset = offset(); } load_prototype_header(tmp_reg, obj_reg); orptr(tmp_reg, r15_thread); xorptr(tmp_reg, swap_reg); Register header_reg = tmp_reg; andptr(header_reg, ~((int) markOopDesc::age_mask_in_place)); if (counters != NULL) { cond_inc32(Assembler::zero, ExternalAddress((address) counters->biased_lock_entry_count_addr())); } jcc(Assembler::equal, done); <The above code block is for checking if the object is biased towards> <the current thread. Since this is the first time this thread has locked> <this object it cannot be biased towards this thread so this block does> <not bail to the "done" label (with success).> // At this point we know that the header has the bias pattern and // that we are not the bias owner in the current epoch. We need to // figure out more details about the state of the header in order to // know what operations can be legally performed on the object's // header. // If the low three bits in the xor result aren't clear, that means // the prototype header is no longer biased and we have to revoke // the bias on this object. testptr(header_reg, markOopDesc::biased_lock_mask_in_place); jccb(Assembler::notZero, try_revoke_bias); // Biasing is still enabled for this data type. See whether the // epoch of the current bias is still valid, meaning that the epoch // bits of the mark word are equal to the epoch bits of the // prototype header. (Note that the prototype header's epoch bits // only change at a safepoint.) If not, attempt to rebias the object // toward the current thread. Note that we must be absolutely sure // that the current epoch is invalid in order to do this because // otherwise the manipulations it performs on the mark word are // illegal. testptr(header_reg, markOopDesc::epoch_mask_in_place); jccb(Assembler::notZero, try_rebias); <The above two checks are for either revoking an existing bias> <or rebiasing the object towards the current thread. For this> <scenario, our object is biased neutral so those two blocks> <do not matter.> // The epoch of the current bias is still valid but we know nothing // about the owner; it might be set or it might be clear. Try to // acquire the bias of the object using an atomic operation. If this // fails we will go in to the runtime to revoke the object's bias. // Note that we first construct the presumed unbiased header so we // don't accidentally blow away another thread's valid bias. NOT_LP64( movptr(swap_reg, saved_mark_addr); ) andptr(swap_reg, markOopDesc::biased_lock_mask_in_place | markOopDesc::age_mask_in_place | markOopDesc::epoch_mask_in_place); #ifdef _LP64 movptr(tmp_reg, swap_reg); orptr(tmp_reg, r15_thread); #else get_thread(tmp_reg); orptr(tmp_reg, swap_reg); #endif if (os::is_MP()) { lock(); } cmpxchgptr(tmp_reg, mark_addr); // compare tmp_reg and swap_reg // If the biasing toward our thread failed, this means that // another thread succeeded in biasing it toward itself and we // need to revoke that bias. The revocation will occur in the // interpreter runtime in the slow case. if (counters != NULL) { cond_inc32(Assembler::zero, ExternalAddress((address) counters->anonymously_biased_lock_entry_count_addr())); } if (slow_case != NULL) { jcc(Assembler::notZero, *slow_case); } jmp(done); <The above block is where the inflation race with JT2 comes into play.> <We (JT1) are trying to bias the object towards ourself with the> <above cmpxchgptr(), but JT2 has updated the object's header> <to refer to an ObjectMonitor, so the tmp_reg value that we are> <comparing against the object's header will no longer match.> <We fail the cmpxchgptr() and jump to the "done" label in the> <caller of biased_locking_enter() which is fast_lock().> <fast_lock() returns to its caller with the ZFlag set to zero which> <results in a call to complete_monitor_enter_C() which results> <in a call to ObjectSynchronizer::quick_enter() where we run> <into our bug.> src/share/vm/runtime/synchronizer.cpp: bool ObjectSynchronizer::quick_enter(oop obj, Thread * Self, BasicLock * Lock) { <snip> <JT1 has reached quick_enter() as a last ditch attempt to> <enter the JavaMonitor without doing expensive stuff like> <thread state transitions...> if (mark->has_monitor()) { <snip> <This check is true because JT2 inflated the Java Monitor> <while JT1 was in biased_locking_enter()...> if (owner == Self) { <snip> <This check is NOT true because the Java Monitor was> <inflated by JT2 and the owner field is NOT set to another> <JavaThread pointer in that case. Inflation can happen due> <to operations other that Java Monitor enter so its not> <appropriate for inflation to set the _owner field.> if (owner == NULL && Atomic::cmpxchg_ptr(Self, &(m->_owner), NULL) == NULL) { <Our thread (JT1) sees the owner field as NULL and we are able> <to make ourself the owner via the cmpxchg_ptr() call. This is all> <good, but the code path that we are on didn't change the BasicLock's> <_displaced_header field value to something other than NULL. This> <means that our JavaThread's FIRST TIME entry of this Java Monitor> <will be seen as a recursive enter which means when we exit the Java> <Monitor we will treat the exit as a no-op. Oops!> assert(m->_recursions == 0, "invariant"); assert(m->_owner == Self, "invariant"); return true; } To recap: - JT1 calls C2 fast_lock() which calls biased_locking_enter() - JT2 inflates the Java Monitor - JT1 bails biased_locking_enter() after making it past the first check which results in an early bail from fast_lock() - JT1 makes a slow path call to complete_monitor_enter_C() - JT1 makes a last ditch call to quick_enter() before doing the real slow path work The early bail code path in biased_locking_enter() and fast_lock() results in the BasicLock's _displaced_header field value remaining NULL which marks this entry as recursive. If JT2's inflation had happened a little earlier, then JT1 would have taken the first bail point in biased_locking_enter() which would have resulted in a regular fast_lock() code path which does initialize the BasicLock's _displaced_header field. The fix for this problem is a 1-liner: --- a/src/share/vm/runtime/synchronizer.cpp +++ b/src/share/vm/runtime/synchronizer.cpp @@ -229,6 +229,9 @@ bool ObjectSynchronizer::quick_enter(oop if (owner == NULL && Atomic::cmpxchg_ptr(Self, &(m->_owner), NULL) == NULL) { + // Make the displaced header non-NULL so this BasicLock is + // not seen as recursive. + Lock->set_displaced_header(markOopDesc::unused_mark()); assert(m->_recursions == 0, "invariant"); assert(m->_owner == Self, "invariant"); return true; So when quick_enter() succeeds at its last ditch optimization, it needs to mark the BasicLock's _displaced_header field with a non-zero value (like the other lock grabbing code paths).
22-03-2016

Congratulations Dan! I have to wonder why _recursions is not used to indicate recursive entry instead of using something so obscure as a NULL displaced header. <sigh> At a minimum I would hope that some assert would check that the two indicators of recursion are in fact in agreement.
22-03-2016

Congratulation on finding the cause!!! You rock!
22-03-2016

In the run I started after dinner on 2016.03.18, instance #2 got a failure in run #78. Interesting diag messages: XXX-4 - 'java/util/concurrent/ForkJoinTask.setCompletion': obj=0xfffffd7be95defe0, mon=0x00000000016f2000, dcubed_jme_last_trace_points=0x0000000500001022, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX-4 - dcubed_jme_fast_lock_deopt=0, dcubed_jme_fast_lock_obj=0xfffffd7be95defe0, dcubed_jme_fast_lock_mon=0x0000000000000000 XXX-4 - dcubed_jme_quick_enter_deopt=0, dcubed_jme_quick_enter_obj=0xfffffd7be95defe0, dcubed_jme_quick_enter_mon=0x00000000016f2000 XXX-4 - mon[0]={lock=0xfffffd7fc01d8640, dmw=0x0000000000000000, elim=false, replaced=false, owner=0xfffffd7be95defe0} XXX-7 - thread=0x0000000004278800, obj=0xfffffd7be95defe0, tracePoints=0x62 XXX-7 - monitor=0x00000000016f2000, tracePoints=0x62 dcubed_mon=0x00000000016f2000: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=8, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x0000000004278800, dcubed_omna_target_thread=0x0000000000ab1800 INFO: unexpected locked object: - locked <0xfffffd7be95defe0> (a java.util.stream.Nodes$CollectorTask$OfDouble) INFO: uo_last_trace_points=0x62 INFO: unlock_object_trace_points=0xffffffff INFO: thread->dcubed_unlock_object_last_trace_points=0x62 INFO: mid->dcubed_uo_last_trace_points=0x62 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2203), pid=19281, tid=95 # fatal error: exiting JavaThread=0x0000000004278800 unexpectedly owns ObjectMonitor=0x00000000016f2000 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_03_18_18_43-b00) The interesting part from DCUBED_JME_TRACE is here: dcubed_jme_last_trace_points=0x0000000500001022 // Mark that we came from MacroAssembler::fast_lock(). orptr(tracePoints, 0x00000002); // Record that we didn't take the force slow-path branch orptr(tracePoints, 0x00000020); // Record that we returned failure from fast_lock orptr(tracePoints, 0x00001000); // Record that we called ObjectSynchronizer::quick_enter() jt->add_dcubed_jme_last_trace_points(obj, Lock, 0x000100000000L); // Record that we grabbed the ObjectMonitor with cmpxhg() jt->add_dcubed_jme_last_trace_points(obj, Lock, m, 0x000400000000L); The dcubed_jme_last_trace_points value is much more sane! It shows that C2 fast_lock() failed and the subsequent call to ObjectSynchronizer::quick_enter() worked with a cmpxhg(). Since the following flag is not set: // Record that biased_locking_enter() didn't take the 'DONE' label. orptr(tracePoints, 0x00000040); that shows that we called biased_locking_enter(boxReg, objReg, tmpReg, scrReg, false, DONE_LABEL, NULL, counters); and that code jumped to the DONE_LABEL which is where this flag was set: // Record that we returned failure from fast_lock orptr(tracePoints, 0x00001000); This diag line from DCUBED_UNLOCK_OBJECT_DEBUG in ObjectMonitor::INotify(): XXX-4 - mon[0]={lock=0xfffffd7fc01d8640, dmw=0x0000000000000000, elim=false, replaced=false, owner=0xfffffd7be95defe0} shows that the displaced mark word (dmw) for our BasicLock is 0x0 which indicates a recursive lock. This diag line from DCUBED_UNLOCK_OBJECT_DEBUG in SharedRuntime::trace_fast_unlock(): XXX-7 - thread=0x0000000004278800, obj=0xfffffd7be95defe0, tracePoints=0x62 shows our unlock trace points as: // Mark that we came from MacroAssembler::fast_unlock(). orptr(tracePoints, 0x00002); // Record that we didn't take the force slow-path branch orptr(tracePoints, 0x00020); // Record that biased_locking_exit() didn't take the 'DONE' label. orptr(tracePoints, 0x00040); Since the following flag was not set: // Record that we didn't take the recursive case. orptr(tracePoints, 0x00080); we know that fast_unlock() took the recursive exit case which explains why the Java Monitor was not unlocked.
19-03-2016

Re: Run with PrintOptoAssembly to see compiled code. I'll do that first thing next week. Update: With -XX:-UseOptoBiasInlining in place, I'm back to my original failure mode (see above for details). So I'm holding off on PrintOptoAssembly since it looks like all -XX:-UseOptoBiasInlining is doing for me is keeping C2's complete_monitor_enter_C() from being called without a preceding call to C2's fast_enter().
19-03-2016

Finished my first pass read of PhaseMacroExpand::expand_lock_node() in opto/macro.cpp and I'm starting to wonder if the UseOptoBiasInlining code path generates a C2 ideal graph where the slow path is accidentally executed twice. That would explain what the tracing is showing... Update: Thought about this more over dinner. It's entirely possible that the UseOptoBiasInlining code might call the slow path without a call to the C2 fast_lock() code first. That would also explain what the tracing code is complaining about: e.g., C2 fast_lock() call on Java Monitor-1 which works followed by a direct C2 complete_monitor_locking_C() call (the slow path) on Java Monitor-2 which works. Since the C2 fast_lock() code was never called for Java Monitor-2, it would look like we had back to back calls to C2 fast_lock() and C2 complete_monitor_locking_C() both of which worked. If my revised theory is correct, then the -XX:-UseOptoBiasInlining test run should still see the original failure mode for this bug which is a JavaThread exiting while owning a Java Monitor. Update: Revised theory is likely correct. My parallel runs failed with the original failure mode in run #247 for instance #1 and run #386 for instance #2.
19-03-2016

I dumped the entire code block for: 1638 C2 java.util.stream.Nodes$SizedCollectorTask.compute()V (132 bytes) @ 0xfffffd7ff2537b84 [0xfffffd7ff2536a60+0x0000000000001124] and I didn't find another code that reached these instructions: 0xfffffd7ff2537b78: leaq 0x0000000000000040(%rsp),%rdx 0xfffffd7ff2537b7d: nop 0xfffffd7ff2537b7f: call 0xfffffd7feab36ce0 [ 0xfffffd7feab36ce0, .-0x7a00e9f ] I also did not find another call to 0xfffffd7feab36ce0 (~RuntimeStub::_complete_monitor_locking_Java)
18-03-2016

" I'm starting to wonder if the UseOptoBiasInlining code path generates a C2 ideal graph where the slow path is accidentally executed twice." Run with PrintOptoAssembly to see compiled code.
18-03-2016

Okay this one is fine indeed. Are there an other path to: 0xfffffd7ff2537b78: leaq 0x0000000000000040(%rsp),%rdx
18-03-2016

Vladimir, not sure why you think the 'jccb' is jumping too far. Here's the code: // DONE_LABEL is a hot target - we'd really like to place it at the // start of cache line by padding with NOPs. // See the AMD and Intel software optimization manuals for the // most efficient "long" NOP encodings. // Unfortunately none of our alignment mechanisms suffice. bind(DONE_LABEL); // At DONE_LABEL the icc ZFlag is set as follows ... // Fast_Unlock uses the same protocol. // ZFlag == 1 -> Success // ZFlag == 0 -> Failure - force control through the slow-path } #ifdef DCUBED_C2_FAST_LOCK_DEBUG Label MY_DONE_FAILED; // if current state is failure, then there is nothing more to do jccb(Assembler::notZero, MY_DONE_FAILED); // Mark that this JavaThread's call to MacroAssembler::fast_lock() worked movptr(Address(r15_thread, JavaThread::dcubed_C2_fast_lock_result_offset()), (int32_t)DCUBED_C2_FAST_LOCK_WORKED); bind(MY_DONE_FAILED); #endif That 'jccb' is jumping over one 'movptr' instruction. That should not be too far...
18-03-2016

Re: As experiment replace all jccb and jmpb in MacroAssembler::fast_lock() and MacroAssembler::fast_unlock(). I've had to do that with some of the experiments that I've been doing. C2 has been kind enough to assert/guarantee for me when I use a 'jccb' or 'jmpb' and 'jcc' or 'jmp' are needed.
18-03-2016

As experiment replace all jccb and jmpb in MacroAssembler::fast_lock() and MacroAssembler::fast_unlock().
18-03-2016

As I suspected next instruction is short branch but distance is big so offset is wrong: 0xfffffd7ff2537b65: jne 0xfffffd7ff2537b72 [ 0xfffffd7ff2537b72, .+0xd ] Is it your new code?: 0xfffffd7ff2537b65: jne 0xfffffd7ff2537b72 // NNN8: jccb(Assembler::notZero, MY_DONE_FAILED); use 'jcc' instead of 'jccb'
18-03-2016

Hi Vladimir, I figured these recent updates would catch your eye. This block of binary code: 0xfffffd7ff2537b52: movq %rax,%r10 0xfffffd7ff2537b55: xorq %rax,%rax 0xfffffd7ff2537b58: lock cmpxchgq %r15,0x000000000000007e(%r10) 0xfffffd7ff2537b5e: movq $0x0000000000000003,(%rbx) 0xfffffd7ff2537b65: jne 0xfffffd7ff2537b72 [ 0xfffffd7ff2537b72, .+0xd ] 0xfffffd7ff2537b67: movq $0x0000000042424242,0x00000000000003e4(%r15) 0xfffffd7ff2537b72: je 0xfffffd7ff253741f [ 0xfffffd7ff253741f, .-0x753 ] 0xfffffd7ff2537b78: leaq 0x0000000000000040(%rsp),%rdx 0xfffffd7ff2537b7d: nop 0xfffffd7ff2537b7f: call 0xfffffd7feab36ce0 [ 0xfffffd7feab36ce0, .-0x7a00e9f ] 0xfffffd7ff2537b84: jmp 0xfffffd7ff253741f [ 0xfffffd7ff253741f, .-0x765 ] so this instruction happens: 0xfffffd7ff2537b67: movq $0x0000000042424242,0x00000000000003e4(%r15) This one should happen: 0xfffffd7ff2537b72: je 0xfffffd7ff253741f [ 0xfffffd7ff253741f, .-0x753 ] but we do these instead: 0xfffffd7ff2537b78: leaq 0x0000000000000040(%rsp),%rdx 0xfffffd7ff2537b7d: nop 0xfffffd7ff2537b7f: call 0xfffffd7feab36ce0 [ 0xfffffd7feab36ce0, .-0x7a00e9f ] At least that's what the tracing shows.... :-) I'm taking a close look at the UseOptoBiasInlining code and my current run hasn't failed since I added -XX:-UseOptoBiasInlining to the run. However, it hasn't been 72 hours yet so I have to wait while I read C2 ideal graph code...
18-03-2016

Dan, can you pint me to je instruction you are refering?: "Of course, I'm having a serious problem believing that the ZFlag value check done by the 'je' instruction is broken, but we keep coming back to that conclusion. " If it jccb (short branch) it could be incorrect if distance is large so it will jump to wrong place. UseOptoBiasInlining is on by default and the code generation is in PhaseMacroExpand::expand_lock_node() in opto/macro.cpp
18-03-2016

One thing that I forgot to make clear in my previous note, is that this part of MacroAssembler::fast_lock() didn't show up in the binary code dump: // it's stack-locked, biased or neutral // TODO: optimize away redundant LDs of obj->mark and improve the markword triage // order to reduce the number of conditional branches in the most common cases. // Beware -- there's a subtle invariant that fetch of the markword // at [FETCH], below, will never observe a biased encoding (*101b). // If this invariant is not held we risk exclusion (safety) failure. if (UseBiasedLocking && !UseOptoBiasInlining) { biased_locking_enter(boxReg, objReg, tmpReg, scrReg, false, DONE_LABEL, NULL, counters); } I took a quick look and UseOptoBiasInlining is enabled by default. Going to have to check my notes to see if I checked out this code before...
18-03-2016

Note: JBS munges the formatting of this note so I've attached it as eval_note2. Modified the DCUBED_JME_DEBUG code in this function: src/share/vm/runtime/thread.cpp: // add Java Monitor Enter trace points from ObjectSynchronizer code void JavaThread::add_dcubed_jme_last_trace_points(oop obj, BasicLock *lock, ObjectMonitor *mon, uint64_t trace_points) { to make this check: + // quick_enter() set this bit in this call: + // Record that we grabbed the ObjectMonitor with cmpxhg() + // jt->add_dcubed_jme_last_trace_points(obj, Lock, m, 0x000400000000L); + // + // MacroAssembler::fast_lock() set this bit before quick_enter() was called: + // Record that we returned success from fast_lock + // orptr(tracePoints, 0x00002000); + guarantee(trace_points != 0x000400000000L || + (_dcubed_jme_last_trace_points & 0x00002000L) == 0, + "fast_lock() and quick_enter() cannot both succeed!"); add_dcubed_jme_last_trace_points() is called with trace_points == 0x000400000000L for this code: src/share/vm/runtime/synchronizer.cpp: bool ObjectSynchronizer::quick_enter(oop obj, Thread * Self, BasicLock * Lock) { if (owner == NULL && Atomic::cmpxchg_ptr(Self, &(m->_owner), NULL) == NULL) { #ifdef DCUBED_JME_TRACE // Record that we grabbed the ObjectMonitor with cmpxhg() jt->add_dcubed_jme_last_trace_points(obj, Lock, m, 0x000400000000L); #endif so the 0x000400000000L flag value records when quick_enter() successfully grabs the inflated ObjectMonitor's _owner field. This code: src/cpu/x86/vm/macroAssembler_x86.cpp void MacroAssembler::fast_lock(Register objReg, Register boxReg, Register tmpReg, Register scrReg, Register cx1Reg, Register cx2Reg, BiasedLockingCounters* counters, RTMLockingCounters* rtm_counters, RTMLockingCounters* stack_rtm_counters, Metadata* method_data, bool use_rtm, bool profile_rtm) { sets the 0x00002000 flag here: // Record that we returned success from fast_lock orptr(tracePoints, 0x00002000); // save the current trace point info for objReg // Note: This trace_fast_lock() causes a crash with slowdebug bits // near the end of the test run in deoptimization code. trace_fast_lock(objReg, scrReg, tracePoints); xorptr(boxReg, boxReg); // set ICC.ZF=1 to indicate success bind(MY_DONE1); pop(tracePoints); #endif } so the 0x00002000 flag value records that MacroAssembler::fast_lock() has returned success. It should never be possible for both MacroAssembler::fast_lock() and ObjectSynchronizer::quick_enter() to succeed. quick_enter() is only called from SharedRuntime::complete_monitor_locking_C() and complete_monitor_locking_C() is only supposed to be called when MacroAssembler::fast_lock() fails. Here's the hs_err_pid stack trace from a failure of the new guarantee(): XXX Here's the dbx stack trace from a failure of the new guarantee(): (dbx) where current thread: t@89 dbx: core file read error: address 0xdf48000000000008 not in data space dbx: attempt to read frame failed -- cannot get return address [1] __lwp_kill(0x59, 0x6, 0xfffffeb4040b70c0, 0xfffffd7fff293e0e, 0xfffffd7fc07de2f0, 0x6), at 0xfffffd7fff29351a [2] _thr_kill(), at 0xfffffd7fff28be13 [3] raise(), at 0xfffffd7fff2381b9 [4] abort(), at 0xfffffd7fff216b80 =>[5] os::abort(dump_core = true, siginfo = <value unavailable>, context = <value unavailable>) (optimized), at 0xfffffd7ffe92b676 (line ~1396) in "os_solaris.cpp" [6] VMError::report_and_die(id = <value unavailable>, message = <value unavailable>, detail_fmt = <value unavailable>, detail_args = <value unavailable>, thread = <value unavailable>, pc = <value unavailable>, siginfo = (nil), context = (nil), filename = 0xfffffd7ffef15aa0 "/work/shared/bug_hunt/8077392_for_jdk9_hs_rt/hotspot/src/share/vm/runtime/thread.cpp", lineno = 4894, size = 0) (optimized), at 0xfffffd7ffebde3e1 (line ~1152) in "vmError.cpp" [7] VMError::report_and_die(thread = <value unavailable>, filename = <value unavailable>, lineno = <value unavailable>, message = <value unavailable>, detail_fmt = <value unavailable>, detail_args = <value unavailable>) (optimized), at 0xfffffd7ffebdd4af (line ~931) in "vmError.cpp" [8] report_vm_error(file = 0xfffffd7ffef15aa0 "/work/shared/bug_hunt/8077392_for_jdk9_hs_rt/hotspot/src/share/vm/runtime/thread.cpp", line = 4894, error_msg = 0xfffffd7ffef15a30 "guarantee(trace_points != 0x000400000000L || (_dcubed_jme_last_trace_points & 0x00002000L) == 0) failed", detail_fmt = 0xfffffd7ffef159f0 "fast_lock() and quick_enter() cannot both succeed!", ...) (optimized), at 0xfffffd7ffe2cd948 (line ~218) in "debug.cpp" [9] JavaThread::add_dcubed_jme_last_trace_points(this = <value unavailable>, obj = <value unavailable>, lock = <value unavailable>, mon = <value unavailable>, trace_points = <value unavailable>) (optimized), at 0xfffffd7ffeb06fb1 (line ~4894) in "thread.cpp" [10] ObjectSynchronizer::quick_enter(obj = 0xfffffd7be71fdf48, Self = 0x1e88800, Lock = 0xfffffd7fc07de860) (optimized), at 0xfffffd7ffeaaec71 (line ~268) in "synchronizer.cpp" [11] SharedRuntime::complete_monitor_locking_C(_obj = 0xfffffd7be71fdf48, lock = 0xfffffd7fc07de860, thread = 0x1e88800) (optimized), at 0xfffffd7ffea1c1c8 (line ~1888) in "sharedRuntime.cpp" [12] 0xfffffd7feab35408(), at 0xfffffd7feab35408 [13] 0xfffffd7feab35408(), at 0xfffffd7feab35408 [14] 0xfffffd7ff252337c(), at 0xfffffd7ff252337c [15] 0x8(), at 0x8 Not sure why frame 12 and 13 are the same address info. So let's take a look at the code from frame 12/13 that got us to SharedRuntime::complete_monitor_locking_C(): (dbx) x 0xfffffd7ff2523377,0xfffffd7ff252337c/i 0xfffffd7ff2523377: call 0xfffffd7feab353e0 [ 0xfffffd7feab353e0, .-0x79edf97 ] 0xfffffd7ff252337c: jmp 0xfffffd7ff2522843 [ 0xfffffd7ff2522843, .-0xb39 ] So frame 14 called 0xfffffd7feab353e0 which is really close to our frame 12/13 address: (dbx) x 0xfffffd7feab353e0,0xfffffd7feab35408/i 0xfffffd7feab353e0: subq $0x0000000000000008,%rsp 0xfffffd7feab353e7: movq %rbp,(%rsp) 0xfffffd7feab353eb: movq %rsp,0x00000000000001d0(%r15) 0xfffffd7feab353f2: movq %rsi,%rdi 0xfffffd7feab353f5: movq %rdx,%rsi 0xfffffd7feab353f8: movq %r15,%rdx 0xfffffd7feab353fb: movq $complete_monitor_locking_C,%r10 0xfffffd7feab35405: call *%r10d 0xfffffd7feab35408: movq $0x0000000000000000,0x00000000000001d0(%r15) so the code from frame 12/13 is pretty much marshalling code for calling complete_monitor_locking_C which has this signature: // Handles the uncommon case in locking, i.e., contention or an inflated lock. JRT_BLOCK_ENTRY(void, SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* lock, JavaThread* thread)) subq $0x0000000000000008,%rsp // make space on the stack movq %rbp,(%rsp) // save %rbp on the stack movq %rsp,0x00000000000001d0(%r15) // save %rsp in a field in %r15 (thread) movq %rsi,%rdi // guessing this is _obj param movq %rdx,%rsi // guessing this is lock param movq %r15,%rdx // this is thread param movq $complete_monitor_locking_C,%r10 call *%r10d // call complete_monitor_locking_C // zero the field in %r15 (thread) movq $0x0000000000000000,0x00000000000001d0(%r15) So here's the regs from frame 12/13: (dbx) regs current thread: t@89 current frame: [13] r15 0x0000000000000000 r14 0x0000000000000000 r13 0x0000000000000000 r12 0x0000000000000000 r11 0x0000000000000000 r10 0x0000000000000000 r9 0x0000000000000000 r8 0x0000000000000000 rdi 0x0000000000000000 rsi 0x0000000000000000 rbp 0xfffffd7be71fdf48 rbx 0x0000000000000000 rdx 0x0000000000000000 rcx 0x0000000000000000 rax 0x0000000000000000 trapno 0x0000000000000000 err 0x0000000000000000 rip 0xfffffd7feab35408:0xfffffd7feab35408 movq $0x0000000000000000,0x00000000000001d0(%r15) cs 0x0000000000000000 eflags 0x0000000000000000 rsp 0x0000000000000000 ss 0x0000000000000000 fs 0x0000000000000000 gs 0x0000000000000000 es 0x0000000000000000 ds 0x0000000000000000 fsbase 0x0000000000000000 gsbase 0x0000000000000000 (dbx) x 0xfffffd7be71fdf48/X 0xfffffd7be71fdf48: 0x02c6ab82 and here's the regs from frame 14: (dbx) regs current thread: t@89 current frame: [14] r15 0x0000000000000000 r14 0x0000000000000000 r13 0x0000000000000000 r12 0x0000000000000000 r11 0x0000000000000000 r10 0x0000000000000000 r9 0x0000000000000000 r8 0x0000000000000000 rdi 0x0000000000000000 rsi 0x0000000000000000 rbp 0x0000000002c6ab82 rbx 0x0000000000000000 rdx 0x0000000000000000 rcx 0x0000000000000000 rax 0x0000000000000000 trapno 0x0000000000000000 err 0x0000000000000000 rip 0xfffffd7ff252337c:0xfffffd7ff252337c jmp 0xfffffd7ff2522843 [ 0xfffffd7ff2522843, .-0xb39 ] cs 0x0000000000000000 eflags 0x0000000000000000 rsp 0x0000000000000000 ss 0x0000000000000000 fs 0x0000000000000000 gs 0x0000000000000000 es 0x0000000000000000 ds 0x0000000000000000 fsbase 0x0000000000000000 gsbase 0x0000000000000000 (dbx) x 0x0000000002c6ab82/X dbx: warning: unknown language, 'c' assumed 0x0000000002c6ab82: 0x00000000 The *rbp value of NULL explains why the dbx stack trace stops at frame 14. So without a valid frame 15, it's hard to know where we go into the code in frame 14. For now, I'm dumping this big section: (dbx) x 0xfffffd7ff2523200,0xfffffd7ff252337c/i 0xfffffd7ff2523200: popq %rbp 0xfffffd7ff2523201: .byte 0xff [unknown opcode] 0xfffffd7ff2523202: .byte 0xff [unknown opcode] 0xfffffd7ff2523203: decl 0x0000000000000054(%rbx,%rcx,4) 0xfffffd7ff2523207: andb $0x0000000000000018,%al 0xfffffd7ff2523209: movq %r10,(%rsp) 0xfffffd7ff252320d: movq 0x0000000000000030(%rsp),%r10 0xfffffd7ff2523212: movq %r10,0x0000000000000018(%rsp) 0xfffffd7ff2523217: movl %ebx,0x0000000000000010(%rsp) 0xfffffd7ff252321b: movl %r13d,0x0000000000000024(%rsp) 0xfffffd7ff2523220: movl %r8d,0x0000000000000040(%rsp) 0xfffffd7ff2523225: movl %r11d,0x0000000000000044(%rsp) 0xfffffd7ff252322a: nop 0xfffffd7ff252322b: call 0xfffffd7fea847b60 [ 0xfffffd7fea847b60, .-0x7cdb6cb ] 0xfffffd7ff2523230: pushq %rax 0xfffffd7ff2523231: pushq %rdx 0xfffffd7ff2523232: pushq %rcx 0xfffffd7ff2523233: call breakpoint [ 0xfffffd7ffe929af0, .+0xc4068bd ] 0xfffffd7ff2523238: popq %rcx 0xfffffd7ff2523239: popq %rdx 0xfffffd7ff252323a: popq %rax 0xfffffd7ff252323b: movq %r15,%rsi 0xfffffd7ff252323e: movq $g1_wb_pre,%r10 0xfffffd7ff2523248: call *%r10d 0xfffffd7ff252324b: jmp 0xfffffd7ff25225dc [ 0xfffffd7ff25225dc, .-0xc6f ] 0xfffffd7ff2523250: lock cmpxchgq %r10,0x0000000000000000(%rbp) 0xfffffd7ff2523256: leaq 0x0000000000000050(%rsp),%rbx 0xfffffd7ff252325b: pushq %rdx 0xfffffd7ff252325c: xorq %rdx,%rdx 0xfffffd7ff252325f: orq $0x0000000000000002,%rdx 0xfffffd7ff2523263: orq $0x0000000000000020,%rdx 0xfffffd7ff2523267: xorq %r10,%r10 0xfffffd7ff252326a: orq $0x0000000000000040,%rdx 0xfffffd7ff252326e: xorq %r10,%r10 0xfffffd7ff2523271: movq 0x0000000000000000(%rbp),%rax 0xfffffd7ff2523275: testq $0x0000000000000002,%rax 0xfffffd7ff252327b: jne 0xfffffd7ff25232c9 [ 0xfffffd7ff25232c9, .+0x4e ] 0xfffffd7ff252327d: orq $0x0000000000000080,%rdx 0xfffffd7ff2523284: orq $0x0000000000000001,%rax 0xfffffd7ff2523288: movq %rax,(%rbx) 0xfffffd7ff252328b: lock cmpxchgq %rbx,0x0000000000000000(%rbp) 0xfffffd7ff2523291: je 0xfffffd7ff25232e3 [ 0xfffffd7ff25232e3, .+0x52 ] 0xfffffd7ff2523297: orq $0x0000000000000100,%rdx 0xfffffd7ff252329e: subq %rsp,%rax 0xfffffd7ff25232a1: andq $0xfffffffffffff007,%rax 0xfffffd7ff25232a8: movq %rax,(%rbx) 0xfffffd7ff25232ab: je 0xfffffd7ff25232ba [ 0xfffffd7ff25232ba, .+0xf ] 0xfffffd7ff25232ad: orq $0x0000000000000200,%rdx 0xfffffd7ff25232b4: cmpq $0x0000000000000000,%rsp 0xfffffd7ff25232b8: jmp 0xfffffd7ff25232c4 [ 0xfffffd7ff25232c4, .+0xc ] 0xfffffd7ff25232ba: orq $0x0000000000000400,%rdx 0xfffffd7ff25232c1: xorq %rbx,%rbx 0xfffffd7ff25232c4: jmp 0xfffffd7ff25232e3 [ 0xfffffd7ff25232e3, .+0x1f ] 0xfffffd7ff25232c9: orq $0x0000000000000800,%rdx 0xfffffd7ff25232d0: movq %rax,%r10 0xfffffd7ff25232d3: xorq %rax,%rax 0xfffffd7ff25232d6: lock cmpxchgq %r15,0x000000000000007e(%r10) 0xfffffd7ff25232dc: movq $0x0000000000000003,(%rbx) 0xfffffd7ff25232e3: je 0xfffffd7ff2523329 [ 0xfffffd7ff2523329, .+0x46 ] 0xfffffd7ff25232e5: orq $0x0000000000001000,%rdx 0xfffffd7ff25232ec: pushq %rbp 0xfffffd7ff25232ed: pushq %r10 0xfffffd7ff25232ef: pushq %rdx 0xfffffd7ff25232f0: movq %rdx,%rcx 0xfffffd7ff25232f3: movq %r10,%rdx 0xfffffd7ff25232f6: movq %rbp,%rsi 0xfffffd7ff25232f9: movq %r15,%rdi 0xfffffd7ff25232fc: testl $0x000000000000000f,%esp 0xfffffd7ff2523302: je 0xfffffd7ff252331a [ 0xfffffd7ff252331a, .+0x18 ] 0xfffffd7ff2523308: subq $0x0000000000000008,%rsp 0xfffffd7ff252330c: call trace_fast_lock [ 0xfffffd7ffea1c690, .+0xc4f9384 ] 0xfffffd7ff2523311: addq $0x0000000000000008,%rsp 0xfffffd7ff2523315: jmp 0xfffffd7ff252331f [ 0xfffffd7ff252331f, .+0xa ] 0xfffffd7ff252331a: call trace_fast_lock [ 0xfffffd7ffea1c690, .+0xc4f9376 ] 0xfffffd7ff252331f: popq %rdx 0xfffffd7ff2523320: popq %r10 0xfffffd7ff2523322: popq %rbp 0xfffffd7ff2523323: cmpq $0x0000000000000000,%rsp 0xfffffd7ff2523327: jmp 0xfffffd7ff252336a [ 0xfffffd7ff252336a, .+0x43 ] 0xfffffd7ff2523329: orq $0x0000000000002000,%rdx 0xfffffd7ff2523330: pushq %rbp 0xfffffd7ff2523331: pushq %r10 0xfffffd7ff2523333: pushq %rdx 0xfffffd7ff2523334: movq %rdx,%rcx 0xfffffd7ff2523337: movq %r10,%rdx 0xfffffd7ff252333a: movq %rbp,%rsi 0xfffffd7ff252333d: movq %r15,%rdi 0xfffffd7ff2523340: testl $0x000000000000000f,%esp 0xfffffd7ff2523346: je 0xfffffd7ff252335e [ 0xfffffd7ff252335e, .+0x18 ] 0xfffffd7ff252334c: subq $0x0000000000000008,%rsp 0xfffffd7ff2523350: call trace_fast_lock [ 0xfffffd7ffea1c690, .+0xc4f9340 ] 0xfffffd7ff2523355: addq $0x0000000000000008,%rsp 0xfffffd7ff2523359: jmp 0xfffffd7ff2523363 [ 0xfffffd7ff2523363, .+0xa ] 0xfffffd7ff252335e: call trace_fast_lock [ 0xfffffd7ffea1c690, .+0xc4f9332 ] 0xfffffd7ff2523363: popq %rdx 0xfffffd7ff2523364: popq %r10 0xfffffd7ff2523366: popq %rbp 0xfffffd7ff2523367: xorq %rbx,%rbx 0xfffffd7ff252336a: popq %rdx 0xfffffd7ff252336b: je 0xfffffd7ff2522843 [ 0xfffffd7ff2522843, .-0xb28 ] 0xfffffd7ff2523371: leaq 0x0000000000000050(%rsp),%rdx 0xfffffd7ff2523376: nop 0xfffffd7ff2523377: call 0xfffffd7feab353e0 [ 0xfffffd7feab353e0, .-0x79edf97 ] 0xfffffd7ff252337c: jmp 0xfffffd7ff2522843 [ 0xfffffd7ff2522843, .-0xb39 ] First thing I've noticed is the trace_fast_lock calls which happens to be some of my tracing code for this bug. I don't quite understand why there are four calls to it, but we'll get there... Let's consider this part of src/cpu/x86/vm/macroAssembler_x86.cpp: fast_lock(): (I've elided some of the code that's not included in the current config, e.g. no DCUBED_OME_DEBUG and no RTM.) #else // _LP64 #ifdef DCUBED_JME_TRACE // Record that we're in the inflated block orptr(tracePoints, 0x00000800); #endif // It's inflated movq(scrReg, tmpReg); xorq(tmpReg, tmpReg); if (os::is_MP()) { lock(); } cmpxchgptr(r15_thread, Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); // Unconditionally set box->_displaced_header = markOopDesc::unused_mark(). // Without cast to int32_t movptr will destroy r10 which is typically obj. movptr(Address(boxReg, 0), (int32_t)intptr_t(markOopDesc::unused_mark())); // Intentional fall-through into DONE_LABEL ... // Propagate ICC.ZF from CAS above into DONE_LABEL. #endif // _LP64 // DONE_LABEL is a hot target - we'd really like to place it at the // start of cache line by padding with NOPs. // See the AMD and Intel software optimization manuals for the // most efficient "long" NOP encodings. // Unfortunately none of our alignment mechanisms suffice. bind(DONE_LABEL); // At DONE_LABEL the icc ZFlag is set as follows ... // Fast_Unlock uses the same protocol. // ZFlag == 1 -> Success // ZFlag == 0 -> Failure - force control through the slow-path } #ifdef DCUBED_JME_TRACE Label MY_DONE0, MY_DONE1; // if current state is success, then preserve that jccb(Assembler::zero, MY_DONE0); // Record that we returned failure from fast_lock orptr(tracePoints, 0x00001000); // save the current trace point info for objReg trace_fast_lock(objReg, scrReg, tracePoints); cmpptr(rsp, 0); // set ICC.ZF=0 to indicate failure jmpb(MY_DONE1); bind(MY_DONE0); // Record that we returned success from fast_lock orptr(tracePoints, 0x00002000); // save the current trace point info for objReg // Note: This trace_fast_lock() causes a crash with slowdebug bits // near the end of the test run in deoptimization code. trace_fast_lock(objReg, scrReg, tracePoints); xorptr(boxReg, boxReg); // set ICC.ZF=1 to indicate success bind(MY_DONE1); pop(tracePoints); #endif } The above macroAssembler_x86.cpp: fast_lock() code maps to this code in memory: (trimming off the addresses in brackets and going wide here for annotations after the instructions) 0xfffffd7ff25232c9: orq $0x0000000000000800,%rdx // orptr(tracePoints, 0x00000800); 0xfffffd7ff25232d0: movq %rax,%r10 // movq(scrReg, tmpReg); 0xfffffd7ff25232d3: xorq %rax,%rax // xorq(tmpReg, tmpReg); // if (os::is_MP()) { // lock(); // } 0xfffffd7ff25232d6: lock cmpxchgq %r15,0x000000000000007e(%r10) // cmpxchgptr(r15_thread, Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); 0xfffffd7ff25232dc: movq $0x0000000000000003,(%rbx) // movptr(Address(boxReg, 0), (int32_t)intptr_t(markOopDesc::unused_mark())); // if the cmpxchgptr worked we branch to MY_DONE0, otherwise... 0xfffffd7ff25232e3: je 0xfffffd7ff2523329 // jccb(Assembler::zero, MY_DONE0); // Record that we returned failure from fast_lock 0xfffffd7ff25232e5: orq $0x0000000000001000,%rdx // orptr(tracePoints, 0x00001000); // begin MacroAssembler::trace_fast_lock(): 0xfffffd7ff25232ec: pushq %rbp // push(objReg); // save/restore across call_VM 0xfffffd7ff25232ed: pushq %r10 // push(omReg); 0xfffffd7ff25232ef: pushq %rdx // push(tracePoints); 0xfffffd7ff25232f0: movq %rdx,%rcx // pass_arg3(this, tracePoints); 0xfffffd7ff25232f3: movq %r10,%rdx // pass_arg2(this, omReg); 0xfffffd7ff25232f6: movq %rbp,%rsi // pass_arg1(this, objReg); 0xfffffd7ff25232f9: movq %r15,%rdi // pass_arg0(this, r15_thread); // begin MacroAssembler::call_VM_leaf_base() 0xfffffd7ff25232fc: testl $0x000000000000000f,%esp 0xfffffd7ff2523302: je 0xfffffd7ff252331a 0xfffffd7ff2523308: subq $0x0000000000000008,%rsp // make stack space for the call 0xfffffd7ff252330c: call trace_fast_lock // make the call 0xfffffd7ff2523311: addq $0x0000000000000008,%rsp // take back the stack space 0xfffffd7ff2523315: jmp 0xfffffd7ff252331f 0xfffffd7ff252331a: call trace_fast_lock // make the call without extra stack space // end MacroAssembler::call_VM_leaf_base() 0xfffffd7ff252331f: popq %rdx // pop(tracePoints); 0xfffffd7ff2523320: popq %r10 // pop(omReg); 0xfffffd7ff2523322: popq %rbp // pop(objReg); // end MacroAssembler::trace_fast_lock() 0xfffffd7ff2523323: cmpq $0x0000000000000000,%rsp // cmpptr(rsp, 0); // set ICC.ZF=0 to indicate failure 0xfffffd7ff2523327: jmp 0xfffffd7ff252336a // jmpb(MY_DONE1); // Record that we returned success from fast_lock 0xfffffd7ff2523329: orq $0x0000000000002000,%rdx // orptr(tracePoints, 0x00002000); // begin MacroAssembler::trace_fast_lock(): 0xfffffd7ff2523330: pushq %rbp // push(objReg); 0xfffffd7ff2523331: pushq %r10 // push(omReg); 0xfffffd7ff2523333: pushq %rdx // push(tracePoints); 0xfffffd7ff2523334: movq %rdx,%rcx // pass_arg3(this, tracePoints); 0xfffffd7ff2523337: movq %r10,%rdx // pass_arg2(this, omReg); 0xfffffd7ff252333a: movq %rbp,%rsi // pass_arg1(this, objReg); 0xfffffd7ff252333d: movq %r15,%rdi // pass_arg0(this, r15_thread); // begin MacroAssembler::call_VM_leaf_base() 0xfffffd7ff2523340: testl $0x000000000000000f,%esp 0xfffffd7ff2523346: je 0xfffffd7ff252335e 0xfffffd7ff252334c: subq $0x0000000000000008,%rsp // make stack space for the call 0xfffffd7ff2523350: call trace_fast_lock // make the call 0xfffffd7ff2523355: addq $0x0000000000000008,%rsp // take back the stack space 0xfffffd7ff2523359: jmp 0xfffffd7ff2523363 0xfffffd7ff252335e: call trace_fast_lock // make the call without extra stack space // end MacroAssembler::call_VM_leaf_base() 0xfffffd7ff2523363: popq %rdx // pop(tracePoints); 0xfffffd7ff2523364: popq %r10 // pop(omReg); 0xfffffd7ff2523366: popq %rbp // pop(objReg); // end MacroAssembler::trace_fast_lock() 0xfffffd7ff2523367: xorq %rbx,%rbx // xorptr(boxReg, boxReg); // set ICC.ZF=1 to indicate success 0xfffffd7ff252336a: popq %rdx // pop(tracePoints); // end MacroAssembler::fast_lock() // Now we're in C2 code that checks the results of the // fast_lock() call and calls complete_monitor_locking_C() // if ICC.ZF=0 (failure) 0xfffffd7ff252336b: je 0xfffffd7ff2522843 // if ICC.ZF=1 we are done 0xfffffd7ff2523371: leaq 0x0000000000000050(%rsp),%rdx 0xfffffd7ff2523376: nop 0xfffffd7ff2523377: call 0xfffffd7feab353e0 // calls the code in frame 12/13 that calls complete_monitor_locking_C() 0xfffffd7ff252337c: jmp 0xfffffd7ff2522843 // we are done Rewinding back to the typical trace bits that we see for this failure: dcubed_jme_last_trace_points=0x0000000500002862 There are two bits of particular interest: 0x000000002000 0x000400000000 0x000000002000 marks that fast_lock()'s cmpxchgq worked. This line from memory: 0xfffffd7ff2523329: orq $0x0000000000002000,%rdx // orptr(tracePoints, 0x00002000); 0x000400000000 marks that quick_enter()'s Atomic::cmpxchg_ptr() worked. That's the code called by this line from memory: 0xfffffd7ff2523377: call 0xfffffd7feab353e0 // calls the code in frame 12/13 that calls complete_monitor_locking_C() The setting of 0x000000002000 and the call to complete_monitor_locking_C() are on different code paths and I don't see anything in the "success" code path that could accidentally lead to the "failure" code path. Here's the "success" code path: 0xfffffd7ff2523329: orq $0x0000000000002000,%rdx // orptr(tracePoints, 0x00002000); <code to call trace_fast_lock which stores the code path flags> 0xfffffd7ff2523367: xorq %rbx,%rbx // xorptr(boxReg, boxReg); // set ICC.ZF=1 to indicate success 0xfffffd7ff252336a: popq %rdx // pop(tracePoints); 0xfffffd7ff252336b: je 0xfffffd7ff2522843 // if ICC.ZF=1 we are done After the 'orq' sets 0x0000000000002000, we have code to call trace_fast_lock which could change ICC.ZF, but "xorq %rbx,%rbx" zeros the rbx register and resets the ICC.ZF=1 state. The "popq %rdx" is housekeeping that does not change the ICC.ZF value so that leads us to "je 0xfffffd7ff2522843" which is the branch around the call to complete_monitor_locking_C() because we are done. So it looks to me like this code path doesn't have any holes that can explain how both 0x000000002000 and 0x000400000000 are set in our tracing flags. Time to see if there's another way for an errant ObjectSynchronizer::quick_enter() call to be made.
18-03-2016

Note: JBS munges the formatting of this note so I've attached it as eval_note4. Extracting the debug/trace code under DCUBED_C2_FAST_LOCK_DEBUG so there is a simpler case that can be discussed with other folks. The key piece is here: src/share/vm/runtime/sharedRuntime.cpp > @@ -1880,6 +1880,39 @@ JRT_END > > // Handles the uncommon case in locking, i.e., contention or an inflated lock. > JRT_BLOCK_ENTRY(void, SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* lock, JavaThread* thread)) > +#ifdef DCUBED_C2_FAST_LOCK_DEBUG > +#if 0 > +{ > + int dcubed_C2_fast_lock_result = thread->dcubed_C2_fast_lock_result(); > + static FILE * fp = NULL; > + ThreadCritical tc; > + > + if (fp == NULL) { > + fp = fopen("dcubed.debug.out", "w"); > + guarantee(fp != NULL, "cannot create dcubed.debug.out"); > + } > + fprintf(fp, "0x%x\n", dcubed_C2_fast_lock_result); > + fflush(fp); > +} > +#endif > + // SharedRuntime::complete_monitor_locking_C() is only supposed to be > + // called when MacroAssembler::fast_lock() fails. > + if (thread->dcubed_C2_fast_lock_worked()) { > + tty->print_cr("WARNING: errant call to " > + "SharedRuntime::complete_monitor_locking_C() after " > + "MacroAssembler::fast_lock() worked: _obj=" INTPTR_FORMAT > + ", lock=" INTPTR_FORMAT ", thread=" INTPTR_FORMAT > + ", dcubed_C2_fast_lock_result=0x%x", > + _obj, lock, thread, thread->dcubed_C2_fast_lock_result()); > + if (VerifyC2FastLockAndCompleteMLCMatch) { > + fatal("SharedRuntime::complete_monitor_locking_C() should not be " > + "called since MacroAssembler::fast_lock() worked."); > + } > + if (FixC2FastLockAndCompleteMLCMatch) { > + return; > + } > + } > +#endif > // Disable ObjectSynchronizer::quick_enter() in default config > // until JDK-8077392 is resolved. > if ((SyncFlags & 256) != 0 && !SafepointSynchronize::is_synchronizing()) { The "#if 0" part of the code was used to validate the JavaThread::_dcubed_C2_fast_lock_result values seen during a single run of the failing test. In src/share/vm/runtime/thread.hpp: #ifdef DCUBED_C2_FAST_LOCK_DEBUG #define DCUBED_C2_FAST_LOCK_CALLED 0xC2464CC2 /* C2 'F' 'L' C2 */ #define DCUBED_C2_FAST_LOCK_WORKED 0x42424242 #define DCUBED_C2_SYNC_METHOD_CALLED 0xC2534DC2 /* C2 'S' 'M' C2 */ $ sort dcubed.debug.out | uniq -c 232 0x42424242 4172 0xc2464cc2 so most of the slow-path calls saw the expected DCUBED_C2_FAST_LOCK_CALLED (0xC2464CC2), but a couple of hundred saw DCUBED_C2_FAST_LOCK_WORKED (0x42424242). This is much, much higher than the 2-3 failures over 72 hours that is normal for this bug so something is not quite right with the new DCUBED_C2_FAST_LOCK_DEBUG code. Here's the hs_err_pid (doit.copyB2.hs_err_pid.0) snippets from a failure of the new fatal(): # Internal Error (sharedRuntime.cpp:1909), pid=20200, tid=106 # fatal error: SharedRuntime::complete_monitor_locking_C() should not be called since MacroAssembler::fast_lock() worked. # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_03_16_19_15-b00) <snip> --------------- T H R E A D --------------- Current thread (0x0000000003eee000): JavaThread "ForkJoinPool.commonPool-worker-10" daemon [_thread_in_Java, id=106, stack(0xfffffd7fbf5ce000,0xfffffd7fbf6ce000)] Stack: [0xfffffd7fbf5ce000,0xfffffd7fbf6ce000], sp=0xfffffd7fbf6caa30, free space=1010k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x13d72f9] void VMError::report(outputStream*,bool)+0xd59 V [libjvm.so+0x13d8bb6] void VMError::report_and_die(int,const char*,const char*,__va_list_element*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned long)+0x596 V [libjvm.so+0x13d85bf] void VMError::report_and_die(Thread*,const char*,int,const char*,const char*,__va_list_element*)+0x3f V [libjvm.so+0xacd40b] void report_fatal(const char*,int,const char*,...)+0xdb V [libjvm.so+0x12180f5] void SharedRuntime::complete_monitor_locking_C(oopDesc*,BasicLock*,JavaThread*)+0x545 v ~RuntimeStub::_complete_monitor_locking_Java J 1638 C2 java.util.stream.Nodes$SizedCollectorTask.compute()V (132 bytes) @ 0xfffffd7ff2537b84 [0xfffffd7ff2536a60+0x0000000000001124] J 1229 C2 java.util.concurrent.CountedCompleter.exec()Z (6 bytes) @ 0xfffffd7ff24638ac [0xfffffd7ff2463860+0x000000000000004c] J 1509 C2 java.util.concurrent.ForkJoinPool$WorkQueue.localPopAndExec()V (115 bytes) @ 0xfffffd7ff24d52d8 [0xfffffd7ff24d5120+0x00000000000001b8] J 1636% C2 java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V (139 bytes) @ 0xfffffd7ff25349dc [0xfffffd7ff25344a0+0x000000000000053c] j java.util.concurrent.ForkJoinWorkerThread.run()V+24 v ~StubRoutines::call_stub V [libjvm.so+0xd0f4bf] void JavaCalls::call_helper(JavaValue*,const methodHandle&,JavaCallArguments*,Thread*)+0x42f V [libjvm.so+0xd0e036] void JavaCalls::call_virtual(JavaValue*,KlassHandle,Symbol*,Symbol*,JavaCallArguments*,Thread*)+0x296 V [libjvm.so+0xd0e278] void JavaCalls::call_virtual(JavaValue*,Handle,KlassHandle,Symbol*,Symbol*,Thread*)+0x68 V [libjvm.so+0xdecb7e] void thread_entry(JavaThread*,Thread*)+0xbe V [libjvm.so+0x12f93f1] void JavaThread::thread_main_inner()+0xf1 V [libjvm.so+0x12f92e2] void JavaThread::run()+0x232 V [libjvm.so+0x1125cb0] java_start+0x230 C [libc.so.1+0xdd9db] _thr_setup+0x5b C [libc.so.1+0xddc10] _lwp_start+0x0 C 0x0000000000000000 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_complete_monitor_locking_Java J 1638 C2 java.util.stream.Nodes$SizedCollectorTask.compute()V (132 bytes) @ 0xfffffd7ff2537b84 [0xfffffd7ff2536a60+0x0000000000001124] J 1229 C2 java.util.concurrent.CountedCompleter.exec()Z (6 bytes) @ 0xfffffd7ff24638ac [0xfffffd7ff2463860+0x000000000000004c] J 1509 C2 java.util.concurrent.ForkJoinPool$WorkQueue.localPopAndExec()V (115 bytes) @ 0xfffffd7ff24d52d8 [0xfffffd7ff24d5120+0x00000000000001b8] J 1636% C2 java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V (139 bytes) @ 0xfffffd7ff25349dc [0xfffffd7ff25344a0+0x000000000000053c] j java.util.concurrent.ForkJoinWorkerThread.run()V+24 v ~StubRoutines::call_stub Here's the dbx stack trace (doit.copyB2.threads.log.0) from a failure of the new guarantee(): THREAD t@106 t@106(l@106) stopped in __lwp_kill at 0xfffffd7fff29351a 0xfffffd7fff29351a: __lwp_kill+0x000a: jae __lwp_kill+0x18 [ 0xfffffd7fff293528, .+0xe ] current thread: t@106 [1] __lwp_kill(0x6a, 0x6, 0xfffffeb43c74fbc0, 0xfffffd7fff293e0e, 0xfffffd7fbf6cd010, 0x6), at 0xfffffd7fff29351a [2] _thr_kill(), at 0xfffffd7fff28be13 [3] raise(), at 0xfffffd7fff2381b9 [4] abort(), at 0xfffffd7fff216b80 =>[5] os::abort(dump_core = true, siginfo = <value unavailable>, context = <value unavailable>) (optimized), at 0xfffffd7ffe9274d6 (line ~1396) in "os_solaris.cpp" [6] VMError::report_and_die(id = <value unavailable>, message = <value unavailable>, detail_fmt = <value unavailable>, detail_args = <value unavailable>, thread = <value unavailable>, pc = <value unavailable>, siginfo = (nil), context = (nil), filename = 0xfffffd7ffeee9860 "/work/shared/bug_hunt/8077392_for_jdk9_hs_rt/hotspot/src/share/vm/runtime/sharedRuntime.cpp", lineno = 1909, size = 0) (optimized), at 0xfffffd7ffebd94f1 (line ~1152) in "vmError.cpp" [7] VMError::report_and_die(thread = <value unavailable>, filename = <value unavailable>, lineno = <value unavailable>, message = <value unavailable>, detail_fmt = <value unavailable>, detail_args = <value unavailable>) (optimized), at 0xfffffd7ffebd85bf (line ~931) in "vmError.cpp" [8] report_fatal(file = 0xfffffd7ffeee9860 "/work/shared/bug_hunt/8077392_for_jdk9_hs_rt/hotspot/src/share/vm/runtime/sharedRuntime.cpp", line = 1909, detail_fmt = 0xfffffd7ffeee97f0 "SharedRuntime::complete_monitor_locking_C() should notbe called since MacroAssembler::fast_lock() worked.", ...) (optimized), at 0xfffffd7ffe2cd40b (line ~227) in "debug.cpp" [9] SharedRuntime::complete_monitor_locking_C(_obj = 0xfffffd7bf79a9d60, lock = 0xfffffd7fbf6cd530, thread = 0x3eee000) (optimized), at 0xfffffd7ffea180f5 (line ~1909) in "sharedRuntime.cpp" [10] 0xfffffd7feab36d08(), at 0xfffffd7feab36d08 [11] 0xfffffd7feab36d08(), at 0xfffffd7feab36d08 [12] 0xfffffd7ff2537b84(), at 0xfffffd7ff2537b84 Current function is Parker::park (optimized) 228 static int cond_wait(cond_t *cv, mutex_t *mx) { return _cond_wait(cv, mx); } Not sure why frame 10 and 11 are the same address info. So let's take a look at the code from frame 10/11 that got us to SharedRuntime::complete_monitor_locking_C(): (dbx) x 0xfffffd7ff2537b7f,0xfffffd7ff2537b84/i 0xfffffd7ff2537b7f: call 0xfffffd7feab36ce0 [ 0xfffffd7feab36ce0, .-0x7a00e9f ] 0xfffffd7ff2537b84: jmp 0xfffffd7ff253741f [ 0xfffffd7ff253741f, .-0x765 ] So frame 12 called 0xfffffd7feab36ce0 which is really close to our frame 10/11 address: (dbx) x 0xfffffd7feab36ce0,0xfffffd7feab36d08/i 0xfffffd7feab36ce0: subq $0x0000000000000008,%rsp 0xfffffd7feab36ce7: movq %rbp,(%rsp) 0xfffffd7feab36ceb: movq %rsp,0x00000000000001d0(%r15) 0xfffffd7feab36cf2: movq %rsi,%rdi 0xfffffd7feab36cf5: movq %rdx,%rsi 0xfffffd7feab36cf8: movq %r15,%rdx 0xfffffd7feab36cfb: movq $complete_monitor_locking_C,%r10 0xfffffd7feab36d05: call *%r10d 0xfffffd7feab36d08: movq $0x0000000000000000,0x00000000000001d0(%r15) so the code from frame 10/11 is pretty much marshalling code for calling complete_monitor_locking_C which has this signature: // Handles the uncommon case in locking, i.e., contention or an inflated lock. JRT_BLOCK_ENTRY(void, SharedRuntime::complete_monitor_locking_C(oopDesc* _obj, BasicLock* lock, JavaThread* thread)) subq $0x0000000000000008,%rsp // make space on the stack movq %rbp,(%rsp) // save %rbp on the stack movq %rsp,0x00000000000001d0(%r15) // save %rsp in a field in %r15 (thread) movq %rsi,%rdi // guessing this is _obj param movq %rdx,%rsi // guessing this is lock param movq %r15,%rdx // this is thread param movq $complete_monitor_locking_C,%r10 call *%r10d // call complete_monitor_locking_C // zero the field in %r15 (thread) movq $0x0000000000000000,0x00000000000001d0(%r15) So here's the regs from frame 10: (dbx) regs current thread: t@106 current frame: [10] r15 0x0000000003eee000 r14 0x00000000000003e8 r13 0xfffffd7bf9100870 r12 0xfffffd7bc0000000 r11 0x0000000000000000 r10 0x0000000000000000 r9 0x0000000000000000 r8 0x0000000000000000 rdi 0x0000000000000000 rsi 0x0000000000000000 rbp 0xfffffd7fbf6cd4d0 rbx 0xfffffd7bafab90e8 rdx 0x0000000000000000 rcx 0x0000000000000000 rax 0x0000000000000000 trapno 0x0000000000000000 err 0x0000000000000000 rip 0xfffffd7feab36d08:0xfffffd7feab36d08 movq $0x0000000000000000,0x00000000000001d0(%r15) cs 0x0000000000000000 eflags 0x0000000000000000 rsp 0xfffffd7fbf6cd4a0 ss 0x0000000000000000 fs 0x0000000000000000 gs 0x0000000000000000 es 0x0000000000000000 ds 0x0000000000000000 fsbase 0x0000000000000000 gsbase 0x0000000000000000 (dbx) x 0xfffffd7fbf6cd4d0/X 0xfffffd7fbf6cd4d0: 0xf79a9d60 Here's the regs from frame 11: (dbx) regs current thread: t@106 current frame: [11] r15 0x0000000000000000 r14 0x0000000000000000 r13 0x0000000000000000 r12 0x0000000000000000 r11 0x0000000000000000 r10 0x0000000000000000 r9 0x0000000000000000 r8 0x0000000000000000 rdi 0x0000000000000000 rsi 0x0000000000000000 rbp 0xfffffd7bf79a9d60 rbx 0x0000000000000000 rdx 0x0000000000000000 rcx 0x0000000000000000 rax 0x0000000000000000 trapno 0x0000000000000000 err 0x0000000000000000 rip 0xfffffd7feab36d08:0xfffffd7feab36d08 movq $0x0000000000000000,0x00000000000001d0(%r15) cs 0x0000000000000000 eflags 0x0000000000000000 rsp 0x0000000000000000 ss 0x0000000000000000 fs 0x0000000000000000 gs 0x0000000000000000 es 0x0000000000000000 ds 0x0000000000000000 fsbase 0x0000000000000000 gsbase 0x0000000000000000 (dbx) x 0xfffffd7bf79a9d60/X 0xfffffd7bf79a9d60: 0x03a3e402 Here's the regs from frame 12: (dbx) regs current thread: t@106 current frame: [12] r15 0x0000000000000000 r14 0x0000000000000000 r13 0x0000000000000000 r12 0x0000000000000000 r11 0x0000000000000000 r10 0x0000000000000000 r9 0x0000000000000000 r8 0x0000000000000000 rdi 0x0000000000000000 rsi 0x0000000000000000 rbp 0x0000000003a3e402 rbx 0x0000000000000000 rdx 0x0000000000000000 rcx 0x0000000000000000 rax 0x0000000000000000 trapno 0x0000000000000000 err 0x0000000000000000 rip 0xfffffd7ff2537b84:0xfffffd7ff2537b84 jmp 0xfffffd7ff253741f [ 0xfffffd7ff253741f, .-0x765 ] cs 0x0000000000000000 eflags 0x0000000000000000 rsp 0x0000000000000000 ss 0x0000000000000000 fs 0x0000000000000000 gs 0x0000000000000000 es 0x0000000000000000 ds 0x0000000000000000 fsbase 0x0000000000000000 gsbase 0x0000000000000000 (dbx) x 0x0000000003a3e402/X 0x0000000003a3e402: 0x00000000 The *rbp value of NULL explains why the dbx stack trace stops at frame 12. So without a valid frame 13, it's hard to know where we go into the code in frame 12. For now, I'm dumping this big section: (dbx) x 0xfffffd7ff2537b00,0xfffffd7ff2537b84/i 0xfffffd7ff2537b00: pushq %rax 0xfffffd7ff2537b01: pushq %rdx 0xfffffd7ff2537b02: pushq %rcx 0xfffffd7ff2537b03: call breakpoint [ 0xfffffd7ffe925950, .+0xc3ede4d ] 0xfffffd7ff2537b08: popq %rcx 0xfffffd7ff2537b09: popq %rdx 0xfffffd7ff2537b0a: popq %rax 0xfffffd7ff2537b0b: lock cmpxchgq %r10,0x0000000000000000(%rbp) 0xfffffd7ff2537b11: leaq 0x0000000000000040(%rsp),%rbx 0xfffffd7ff2537b16: movq $0xffffffffc2464cc2,0x00000000000003e4(%r15) 0xfffffd7ff2537b21: movq 0x0000000000000000(%rbp),%rax 0xfffffd7ff2537b25: testq $0x0000000000000002,%rax 0xfffffd7ff2537b2b: jne 0xfffffd7ff2537b52 [ 0xfffffd7ff2537b52, .+0x27 ] 0xfffffd7ff2537b2d: orq $0x0000000000000001,%rax 0xfffffd7ff2537b31: movq %rax,(%rbx) 0xfffffd7ff2537b34: lock cmpxchgq %rbx,0x0000000000000000(%rbp) 0xfffffd7ff2537b3a: je 0xfffffd7ff2537b65 [ 0xfffffd7ff2537b65, .+0x2b ] 0xfffffd7ff2537b40: subq %rsp,%rax 0xfffffd7ff2537b43: andq $0xfffffffffffff007,%rax 0xfffffd7ff2537b4a: movq %rax,(%rbx) 0xfffffd7ff2537b4d: jmp 0xfffffd7ff2537b65 [ 0xfffffd7ff2537b65, .+0x18 ] 0xfffffd7ff2537b52: movq %rax,%r10 0xfffffd7ff2537b55: xorq %rax,%rax 0xfffffd7ff2537b58: lock cmpxchgq %r15,0x000000000000007e(%r10) 0xfffffd7ff2537b5e: movq $0x0000000000000003,(%rbx) 0xfffffd7ff2537b65: jne 0xfffffd7ff2537b72 [ 0xfffffd7ff2537b72, .+0xd ] 0xfffffd7ff2537b67: movq $0x0000000042424242,0x00000000000003e4(%r15) 0xfffffd7ff2537b72: je 0xfffffd7ff253741f [ 0xfffffd7ff253741f, .-0x753 ] 0xfffffd7ff2537b78: leaq 0x0000000000000040(%rsp),%rdx 0xfffffd7ff2537b7d: nop 0xfffffd7ff2537b7f: call 0xfffffd7feab36ce0 [ 0xfffffd7feab36ce0, .-0x7a00e9f ] 0xfffffd7ff2537b84: jmp 0xfffffd7ff253741f [ 0xfffffd7ff253741f, .-0x765 ] Start Update: These frames from the hs_err_pid stack trace are relevant: V [libjvm.so+0x12180f5] void SharedRuntime::complete_monitor_locking_C(oopDesc*,BasicLock*,JavaThread*)+0x545 v ~RuntimeStub::_complete_monitor_locking_Java J 1638 C2 java.util.stream.Nodes$SizedCollectorTask.compute()V (132 bytes) @ 0xfffffd7ff2537b84 [0xfffffd7ff2536a60+0x0000000000001124] The code for java.util.stream.Nodes$SizedCollectorTask.compute()V includes the "x 0xfffffd7ff2537b00,0xfffffd7ff2537b84/i" code block from above. In the code block from "x 0xfffffd7ff2536a60,0xfffffd7ff2537b84/i", I found a branch to "0xfffffd7ff2537b11: leaq 0x0000000000000040(%rsp),%rbx" which is the start of the parameter setup for calling fast_lock(). I did not find a branch to: "0xfffffd7ff2537b0b: lock cmpxchgq %r10,0x0000000000000000(%rbp)" so I'm not sure if it belongs to the strange "breakpoint" block or not. End Update I _think_ the "call breakpoint" section at the top is some barrier code emitted by C2 just in case the previous generated code block runs off the end. Here's the original src/cpu/x86/vm/macroAssembler_x86.cpp: fast_lock() with most comments and optional code not included in the current config elided and the DCUBED_C2_FAST_LOCK_DEBUG addition marked by "NNNN" line numbers: 1658 // obj: object to lock 1659 // box: on-stack box address (displaced header location) - KILLED 1660 // rax,: tmp -- KILLED 1661 // scr: tmp -- KILLED 1662 void MacroAssembler::fast_lock(Register objReg, Register boxReg, Register tmpReg, 1663 Register scrReg, Register cx1Reg, Register cx2Reg, 1664 BiasedLockingCounters* counters, 1665 RTMLockingCounters* rtm_counters, 1666 RTMLockingCounters* stack_rtm_counters, 1667 Metadata* method_data, 1668 bool use_rtm, bool profile_rtm) { : NNN1 #ifdef DCUBED_C2_FAST_LOCK_DEBUG NNN2 // Mark that this JavaThread called MacroAssembler::fast_lock() NNN3 movptr(Address(r15_thread, JavaThread::dcubed_C2_fast_lock_result_offset()), (int32_t)DCUBED_C2_FAST_LOCK_CALLED); NNN4 #endif : 1727 movptr(tmpReg, Address(objReg, 0)); // [FETCH] 1728 testptr(tmpReg, markOopDesc::monitor_value); // inflated vs stack-locked|neutral|biased 1729 jccb(Assembler::notZero, IsInflated); : 1732 orptr (tmpReg, markOopDesc::unlocked_value); 1733 movptr(Address(boxReg, 0), tmpReg); // Anticipate successful CAS 1734 if (os::is_MP()) { 1735 lock(); 1736 } 1737 cmpxchgptr(boxReg, Address(objReg, 0)); // Updates tmpReg : 1742 jcc(Assembler::equal, DONE_LABEL); // Success : 1747 subptr(tmpReg, rsp); : 1749 andptr(tmpReg, (int32_t) (NOT_LP64(0xFFFFF003) LP64_ONLY(7 - os::vm_page_size())) ); 1750 movptr(Address(boxReg, 0), tmpReg); : 1755 jmp(DONE_LABEL); 1756 1757 bind(IsInflated); : 1876 movq(scrReg, tmpReg); 1877 xorq(tmpReg, tmpReg); 1878 1879 if (os::is_MP()) { 1880 lock(); 1881 } 1882 cmpxchgptr(r15_thread, Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); : 1885 movptr(Address(boxReg, 0), (int32_t)intptr_t(markOopDesc::unused_mark())); : 1897 bind(DONE_LABEL); : NNN5 #ifdef DCUBED_C2_FAST_LOCK_DEBUG NNN6 Label MY_DONE_FAILED; NNN7 // if current state is failure, then there is nothing more to do NNN8 jccb(Assembler::notZero, MY_DONE_FAILED); NNN9 // Mark that this JavaThread's call to MacroAssembler::fast_lock() worked NN10 movptr(Address(r15_thread, JavaThread::dcubed_C2_fast_lock_result_offset()), (int32_t)DCUBED_C2_FAST_LOCK_WORKED); NN11 NN12 bind(MY_DONE_FAILED); NN13 #endif : 1903 } 1904 } The above macroAssembler_x86.cpp: fast_lock() code maps to this code in memory: (trimming off the addresses in brackets and going wide here for annotations after the instructions) (dbx) x 0xfffffd7ff2537b00,0xfffffd7ff2537b84/i // start: C2 barrier/guard code? 0xfffffd7ff2537b00: pushq %rax // save regs for call 0xfffffd7ff2537b01: pushq %rdx // 0xfffffd7ff2537b02: pushq %rcx // 0xfffffd7ff2537b03: call breakpoint // 0xfffffd7ff2537b08: popq %rcx // restore regs after call 0xfffffd7ff2537b09: popq %rdx // 0xfffffd7ff2537b0a: popq %rax // end: C2 barrier/guard code? // compare-and-exchange/CAS: // if ((old = 0x0000000000000000(%rbp)) == %rax) { // 0x0000000000000000(%rbp) = %r10; // } // %rax = old; 0xfffffd7ff2537b0b: lock cmpxchgq %r10,0x0000000000000000(%rbp) // cmpxchgptr(%r10, Address(objReg, 0)); // Where did the above cmpxchgq() come from? What's // in %rax for the comparison with object header? // (%rax is fast_lock's tmpReg param) // What's in %r10 for the assignment to the object // header if the 0x0000000000000000(%rbp) == %rax // compare succeeds? // (%r10 is fast_lock's scrReg param) // (fast_lock doesn't expect %rax or %r10 to contain // anything useful on input since they get overwritten) // If the random value in %rax happens to match the // object's header (0x0000000000000000(%rbp)), then // the random value in %r10 will be put into the // object's header. If random value in %r10 happens // to be an ObjectMonitor, then we're going to lock // that ObjectMonitor. // Start update: I found code in // // java.util.stream.Nodes$SizedCollectorTask.compute()V // that jumps to "0xfffffd7ff2537b11: leaq", but I did // not find code that jumps to "0xfffffd7ff2537b0b: // lock cmpxchgq" above. I don't know if the cmpxchgq // above is causing us grief or not. // // End update. 0xfffffd7ff2537b11: leaq 0x0000000000000040(%rsp),%rbx // set boxReg to BasicLock on local stack // START OF: MacroAssembler::fast_lock() // objReg == %rbp, boxReg == %rbx, tmpReg == %rax, // scrReg == %r10 0xfffffd7ff2537b16: movq $0xffffffffc2464cc2,0x00000000000003e4(%r15) // NNN3: set DCUBED_C2_FAST_LOCK_CALLED in // JavaThread::_dcubed_C2_fast_lock_result // to mark that fast_lock() was called 0xfffffd7ff2537b21: movq 0x0000000000000000(%rbp),%rax // 1727: [FETCH] (object header) 0xfffffd7ff2537b25: testq $0x0000000000000002,%rax // 1728: inflated vs stack-locked|neutral|biased 0xfffffd7ff2537b2b: jne 0xfffffd7ff2537b52 // 1729: if (inflated) then jump 0xfffffd7ff2537b2d: orq $0x0000000000000001,%rax // 1732: 'or' in markOopDesc::unlocked_value 0xfffffd7ff2537b31: movq %rax,(%rbx) // 1733: update BasicLock's saved header // compare-and-exchange/CAS: // if ((old = Address(objReg, 0)) == tmpReg) { // Address(objReg, 0) = boxReg; // } // tmpReg = old; 0xfffffd7ff2537b34: lock cmpxchgq %rbx,0x0000000000000000(%rbp) // 173[5,7]: cmpxchgptr(boxReg, Address(objReg, 0)); // if cmpxchgptr worked we are done 0xfffffd7ff2537b3a: je 0xfffffd7ff2537b65 // 1742: jcc(Assembler::equal, DONE_LABEL); // Stack locked by current thread if difference with // current SP is less than one page. 0xfffffd7ff2537b40: subq %rsp,%rax // 1747: subptr(tmpReg, rsp); // 1749: andptr(tmpReg, (int32_t) (NOT_LP64(0xFFFFF003) 0xfffffd7ff2537b43: andq $0xfffffffffffff007,%rax // LP64_ONLY(7 - os::vm_page_size()))); // always save result in BasicLock's saved header // recursive enter saves a NULL 0xfffffd7ff2537b4a: movq %rax,(%rbx) // 1750: movptr(Address(boxReg, 0), tmpReg); 0xfffffd7ff2537b4d: jmp 0xfffffd7ff2537b65 // 1755: jmp(DONE_LABEL); // 1757: bind(IsInflated); // save ObjectMonitor 0xfffffd7ff2537b52: movq %rax,%r10 // 1876: movq(scrReg, tmpReg); // get ready for NULL _owner CAS 0xfffffd7ff2537b55: xorq %rax,%rax // 1877: xorq(tmpReg, tmpReg); // compare-and-exchange/CAS: // if ((old = Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))) == tmpReg) { // Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG (owner)) = r15_thread; // } // tmpReg = old; 0xfffffd7ff2537b58: lock cmpxchgq %r15,0x000000000000007e(%r10) // 188[0,2]: cmpxchgptr(r15_thread, Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); 0xfffffd7ff2537b5e: movq $0x0000000000000003,(%rbx) // 1885: set BasicLock's saved header to markOopDesc::unused_mark // bind(DONE); 0xfffffd7ff2537b65: jne 0xfffffd7ff2537b72 // NNN8: jccb(Assembler::notZero, MY_DONE_FAILED); 0xfffffd7ff2537b67: movq $0x0000000042424242,0x00000000000003e4(%r15) // NN10: set DCUBED_C2_FAST_LOCK_WORKED in // JavaThread::_dcubed_C2_fast_lock_result // to mark that fast_lock() worked // END OF: MacroAssembler::fast_lock() 0xfffffd7ff2537b72: je 0xfffffd7ff253741f // ZFlag == 1 -> Success // ZFlag == 0 -> Failure - force control through the slow-path 0xfffffd7ff2537b78: leaq 0x0000000000000040(%rsp),%rdx // fetch BasicLock addr on local stack for // complete_monitor_locking_C() call 0xfffffd7ff2537b7d: nop // 0xfffffd7ff2537b7f: call 0xfffffd7feab36ce0 // call complete_monitor_locking_C() 0xfffffd7ff2537b84: jmp 0xfffffd7ff253741f // Start update: The above analysis of the MacroAssembler::fast_lock() code, the prologue code and the epilogue code do not reveal any believable smoking gun for this failure mode. The DCUBED_C2_FAST_LOCK_DEBUG debug/tracing code yields similar results to the DCUBED_JME_TRACE/ DCUBED_JME_DEBUG debug/tracing code: Sometimes the ZFlag check that controls the call to complete_monitor_locking_C() does not work right. We see evidence of calls to complete_monitor_locking_C() when the fast_lock() code has finished with ZFlag == 1. Just to sanity check the DCUBED_C2_FAST_LOCK_DEBUG debug/tracing code, I've done an experiment where the DCUBED_JME_TRACE/DCUBED_JME_DEBUG debug/tracing is also enabled. The key output: WARNING: errant call to SharedRuntime::complete_monitor_locking_C() after MacroAssembler::fast_lock() worked: _obj=0xfffffd7bfa2154a8, lock=0xfffffd7fbf9d07c0, thread=0x0000000000c58800, dcubed_C2_fast_lock_result=0x42424242 INFO: dcubed_jme_last_trace_points=0x0000000000002862 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (sharedRuntime.cpp:1913), pid=18294, tid=103 # fatal error: SharedRuntime::complete_monitor_locking_C() should not be called since MacroAssembler::fast_lock() worked. # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_03_18_12_26-b00) The DCUBED_C2_FAST_LOCK_DEBUG output of: WARNING: errant call to SharedRuntime::complete_monitor_locking_C() after MacroAssembler::fast_lock() worked: _obj=0xfffffd7bfa2154a8, lock=0xfffffd7fbf9d07c0, thread=0x0000000000c58800, dcubed_C2_fast_lock_result=0x42424242 # Internal Error (sharedRuntime.cpp:1913), pid=18294, tid=103 # fatal error: SharedRuntime::complete_monitor_locking_C() should not be called since MacroAssembler::fast_lock() worked. Shows that SharedRuntime::complete_monitor_locking_C() was called when MacroAssembler::fast_lock() work. The DCUBED_JME_TRACE output of: INFO: dcubed_jme_last_trace_points=0x0000000000002862 is decode as follows: // Mark that we came from MacroAssembler::fast_lock(). orptr(tracePoints, 0x00000002); // Record that we didn't take the force slow-path branch orptr(tracePoints, 0x00000020); // Record that biased_locking_enter() didn't take the 'DONE' label. orptr(tracePoints, 0x00000040); // Record that we're in the inflated block orptr(tracePoints, 0x00000800); // Record that we returned success from fast_lock orptr(tracePoints, 0x00002000); And it confirms the binary code fast_lock() code path that we decoded in gory detail above. Of course, I'm having a serious problem believing that the ZFlag value check done by the 'je' instruction is broken, but we keep coming back to that conclusion. End update:
18-03-2016

Just a quick note to remind myself that I changed the guarantee() in the previous experiment to dump more info and added another guarantee() on the ObjectMonitor::notifyAll() side: $ diff 8077392.diag.diff.txt.7[12] 1522c1522 < @@ -4006,6 +4006,39 @@ public: --- > @@ -4006,6 +4006,42 @@ public: 1556c1556 < + diagnostic(bool, UseNewCode9, false, \ --- > + diagnostic(bool, UseNewCode9e, true, \ 1558a1559,1561 > + diagnostic(bool, UseNewCode9n, true, \ > + "Testing Only: Use the new version while testing") \ > + \ 2488c2491 < @@ -1784,8 +2263,130 @@ void ObjectMonitor::notifyAll(TRAPS) { --- > @@ -1784,8 +2263,136 @@ void ObjectMonitor::notifyAll(TRAPS) { 2604a2608,2613 > +#ifdef DCUBED_JME_TRACE > +guarantee(!UseNewCode9n || > + (jt->dcubed_jme_last_trace_points() & (0x000400000000L | 0x00002000L)) > + != (0x000400000000L | 0x00002000L), > + "notifyAll: fast_lock() and quick_enter() cannot both succeed!"); > +#endif 2619c2628 < @@ -2187,6 +2788,18 @@ inline void ObjectMonitor::AddWaiter(Obj --- > @@ -2187,6 +2794,18 @@ inline void ObjectMonitor::AddWaiter(Obj 4031c4040 < @@ -4701,3 +4744,240 @@ void Threads::verify() { --- > @@ -4701,3 +4744,269 @@ void Threads::verify() { 4171a4181,4209 > + > + // quick_enter() set this bit in this call: > + // Record that we grabbed the ObjectMonitor with cmpxhg() > + // jt->add_dcubed_jme_last_trace_points(obj, Lock, m, 0x000400000000L); > + // > + // MacroAssembler::fast_lock() set this bit before quick_enter() was called: > + // Record that we returned success from fast_lock > + // orptr(tracePoints, 0x00002000); > + if (UseNewCode9e && trace_points == 0x000400000000L && > + (_dcubed_jme_last_trace_points & 0x00002000L) != 0) { > + tty->print_cr("XXX-9 - thread=" INTPTR_FORMAT ", lock=" INTPTR_FORMAT > + ", mon=" INTPTR_FORMAT ", trace_points=" INTPTR_FORMAT, > + this, lock, mon, _dcubed_jme_last_trace_points); > + tty->print_cr("XXX-9 - dcubed_jme_fast_lock_obj=" INTPTR_FORMAT ", " > +#if 0 > + "dcubed_jme_fast_lock_obj_hdr=" INTPTR_FORMAT ", " > + "*dcubed_jme_fast_lock_obj=" INTPTR_FORMAT ", " > +#endif > + "dcubed_jme_fast_lock_mon=" INTPTR_FORMAT, > + _dcubed_jme_fast_lock_obj, > +#if 0 > + _dcubed_jme_fast_lock_obj_hdr, > + (_dcubed_jme_fast_lock_obj == NULL ? -1 > + : *((intptr_t *) _dcubed_jme_fast_lock_obj)), > +#endif > + _dcubed_jme_fast_lock_mon); > + > + fatal("add_dcubed_jme_last_trace_points: fast_lock() and quick_enter() cannot both succeed!"); > + } The guarantee() are enabled by default, but controllable via two new UseNewCode switches: UseNewCode9e - controls the Java Monitor enter guarantee UseNewCode9n - controls the Java Monitor notifyAll guarantee When run with -XX:-UseNewCode9e, the failure mode looks like this: XXX-4 - dcubed_jme_fast_lock_obj=0xfffffd7bfa200408, dcubed_jme_fast_lock_mon=0x0000000001b6e382 XXX-4 - dcubed_jme_quick_enter_obj=0xfffffd7bf9b214c8, dcubed_jme_quick_enter_mon=0x0000000001b58f00 XXX-4 - mon[0]={lock=0xfffffd7fbf5cc6e0, dmw=0x000001535d1c4f4d, elim=false, replaced=false, owner=0xfffffd7bf9b214c8} # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (objectMonitor.cpp:2383), pid=20892, tid=107 # guarantee(!UseNewCode9n || (jt->dcubed_jme_last_trace_points() & (0x000400000 000L | 0x00002000L)) != (0x000400000000L | 0x00002000L)) failed: notifyAll: fast _lock() and quick_enter() cannot both succeed! # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcube d_2016_03_09_13_20-b00) which is very similar to our most recent results. The XXX-4 messages show that fast_lock() and quick_enter() both succeeded and both locked different objects (and different ObjectMonitors). When UseNewCode9e is left at the default of true, we get the following: XXX-9 - thread=0x000000000476f000, lock=0xfffffd7fc05dc850, mon=0x000000000435e580, trace_points=0x0000015d00012866 XXX-9 - dcubed_jme_fast_lock_obj=0xfffffd7be378c278, dcubed_jme_fast_lock_mon=0x0000000004363382 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (thread.cpp:4911), pid=17664, tid=91 # fatal error: add_dcubed_jme_last_trace_points: fast_lock() and quick_enter() cannot both succeed! # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcube d_2016_03_09_13_20-b00) The XXX-9 messages are printed on the quick_enter() code path and show that quick_enter() is operating on a different ObjectMonitor than the one that fast_lock() operated on. When UseNewCode9e is left at a default value of true, we have yet to see the ObjectMonitor::notifyAll() guarantee() fire. This strongly indicates that the errant complete_monitor_locking_C()/quick_enter() call is happening before the Object.notifyAll() call and is very likely happening as part of the original monitorenter operation.
10-03-2016

This entry is mostly notes to myself as I try to figure out what is going on with C2's fast_lock() and quick_enter() code. Normal people should probably ignore this entry. Added a dcubed_jme_fast_lock_mon field to track the ObjectMonitor that is observed by C2's fast_lock(). Added a dcubed_jme_quick_enter_mon field to track the ObjectMonitor that is observed by C2's quick_enter(). Similar dcubed_jme_last_trace_points results: $ /bin/grep XXX-4 doit.copyA3.log.7 | sed -n 's/.* \(dcubed_jme_last_trace_points=[^,][^,]*\).*/\1/p' | sort | uniq -c | grep '862$' 41294 dcubed_jme_last_trace_points=0x0000000000002862 615 dcubed_jme_last_trace_points=0x0000000500001862 1 dcubed_jme_last_trace_points=0x0000000500002862 Most C2 fast_lock() calls worked (0x2862 pattern). The C2 fast_lock() calls that failed called C2 quick_enter() which worked (0x500001862 pattern). And in one case, we got a C2 fast_lock() that worked followed by a C2 quick_enter() that also worked (pattern 0x500002862). All of the C2 fast_lock() calls that worked (0x2862 pattern) also have a NULL dcubed_jme_quick_enter_mon value: $ /bin/grep dcubed_jme_last_trace_points=0x0000000000002862 doit.copyA3.log.7 | grep -v dcubed_jme_quick_enter_mon=0x0000000000000000 when the debug code records the C2 fast_lock() tracing bits and the dcubed_jme_fast_lock_mon field value, it also sets the dcubed_jme_quick_enter_mon field to NULL. A subsequent call to C2 quick_enter() updates the dcubed_jme_quick_enter_mon field and in the case of working C2 fast_lock() calls, there was no C2 quick_enter() call. All of the C2 fast_lock() calls that failed (0x500001862 pattern) have the same "mon=", "dcubed_jme_fast_lock_mon=" and "dcubed_jme_quick_enter_mon=" values: $ grep dcubed_jme_last_trace_points=0x0000000500001862 doit.copyA3.log.7 | sed -e 's/.* mon=\([^,][^,]*\), dcubed_jme_fast_lock_mon=\([^,][^,]*\), .* dcubed_jme_quick_enter_mon=\([^,][^,]*\), .*/\1 \2X \3/' -e 's/2X/0/' | awk '{ if ( $1 != $2 ) printf("FAIL: mon=%s != dcubed_jme_fast_lock_mon=%s\n", $1, $2); if ( $1 != $3 ) printf("FAIL: mon=%s != dcubed_jme_quick_enter_mon=%s\n", $1, $3); printf( "checked %s\n", $1); }' | grep FAIL The "mon=" value is the ObjectMonitor being operated on by an ObjectMonitor::notifyAll() call, the "dcubed_jme_fast_lock_mon=" value is the ObjectMonitor recorded by fast_lock(); note the value reported includes the magic 0x2 value that identifies the object header as an ObjectMonitor so I've filtered that out. The "dcubed_jme_quick_enter_mon=" value is the the ObjectMonitor value recorded by quick_enter(). For all of these entries, the values all match which is what we expect. Here's the usual snippet from one of last night's failures: XXX-4 - 'java/util/concurrent/ForkJoinTask.setCompletion': obj=0xfffffd7bf3cf0a08, mon=0x0000000002994080, dcubed_jme_fast_lock_mon=0x0000000002994202, dcubed_jme_last_trace_points=0x0000000500002862, dcubed_jme_quick_enter_mon=0x0000000002994080, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX-4 - mon[0]={lock=0xfffffd7fbf6cd550, dmw=0x0000000000000000, elim=false, replaced=false, owner=0xfffffd7bf3cf0a08} XXX-7 - thread=0x0000000000c5c000, obj=0xfffffd7bf3cf0a08, tracePoints=0x62 XXX-7 - monitor=0x0000000002994080, tracePoints=0x62 dcubed_mon=0x0000000002994080: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=33, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x0000000000c5c000, dcubed_omna_target_thread=0x0000000000aab800 INFO: unexpected locked object: - locked <0xfffffd7bf3cf0a08> (a java.util.stream.SliceOps$SliceTask) INFO: uo_last_trace_points=0x62 INFO: unlock_object_trace_points=0xffffffff INFO: thread->dcubed_unlock_object_last_trace_points=0x62 INFO: mid->dcubed_uo_last_trace_points=0x62 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2203), pid=5411, tid=106 # fatal error: exiting JavaThread=0x0000000000c5c000 unexpectedly owns ObjectMonitor=0x0000000002994080 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_02_10_18_42-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2016_02_10_18_42-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) Here's the key line from the above snippet: XXX-4 - 'java/util/concurrent/ForkJoinTask.setCompletion': obj=0xfffffd7bf3cf0a08, mon=0x0000000002994080, dcubed_jme_fast_lock_mon=0x0000000002994202, dcubed_jme_last_trace_points=0x0000000500002862, dcubed_jme_quick_enter_mon=0x0000000002994080, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 These three values should be the same: mon=0x0000000002994080 dcubed_jme_fast_lock_mon=0x0000000002994202 dcubed_jme_quick_enter_mon=0x0000000002994080 and these bits: dcubed_jme_last_trace_points=0x0000000500002862 tell me that C2 fast_lock() succeeded on the ObjectMonitor at 0x0000000002994202 and C2 quick_enter() succeeded on the ObjectMonitor at 0x0000000002994080. The mon= line tells me that ObjectMonitor::notifyAll() is operating on the same ObjectMonitor that C2 quick_enter() successfully entered. Next step is to dig into how/where C2 fast_enter() is called so I can look at the success and failure detection along with the subsequent call to C2 quick_enter().
11-02-2016

Added end-to-end tracing in Java Monitor Enter (JME) code paths (interpreter, C2, ObjectMonitor::enter(), ObjectMonitor::EnterI(), ObjectSynchronizer::quick_enter(), ObjectSynchronizer::fast_enter() and ObjectSynchronizer::slow_enter()). The only piece that is missing is C1, but this bug has never been seen on C1 so I'll leave that code alone for now. Last night's over night run had three instances of our failure mode. Here's snippets from the first: XXX-4 - 'java/util/concurrent/ForkJoinTask.setCompletion': obj=0xfffffd7bf36a71a8, mon=0x00000000021bb880, dcubed_jme_last_trace_points=0x0000000500002862, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX-4 - mon[0]={lock=0xfffffd7fc02d94b0, dmw=0x0000000000000000, elim=false, replaced=false, owner=0xfffffd7bf36a71a8} XXX-7 - thread=0x0000000002e5e800, obj=0xfffffd7bf36a71a8, tracePoints=0x62 XXX-7 - monitor=0x00000000021bb880, tracePoints=0x62 INFO: Deflate: InCirc=4352 InUse=4 Scavenged=6 ForceMonitorScavenge=0 : pop=4318 free=3232 dcubed_mon=0x00000000021bb880: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=24, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x0000000002e5e800, dcubed_omna_target_thread=0x0000000000abb800 INFO: unexpected locked object: - locked <0xfffffd7bf36a71a8> (a java.util.stream.SliceOps$SliceTask) INFO: uo_last_trace_points=0x62 INFO: unlock_object_trace_points=0xffffffff INFO: thread->dcubed_unlock_object_last_trace_points=0x62 INFO: mid->dcubed_uo_last_trace_points=0x62 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2203), pid=28837, tid=94 # fatal error: exiting JavaThread=0x0000000002e5e800 unexpectedly owns ObjectMonitor=0x00000000021bb880 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_02_09_16_09-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2016_02_09_16_09-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) The new tracing flag value for the first failure is: dcubed_jme_last_trace_points=0x0000000500002862 Right to left, these bits translate as: 0x000000002 - // Mark that we came from MacroAssembler::fast_lock(). orptr(tracePoints, 0x00000002); 0x000000020 - // Record that we didn't take the force slow-path branch orptr(tracePoints, 0x00000020); 0x000000040 - // Record that biased_locking_enter() didn't take the 'DONE' label. orptr(tracePoints, 0x00000040); 0x000000800 - // Record that we're in the inflated block orptr(tracePoints, 0x00000800); 0x000002000 - // Record that we returned success from fast_lock orptr(tracePoints, 0x00002000); 0x100000000 - // Record that we called ObjectSynchronizer::quick_enter() jt->add_dcubed_jme_last_trace_points(Lock, 0x000100000000L); 0x400000000 - // Record that we grabbed the ObjectMonitor with cmpxhg() jt->add_dcubed_jme_last_trace_points(m, 0x000400000000L); Update: While re-reading the above, it looks like success was reported by both fast_lock() and quick_enter(). That shouldn't be possible. Looks like I need to investigate the new tracing more. Update 2: Double checked the tracing code and it appears to be working correctly. Here's interesting info for the C2 code paths: $ /bin/grep XXX-4 doit.copyA1.log.58 | sed -n 's/.* \(dcubed_jme_last_trace_points=[^,][^,]*\).*/\1/p' | sort | uniq -c | grep 862'$' 41200 dcubed_jme_last_trace_points=0x0000000000002862 648 dcubed_jme_last_trace_points=0x0000000500001862 1 dcubed_jme_last_trace_points=0x0000000500002862 The '0x862' bit pattern is C2's fast_lock(). The 0x2000 means that fast_lock() worked and the '0x1000' means that fast_lock() failed. The 0x500000000 means that quick_enter() worked. We have 41200 instances of C2's fast_lock() working without a followup call to quick_enter(). We have 648 instances of fast_lock() failing with a followup call to quick_enter() that worked. We have _one_ instance of a fast_lock() call working followed by a quick_enter() call that also worked. Going to take a quick look at the quick_enter() code path, but I suspect that it will take us down the recursive enter rabbit hole. Update 3: Taking a closer look at the tracing in fast_lock() and in quick_enter(). Here's some fast_lock() code: #ifdef DCUBED_JME_TRACE // Record that we're in the inflated block orptr(tracePoints, 0x00000800); #endif // It's inflated movq(scrReg, tmpReg); xorq(tmpReg, tmpReg); if (os::is_MP()) { lock(); } cmpxchgptr(r15_thread, Address(scrReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); // Unconditionally set box->_displaced_header = markOopDesc::unused_mark(). // Without cast to int32_t movptr will destroy r10 which is typically obj. movptr(Address(boxReg, 0), (int32_t)intptr_t(markOopDesc::unused_mark())); // Intentional fall-through into DONE_LABEL ... // Propagate ICC.ZF from CAS above into DONE_LABEL. So the cmpxchgptr() is trying to set the _owner field to the current thread. If that operation succeeded, then fast_lock() returns success. Otherwise it returns failure. // DONE_LABEL is a hot target - we'd really like to place it at the // start of cache line by padding with NOPs. // See the AMD and Intel software optimization manuals for the // most efficient "long" NOP encodings. // Unfortunately none of our alignment mechanisms suffice. bind(DONE_LABEL); // At DONE_LABEL the icc ZFlag is set as follows ... // Fast_Unlock uses the same protocol. // ZFlag == 1 -> Success // ZFlag == 0 -> Failure - force control through the slow-path } #ifdef DCUBED_JME_TRACE Label MY_DONE0, MY_DONE1; // if current state is success, then preserve that jccb(Assembler::zero, MY_DONE0); // Record that we returned failure from fast_lock orptr(tracePoints, 0x00001000); // save the current trace point info for objReg trace_fast_lock(objReg, tracePoints); cmpptr(rsp, 0); // set ICC.ZF=0 to indicate failure jmpb(MY_DONE1); bind(MY_DONE0); // Record that we returned success from fast_lock orptr(tracePoints, 0x00002000); // save the current trace point info for objReg // Note: This trace_fast_lock() causes a crash with slowdebug bits // near the end of the test run in deoptimization code. trace_fast_lock(objReg, tracePoints); xorptr(boxReg, boxReg); // set ICC.ZF=1 to indicate success bind(MY_DONE1); pop(tracePoints); #endif According to the tracing (0x00002000), the cmpxchgptr() call worked so the current thread should be in the _owner field and success is returned. Also note this unconditional code after the cmpxchgptr() call: // Unconditionally set box->_displaced_header = markOopDesc::unused_mark(). // Without cast to int32_t movptr will destroy r10 which is typically obj. movptr(Address(boxReg, 0), (int32_t)intptr_t(markOopDesc::unused_mark())); The above code puts markOopDesc::unused_mark() which is 0x3 into the displaced mark word, but what ObjectMonitor::notifyAll() sees later is zero. Ouch. Back to the _owner field mystery. However, something went wrong because we called quick_enter() and it's code looks like this: bool ObjectSynchronizer::quick_enter(oop obj, Thread * Self, BasicLock * Lock) { assert(!SafepointSynchronize::is_at_safepoint(), "invariant"); #ifndef DCUBED_JME_TRACE assert(Self->is_Java_thread(), "invariant"); #else guarantee(Self->is_Java_thread(), "invariant"); JavaThread * jt = (JavaThread *) Self; // Record that we called ObjectSynchronizer::quick_enter() jt->add_dcubed_jme_last_trace_points(Lock, 0x000100000000L); #endif assert(((JavaThread *) Self)->thread_state() == _thread_in_Java, "invariant"); No_Safepoint_Verifier nsv; if (obj == NULL) return false; // Need to throw NPE const markOop mark = obj->mark(); if (mark->has_monitor()) { ObjectMonitor * const m = mark->monitor(); assert(m->object() == obj, "invariant"); Thread * const owner = (Thread *) m->_owner; // Lock contention and Transactional Lock Elision (TLE) diagnostics // and observability // Case: light contention possibly amenable to TLE // Case: TLE inimical operations such as nested/recursive synchronization if (owner == Self) { #ifdef DCUBED_JME_TRACE // Record that we have a recursive ObjectMonitor enter jt->add_dcubed_jme_last_trace_points(m, 0x000200000000L); #endif #ifdef DCUBED_BL_DEBUG m->inc_dcubed_bl_enter_cnt(); #endif m->_recursions++; return true; } if (owner == NULL && Atomic::cmpxchg_ptr(Self, &(m->_owner), NULL) == NULL) { #ifdef DCUBED_JME_TRACE // Record that we grabbed the ObjectMonitor with cmpxhg() jt->add_dcubed_jme_last_trace_points(m, 0x000400000000L); #endif #ifdef DCUBED_BL_DEBUG m->inc_dcubed_bl_enter_cnt(); #endif assert(m->_recursions == 0, "invariant"); assert(m->_owner == Self, "invariant"); return true; } #ifdef DCUBED_JME_TRACE // Record that ObjectSynchronizer::quick_enter() for ObjectMonitor failed jt->add_dcubed_jme_last_trace_points(m, 0x000800000000L); return false; #endif } We know from the tracing value, only two flags are set in quick_enter(): the initial one (0x000100000000L) and the one (0x000400000000L) that shows another cmpxchg_ptr() call worked. Hey now this is just strange. fast_lock() made a successful cmpxchgptr() call so the _owner field should have already been the current thread AND we should have taken the recursive enter and set that tracing flag (0x000200000000L), but we didn't. quick_enter() just set a NULL _owner field to the current thread via cmpxchg_ptr() and returned true. How did quick_enter() see a NULL _owner field after it was already set by fast_lock()? fast_lock() set the displaced mark word to 0x3 and quick_enter() doesn't touch it so how did it get to be NULL when ObjectMonitor::notifyAll() is called? The second failure sighting has the same dcubed_jme_last_trace_points as the first so I won't repeat the decode. The third failure sighting is different than the first two: XXX-4 - 'java/util/concurrent/ForkJoinTask.setCompletion': obj=0xfffffd7bfdfa6c08, mon=0x0000000002167300, dcubed_jme_last_trace_points=0x0000015d00012866, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX-4 - mon[0]={lock=0xfffffd7fc0de45d0, dmw=0x0000000000000000, elim=false, replaced=false, owner=0xfffffd7bfdfa6c08} XXX-7 - thread=0x000000000193d800, obj=0xfffffd7bfdfa6c08, tracePoints=0x62 XXX-7 - monitor=0x0000000002167300, tracePoints=0x62 dcubed_mon=0x0000000002167300: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=8, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x000000000193d800, dcubed_omna_target_thread=0x0000000000ac2800 INFO: unexpected locked object: - locked <0xfffffd7bfdfa6c08> (a java.util.stream.Nodes$ToArrayTask$OfLong) INFO: uo_last_trace_points=0x62 INFO: unlock_object_trace_points=0xffffffff INFO: thread->dcubed_unlock_object_last_trace_points=0x62 INFO: mid->dcubed_uo_last_trace_points=0x62 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2203), pid=7501, tid=83 # fatal error: exiting JavaThread=0x000000000193d800 unexpectedly owns ObjectMonitor=0x0000000002167300 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_02_09_16_09-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2016_02_09_16_09-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) The new tracing flag value for the third failure is: dcubed_jme_last_trace_points=0x0000015d00012866 Right to left, these bits translate as: 0x00000000002 - // Mark that we came from MacroAssembler::fast_lock(). orptr(tracePoints, 0x00000002); 0x00000000004 - // Record that we have called ObjectMonitor::enter() jt->add_dcubed_jme_last_trace_points(this, 0x00000004); 0x00000000020 - // Record that we didn't take the force slow-path branch orptr(tracePoints, 0x00000020); 0x00000000040 - // Record that biased_locking_enter() didn't take the 'DONE' label. orptr(tracePoints, 0x00000040); 0x00000000800 - // Record that we're in the inflated block orptr(tracePoints, 0x00000800); 0x00000002000 - // Record that we returned success from fast_lock orptr(tracePoints, 0x00002000); 0x00000010000 - // Record that a simple _owner cmpxchg worked jt->add_dcubed_jme_last_trace_points(this, 0x00010000); 0x00100000000 - // Record that we called ObjectSynchronizer::quick_enter() jt->add_dcubed_jme_last_trace_points(Lock, 0x000100000000L); 0x00400000000 - // Record that we grabbed the ObjectMonitor with cmpxhg() jt->add_dcubed_jme_last_trace_points(m, 0x000400000000L); 0x00800000000 - // Record that ObjectSynchronizer::quick_enter() for ObjectMonitor failed jt->add_dcubed_jme_last_trace_points(m, 0x000800000000L); 0x01000000000 - // Record that we called ObjectSynchronizer::fast_enter() jt->add_dcubed_jme_last_trace_points(lock, 0x001000000000L); 0x04000000000 - // Record that bias was not revoked (!safepoint) jt->add_dcubed_jme_last_trace_points(lock, 0x004000000000L); 0x10000000000 - // Record that we called ObjectSynchronizer::slow_enter() jt->add_dcubed_jme_last_trace_points(lock, 0x010000000000L); This last trace is strange; there are a couple of points that indicate that the Java Monitor was successfully grabbed and that shouldn't be the case. Will analyze and add another note. Update: The first failure in this set of notes is giving me plenty to investigate so I won't be chasing down this one (for now).
11-02-2016

A one line change relative to the previous experiment: diff -r 9e7d1e562f69 src/share/vm/runtime/basicLock.hpp --- a/src/share/vm/runtime/basicLock.hpp +++ b/src/share/vm/runtime/basicLock.hpp > @@ -59,6 +69,7 @@ class BasicObjectLock VALUE_OBJ_CLASS_SP > private: > BasicLock _lock; // the lock, must be double word aligned > oop _obj; // object holds the lock; > +int _dcubed_dummy_space; > > public: > // Manipulation Doing two parallel product bits runs and so far 300+ iterations without a failure. Gonna take a closer look at how the interpreted frame is copied over to the compiler frame while this experiment is chunking away... Update: Instance #1 failed in the usual way at iterate #375 and instance #2 failed in the usual way at iteration #1317. The next logical step would be to move the dummy field to the end of BasicLock, but that would require enabling more code which would muddy the waters. Time to return to code path analysis.
05-02-2016

Back to feeling the love with this bug! Did an experiment over lunch with the #ifdef DCUBED_BOLN_DEBUG code disabled. Did two parallel product bits runs and got two failures: - instance #1 failed in run #63 XXX-4 - 'java/util/concurrent/ForkJoinTask.setCompletion': obj=0xfffffd7bf43df540, mon=0x00000000035f7b80, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX-4 - mon[0]={lock=0xfffffd7fc0fe6710, dmw=0x0000000000000000, elim=false, replaced=false, owner=0xfffffd7bf43df540} XXX-7 - thread=0x0000000002697000, obj=0xfffffd7bf43df540, tracePoints=0x62 XXX-7 - monitor=0x00000000035f7b80, tracePoints=0x62 dcubed_mon=0x00000000035f7b80: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=18, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x0000000002697000, dcubed_omna_target_thread=0x0000000000aba800 INFO: unexpected locked object: - locked <0xfffffd7bf43df540> (a java.util.stream.SliceOps$SliceTask) INFO: uo_last_trace_points=0x62 INFO: uo_bo_locks[0] = {bo_lock=0xfffffd7fc0fe64b8, obj=0xfffffd7bc0000000, trace_points=0xfe1} INFO: unlock_object_trace_points=0xffffffff INFO: thread->dcubed_unlock_object_last_trace_points=0x62 INFO: mid->dcubed_uo_last_trace_points=0x62 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2143), pid=28972, tid=81 # fatal error: exiting JavaThread=0x0000000002697000 unexpectedly owns ObjectMonitor=0x00000000035f7b80 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_02_04_12_06-b00) - instance #2 failed in run #11 XXX-4 - 'java/util/concurrent/ForkJoinTask.setCompletion': obj=0xfffffd7be40a9ce0, mon=0x00000000018dcd00, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX-4 - mon[0]={lock=0xfffffd7fc0ee56e0, dmw=0x0000000000000000, elim=false, replaced=false, owner=0xfffffd7be40a9ce0} XXX-7 - thread=0x0000000001cea000, obj=0xfffffd7be40a9ce0, tracePoints=0x62 XXX-7 - monitor=0x00000000018dcd00, tracePoints=0x62 dcubed_mon=0x00000000018dcd00: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=26, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x0000000001cea000, dcubed_omna_target_thread=0x0000000000a51000 INFO: unexpected locked object: - locked <0xfffffd7be40a9ce0> (a java.util.stream.Nodes$CollectorTask$OfInt) INFO: uo_last_trace_points=0x62 INFO: uo_bo_locks[0] = {bo_lock=0xfffffd7fc0ee5340, obj=0x000000000081dd90, trace_points=0xfe1} INFO: uo_bo_locks[1] = {bo_lock=0xfffffd7fc0ee53d0, obj=0x0000000000000000, trace_points=0xfe1} INFO: uo_bo_locks[2] = {bo_lock=0xfffffd7fc0ee53e0, obj=0xfffffd7f00000052, trace_points=0xfe1} INFO: unlock_object_trace_points=0xffffffff INFO: thread->dcubed_unlock_object_last_trace_points=0x62 INFO: mid->dcubed_uo_last_trace_points=0x62 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2143), pid=9211, tid=82 # fatal error: exiting JavaThread=0x0000000001cea000 unexpectedly owns ObjectMonitor=0x00000000018dcd00 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_02_04_12_06-b00) It's almost like this bug missed me! Next experiment will be to add an unused word to the end of BasicObjectLock (not BasicLock) to see whether the extra space is what made the bug stop reproducing. Can't add a dummy word to the end of BasicLock without reenabling more logic because that would put the dummy word between the displaced mark word and the object ref. There is code that assumes those two fields are adjacent... sigh...
04-02-2016

Added more debug and tracing code to track the code that is setting the displaced_header field in BasicLock. New code is in both assembly and C++ code paths for the interpreter, C1 and C2. This latest addition has made the failure stop reproducing. Originally I added the code under the (huge) DCUBED_UNLOCK_OBJECT_DEBUG #ifdef because it is part of the overall chase for where we lose an object unlock. I've moved most of the new additions to DCUBED_BOLN_DEBUG (BasicObjectLock NULL debugging). There are a few size related changes that need to be made independent of this bug fix so those aren't #ifdef'ed. Verifying that the bug reproduces again.
04-02-2016

Adding more tracing code to ObjectMonitor::notifyAll() in order to verify how many Java Monitors the target thread has locked. Got a couple of failures in my Friday night testing: XXX-4 - 'java/util/concurrent/ForkJoinTask.setCompletion': obj=0xfffffd7be9911e18, mon=0x0000000001eb2a80, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX-4 - mon[0]={lock=0xfffffd7bafdfe750, dmw=0x0000000000000000, elim=false, replaced=false, owner=0xfffffd7be9911e18} XXX-7 - thread=0x0000000000db9000, obj=0xfffffd7be9911e18, tracePoints=0x62 XXX-7 - monitor=0x0000000001eb2a80, tracePoints=0x62 dcubed_mon=0x0000000001eb2a80: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=13, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x0000000000db9000, dcubed_omna_target_thread=0x0000000000aca800 INFO: unexpected locked object: - locked <0xfffffd7be9911e18> (a java.util.stream.Nodes$ToArrayTask$OfDouble) INFO: uo_last_trace_points=0x62 INFO: unlock_object_trace_points=0xffffffff INFO: thread->dcubed_unlock_object_last_trace_points=0x62 INFO: mid->dcubed_uo_last_trace_points=0x62 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2134), pid=16792, tid=109 # fatal error: exiting JavaThread=0x0000000000db9000 unexpectedly owns ObjectMonitor=0x0000000001eb2a80 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2016_01_29_13_53-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2016_01_29_13_53-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) Some quick notes for what I can decipher off the top of my head: - The guarantee() failure shows the target thread tried to exit the VM while it owned the java.util.stream.Nodes$ToArrayTask$OfDouble monitor. - The XXX-4 line shows that we executed ObjectMonitor::notifyAll() in java/util/concurrent/ForkJoinTask.setCompletion() on our target object from a compiled frame; the obj=0xfffffd7be9911e18 output is new for that debug line - The second XXX-4 line which is also new: XXX-4 - mon[0]={lock=0xfffffd7bafdfe750, dmw=0x0000000000000000, elim=false, replaced=false, owner=0xfffffd7be9911e18} shows that our thread has one Java Monitor locked in this frame and the displaced mark word is zero. The fact that I can see dmw=0x0000000000000000 from ObjectMonitor::notifyAll() means that the errant setting of the displaced mark word happened between the Java Monitor enter and the Object.notifyAll() call. This rules out a problem in the Java Monitor exit path (even though it is the exit path that is messed up by the errant displaced mark word value).
01-02-2016

Now that's a depressing thought! I'm currently mulling on how to instrument a sanity check for verifying the number of lock entries a thread has for our target Java monitor. If we execute a double-enter due to an error in the compiled code, then we should have two BasicObjectLocks on the Java thread's stack.
22-01-2016

Does your logging give any indication that we actually did a recursive lock, even if the code doesn't perform it? I'm wondering if the bug may be at the "front-end" and we lock twice due to an incorrect action of the compiled code.
22-01-2016

I'm shaking out the new tracing code that I've added to InterpreterMacroAssembler::unlock_object() and MacroAssembler::fast_unlock(). I wasn't expecting to get a hit so soon since I just started testing this morning. Here's some debug output: XXX-4 - 'java/util/concurrent/ForkJoinTask.setCompletion': mon=0x00000000014c4200, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX-7 - thread=0x00000000012cd000, obj=0xfffffd7bfd7f72d8, tracePoints=0x62 XXX-7 - monitor=0x00000000014c4200, tracePoints=0x62 dcubed_mon=0x00000000014c4200: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=9, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x00000000012cd000, dcubed_omna_target_th read=0x0000000000ab2800 INFO: unexpected locked object: - locked <0xfffffd7bfd7f72d8> (a java.util.stream.Nodes$CollectorTask$OfLong) INFO: uo_last_trace_points=0x62 INFO: uo_bo_locks[0] = {bo_lock=0xfffffd7fc0de44d0, obj=0xfffffd7b40006970, trace_points=0xfe1} INFO: unlock_object_trace_points=0xffffffff INFO: thread->dcubed_unlock_object_last_trace_points=0x62 INFO: mid->dcubed_uo_last_trace_points=0x62 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2134), pid=18544, tid=83 # fatal error: exiting JavaThread=0x00000000012cd000 unexpectedly owns ObjectMonitor=0x00000000014c4200 Some quick notes for what I can decipher off the top of my head: - The guarantee() failure shows the target thread tried to exit the VM while it owned the java.util.stream.Nodes$CollectorTask$OfLong monitor. - The XXX-4 line shows that we executed ObjectMonitor::notifyAll() in java/util/concurrent/ForkJoinTask.setCompletion() on our target object from a compiled frame - The XXX-7 line shows us the code path through MacroAssembler::fast_unlock() was recorded with these trace points: 0x62 which decode as follows: - // Mark that we came from MacroAssembler::fast_unlock(). orptr(tracePoints, 0x00002); - // Record that we didn't take the force slow-path branch orptr(tracePoints, 0x00020); - // Record that biased_locking_exit() didn't take the 'DONE' label. orptr(tracePoints, 0x00040); The lack of any more flags shows that we bailed at this point in the code: cmpptr(Address(boxReg, 0), (int32_t)NULL_WORD); // Examine the displaced header // DCUBED - We're jumping to DONE_LABEL with icc.ZFlag==1 which indicates // DCUBED - success so we do nothing for exiting a recursive stack lock. // DCUBED - That sounds good, but is there a race with the inflation // DCUBED - code where the displaced header is temporarily set to NULL? jcc (Assembler::zero, DONE_LABEL); // 0 indicates recursive stack-lock #ifdef DCUBED_UNLOCK_OBJECT_DEBUG // Record that we didn't take the recursive case. orptr(tracePoints, 0x00080); #endif The problem with bailing out here where this code thinks we have a recursive stack lock is that we just came from an ObjectMonitor::notifyAll() where we had to be inflated to do all the notification administration stuff. This should not have been possible which is why this bug is a race... Update: That last comment probably isn't clear. The code above the 0x00080 is the normal place for exiting a recursive lock. In the failing test program, the stranded lock is not used in a recursive way so we should not be taking that code path with this particular lock.
16-01-2016

In the spirit of double checking what you "know", I've added "guarantee(false)" calls controlled by UseNewCode[2-5] options to generate stack traces so I can verify that call stacks are what I expected. UseNewCode2 - combined with a check in ObjectMonitor::notifyAll() for a compiled java/util/concurrent/ForkJoinTask.setCompletion() frame; the hs_err_pid stack is more complete than the dbx thread dump, dbx appears to get confused by compiled frames below the one that called into the VM, but it all looks pretty much as expected UseNewCode3 - combined with a check in ObjectMonitor::notifyAll() for an interpreted java/util/concurrent/ForkJoinTask.setCompletion() frame; the hs_err_pid stack and dbx thread dump show similar stacks and all looks pretty much as expected UseNewCode4 - combined with a check in InterpreterRuntime::trace_unlock_object() for a compiled java/util/concurrent/ForkJoinTask.setCompletion() frame; unlike the ObjectMonitor::notifyAll() experiment (UseNewCode2), this trap was never hit by our test program. UseNewCode5 - combined with a check in InterpreterRuntime::trace_unlock_object() for an interpreted java/util/concurrent/ForkJoinTask.setCompletion() frame; the hs_err_pid stack and dbx thread dump show similar stacks and all looks pretty much as expected Way back in the notes is some analysis of the C2 compiled code for java/util/concurrent/ForkJoinTask.setCompletion(). It's pointed out in that analysis that's there is no C2 code for the following line: synchronized (this) { notifyAll(); } and the explanation is that we hit an uncommon trap which results in that line of Java code being executed by the interpreter. Based on the results of my UseNewCode4 experiment, it looks like that statement is not necessarily correct in all cases. Next step is to find the other places where unlock_object() is implemented and add tracking like I did in InterpreterMacroAssembler::unlock_object(). It could be that I've been looking in the wrong places for this bug...
03-12-2015

Did an experiment late on Wed 11.25 and let it run through Thanksgiving. Here's the second instance's results: XXX - 'java/util/concurrent/ForkJoinTask.setCompletion': mon=0x000000000093d080, is_entry=0, is_java=1, is_interpreted=1, is_compiled=0, bci=40 XXX - bcp=0xfffffd7bafac6168 XXX bcp[0] = b6 XXX bcp[1] = 05 XXX bcp[2] = 00 XXX bcp[3] = 2d XXX bcp[4] = c3 INFO: deflate_idle_monitors(): mon=0x000000000093d200 INFO: deflate_idle_monitors(): mon=0x000000000093d380 INFO: deflate_idle_monitors(): mon=0x000000000093d500 INFO: deflate_idle_monitors(): mon=0x000000000093d680 INFO: deflate_idle_monitors(): mon=0x000000000093d800 INFO: deflate_idle_monitors(): mon=0x000000000093d980 INFO: deflate_idle_monitors(): mon=0x000000000093db00 INFO: deflate_idle_monitors(): mon=0x000000000093dc80 INFO: deflate_idle_monitors(): mon=0x000000000093de00 INFO: deflate_idle_monitors(): mon=0x000000000093df80 INFO: deflate_idle_monitors(): mon=0x000000000093e100 INFO: deflate_idle_monitors(): mon=0x000000000093e280 INFO: deflate_idle_monitors(): mon=0x000000000093e400 INFO: deflate_idle_monitors(): mon=0x000000000093e580 INFO: deflate_idle_monitors(): mon=0x000000000093e700 INFO: deflate_idle_monitors(): mon=0x000000000093e880 INFO: deflate_idle_monitors(): mon=0x000000000093ea00 INFO: deflate_idle_monitors(): mon=0x000000000093eb80 INFO: deflate_idle_monitors(): mon=0x000000000093ed00 INFO: deflate_idle_monitors(): mon=0x000000000093ee80 INFO: deflate_idle_monitors(): mon=0x000000000093f000 INFO: deflate_idle_monitors(): mon=0x000000000093f900 INFO: Deflate: InCirc=128 InUse=4 Scavenged=22 ForceMonitorScavenge=0 : pop=127 free=22 INFO: Deflate: InCirc=128 InUse=4 Scavenged=0 ForceMonitorScavenge=0 : pop=127 free=22 dcubed_mon=0x000000000093d080: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=1, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x00000000026df000, dcubed_omna_target_thread=0x0000000000ab2800 INFO: unexpected locked object: - locked <0xfffffd7bfd004228> (a java.util.stream.Nodes$SizedCollectorTask$OfRef) INFO: uo_bo_locks[0] = {bo_lock=0xfffffd7fc03d9ef0, obj=0xfffffd7ffe8e6388, trace_points=0xfe} INFO: uo_bo_locks[1] = {bo_lock=0xfffffd7fc03d9f00, obj=0xfffffd7ffef25c00, trace_points=0xce} INFO: uo_bo_locks[2] = {bo_lock=0xfffffd7fc03d9fa8, obj=0x00000000009d1cd0, trace_points=0xce} INFO: uo_bo_locks[3] = {bo_lock=0xfffffd7fc03da0a0, obj=0xfffffd7fc03da8b0, trace_points=0xfe} INFO: uo_bo_locks[4] = {bo_lock=0xfffffd7fc03da120, obj=0x000a7d656678303d, trace_points=0xce} INFO: uo_bo_locks[5] = {bo_lock=0xfffffd7fc03da2e0, obj=0xfffffd7ffe4b8d4c, trace_points=0xfe} INFO: uo_bo_locks[6] = {bo_lock=0xfffffd7fc03da530, obj=0x0000000000000000, trace_points=0xc6} INFO: uo_bo_locks[7] = {bo_lock=0xfffffd7fc03da5b8, obj=0x0000000000000000, trace_points=0xce} INFO: unlock_object_trace_point=0xffffffff # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2128), pid=24692, tid=93 # fatal error: exiting JavaThread=0x00000000026df000 unexpectedly owns ObjectMonitor=0x000000000093d080 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internaldcubed_2015_11_25_13_05-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_11_25_13_05-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) Observations about the second instance: - this failure caught an interpreted frame in notifyAll()/INotify() tracing: 'java/util/concurrent/ForkJoinTask.setCompletion': mon=0x000000000093d080, is_entry=0, is_java=1, is_interpreted=1, is_compiled=0, bci=40 - the bytecodes are as we expect (current and next two) - the target monitor (mon=0x000000000093d080) does not appear in the deflate_idle_monitors() output list because we don't have one in between notifyAll() and thread exit - our object <0xfffffd7bfd004228> (a java.util.stream.Nodes$SizedCollectorTask$OfRef) does not appear in the uo_bo_locks list so again it looks like we never called 'monitorexit' on that object - deoptimization of java/util/concurrent/ForkJoinTask.setCompletion shows up in the hs_err_pid log for this instance which is likely why we see the frame as 'interpreted' Here's the first instance's results: XXX - 'java/util/concurrent/ForkJoinTask.setCompletion': mon=0x00000000030a6680, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX - bcp=0xfffffd7bafac6168 XXX bcp[0] = b6 XXX bcp[1] = 05 XXX bcp[2] = 00 XXX bcp[3] = 2d XXX bcp[4] = c3 dcubed_mon=0x00000000030a6680: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=10, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x00000000033cb800, dcubed_omna_target_thread=0x0000000000aaa800 INFO: unexpected locked object: - locked <0xfffffd7be4e35e48> (a java.util.stream.Nodes$CollectorTask$OfInt) INFO: uo_bo_locks[0] = {bo_lock=0xfffffd7fbf5cc410, obj=0xfffffd7baf42db00, trace_points=0xfe} INFO: uo_bo_locks[1] = {bo_lock=0xfffffd7fbf5cc630, obj=0xfffffd7fbf5cc728, trace_points=0xc6} INFO: uo_bo_locks[2] = {bo_lock=0xfffffd7fbf5cc6b8, obj=0xfffffd7fbf5cc788, trace_points=0xce} INFO: unlock_object_trace_point=0xffffffff # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2128), pid=26883, tid=107 # fatal error: exiting JavaThread=0x00000000033cb800 unexpectedly owns ObjectMonitor=0x00000000030a6680 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2015_11_25_13_05-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_11_25_13_05-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) Observations about the first instance: - this failure caught a compiled frame in notifyAll()/INotify() tracing: 'java/util/concurrent/ForkJoinTask.setCompletion': mon=0x00000000030a6680, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 - the bytecodes are as we expect (current and next two) - the target monitor (mon=0x000000000093d080) does not appear in the deflate_idle_monitors() output list - our object <0xfffffd7be4e35e48> (a java.util.stream.Nodes$CollectorTask$OfInt) does not appear in the uo_bo_locks list so again it looks like we never called 'monitorexit' on that object - deoptimization of java/util/concurrent/ForkJoinTask.setCompletion does NOT show up in the hs_err_pid log for this instance which is likely why we see the frame as 'compiled' - our target thread (JavaThread=0x00000000033cb800) does show as deoptimizing java.util.concurrent.ForkJoinPool.runW orker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V
02-12-2015

Continuing down the path of figuring out why the 'monitorexit' bytecode does not appear to be executed very rarely... Here's the 'javap -v -l -p' output for ava/util/concurrent/ForkJoinTask.setCompletion() private int setCompletion(int); descriptor: (I)I flags: ACC_PRIVATE Code: stack=7, locals=5, args_size=2 0: aload_0 1: getfield #2 // Field status:I 4: dup 5: istore_2 6: ifge 11 9: iload_2 10: ireturn 11: getstatic #3 // Field U:Lsun/misc/Unsafe; 14: aload_0 15: getstatic #4 // Field STATUS:J 18: iload_2 19: iload_2 20: iload_1 21: ior 22: invokevirtual #5 // Method sun/misc/Unsafe.compareAndSwapInt:(Ljava/lang/Object;JII)Z 25: ifeq 0 28: iload_2 29: bipush 16 31: iushr 32: ifeq 55 35: aload_0 36: dup 37: astore_3 38: monitorenter 39: aload_0 40: invokevirtual #6 // Method java/lang/Object.notifyAll:()V 43: aload_3 44: monitorexit 45: goto 55 48: astore 4 50: aload_3 51: monitorexit 52: aload 4 54: athrow 55: iload_1 56: ireturn Exception table: from to target type 39 45 48 any 48 52 48 any LineNumberTable: line 260: 0 line 261: 9 line 262: 11 line 263: 28 line 264: 35 line 265: 55 StackMapTable: number_of_entries = 4 frame_type = 0 /* same */ frame_type = 252 /* append */ offset_delta = 10 locals = [ int ] frame_type = 255 /* full_frame */ offset_delta = 36 locals = [ class java/util/concurrent/ForkJoinTask, int, int, class java/lang/Object ] stack = [ class java/lang/Throwable ] frame_type = 250 /* chop */ offset_delta = 6 So for this line of code: synchronized (this) { notifyAll(); } these are the bytecodes: 35: aload_0 36: dup 37: astore_3 38: monitorenter 39: aload_0 40: invokevirtual #6 // Method java/lang/Object.notifyAll:()V 43: aload_3 44: monitorexit 45: goto 55 48: astore 4 50: aload_3 51: monitorexit Based on various debug sessions, it looks like java/lang/Object.notifyAll() is working as expected (ObjectMonitor::notifyAll() and ObjectMonitor::INotify()) so I've started layering in my next debug hooks there. Some preliminary output: XXX - 'java/util/concurrent/ForkJoinTask.setCompletion': mon=0x00000000013a7400, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX - bcp=0xfffffd7bafac0750 XXX bcp[0] = b6 XXX bcp[1] = 05 XXX bcp[2] = 00 XXX bcp[3] = 2d XXX bcp[4] = c3 INFO: Deflate: InCirc=5120 InUse=4 Scavenged=1 ForceMonitorScavenge=0 : pop=5080 free=4032 dcubed_mon=0x00000000013a7400: dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=5, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x0000000001022000, dcubed_omna_target_thread=0x0000000000ab3800 INFO: unexpected locked object: - locked <0xfffffd7bea622080> (a java.util.stream.Nodes$ToArrayTask$OfDouble) INFO: uo_bo_locks[0] = {bo_lock=0xfffffd7fbfbd2008, obj=0x0000000000419d60, trace_points=0xce} INFO: uo_bo_locks[1] = {bo_lock=0xfffffd7fbfbd2018, obj=0x0000000000419d60, trace_points=0xce} INFO: uo_bo_locks[2] = {bo_lock=0xfffffd7fbfbd20c0, obj=0xfffffd7fbfbd21d0, trace_points=0xce} INFO: uo_bo_locks[3] = {bo_lock=0xfffffd7fbfbd21b8, obj=0xfffffd7fbfbd2100, trace_points=0xfe} INFO: uo_bo_locks[4] = {bo_lock=0xfffffd7fbfbd2238, obj=0x303d646165726874, trace_points=0xce} INFO: uo_bo_locks[5] = {bo_lock=0xfffffd7fbfbd2630, obj=0x0000000000000000, trace_points=0xc6} INFO: uo_bo_locks[6] = {bo_lock=0xfffffd7fbfbd26b8, obj=0x0000000000000000, trace_points=0xce} INFO: unlock_object_trace_point=0xffffffff # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2126), pid=29966, tid=101 # fatal error: exiting JavaThread=0x0000000001022000 unexpectedly owns ObjectMonitor=0x00000000013a7400 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2015_11_25_13_05-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_11_25_13_05-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) Very similar story as before: - exiting thread 0x0000000001022000 owns an ObjectMonitor (0x00000000013a7400) associated with object 0xfffffd7bea622080 (a java.util.stream.Nodes$ToArrayTask$OfDouble) - the exiting thread called 'monitorexit' on 7 objects and none of those objects match the stranded object (0xfffffd7bea622080) New info this round: XXX - 'java/util/concurrent/ForkJoinTask.setCompletion': mon=0x00000000013a7400, is_entry=0, is_java=1, is_interpreted=0, is_compiled=1, bci=40 XXX - bcp=0xfffffd7bafac0750 XXX bcp[0] = b6 XXX bcp[1] = 05 XXX bcp[2] = 00 XXX bcp[3] = 2d XXX bcp[4] = c3 So the most recent call to ava/util/concurrent/ForkJoinTask.setCompletion() that made a notifyAll() call on our ObjectMonitor (0x00000000013a7400) came from a compiled Java frame executing at BCI == 40. The bytecode at that spot look like (with annotations) XXX bcp[0] = b6 // invokevirtual bytecode XXX bcp[1] = 05 // invokevirtual (cont) XXX bcp[2] = 00 // invokevirtual (cont) XXX bcp[3] = 2d // aload_3 XXX bcp[4] = c3 // monitorexit One interesting thing is that javaVFrame::is_compiled() still returns true/1 for our frame. Another interesting thing is that this message: INFO: Deflate: InCirc=5120 InUse=4 Scavenged=1 ForceMonitorScavenge=0 : pop=5080 free=4032 happened after notifyAll() was called and before our JavaThread tried to exit with the stranded ObjectMonitor.
01-12-2015

More debugging and fixing of the DCUBED_UNLOCK_OBJECT_DEBUG code. Here's what a current experiment with DCUBED_UNLOCK_OBJECT_DEBUG and the revamped DCUBED_OMN_DEBUG enabled. The fix for "fast notify" refactored the ObjectMonitor::notify() and ObjectMonitor::notifyAll() code to share a common ObjectMonitor::INotify() function. I merged DCUBED_OMN_DEBUG and DCUBED_OMNA_DEBUG code under the single DCUBED_OMN_DEBUG option. Here's a sample: INFO: Deflate: InCirc=5248 InUse=4 Scavenged=1170 ForceMonitorScavenge=0 : pop=5207 free=3503 dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=3, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x0000000001880000, dcubed_omna_target_thread=0x0000000000ab2800 INFO: unexpected locked object: - locked <0xfffffd7bf251d320> (a java.util.stream.Nodes$ToArrayTask$OfRef) INFO: uo_bo_locks[0] = {bo_lock=0xfffffd7fbf8cf470, obj=0xfffffd7bf3b106f8, trace_points=0xfe} INFO: uo_bo_locks[1] = {bo_lock=0xfffffd7fbf8cf730, obj=0x0000000000000000, trace_points=0xc6} INFO: uo_bo_locks[2] = {bo_lock=0xfffffd7fbf8cf7b8, obj=0x0000000000000000, trace_points=0xce} INFO: unlock_object_trace_point=0xffffffff # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2123), pid=11719, tid=104 # fatal error: exiting JavaThread=0x0000000001880000 unexpectedly owns ObjectMonitor=0x0000000003b7c700 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcube d_2015_11_20_10_26-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_11_20_1 0_26-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) Again, the target thread (JavaThread=0x0000000001880000) only exited three Java Monitors and none matches the stranded object (0xfffffd7bf251d320). Two of the three BasicObjectLock entries have a zero for the object ref. Looks like those objects have been GC'ed. Update: Just realized that I never talked about the revamped DCUBED_OMN_DEBUG code and the resulting output: dcubed_omna_ticket=3, dcubed_omna_call_path=0x35, dcubed_omna_loop_cnt=1, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x0000000001880000, dcubed_omna_target_thread=0x0000000000ab2800 What the above debug info means is that the target thread (0x0000000001880000) looped one time in notifyAll() and notified one thread: 0x0000000000ab2800. Since the target thread never called 'monitorexit', the notified thread (0x0000000000ab2800) would be the hung thread in the original sighting of this bug.
25-11-2015

Based on earlier analysis we know that java/util/concurrent/ForkJoinTask.setCompletion was last compiled by C2 and that the critical line of Java code: synchronized (this) { notifyAll(); } is not included in that compilation and instead calls a DeoptBlob. This should switch the thread from executing C2 compiled code to executing interpreter code. I thought of a way to record trace points in InterpreterMacroAssembler::unlock_object() and ran another parallel experiment with those bits. Here's the new debug output for instance #1: INFO: unexpected locked object: - locked <0xfffffd7bfd029128> (a java.util.stream.Nodes$SizedCollectorTask$OfRef) INFO: unlock_object_trace_point=0xffffffff INFO: uo_objs[0] = {obj=0xfffffd7bfe320638, trace_points=0x2} INFO: uo_objs[1] = {obj=0xfffffd7c00092e68, trace_points=0x2} INFO: uo_objs[2] = {obj=0xfffffd7c00001c08, trace_points=0x2} INFO: uo_objs[3] = {obj=0xfffffd7bfdbf7648, trace_points=0x2} INFO: uo_objs[4] = {obj=0xfffffd7bfe329050, trace_points=0x2} INFO: uo_objs[5] = {obj=0xfffffd7bfe3294d0, trace_points=0x2} INFO: uo_objs[6] = {obj=0xfffffd7bfd04c5f8, trace_points=0x2} # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2108), pid=24950, tid=90 # fatal error: exiting JavaThread=0x000000000422a000 unexpectedly owns ObjectMonitor=0x0000000000940a00 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2015_11_12_08_20-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_11_12_08_20-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) What the above output tells me is that the thread that is exiting while owning the lock for 0xfffffd7bfd029128 has called the interpreter's unlock_object() code on seven Java objects and none of those objects is the one in question. Here's the new debug output for instance #2: INFO: unexpected locked object: - locked <0xfffffd7bfd1209b8> (a java.util.stream.ReduceOps$ReduceTask) INFO: unlock_object_trace_point=0xffffffff INFO: uo_objs[0] = {obj=0xfffffd7bfe325cf0, trace_points=0x2} INFO: uo_objs[1] = {obj=0xfffffd7c00092e68, trace_points=0x2} INFO: uo_objs[2] = {obj=0xfffffd7c00001c08, trace_points=0x2} INFO: uo_objs[3] = {obj=0xfffffd7bfd90e688, trace_points=0x2} INFO: uo_objs[4] = {obj=0xfffffd7bfe32ef10, trace_points=0x2} INFO: uo_objs[5] = {obj=0xfffffd7bfd022f78, trace_points=0x2} # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2108), pid=21896, tid=92 # fatal error: exiting JavaThread=0x00000000019c3800 unexpectedly owns ObjectMonitor=0x000000000093a980 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2015_11_12_08_20-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_11_12_08_20-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) What the above output tells me is that the thread that is exiting while owning the lock for 0xfffffd7bfd1209b8 has called the interpreter's unlock_object() code on six Java objects and none of those objects is the one in question. The 'trace_points=0x2' output is telling me two things: 1) All of these unlocks took the 'done' branch out of biased_locking_exit() 2) The flag setting code at the bottom of unlock_object() doesn't work the way I thought it did. void InterpreterMacroAssembler::unlock_object(Register lock_reg) { assert(lock_reg == LP64_ONLY(c_rarg1) NOT_LP64(rdx), "The argument is only for looks. It must be c_rarg1"); #ifdef DCUBED_UNLOCK_OBJECT_DEBUG // Reset any trace_points associated with lock_reg. lock_reg is saved // and restored in trace_unlock_object(). Other regs have to be saved // and restored depending on the code that follows. trace_unlock_object(lock_reg, 0x0000); #endif if (UseHeavyMonitors) { #ifdef DCUBED_UNLOCK_OBJECT_DEBUG // Record that we took the UseHeavyMonitors branch. trace_unlock_object(lock_reg, 0x0001); push(lock_reg); // preserve lock_reg across call_VM() #endif call_VM(noreg, CAST_FROM_FN_PTR(address, InterpreterRuntime::monitorexit), lock_reg); #ifdef DCUBED_UNLOCK_OBJECT_DEBUG pop(lock_reg); // because we need it at the bottom of this function #endif } else { Label done; #ifdef DCUBED_UNLOCK_OBJECT_DEBUG // Record that we didn't take the UseHeavyMonitors branch. trace_unlock_object(lock_reg, 0x0002); #endif const Register swap_reg = rax; // Must use rax for cmpxchg instruction const Register header_reg = LP64_ONLY(c_rarg2) NOT_LP64(rbx); // Will conta in the old oopMark const Register obj_reg = LP64_ONLY(c_rarg3) NOT_LP64(rcx); // Will conta in the oop save_bcp(); // Save in case of exception // Convert from BasicObjectLock structure to object and BasicLock // structure Store the BasicLock address into %rax lea(swap_reg, Address(lock_reg, BasicObjectLock::lock_offset_in_bytes())); // Load oop into obj_reg(%c_rarg3) movptr(obj_reg, Address(lock_reg, BasicObjectLock::obj_offset_in_bytes())); // Free entry movptr(Address(lock_reg, BasicObjectLock::obj_offset_in_bytes()), (int32_t)NULL_WORD); if (UseBiasedLocking) { biased_locking_exit(obj_reg, header_reg, done); } #ifdef DCUBED_UNLOCK_OBJECT_DEBUG // Record that biased_locking_exit() didn't take the 'done' label. // Have to save more regs that are used later. push(header_reg); push(obj_reg); push(swap_reg); trace_unlock_object(lock_reg, 0x0004); pop(swap_reg); pop(obj_reg); pop(header_reg); #endif <snip> bind(done); restore_bcp(); } #ifdef DCUBED_UNLOCK_OBJECT_DEBUG_XXX // Record that we made it to the bottom of this function. trace_unlock_object(lock_reg, 0x0040); #endif The 0x0040 flag never got set for the objects that we did record so that trace_unlock_object() call isn't working quite right. Crap it just occurred to me that when biased_locking_exit() takes the 'done' label branch, it hasn't restored the lock_reg because I never told that the lock_reg needed to be restored... sigh... Update: That last paragraph explains why the debug output isn't 0x42 (Hitchhiker fan anyone?) which is what I was expecting. I still don't know why monitor exit was never called for the Java object in question...
13-11-2015

While checking out Thread-2's enter-notifyAll-exit code path, I got suspicious of MacroAssembler::biased_locking_exit() since it can cause the exit code path to terminate without doing anything when it detects that Biased Locking is in use. Used this code to make sure that Thread-2's view of memory was up to date: $ diff 8077392.diag.diff.txt.35 8077392.diag.diff.txt.364c4,18 < @@ -1571,6 +1571,9 @@ void MacroAssembler::rtm_inflated_lockin --- > @@ -1263,6 +1263,13 @@ void MacroAssembler::biased_locking_exit > // a higher level. Second, if the bias was revoked while we held the > // lock, the object could not be rebiased toward another thread, so > // the bias bit would be clear. > + if (os::is_MP() && (SyncFlags & 256) != 0) { > + // Memory barrier/fence > + // Instead of MFENCE we use a dummy locked add of 0 to the top-of-stack. > + // This is faster on Nehalem and AMD Shanghai/Barcelona. > + // See https://blogs.oracle.com/dave/entry/instruction_selection_for_volatile_fences > + lock(); addl(Address(rsp, 0), 0); > + } > movptr(temp_reg, Address(obj_reg, oopDesc::mark_offset_in_bytes())); > andptr(temp_reg, markOopDesc::biased_lock_mask_in_place); > cmpptr(temp_reg, markOopDesc::biased_lock_pattern); Did a parallel test run with these bits and the failure reproduced in the same way as previous runs.
05-11-2015

Here's the latest set of relative diagnostics changes that I put in place for test runs from 2015.10.30 -> 2015.11.02: $ diff 8077392.diag.diff.txt.3[23] 935c936,938 < assert(false, "Non-balanced monitor enter/exit! Likely JNI locking"); --- > - assert(false, "Non-balanced monitor enter/exit! Likely JNI locking"); > +// assert(false, "Non-balanced monitor enter/exit! Likely JNI locking"); > +fatal("Non-balanced monitor enter/exit! Likely JNI locking"); 955c958 < @@ -942,10 +1089,16 @@ void NOINLINE ObjectMonitor::exit(bool n --- > @@ -942,10 +1090,17 @@ void NOINLINE ObjectMonitor::exit(bool n 962a966 > +guarantee(THREAD == _owner, "invariant"); 972c976 < @@ -961,6 +1114,10 @@ void NOINLINE ObjectMonitor::exit(bool n --- > @@ -961,7 +1116,12 @@ void NOINLINE ObjectMonitor::exit(bool n 980a985 > +guarantee(THREAD != _owner, "invariant"); 990a997 > +guarantee(THREAD != _owner, "invariant"); 1010a1018 > +guarantee(THREAD != _owner, "invariant"); 1021a1031 > +guarantee(THREAD != _owner, "invariant"); 1034c1045 < @@ -1144,13 +1322,23 @@ void NOINLINE ObjectMonitor::exit(bool n --- > @@ -1144,13 +1328,24 @@ void NOINLINE ObjectMonitor::exit(bool n 1040a1052 > +guarantee(THREAD != _owner, "invariant"); 1068c1080 < @@ -1211,15 +1402,32 @@ void NOINLINE ObjectMonitor::exit(bool n --- > @@ -1211,15 +1409,34 @@ void NOINLINE ObjectMonitor::exit(bool n 1090a1103 > +guarantee(THREAD != _owner, "invariant"); 1097a1111 > +guarantee(THREAD != _owner, "invariant"); 1920c1934 < @@ -444,8 +605,15 @@ ObjectLocker::~ObjectLocker() { --- > @@ -444,15 +605,82 @@ ObjectLocker::~ObjectLocker() { 1929a1944 > - assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now"); 1933c1948,1982 < assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now"); --- > +// assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now"); > + > +// BEGIN - verifying lock/monitor ownership after revoke_and_rebias() > +// > +{ > +guarantee(!obj->mark()->has_bias_pattern(), "biases should be revoked by now"); > +// revoke_and_rebias() will change the biased lock into a stack lock, > +// but the stack lock could be inflated by another thread by the time > +// we get here. > +guarantee(obj->mark()->has_locker() || obj->mark()->has_monitor(), "must be either stack locked or inflated lock"); > +BasicLock * xxx_locker = NULL; > +if (obj->mark()->has_locker()) { > +xxx_locker = obj->mark()->locker(); > +} > +ObjectMonitor * xxx_monitor = NULL; > +if (obj->mark()->has_monitor()) { > +if (xxx_locker != NULL) { > +// since the has_locker() check, the stack lock has been inflated so > +// we can't trust the locker() return value > +xxx_locker = NULL; > +} > +xxx_monitor = obj->mark()->monitor(); > +} > +// If we have a non-NULL xxx_locker, then it should be owned by the > +// calling thread. Even if the stack-lock was inflated by the time > +// we got here. > +guarantee(xxx_locker == NULL || THREAD->is_lock_owned((address)xxx_locker), "BasicLock should be owned by calling thread"); > +// The owner() of the ObjectMonitor can be a BasicLock or the calling > +// thread. IIRC, when the stack lock is inflated by another thread, > +// the ObjectMonitor::_owner is set to the BasicLock. > +guarantee(xxx_monitor == NULL || THREAD->is_lock_owned((address)xxx_monitor->owner()) || THREAD == xxx_monitor->owner(), "ObjectMonitor should be owned by calling thread"); > +} > +// > +// END - verifying lock/monitor ownership after revoke_and_rebias() > + 1936c1985 < @@ -453,6 +621,12 @@ int ObjectSynchronizer::wait(Handle obj, --- > TEVENT(wait - throw IAX); 1945a1995,2014 > + > +// BEGIN - verifying monitor ownership after inflate() > +// > +{ > +guarantee(!obj->mark()->has_bias_pattern(), "biases should still be revoked by now"); > +// has_locker is a superset of has_monitor() because lock_mask > +// is a superset of monitor_value so we can't guarantee() that > +// we don't have a stock lock: > +// guarantee(!obj->mark()->has_locker(), "stack lock should be inflated by now"); > +// The stack lock should be inflated now; either by another thread > +// or by our call to inflate() above. > +guarantee(obj->mark()->has_monitor(), "must be inflated lock"); > +// The owner() of the ObjectMonitor can be a BasicLock or the calling > +// thread. IIRC, when the stack lock is inflated by another thread, > +// the ObjectMonitor::_owner is set to the BasicLock. > +guarantee(THREAD->is_lock_owned((address)monitor->owner()) || THREAD == monitor->owner(), "ObjectMonitor should be owned by calling thread"); > +} > +// > +// END - verifying monitor ownership after inflate() > + The purpose of the above diff is to simply record the paranoid guarantee() calls that I added during this round. Both parallel runs failed over the weekend (no surprise): $ tail doit_loop.copy66.log Loop #1628...PASS Loop #1629...PASS Loop #1630...PASS Loop #1631...PASS Loop #1632...PASS Loop #1633...PASS Loop #1634...PASS Loop #1635...PASS Loop #1636...PASS Loop #1637...FAIL $ tail doit_loop.copy67.log Loop #11246...PASS Loop #11247...PASS Loop #11248...PASS Loop #11249...PASS Loop #11250...PASS Loop #11251...PASS Loop #11252...PASS Loop #11253...PASS Loop #11254...PASS Loop #11255...FAIL The purpose of this info is to give some idea about how long repro'ing this failure mode can take. Here's where instance #1 failed: config java.util.stream.LoggingTestCase.before(): success Revoking bias by walking my own stack (dcubed_bl_caller_id=7, dcubed_bl_call_path=600800): Revoking bias of object 0xfffffd7be34098c0, mark 0x0000000000ab8005, type java.util.stream.Nodes$CollectorTask$OfInt, prototype header 0x0000000000000005, allow rebias 0, requesting thread 0x0000000000ab8000 Revoking bias of object biased toward live thread (0x0000000000ab8000) Revoked bias of currently-locked object INFO: unexpected locked object: - locked <0xfffffd7be34098c0> (a java.util.stream.Nodes$CollectorTask$OfInt) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2089), pid=14579, tid=111 # fatal error: exiting JavaThread=0x0000000002ad0000 unexpectedly owns ObjectMonitor=0x0000000002a69300 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2015_10_30_15_31-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_10_30_15_31-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) The important fact about the above output is the fact none of the new guarantee() calls detected a problem (not really a surprise). Here's where instance #2 failed: config java.util.stream.LoggingTestCase.before(): success Revoking bias by walking my own stack (dcubed_bl_caller_id=7, dcubed_bl_call_path=600800): Revoking bias of object 0xfffffd7be373ba88, mark 0x0000000000ab1005, type java.util.stream.Nodes$CollectorTask$OfInt, prototype header 0x0000000000000005, allow rebias 0, requesting thread 0x0000000000ab1000 Revoking bias of object biased toward live thread (0x0000000000ab1000) Revoked bias of currently-locked object INFO: unexpected locked object: - locked <0xfffffd7be373ba88> (a java.util.stream.Nodes$CollectorTask$OfInt) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2089), pid=28243, tid=108 # fatal error: exiting JavaThread=0x00000000025c9000 unexpectedly owns ObjectMonitor=0x00000000062f2300 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2015_10_30_15_31-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_10_30_15_31-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) Again, the important fact about the above output is the fact none of the new guarantee() calls detected a problem (not really a surprise). I have one last thing to sanity check about when a thread inflates a Java monitor for another thread and then we move on to Thread-2's enter-notifyAll-exit code paths.
02-11-2015

Not quite. The current thread does the revoke, finishes its Object.wait() call and another thread (Thread-B) enters the monitor, does a notifyAll(), and somehow still owns the monitor when Thread-B exits. I'm walking the code paths (again) to see if I can spot the race... I'll have more detailed notes when I'm done...
30-10-2015

I did not pick up on the fact it was the current thread that owned the lock and which then got the assertion failure on termination. :( That leads me to ask how much intervening execution there is between the bias revocation and the termination?
30-10-2015

Agreed about the block to which the comment you quoted applies. I have similar concerns documented in my "Biased Locking Decision Points" write up. However, for this particular failure mode, the calling thread happens to be the thread towards which the lock is biased so the thread is walking its own stack. Of course, I have more notes about what another thread can be doing to this object while this thread is trying to change from a Biased Lock into a stack lock. However, I still have not yet nailed down a viable sequence of exactly what goes wrong. Yesterday, I successfully ported my diagnostics code from June 2015 to the current baseline. Some things have changed, but less than I feared. Leveraging off the new diag code, I made one small change to add more context to the current investigation: $ diff 8077392.diag.diff.txt.30 8077392.diag.diff.txt.31523c523,525 < @@ -611,23 +695,47 @@ BiasedLocking::Condition BiasedLocking:: --- > @@ -609,25 +693,53 @@ BiasedLocking::Condition BiasedLocking:: > // stale epoch. > ResourceMark rm; 524a527 > +#ifndef DCUBED_BL_DEBUG 525a529,531 > +#else > +tty->print_cr("Revoking bias by walking my own stack (dcubed_bl_caller_id=%d):", dcubed_bl_caller_id); > +#endif That now leads to debug output that looks like this: Revoking bias by walking my own stack (dcubed_bl_caller_id=7): Revoking bias of object 0xfffffd7be4f17eb8, mark 0x0000000000ab2005, type java.util.stream.Nodes$CollectorTask$OfInt, prototype header 0x0000000000000005, allow rebias 0, requesting thread 0x0000000000ab2000 Revoking bias of object biased toward live thread (0x0000000000ab2000) Revoked bias of currently-locked object INFO: unexpected locked object: - locked <0xfffffd7be4f17eb8> (a java.util.stream.Nodes$CollectorTask$OfInt) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:2035), pid=20341, tid=94 # fatal error: exiting JavaThread=0x00000000017e9000 unexpectedly owns ObjectMonitor=0x0000000003454180 Which means that revoke_and_rebias() was called from here: src/share/vm/runtime/synchronizer.cpp: // ----------------------------------------------------------------------------- // Wait/Notify/NotifyAll // NOTE: must use heavy weight monitor to handle wait() int ObjectSynchronizer::wait(Handle obj, jlong millis, TRAPS) { #ifdef DCUBED_BL_DEBUG int dcubed_bl_call_path = 0; #endif if (UseBiasedLocking) { #ifndef DCUBED_BL_DEBUG BiasedLocking::revoke_and_rebias(obj, false, THREAD); #else BiasedLocking::revoke_and_rebias(obj, false, &dcubed_bl_call_path, 7, THREAD); #endif assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now"); } if (millis < 0) { TEVENT(wait - throw IAX); THROW_MSG_0(vmSymbols::java_lang_IllegalArgumentException(), "timeout value is negative"); } ObjectMonitor* monitor = ObjectSynchronizer::inflate(THREAD, obj()); #ifdef DCUBED_BL_DEBUG // this is the path that the hanging thread takes so don't // potentially overwrite the path taken by the other thread //monitor->set_dcubed_bl_debug_info(dcubed_bl_call_path, 7, THREAD); monitor->inc_dcubed_bl_wait_cnt(); #endif DTRACE_MONITOR_WAIT_PROBE(monitor, obj(), THREAD, millis); monitor->wait(millis, true, THREAD); // This dummy call is in place to get around dtrace bug 6254741. Once // that's fixed we can uncomment the following line, remove the call // and change this function back into a "void" func. // DTRACE_MONITOR_PROBE(waited, monitor, obj(), THREAD); return dtrace_waited_probe(monitor, obj, THREAD); } As you can see from the comments I added for the diag code, I've been down this route before. Comforting and yet I have to wonder what I missed before...
29-10-2015

I'm certainly not ramped up on the details at the moment but this rings alarm bells: // Check to see whether it currently owns the lock and, if so, // write down the needed displaced headers to the thread's stack. how can we write to another thread's stack, if that other thread might terminate at any time? We're racing with that other thread, both regarding termination and the unlock of this monitor that has to happen before termination.
29-10-2015

Thanks for reading these rambling notes of mine... :-) That's my current theory. I reread my "Biased Locking Decision Points" write up yesterday and I had previously identified some code that is suspicious, but when I looked at the code yesterday I couldn't come up with a sequence of events that would cause this specific issue. I'm adding back my tracing/debug code so that I can find the exact point of the overlap/confusion.
28-10-2015

Based on what you have now found: are we racing the bias-revocation with the unlocking in the other thread? ie the lock owner does an unlock but the non-safepoint bias-revocation then restores the mark/header which shows the lock is still owned, and so we hit the assert when the thread exits.
28-10-2015

I'm having trouble mapping the latest diagnostic output with the earlier note that identified code so here's the most recent diagnostic output, the code that generated each message and some analysis of the code. > Revoking bias by walking my own stack: src/share/vm/runtime/biasedLocking.cpp BiasedLocking::Condition BiasedLocking::revoke_and_rebias(Handle obj, bool attempt_rebias, TRAPS) { assert(!SafepointSynchronize::is_at_safepoint(), "must not be called while at safepoint"); <snip> HeuristicsResult heuristics = update_heuristics(obj(), attempt_rebias); if (heuristics == HR_NOT_BIASED) { return NOT_BIASED; } else if (heuristics == HR_SINGLE_REVOKE) { Klass *k = obj->klass(); markOop prototype_header = k->prototype_header(); if (mark->biased_locker() == THREAD && prototype_header->bias_epoch() == mark->bias_epoch()) { // A thread is trying to revoke the bias of an object biased // toward it, again likely due to an identity hash code // computation. We can again avoid a safepoint in this case // since we are only going to walk our own stack. There are no // races with revocations occurring in other threads because we // reach no safepoints in the revocation path. // Also check the epoch because even if threads match, another thread // can come in with a CAS to steal the bias of an object that has a // stale epoch. ResourceMark rm; if (TraceBiasedLocking) { tty->print_cr("Revoking bias by walking my own stack:"); } BiasedLocking::Condition cond = revoke_bias(obj(), false, false, (JavaThread*) THREAD); ((JavaThread*) THREAD)->set_cached_monitor_info(NULL); assert(cond == BIAS_REVOKED, "why not?"); return cond; } else { So based on this message, we are in revoke_and_rebias(), we are not at a safepoint and we're going to revoke the bias of an object toward us which is supposed to okay since we're doing it by walking our own stack. > Revoking bias of object 0xfffffd7bfd561308, mark 0x0000000000a79905, type java.util.stream.Nodes$CollectorTask$OfLong, prototype header 0x0000000000000105, allow rebias 0, requesting thread 0x0000000000a79800 src/share/vm/runtime/biasedLocking.cpp: static BiasedLocking::Condition revoke_bias(oop obj, bool allow_rebias, bool is_bulk, JavaThread* requesting_thread) { <snip> if (TraceBiasedLocking && (Verbose || !is_bulk)) { ResourceMark rm; tty->print_cr("Revoking bias of object " INTPTR_FORMAT ", mark " INTPTR_FORMAT ", type %s, prototype header " INTPTR_FORMAT ", allow rebias %d, requesting thread " INTPTR_FORMAT, p2i((void *)obj), (intptr_t) mark, obj->klass()->external_name(), (intptr_t) obj->klass()->prototype_header(), (allow_rebias ? 1 : 0), (intptr_t) requesting_thread); } This call to revoke_bias() made it past the "if (!mark->has_bias_pattern())" check so the object is still biased. In this call to revoke_bias(), the allow_bias parameter is false. We also know that the is_bulk parameter is false since Verbose is not true and we printed this output. So this is not a bulk revoke and we aren't allowed to rebias. That might narrow down the possible code paths. > Revoking bias of object biased toward live thread (0x0000000000a79800) src/share/vm/runtime/biasedLocking.cpp: static BiasedLocking::Condition revoke_bias(oop obj, bool allow_rebias, bool is_bulk, JavaThread* requesting_thread) { <snip> if (!thread_is_alive) { if (allow_rebias) { obj->set_mark(biased_prototype); } else { obj->set_mark(unbiased_prototype); } if (TraceBiasedLocking && (Verbose || !is_bulk)) { tty->print_cr(" Revoked bias of object biased toward dead thread (" PTR_FORMAT ")", p2i(biased_thread)); } return BiasedLocking::BIAS_REVOKED; } if (TraceBiasedLocking && (Verbose || !is_bulk)) { tty->print_cr(" Revoking bias of object biased toward live thread (" PTR_FORMAT ")", p2i(biased_thread)); } This is a new diagnostic added after the "if (!thread_is_alive)" check so we have a definitive value for biased_thread. > Revoked bias of currently-locked object src/share/vm/runtime/biasedLocking.cpp: static BiasedLocking::Condition revoke_bias(oop obj, bool allow_rebias, bool is_bulk, JavaThread* requesting_thread) { <snip> // Thread owning bias is alive. // Check to see whether it currently owns the lock and, if so, // write down the needed displaced headers to the thread's stack. // Otherwise, restore the object's header either to the unlocked // or unbiased state. GrowableArray<MonitorInfo*>* cached_monitor_info = get_or_compute_monitor_info(biased_thread); BasicLock* highest_lock = NULL; for (int i = 0; i < cached_monitor_info->length(); i++) { MonitorInfo* mon_info = cached_monitor_info->at(i); if (mon_info->owner() == obj) { if (TraceBiasedLocking && Verbose) { tty->print_cr(" mon_info->owner (" PTR_FORMAT ") == obj (" PTR_FORMAT ")", p2i((void *) mon_info->owner()), p2i((void *) obj)); } // Assume recursive case and fix up highest lock later markOop mark = markOopDesc::encode((BasicLock*) NULL); highest_lock = mon_info->lock(); highest_lock->set_displaced_header(mark); } else { if (TraceBiasedLocking && Verbose) { tty->print_cr(" mon_info->owner (" PTR_FORMAT ") != obj (" PTR_FORMAT ")", p2i((void *) mon_info->owner()), p2i((void *) obj)); } } } if (highest_lock != NULL) { // Fix up highest lock to contain displaced header and point // object at it highest_lock->set_displaced_header(unbiased_prototype); // Reset object header to point to displaced mark. // Must release storing the lock address for platforms without TSO // ordering (e.g. ppc). obj->release_set_mark(markOopDesc::encode(highest_lock)); assert(!obj->mark()->has_bias_pattern(), "illegal mark state: stack lock used bias bit"); if (TraceBiasedLocking && (Verbose || !is_bulk)) { tty->print_cr(" Revoked bias of currently-locked object"); } } else { if (TraceBiasedLocking && (Verbose || !is_bulk)) { tty->print_cr(" Revoked bias of currently-unlocked object"); } if (allow_rebias) { obj->set_mark(biased_prototype); } else { // Store the unlocked value into the object's header. obj->set_mark(unbiased_prototype); } } The above code unbiases the lock by making it into a stack lock. The new owner is the JavaThread towards which the lock was biased. > INFO: unexpected locked object: - locked <0xfffffd7bfd561308> (a java.util.stream.Nodes$CollectorTask$OfLong) > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (synchronizer.cpp:1705), pid=6334, tid=84 > # fatal error: exiting JavaThread=0x00000000012c4000 unexpectedly owns ObjectMonitor=0x0000000002990480 The INFO line and the "fatal error" output combine to show us that JavaThread=0x00000000012c4000 tried to exit while owning our ObjectMonitor=0x0000000002990480 which is associated with the object: 0xfffffd7bfd561308. We need to figure out the origin of the revoke_and_rebias() call so I need to dig up the original biased locking diagnostics that I created for this bug back in 2015.06.
27-10-2015

Here's the debug output, the code that generated each message and some analysis of the code. > Revoking bias with potentially per-thread safepoint: src/share/vm/runtime/biasedLocking.cpp class VM_RevokeBias : public VM_Operation { <snip> virtual void doit() { if (_obj != NULL) { if (TraceBiasedLocking) { tty->print_cr("Revoking bias with potentially per-thread safepoint:"); } _status_code = revoke_bias((*_obj)(), false, false, _requesting_thread); clean_up_cached_monitor_info(); return; } else { Since we're in the "if (_obj != NULL) {" branch we know that we were handed a single object instead of an array of _objs. Note: not sure why the trace output talks about "per-thread safepoint". We don't have those in HotSpot (yet?). This is VM_RevokeBias::doit() so we're at a safepoint when this message is printed. > (Skipping revocation of object of type java.util.stream.Nodes$CollectorTask$OfDouble because it's no longer biased) src/share/vm/runtime/biasedLocking.cpp: static BiasedLocking::Condition revoke_bias(oop obj, bool allow_rebias, bool is_ bulk, JavaThread* requesting_thread) { markOop mark = obj->mark(); if (!mark->has_bias_pattern()) { if (TraceBiasedLocking) { ResourceMark rm; tty->print_cr(" (Skipping revocation of object of type %s because it's no longer biased)", obj->klass()->external_name()); } return BiasedLocking::NOT_BIASED; } So the "Skipping revocation..." message means that revoke_bias() was called on an object reference and by the time we got here, the object was no longer biased so there's nothing left to do. > Revoking bias by walking my own stack: src/share/vm/runtime/biasedLocking.cpp BiasedLocking::Condition BiasedLocking::revoke_and_rebias(Handle obj, bool attem pt_rebias, TRAPS) { assert(!SafepointSynchronize::is_at_safepoint(), "must not be called while at safepoint"); <snip> HeuristicsResult heuristics = update_heuristics(obj(), attempt_rebias); if (heuristics == HR_NOT_BIASED) { return NOT_BIASED; } else if (heuristics == HR_SINGLE_REVOKE) { Klass *k = obj->klass(); markOop prototype_header = k->prototype_header(); if (mark->biased_locker() == THREAD && prototype_header->bias_epoch() == mark->bias_epoch()) { // A thread is trying to revoke the bias of an object biased // toward it, again likely due to an identity hash code // computation. We can again avoid a safepoint in this case // since we are only going to walk our own stack. There are no // races with revocations occurring in other threads because we // reach no safepoints in the revocation path. // Also check the epoch because even if threads match, another thread // can come in with a CAS to steal the bias of an object that has a // stale epoch. ResourceMark rm; if (TraceBiasedLocking) { tty->print_cr("Revoking bias by walking my own stack:"); } BiasedLocking::Condition cond = revoke_bias(obj(), false, false, (JavaThread*) THREAD); ((JavaThread*) THREAD)->set_cached_monitor_info(NULL); assert(cond == BIAS_REVOKED, "why not?"); return cond; } else { So based on this message, we are in revoke_and_rebias(), we are not at a safepoint and we're going to revoke the bias of an object toward us which is supposed to okay since we're doing it by walking our own stack. > Revoking bias of object 0xfffffd7be942c080 , mark 0x0000000000abd005 , type java.util.stream.Nodes$ToArrayTask$OfDouble , prototype header 0x0000000000000005 , allow rebias 0 , requesting thread 0x0000000000abd000 src/share/vm/runtime/biasedLocking.cpp: static BiasedLocking::Condition revoke_bias(oop obj, bool allow_rebias, bool is_ bulk, JavaThread* requesting_thread) { <snip> if (TraceBiasedLocking && (Verbose || !is_bulk)) { ResourceMark rm; tty->print_cr("Revoking bias of object " INTPTR_FORMAT " , mark " INTPTR_FOR MAT " , type %s , prototype header " INTPTR_FORMAT " , allow rebias %d , request ing thread " INTPTR_FORMAT, p2i((void *)obj), (intptr_t) mark, obj->klass()->external_name (), (intptr_t) obj->klass()->prototype_header(), (allow_rebias ? 1 : 0), (intptr _t) requesting_thread); } This call to revoke_bias() made it past the "if (!mark->has_bias_pattern())" check so the object is still biased. In this call to revoke_bias(), the allow_bias parameter is false. We also know that the is_bulk parameter is false since Verbose is not true and we printed this output. So this is not a bulk revoke and we aren't allowed to rebias. That might narrow down the possible code paths. > Revoked bias of currently-locked object src/share/vm/runtime/biasedLocking.cpp: static BiasedLocking::Condition revoke_bias(oop obj, bool allow_rebias, bool is_ bulk, JavaThread* requesting_thread) { <snip> if (highest_lock != NULL) { // Fix up highest lock to contain displaced header and point // object at it highest_lock->set_displaced_header(unbiased_prototype); // Reset object header to point to displaced mark. // Must release storing the lock address for platforms without TSO // ordering (e.g. ppc). obj->release_set_mark(markOopDesc::encode(highest_lock)); assert(!obj->mark()->has_bias_pattern(), "illegal mark state: stack lock use d bias bit"); if (TraceBiasedLocking && (Verbose || !is_bulk)) { tty->print_cr(" Revoked bias of currently-locked object"); } } else { The above code unbiases the lock by making it into a stack lock. The new owner is the JavaThread towards which the lock was biased. > INFO: unexpected locked object: - locked <0xfffffd7be942c080> (a java.util.stream.Nodes$ToArrayTask$OfDouble)# > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (synchronizer.cpp:1705), pid=9760, tid=103 > # fatal error: exiting JavaThread=0x0000000001a09000 unexpectedly owns ObjectMonitor=0x000000000162b600 The INFO line and the "fatal error" output combine to show us that JavaThread=0x0000000001a09000 tried to exit while owning our ObjectMonitor=0x000000000162b600. The existing diagnostics show us that the biased lock was owned and that we converted the biased lock into a stack lock owned by biased_thread, but we don't have any diagnostic output definitively showing which thread is biased_thread in that revoke_bias() call. Gonna have to fix that. Update: Here's the new tracing code: $ hg diff src/share/vm/runtime/biasedLocking.cpp diff -r 6f0961ba54bb src/share/vm/runtime/biasedLocking.cpp --- a/src/share/vm/runtime/biasedLocking.cpp Wed Oct 21 19:10:21 2015 +0000 +++ b/src/share/vm/runtime/biasedLocking.cpp Mon Oct 26 15:57:14 2015 -0600 @@ -199,9 +199,15 @@ static BiasedLocking::Condition revoke_b obj->set_mark(unbiased_prototype); } if (TraceBiasedLocking && (Verbose || !is_bulk)) { - tty->print_cr(" Revoked bias of object biased toward dead thread"); + tty->print_cr(" Revoked bias of object biased toward dead thread (" + PTR_FORMAT ")", p2i(biased_thread)); } return BiasedLocking::BIAS_REVOKED; + } + + if (TraceBiasedLocking && (Verbose || !is_bulk)) { + tty->print_cr(" Revoking bias of object biased toward live thread (" + PTR_FORMAT ")", p2i(biased_thread)); } // Thread owning bias is alive. Here's the revised output snippet: INFO: Deflate: InCirc=3072 InUse=4 Scavenged=2 ForceMonitorScavenge=0 : pop=3048 free=1653 Revoking bias with potentially per-thread safepoint: (Skipping revocation of object of type java.util.stream.Nodes$CollectorTask$OfInt because it's no longer biased) test org.openjdk.tests.java.util.stream.ToArrayOpTest.testIntOpsWithFilter("array:2x[0..100]", IntTestData[array:2x[0..100]]): success config java.util.stream.LoggingTestCase.after(org.testng.internal.TestResult@264eab1c): success config java.util.stream.LoggingTestCase.before(): success Revoking bias by walking my own stack: Revoking bias of object 0xfffffd7be4932260 , mark 0x0000000000aae105 , type java.util.stream.Nodes$CollectorTask$OfInt , prototype header 0x0000000000000105 , allow rebias 0 , requesting thread 0x0000000000aae000 Revoking bias of object biased toward live thread (0x0000000000aae000) Revoked bias of currently-locked object INFO: unexpected locked object: - locked <0xfffffd7be4932260> (a java.util.stream.Nodes$CollectorTask$OfInt) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:1705), pid=20849, tid=91 # fatal error: exiting JavaThread=0x000000000116b800 unexpectedly owns ObjectMonitor=0x000000000220a800 Update: Need more details when this tracing line is output: (Skipping revocation of object of type java.util.stream.Nodes$CollectorTask$OfInt because it's no longer biased) Here's the updated tracing code: $ hg diff diff -r 6f0961ba54bb src/share/vm/runtime/biasedLocking.cpp --- a/src/share/vm/runtime/biasedLocking.cpp Wed Oct 21 19:10:21 2015 +0000 +++ b/src/share/vm/runtime/biasedLocking.cpp Mon Oct 26 17:43:24 2015 -0600 @@ -150,8 +150,12 @@ static BiasedLocking::Condition revoke_b if (!mark->has_bias_pattern()) { if (TraceBiasedLocking) { ResourceMark rm; - tty->print_cr(" (Skipping revocation of object of type %s because it's no longer biased)", - obj->klass()->external_name()); + tty->print_cr(" (Skipping revocation of object " INTPTR_FORMAT + ", mark " INTPTR_FORMAT ", type %s, requesting thread " + INTPTR_FORMAT " because it's no longer biased)", + p2i((void *)obj), (intptr_t) mark, + obj->klass()->external_name(), + (intptr_t) requesting_thread); } return BiasedLocking::NOT_BIASED; } @@ -162,8 +166,13 @@ static BiasedLocking::Condition revoke_b if (TraceBiasedLocking && (Verbose || !is_bulk)) { ResourceMark rm; - tty->print_cr("Revoking bias of object " INTPTR_FORMAT " , mark " INTPTR_FORMAT " , type %s , prototype header " INTPTR_FORMAT " , allow rebias %d , requesting thread " INTPTR_FORMAT, - p2i((void *)obj), (intptr_t) mark, obj->klass()->external_name(), (intptr_t) obj->klass()->prototype_header(), (allow_rebias ? 1 : 0), (intptr_t) requesting_thread); + tty->print_cr("Revoking bias of object " INTPTR_FORMAT ", mark " + INTPTR_FORMAT ", type %s, prototype header " INTPTR_FORMAT + ", allow rebias %d, requesting thread " INTPTR_FORMAT, + p2i((void *)obj), (intptr_t) mark, + obj->klass()->external_name(), + (intptr_t) obj->klass()->prototype_header(), + (allow_rebias ? 1 : 0), (intptr_t) requesting_thread); } JavaThread* biased_thread = mark->biased_locker(); @@ -199,9 +208,15 @@ static BiasedLocking::Condition revoke_b obj->set_mark(unbiased_prototype); } if (TraceBiasedLocking && (Verbose || !is_bulk)) { - tty->print_cr(" Revoked bias of object biased toward dead thread"); + tty->print_cr(" Revoked bias of object biased toward dead thread (" + PTR_FORMAT ")", p2i(biased_thread)); } return BiasedLocking::BIAS_REVOKED; + } + + if (TraceBiasedLocking && (Verbose || !is_bulk)) { + tty->print_cr(" Revoking bias of object biased toward live thread (" + PTR_FORMAT ")", p2i(biased_thread)); } // Thread owning bias is alive. Update: Here's the revised output snippet: INFO: Deflate: InCirc=2816 InUse=4 Scavenged=1 ForceMonitorScavenge=0 : pop=2794 free=952 Revoking bias with potentially per-thread safepoint: (Skipping revocation of object 0xfffffd7bfd552258, mark 0x0000000002990682, type java.util.stream.Nodes$ToArrayTask$OfLong, requesting thread 0x00000000012c4000 because it's no longer biased) test org.openjdk.tests.java.util.stream.ToArrayOpTest.testLongOpsWithFilter("SpinedList:reverse 0..100", java.util.stream.TestData$AbstractTestData$LongTestData@6d187690): success config java.util.stream.LoggingTestCase.after(org.testng.internal.TestResult@26b5f079): success config java.util.stream.LoggingTestCase.before(): success Revoking bias by walking my own stack: Revoking bias of object 0xfffffd7bfd561308, mark 0x0000000000a79905, type java.util.stream.Nodes$CollectorTask$OfLong, prototype header 0x0000000000000105, allow rebias 0, requesting thread 0x0000000000a79800 Revoking bias of object biased toward live thread (0x0000000000a79800) Revoked bias of currently-locked object INFO: unexpected locked object: - locked <0xfffffd7bfd561308> (a java.util.stream.Nodes$CollectorTask$OfLong) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:1705), pid=6334, tid=84 # fatal error: exiting JavaThread=0x00000000012c4000 unexpectedly owns ObjectMonitor=0x0000000002990480
27-10-2015

The "potential per-thread safepoint" log message was added so we could gather statistics on when we might be able to use per-thread safepoints. The answer was in generlal, rarely, and so we never continued with the per-thread safepoint work.
24-10-2015

Looking at doit.copy40.log.178 instead of the grep output shows more interesting messages: Revoking bias with potentially per-thread safepoint: (Skipping revocation of object of type java.util.stream.Nodes$CollectorTask$OfDouble because it's no longer biased) Revoking bias by walking my own stack: Revoking bias of object 0xfffffd7be942c080 , mark 0x0000000000abd005 , type java.util.stream.Nodes$ToArrayTask$OfDouble , prototype header 0x0000000000000005 , allow rebias 0 , requesting thread 0x0000000000abd000 Revoked bias of currently-locked object INFO: unexpected locked object: - locked <0xfffffd7be942c080> (a java.util.stream.Nodes$ToArrayTask$OfDouble) The "INFO" line above comes from the JavaThread=0x0000000001a09000 that is about to exit with an unexpected locked object. Got the JavaThread address from the hs_err_pid file. These three lines indicate that the JavaThread (0x0000000000abd000) that caused the revocation found a reference to the object locked by JavaThread=0x0000000001a09000 on its own stack: Revoking bias by walking my own stack: Revoking bias of object 0xfffffd7be942c080 , mark 0x0000000000abd005 , type java.util.stream.Nodes$ToArrayTask$OfDouble , prototype header 0x0000000000000005 , allow rebias 0 , requesting thread 0x0000000000abd000 Revoked bias of currently-locked object The last line in this group has me very interested. That's the code that I want to take a closer look at... One more thing. This line: Revoking bias with potentially per-thread safepoint: means that I need to figure out if we're at a safepoint or not.
23-10-2015

The latest experiment: $ diff doit.ksh.save.03 doit.ksh 30a31 > DIAG_OPTIONS="$DIAG_OPTIONS -XX:+TraceBiasedLocking" 62,63d62 < -XX:+UnlockDiagnosticVMOptions \ < -XX:LogEventsBufferEntries=100 \ Definitely have some biased locking operations happening on the class for which we have an abandoned lock: $ grep 'java.util.stream.Nodes$ToArrayTask$OfDouble' doit.copy40.log.178 Revoking bias of object 0xfffffd7be9403240 , mark 0x0000000000abd005 , type java.util.stream.Nodes$ToArrayTask$OfDouble , prototype header 0x0000000000000005 , allow rebias 0 , requesting thread 0x0000000000abd000 (Skipping revocation of object of type java.util.stream.Nodes$ToArrayTask$OfDouble because it's no longer biased) Revoking bias of object 0xfffffd7be941c148 , mark 0x0000000000abd005 , type java.util.stream.Nodes$ToArrayTask$OfDouble , prototype header 0x0000000000000005 , allow rebias 0 , requesting thread 0x0000000000abd000 Revoking bias of object 0xfffffd7be942c080 , mark 0x0000000000abd005 , type java.util.stream.Nodes$ToArrayTask$OfDouble , prototype header 0x0000000000000005 , allow rebias 0 , requesting thread 0x0000000000abd000 INFO: unexpected locked object: - locked <0xfffffd7be942c080> (a java.util.stream.Nodes$ToArrayTask$OfDouble) What's interesting with this log output is that the target thread: # Internal Error (synchronizer.cpp:1705), pid=9760, tid=103 # fatal error: exiting JavaThread=0x0000000001a09000 unexpectedly owns ObjectMonitor=0x000000000162b600 does not appear in any of the biased locking messages that showed up. Taking a closer look at these two messages: Revoking bias of object 0xfffffd7be942c080 , mark 0x0000000000abd005 , type java.util.stream.Nodes$ToArrayTask$OfDouble , prototype header 0x0000000000000005 , allow rebias 0 , requesting thread 0x0000000000abd000 INFO: unexpected locked object: - locked <0xfffffd7be942c080> (a java.util.stream.Nodes$ToArrayTask$OfDouble) Our locked object 0xfffffd7be942c080 owned by JavaThread=0x0000000001a09000 was just bias revoked by requesting thread 0x0000000000abd000. Smoking gun? Update: from the hs_err_pid file: 0x0000000000abd000 JavaThread "MainThread" [_thread_blocked, id=80, stack(0xff fffd7fc0fe8000,0xfffffd7fc10e8000)]
22-10-2015

Resynced my local repos for this bug to this fix: Changeset: d83a5e8e97aa Author: ctornqvi Date: 2015-10-21 09:47 -0700 URL: http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/d83a5e8e97aa 8140243: [TESTBUG] Exclude compiler/jvmci/compilerToVM/GetConstantPoolTest.java Reviewed-by: gtriantafill, kvn Ran another experiment yesterday with "fast enter" enabled and increased the event buffers from the default 10 to 100. $ diff doit.ksh.save.02 doit.ksh 30c30 < #DIAG_OPTIONS="-XX:+UnlockExperimentalVMOptions -XX:SyncFlags=256 -XX:SyncKnobs=ReportSettings=1:Verbose=1:ExitRelease=1:VerifyMatch=1" --- > DIAG_OPTIONS="-XX:+UnlockExperimentalVMOptions -XX:SyncFlags=256 -XX:SyncKnobs=ReportSettings=1:Verbose=1:ExitRelease=1:VerifyMatch=1" 33c33 < DIAG_OPTIONS="-XX:+UnlockExperimentalVMOptions -XX:SyncKnobs=ReportSettings=1:Verbose=1:ExitRelease=1:VerifyMatch=1" --- > #DIAG_OPTIONS="-XX:+UnlockExperimentalVMOptions -XX:SyncKnobs=ReportSettings=1:Verbose=1:ExitRelease=1:VerifyMatch=1" 61a62,63 > -XX:+UnlockDiagnosticVMOptions \ > -XX:LogEventsBufferEntries=100 \ Instance #1 failed here: Loop #359...PASS Loop #360...FAIL Instance #2 failed here: Loop #1368...PASS Loop #1369...FAIL I was hoping to see deoptimization of java/util/concurrent/ForkJoinTask.setCompletion but that did not happen. Again, this was the last deopt event: instance #1: Event: 6.574 Thread 0x00000000010f8800 Uncommon trap: reason=unstable_if action= reinterpret pc=0xfffffd7ff24ea6d8 method=java.util.concurrent.ForkJoinPool.runWo rker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V @ 129 instance #2: Event: 18.986 Thread 0x00000000050e5000 Uncommon trap: reason=unstable_if action =reinterpret pc=0xfffffd7ff25184ac method=java.util.concurrent.ForkJoinPool.runW orker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V @ 129 And here's the last events from the event log: instance #1 Event: 4.567 Executing VM operation: RevokeBias Event: 4.567 Executing VM operation: RevokeBias done Event: 6.574 Thread 0x00000000010f8800 Uncommon trap: trap_request=0xffffff5d fr .pc=0xfffffd7ff24ea6d8 relative=0x00000000000009d8 Event: 6.574 Thread 0x00000000010f8800 DEOPT PACKING pc=0xfffffd7ff24ea6d8 sp=0x fffffd7fbf5cc720 Event: 6.574 Thread 0x00000000010f8800 DEOPT UNPACKING pc=0xfffffd7fea847be9 sp= 0xfffffd7fbf5cc718 mode 2 instance #2 Event: 16.980 Executing VM operation: RevokeBias Event: 16.981 Executing VM operation: RevokeBias done Event: 18.986 Thread 0x00000000050e5000 Uncommon trap: trap_request=0xffffff5d f r.pc=0xfffffd7ff25184ac relative=0x0000000000000bec Event: 18.986 Thread 0x00000000050e5000 DEOPT PACKING pc=0xfffffd7ff25184ac sp=0 xfffffd7fc00d7690 Event: 18.987 Thread 0x00000000050e5000 DEOPT UNPACKING pc=0xfffffd7fea847be9 sp =0xfffffd7fc00d7698 mode 2 The increased size of the event logs let me see that there were a lot of biased locking operations. The next experiment is to see if I can get more biased locking diagnostics.
22-10-2015

Interesting things from doit.copy11.hs_err_pid.844: # Internal Error (synchronizer.cpp:1707), pid=1252, tid=94 # fatal error: exiting JavaThread=0x0000000004a0a800 unexpectedly owns ObjectMonitor=0x0000000001886000 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2015_09_30_16_33-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_09_30_16_33-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) --------------- T H R E A D --------------- Current thread (0x0000000004a0a800): JavaThread "ForkJoinPool.commonPool-worker-5" daemon [_thread_in_vm, id=94, stack(0xfffffd7fc03da000,0xfffffd7fc04da000)] Stack: [0xfffffd7fc03da000,0xfffffd7fc04da000], sp=0xfffffd7fc04d6e60, free space=1011k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1341ae4] void VMError::report(outputStream*,bool)+0xcf4 V [libjvm.so+0x1343246] void VMError::report_and_die(int,const char*,const char*,__va_list_element*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned long)+0x596 V [libjvm.so+0x1342c4f] void VMError::report_and_die(Thread*,const char*,int,const char*,const char*,__va_list_element*)+0x3f V [libjvm.so+0xab12fb] void report_fatal(const char*,int,const char*,...)+0xdb V [libjvm.so+0x124a1b8] void ReleaseJavaMonitorsClosure::do_monitor(ObjectMonitor*)+0xf8 V [libjvm.so+0x1249c12] void ObjectSynchronizer::release_monitors_owned_by_thread(Thread*)+0xc2 V [libjvm.so+0x1294a99] void JavaThread::exit(bool,JavaThread::ExitType)+0x4d9 V [libjvm.so+0x1294427] void JavaThread::thread_main_inner()+0x217 V [libjvm.so+0x12941f2] void JavaThread::run()+0x232 V [libjvm.so+0x10c4b40] java_start+0x230 C [libc.so.1+0xdd9db] _thr_setup+0x5b C [libc.so.1+0xddc10] _lwp_start+0x0 C 0x0000000000000000 --------------- P R O C E S S --------------- Java Threads: ( => current thread ) =>0x0000000004a0a800 JavaThread "ForkJoinPool.commonPool-worker-5" daemon [_thread_in_vm, id=94, stack(0xfffffd7fc03da000,0xfffffd7fc04da000)] Deoptimization events (10 events): Event: 13.772 Thread 0x0000000004a0a800 Uncommon trap: reason=unstable_if action=reinterpret pc=0xfffffd7ff2754364 method=java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V @ 47 Events (10 events): Event: 13.772 Thread 0x0000000004a0a800 Uncommon trap: trap_request=0xffffff5d fr.pc=0xfffffd7ff2754364 Event: 13.772 Thread 0x0000000004a0a800 DEOPT PACKING pc=0xfffffd7ff2754364 sp=0xfffffd7fc04d9540 Event: 13.772 Thread 0x0000000004a0a800 DEOPT UNPACKING pc=0xfffffd7feaa47be9 sp=0xfffffd7fc04d9530 mode So here's the code we just deoptimized: jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java /** * Top-level runloop for workers, called by ForkJoinWorkerThread.run. */ final void runWorker(WorkQueue w) { w.growArray(); // allocate queue int seed = w.hint; // initially holds randomization hint int r = (seed == 0) ? 1 : seed; // avoid 0 for xorShift for (ForkJoinTask<?> t;;) { if ((t = scan(w, r)) != null) w.runTask(t); else if (!awaitWork(w, r)) break; r ^= r << 13; r ^= r >>> 17; r ^= r << 5; // xorshift } } However, if we stop compiling this code, we no longer hang: jdk/src/java.base/share/classes/java/util/concurrent/ForkJoinTask.java /** * Marks completion and wakes up threads waiting to join this * task. * * @param completion one of NORMAL, CANCELLED, EXCEPTIONAL * @return completion status on exit */ private int setCompletion(int completion) { for (int s;;) { if ((s = status) < 0) return s; if (U.compareAndSwapInt(this, STATUS, s, s | completion)) { if ((s >>> 16) != 0) synchronized (this) { notifyAll(); } return completion; } } } I just went back to the -XX:CompileCommand=exclude,foo experiments and it doesn't look like we ever tried to exclude compiling java.util.concurrent.ForkJoinPool.runWorker Need to resync the repos and see if the bug still repros after the big JDK9-hs-comp push.
21-10-2015

I ran two parallel test jobs using locally built bits on the Solaris X64 server in my lab (same machine as before). To avoid overwriting the sample logs in the kit, I named these runs "copy20" and "copy21". This run did not include the -XX:SyncFlags=256 option which disables the "fast enter" optimization: $ diff doit.ksh.save.01 doit.ksh 30c30 < DIAG_OPTIONS="-XX:+UnlockExperimentalVMOptions -XX:SyncFlags=256 -XX:SyncKnobs=ReportSettings=1:Verbose=1:ExitRelease=1:VerifyMatch=1" --- > #DIAG_OPTIONS="-XX:+UnlockExperimentalVMOptions -XX:SyncFlags=256 -XX:SyncKnobs=ReportSettings=1:Verbose=1:ExitRelease=1:VerifyMatch=1" 31a32,34 > # Use these settings to test the baseline: > DIAG_OPTIONS="-XX:+UnlockExperimentalVMOptions -XX:SyncKnobs=ReportSettings=1:Verbose=1:ExitRelease=1:VerifyMatch=1" I let these two instances run for 20+ days: $ elapsed_times doit.ksh doit_loop.copy20.log doit.ksh 0 seconds doit_loop.copy20.log 20 days 45 minutes 17 seconds $ elapsed_times doit.ksh doit_loop.copy21.log doit.ksh 0 seconds doit_loop.copy21.log 20 days 45 minutes 40 seconds And each instance ran for more than 42K iterations without a failure: $ tail doit_loop.copy20.log Loop #42643...PASS Loop #42644...PASS Loop #42645...PASS Loop #42646...PASS Loop #42647...PASS Loop #42648...PASS Loop #42649...PASS Loop #42650...PASS Loop #42651...PASS Loop #42652... $ tail doit_loop.copy21.log Loop #42648...PASS Loop #42649...PASS Loop #42650...PASS Loop #42651...PASS Loop #42652...PASS Loop #42653...PASS Loop #42654...PASS Loop #42655...PASS Loop #42656...PASS Loop #42657...
21-10-2015

Used the 8077392_repro.zip kit to retest RT_Baseline as of 2015-09-30 where the last push was for this bug fix: Changeset: 983c56341c80 Author: brutisso Date: 2015-09-30 09:07 +0200 URL: http://hg.openjdk.java.net/jdk9/hs-rt/hotspot/rev/983c56341c80 8134953: Make the GC ID available in a central place Reviewed-by: pliden, jmasa Here's the changeset IDs for that JPRT job: $ cat SourceTips.txt .:34280222936a jdk:8a9a7b1a3210 jaxp:497bc2654e11 corba:ca8a17195884 jaxws:bdb954839363 closed:57176e80ab18 hotspot:983c56341c80 nashorn:678db05f13ba langtools:8e76163b3f3a jdk/src/closed:59bd18af2265 jdk/make/closed:54d0705354f2 jdk/test/closed:de2be51ab426 hotspot/src/closed:3329566526f7 hotspot/make/closed:d70cd66cf2f4 hotspot/test/closed:5524c847f372 I ran two parallel test jobs using locally built bits on the Solaris X64 server in my lab (same machine as before). To avoid overwriting the sample logs in the kit, I named these runs "copy10" and "copy11". $ tail -3 doit_loop.copy10.log Loop #539...PASS Loop #540...PASS Loop #541...FAIL $ tail -3 doit_loop.copy11.log Loop #842...PASS Loop #843...PASS Loop #844...FAIL Here's copy10's assertion failure: INFO: unexpected locked object: - locked <0xfffffd7bfc15fd48> (a java.util.stream.Nodes$CollectorTask$OfLong) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:1707), pid=16287, tid=88 # fatal error: exiting JavaThread=0x0000000003906000 unexpectedly owns ObjectMonitor=0x000000000230f600 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2015_09_30_16_33-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_09_30_16_33-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) # Core dump will be written. Default location: /work/shared/bug_hunt/8077392_for_jdk9_hs_rt/8077392_repro/core or core.16287 # # An error report file with more information is saved as: # /work/shared/bug_hunt/8077392_for_jdk9_hs_rt/8077392_repro/hs_err_pid16287.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # doit.ksh[45]: 16287 Abort(coredump) + status=134 + echo status=134 status=134 + [ 134 = 95 ] + exit 134 and here's copy11's assertion failure: INFO: unexpected locked object: - locked <0xfffffd7bea17aed0> (a java.util.stream.Nodes$ToArrayTask$OfDouble) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:1707), pid=1252, tid=94 # fatal error: exiting JavaThread=0x0000000004a0a800 unexpectedly owns ObjectMonitor=0x0000000001886000 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-dcubed_2015_09_30_16_33-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-dcubed_2015_09_30_16_33-b00, mixed mode, tiered, compressed oops, g1 gc, solaris-amd64) # Core dump will be written. Default location: /work/shared/bug_hunt/8077392_for_jdk9_hs_rt/8077392_repro/core or core.1252 # # An error report file with more information is saved as: # /work/shared/bug_hunt/8077392_for_jdk9_hs_rt/8077392_repro/hs_err_pid1252.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # doit.ksh[45]: 1252 Abort(coredump) + status=134 + echo status=134 status=134 + [ 134 = 95 ] + exit 134 As is normal for this bug, the classes being locked by the two test runs are different: copy10: INFO: unexpected locked object: - locked <0xfffffd7bfc15fd48> (a java.util.stream.Nodes$CollectorTask$OfLong) copy11: INFO: unexpected locked object: - locked <0xfffffd7bea17aed0> (a java.util.stream.Nodes$ToArrayTask$OfDouble) Here is copy10's crashing thread stack: --------------- T H R E A D --------------- Current thread (0x0000000003906000): JavaThread "ForkJoinPool.commonPool-worker -8" daemon [_thread_in_vm, id=88, stack(0xfffffd7fc09e0000,0xfffffd7fc0ae0000)] Stack: [0xfffffd7fc09e0000,0xfffffd7fc0ae0000], sp=0xfffffd7fc0add0e0, free sp ace=1012k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1341ae4] void VMError::report(outputStream*,bool)+0xcf4 V [libjvm.so+0x1343246] void VMError::report_and_die(int,const char*,const cha r*,__va_list_element*,Thread*,unsigned char*,void*,void*,const char*,int,unsigne d long)+0x596 V [libjvm.so+0x1342c4f] void VMError::report_and_die(Thread*,const char*,int,c onst char*,const char*,__va_list_element*)+0x3f V [libjvm.so+0xab12fb] void report_fatal(const char*,int,const char*,...)+0xdb V [libjvm.so+0x124a1b8] void ReleaseJavaMonitorsClosure::do_monitor(ObjectMoni tor*)+0xf8 V [libjvm.so+0x1249bf1] void ObjectSynchronizer::release_monitors_owned_by_thr ead(Thread*)+0xa1 V [libjvm.so+0x1294a99] void JavaThread::exit(bool,JavaThread::ExitType)+0x4d9 V [libjvm.so+0x1294427] void JavaThread::thread_main_inner()+0x217 V [libjvm.so+0x12941f2] void JavaThread::run()+0x232 V [libjvm.so+0x10c4b40] java_start+0x230 C [libc.so.1+0xdd9db] _thr_setup+0x5b C [libc.so.1+0xddc10] _lwp_start+0x0 C 0x0000000000000000 Here is copy11's crashing thread stack: --------------- T H R E A D --------------- Current thread (0x0000000004a0a800): JavaThread "ForkJoinPool.commonPool-worker -5" daemon [_thread_in_vm, id=94, stack(0xfffffd7fc03da000,0xfffffd7fc04da000)] Stack: [0xfffffd7fc03da000,0xfffffd7fc04da000], sp=0xfffffd7fc04d6e60, free sp ace=1011k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1341ae4] void VMError::report(outputStream*,bool)+0xcf4 V [libjvm.so+0x1343246] void VMError::report_and_die(int,const char*,const cha r*,__va_list_element*,Thread*,unsigned char*,void*,void*,const char*,int,unsigne d long)+0x596 V [libjvm.so+0x1342c4f] void VMError::report_and_die(Thread*,const char*,int,c onst char*,const char*,__va_list_element*)+0x3f V [libjvm.so+0xab12fb] void report_fatal(const char*,int,const char*,...)+0xdb V [libjvm.so+0x124a1b8] void ReleaseJavaMonitorsClosure::do_monitor(ObjectMoni tor*)+0xf8 V [libjvm.so+0x1249c12] void ObjectSynchronizer::release_monitors_owned_by_thr ead(Thread*)+0xc2 V [libjvm.so+0x1294a99] void JavaThread::exit(bool,JavaThread::ExitType)+0x4d9 V [libjvm.so+0x1294427] void JavaThread::thread_main_inner()+0x217 V [libjvm.so+0x12941f2] void JavaThread::run()+0x232 V [libjvm.so+0x10c4b40] java_start+0x230 C [libc.so.1+0xdd9db] _thr_setup+0x5b C [libc.so.1+0xddc10] _lwp_start+0x0 C 0x0000000000000000
01-10-2015

Back to you Dan. I did not and will not have time for this in near future.
14-08-2015

C1 generated code has code corresponding to line 272. C2 code has uncommon trap instead (so it is deoptimized when we hit that code): 0xfffffd7ff5e87cc3: shr $0x10,%r10d ;*iushr ; - java.util.concurrent.ForkJoinTask::setCompletion@31 (line 271) 0xfffffd7ff5e87cc7: test %r10d,%r10d 0xfffffd7ff5e87cca: jne 0xfffffd7ff5e87d1f ;*ifeq ; - java.util.concurrent.ForkJoinTask::setCompletion@32 (line 271) 0xfffffd7ff5e87d1f: mov $0xffffff5d,%esi 0xfffffd7ff5e87d24: mov %r8,%rbp 0xfffffd7ff5e87d27: mov %edx,(%rsp) 0xfffffd7ff5e87d2a: mov %r10d,0x4(%rsp) 0xfffffd7ff5e87d2f: callq 0xfffffd7fee2479e0 ; OopMap{rbp=Oop off=212} ;*ifeq ; - java.util.concurrent.ForkJoinTask::setCompletion@32 (line 271) ; {runtime_call UncommonTrapBlob}
16-07-2015

I have a repro kit ready for someone on the Compiler team to check out. See the attached 8077392_repro.zip file. $ more READ_ME.repro /dev/null :::::::::::::: READ_ME.repro :::::::::::::: Repro kit for the following bug: JDK-8077392 Stream fork/join tasks occasionally fail to complete https://bugs.openjdk.java.net/browse/JDK-8077392 This repro kit depends on code from the following fix: JDK-8130448 thread dump improvements, comment additions, new diagnostics inspired by 8077392 https://bugs.openjdk.java.net/browse/JDK-8130448 READ_ME.repro This file. do_jtreg.ksh This script was used in my environment to create the initial sucessful run of the following test: jdk/test/java/util/stream/test/org/openjdk/tests/java/util/stream/ToArrayOpT est.java Because the hang failure is very intermittent, it was useful to get a baseline (passing) run from which a script could be created. The JAVA_HOME, JTREG and TEST_DIR variables in do_jtreg.ksh will need to be tuned up for your environment. doit.ksh This script is used to run the test with a minimal amount of infrastructure overhead. This script does depend on do_jtreg.ksh having been run successfully at least one time to generate the .class files. The JAVA_HOME and JTREG variables in doit.ksh need to be tuned up for your environment. The DIAG_OPTIONS variable can be changed to use different diagnostic options from JDK-8130448. The current DIAG_OPTIONS setting calls fatal() on unmatched Java monitor at JavaThread exit. Arguments to the script are passed to the "java" cmd that is used to run the test, e.g.: $ ksh doit.ksh -server The doit.ksh script was created from the "rerun" section of the .jtr file for the test. A "tail -25" is good for getting most if not all of the "rerun" section so it can be adapted into a usable script. doit_loop.ksh This script is used for running one or more copies of doit.ksh in a loop, e.g.: $ ksh doit_loop.ksh doit.copy1 -server > doit_loop.copy1.log 2>&1 $ ksh doit_loop.ksh doit.copy2 -server > doit_loop.copy2.log 2>&1 launches two copies of the test. The first copy invokes doit.ksh with the output going into doit.copy1.log and the second invokes doit.ksh with the output going into doit.copy2.log. If a failure occurs, then artifacts are saved like this: doit.copy1.core.17 doit.copy1.hs_err_pid.17 doit.copy1.log.17 So run #17 crashed and we have a core file, an hs_err_pid file and the test run's output. hsdis-amd64.so The hsdis binary for Solaris X64 machines. The remaining artifacts are examples: doit.log Output from a run of the doit.ksh script. doit.copy1.hs_err_pid.144 doit.copy1.log.144 doit_loop.copy1.log Output from the copy1 run of doit_loop.ksh: $ ksh doit_loop.ksh doit.copy1 -server -XX:-TieredCompilation \ -XX:CompileCommand=print,java/util/concurrent/ForkJoinTask.setCompletion \ > doit_loop.copy1.log 2>&1 doit.copy2.hs_err_pid.30 doit.copy2.log.30 doit_loop.copy2.log Output from the copy2 run of doit_loop.ksh: $ ksh doit_loop.ksh doit.copy2 -server -XX:-TieredCompilation \ -XX:CompileCommand=print,java/util/concurrent/ForkJoinTask.setCompletion \ > doit_loop.copy2.log 2>&1 JTwork/ The JTwork directory from a run of the do_jtreg.ksh script.
16-07-2015

Added logs from 2015.07.03 and 2015.07.04 failures with these options: -XX:CompileCommand=print,java/util/concurrent/ForkJoinTask.setCompletion -XX:+UnlockDiagnosticVMOptions -XX:+GuaranteeOnMonitorMismatch -XX:+JavaThreadExitReleasesMonitors -XX:+VerboseStackTrace -rw-r--r-- 1 dcubed green 32005 Jul 3 17:08 doit.extract_03_0_0.hs_err_pid.log.96 -rw-r--r-- 1 dcubed green 225070 Jul 3 17:11 doit.extract_03_0_0.log.96 -rw-r--r-- 1 dcubed green 31274 Jul 4 02:44 doit.extract_03_1_0.hs_err_pid.log.2328 -rw-r--r-- 1 dcubed green 63190 Jul 4 02:47 doit.extract_03_1_0.log.2328 The hs_err_pid files show the GuaranteeOnMonitorMismatch failures along with info about the failing thread. The event dumps in the hs_err_pid files show the last events for the failing thread. The test run output logs show the "-XX:CompileCommand=print,..." output combined with the regular test output.
07-07-2015

Added logs from 2015.07.06 failures with these options: -XX:-TieredCompilation -XX:CompileCommand=print,java/util/concurrent/ForkJoinTask.setCompletion -XX:+UnlockDiagnosticVMOptions -XX:+GuaranteeOnMonitorMismatch -XX:+JavaThreadExitReleasesMonitors -XX:+VerboseStackTrace -rw-r--r-- 1 dcubed green 29428 Jul 6 16:40 doit.extract_03_1_1.hs_err_pid.39 -rw-r--r-- 1 dcubed green 56036 Jul 6 16:42 doit.extract_03_1_1.log.39 -rw-r--r-- 1 dcubed green 29844 Jul 6 18:51 doit.extract_03_0_1.hs_err_pid.499 -rw-r--r-- 1 dcubed green 52585 Jul 6 18:54 doit.extract_03_0_1.log.499 The hs_err_pid files show the GuaranteeOnMonitorMismatch failures along with info about the failing thread. The event dumps in the hs_err_pid files show the last events for the failing thread. The test run output logs show the "-XX:CompileCommand=print,..." output combined with the regular test output. Due to the '-XX:-TieredCompilation' option, we only show code generation for C2 (as expected). The code dump only shows code for lines 268, 270 and 271: $ grep 'line ' doit.extract_03_1_1.log.39 | sort | uniq -c 1 ; - java.util.concurrent.ForkJoinTask::setCompletion@-1 (line 268) 1 ; - java.util.concurrent.ForkJoinTask::setCompletion@1 (line 268) 1 ; - java.util.concurrent.ForkJoinTask::setCompletion@22 (line 270) 3 ; - java.util.concurrent.ForkJoinTask::setCompletion@25 (line 270) 1 ; - java.util.concurrent.ForkJoinTask::setCompletion@31 (line 271) 3 ; - java.util.concurrent.ForkJoinTask::setCompletion@32 (line 271) 2 ; - java.util.concurrent.ForkJoinTask::setCompletion@6 (line 268) The code dump does not include anything for this line: 272 synchronized (this) { notifyAll(); } which is the code where we lose the "monitor exit". Don't know if the code dumper is broken or if there is something about the code that prevents it from being dumped.
07-07-2015

If disabling tiered didn't help then the analysis regarding the C1->C2 dance isn't hitting the key part. :( What is observed regarding deopt etc when only C1 or C2 are involved? This is looking like a compiler issue regardless :) And to me this highlights we need a better way to validate what the compiler is doing.
03-07-2015

The -XX:-TieredCompilation didn't help: - instance #0 crashed at iteration 155: guarantee(!GuaranteeOnMonitorMismatch) failed: exiting JavaThread=0x000000000090e800 unexpectedly owns ObjectMonitor=0x0000000000da2680 - instance #1 crashed at iteration 91: guarantee(!GuaranteeOnMonitorMismatch) failed: exiting JavaThread=0x0000000000c75800 unexpectedly owns ObjectMonitor=0x0000000000c1b180
03-07-2015

Did a longer experiment with the following options: -XX:CompileCommand=exclude,"java/util/concurrent/ForkJoinTask.setCompletion" - instance #0 ran for 3319 runs without a hang - instance #1 ran for 3320 runs without a hang Next experiment is -XX:-TieredCompilation.
03-07-2015

So this should disappear with tiered compilation disabled.
02-07-2015

Spent some time crawling through doit-setCompletion.PrintAssembly. Looks like there is a C1 version of ForkJoinTask.setCompletion() and then there is a C2 version of ForkJoinTask.setCompletion() and then we go back to a C1 version of ForkJoinTask.setCompletion(). It looks like the C2 version of ForkJoinTask.setCompletion() is incomplete since there doesn't seem to be any code for lines 272 or 273. My guess here: - the VM compiles up a C1 version of ForkJoinTask.setCompletion() - the VM decides to optimize it more and starts to compile up a C2 version of ForkJoinTask.setCompletion() - the C2 hits an uncommon trap and we deopt - the VM compiles up a C1 version of ForkJoinTask.setCompletion() In the hs_err_pid files for the !GuaranteeOnMonitorMismatch crashes, I tend to see events like these for the exiting JavaThread: Deoptimization events (10 events): <snip> Event: 22.518 Thread 0x0000000000c1d800 Uncommon trap: reason=unstable_if action =reinterpret pc=0xfffffd7ff5efb854 method=java.util.concurrent.ForkJoinPool.runW orker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V @ 47 Internal exceptions (10 events): <snip> Event: 7.477 Thread 0x0000000000c1d800 Implicit null exception at 0xfffffd7ff5f2 e1a1 to 0xfffffd7ff5f2e4f7 <snip> Events (10 events): <snip> Event: 22.518 Thread 0x0000000000c1d800 Uncommon trap: trap_request=0xffffff5d fr.pc=0xfffffd7ff5efb854 Event: 22.518 Thread 0x0000000000c1d800 DEOPT PACKING pc=0xfffffd7ff5efb854 sp=0xfffffd7fbf4058b0 Event: 22.518 Thread 0x0000000000c1d800 DEOPT UNPACKING pc=0xfffffd7fee247a69 sp=0xfffffd7fbf405890 mode 2
02-07-2015

This is the version of ForkJoin.java that I've been using: $ hg log src/java.base/share/classes/java/util/concurrent/ForkJoinTask.java | head -5 changeset: 10593:7af64e3e095d user: dl date: Fri Sep 05 10:54:28 2014 +0200 summary: 8056248: Improve ForkJoin thread throttling And here is the code for setCompletion(): src/java.base/share/classes/java/util/concurrent/ForkJoinTask.java: 259 /** 260 * Marks completion and wakes up threads waiting to join this 261 * task. 262 * 263 * @param completion one of NORMAL, CANCELLED, EXCEPTIONAL 264 * @return completion status on exit 265 */ 266 private int setCompletion(int completion) { 267 for (int s;;) { 268 if ((s = status) < 0) 269 return s; 270 if (U.compareAndSwapInt(this, STATUS, s, s | completion)) { 271 if ((s >>> 16) != 0) 272 synchronized (this) { notifyAll(); } 273 return completion; 274 } 275 } 276 } See doit-setCompletion.PrintAssembly for a test run log file with these options: -XX:CompileCommand=print,java/util/concurrent/ForkJoinTask.setCompletion I stripped out the "^config " and "^test " lines to get rid of most of test output: $ wc -l doit-setCompletion.log doit-setCompletion.PrintAssembly 4698 doit-setCompletion.log 912 doit-setCompletion.PrintAssembly 5610 total Just a few thousand lines of test output.
01-07-2015

java.util.concurrent.ForkJoinTask.externalAwaitDone() is the method where the MainThread gets stuck. Looking at that code, there are several methods that use the same lock. Here's the results of -XX:CompileCommand=exclude,foo experiments: -XX:CompileCommand=exclude,"java/util/concurrent/ForkJoinTask.setCompletion" - no hangs in 2 X 1050 iterations -XX:CompileCommand=exclude,"java/util/concurrent/ForkJoinTask.internalWait" - parallel instance #0 crashed in run #6: guarantee(!GuaranteeOnMonitorMismatch) failed: exiting JavaThread=0x0000000002650000 unexpectedly owns ObjectMonitor=0x000000000264df80 - parallel instance #1 stopped at run #18 with no failures -XX:CompileCommand=exclude,"java/util/concurrent/ForkJoinTask.externalAwaitDone" - parallel instance #1 crashed in run #70: guarantee(!GuaranteeOnMonitorMismatch) failed: exiting JavaThread=0x000000000307b000 unexpectedly owns ObjectMonitor=0x00000000027b9b80 - parallel instance #0 stopped at run #207 with no failures -XX:CompileCommand=exclude,"java/util/concurrent/ForkJoinTask.externalInterruptibleAwaitDone" - parallel instance #0 crashed in run #43: guarantee(!GuaranteeOnMonitorMismatch) failed: exiting JavaThread=0x0000000001286800 unexpectedly owns ObjectMonitor=0x0000000001123200 - parallel instance #1 stopped at run #210 with no failures -XX:CompileCommand=exclude,"java/util/concurrent/ForkJoinTask.get" - parallel instance #0 crashed in run #248: guarantee(!GuaranteeOnMonitorMismatch) failed: exiting JavaThread=0x0000000002cb8000 unexpectedly owns ObjectMonitor=0x00000000022eca00 - parallel instance #1 crashed in run #331: guarantee(!GuaranteeOnMonitorMismatch) failed: exiting JavaThread=0x0000000000c1d800 unexpectedly owns ObjectMonitor=0x0000000000f60f80 Looks like this method is where we go next: java/util/concurrent/ForkJoinTask.setCompletion
01-07-2015

Added a bunch of tracing and diagnostics to the various pieces of the Java monitor subsystem in order to figure what code paths are being used so they can be examined in more detail. Squirreled away information in both the JavaThread and the ObjectMonitor as needed. Tried to squirrel away info in Object, but that was a disaster; too much code knows about the size of Object or more accurately where "things" are in Object. I organized the debug code using #ifdef's. The remainder of this note is just a reminder to myself about the different pieces of the tracing. $ hg diff src/share/vm/utilities/globalDefinitions.hpp diff -r 56e01852fed6 src/share/vm/utilities/globalDefinitions.hpp --- a/src/share/vm/utilities/globalDefinitions.hpp Mon Apr 27 09:02:41 2015 -0700 +++ b/src/share/vm/utilities/globalDefinitions.hpp Tue Jun 30 08:35:14 2015 -0600 @@ -24,6 +24,37 @@ #ifndef SHARE_VM_UTILITIES_GLOBALDEFINITIONS_HPP #define SHARE_VM_UTILITIES_GLOBALDEFINITIONS_HPP + +// #ifndef DCUBED_COMMON_DEBUG // common debug code shared by all the API specific pieces, e.g., dcubed_ticket +// #define DCUBED_COMMON_DEBUG +// #endif +// #ifndef DCUBED_BL_DEBUG // Biased Locking code path tracing and debugging +// #define DCUBED_BL_DEBUG +// #endif +// #ifndef DCUBED_BOLX_DEBUG // BasicObjectLock and BasicLock code path tracing and debugging +// #define DCUBED_BOLX_DEBUG +// #endif +// #ifndef DCUBED_OME_DEBUG // ObjectMonitor::enter() code path tracing and debugging +// #define DCUBED_OME_DEBUG +// #endif +// #ifndef DCUBED_OMEI_DEBUG // ObjectMonitor::EnterI() code path tracing and debugging +// #define DCUBED_OMEI_DEBUG +// #endif +// #ifndef DCUBED_OMN_DEBUG // ObjectMonitor::notify() code path tracing and debugging +// #define DCUBED_OMN_DEBUG +// #endif +// #ifndef DCUBED_OMNA_DEBUG // ObjectMonitor::notifyAll() code path tracing and debugging +// #define DCUBED_OMNA_DEBUG +// #endif +// #ifndef DCUBED_OMREI_DEBUG // ObjectMonitor::ReenterI() code path tracing and debugging +// #define DCUBED_OMREI_DEBUG +// #endif +// #ifndef DCUBED_OMW_DEBUG // ObjectMonitor::wait() code path tracing and debugging +// #define DCUBED_OMW_DEBUG +// #endif +// #ifndef DCUBED_OMX_DEBUG // ObjectMonitor::exit() code path tracing and debugging +// #define DCUBED_OMX_DEBUG +// #endif #ifndef __STDC_FORMAT_MACROS #define __STDC_FORMAT_MACROS The tracing and debugging I've implemented so far is by no means complete. My tactic was to put in the basics for each API and then add/squirrel away more as I saw which code paths were interesting. Here's a sample blurb of the tracing output just before hitting the GuaranteeOnMonitorMismatch crash: config java.util.stream.LoggingTestCase.before(): success XXX - JavaThread end: name='ForkJoinPool.commonPool-worker-13', addr=0x00000000011f6800 dcubed_bolx=0xfffffd7fbfa0b2d0, dcubed_bolx_blx=0xfffffd7fbfa0b2d0, dcubed_bolx_call_path=1e7d, dcubed_bolx_ticket=80, m->dcubed_bolx_basic_lock=0xfffffd7fbf4056f0, m->dcubed_bolx_ticket=71, m->dcubed_bolx_thread=0x0000000001321000 dcubed_ome_ticket=45, dcubed_ome_call_path=0x8, dcubed_ome_loop_cnt=0, dcubed_ome_thread=0x0000000000d98800, dcubed_ome_C2_ticket=69, dcubed_ome_C2_thread=0x0000000001321000 dcubed_omei_ticket=0, dcubed_omei_call_path=0x0, dcubed_omei_loop_cnt=0, dcubed_omei_thread=0x0000000000000000 dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 dcubed_omna_ticket=78, dcubed_omna_call_path=0xd, dcubed_omna_loop_cnt=2, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x00000000011f6800, dcubed_omna_target_thread=0x00000000008c3000 dcubed_omrei_ticket=75, dcubed_omrei_call_path=0x23, dcubed_omrei_loop_cnt=1, dcubed_omrei_thread=0x00000000008c3000 dcubed_omw_ticket=77, dcubed_omw_call_path=0x3, dcubed_omw_thread=0x00000000008c3000 dcubed_omx_ticket=76, dcubed_omx_call_path=0x30, dcubed_omx_loop_cnt=1, dcubed_omx_thread=0x00000000008c3000, dcubed_omx_C2_thread=0x00000000008c3000 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:1978), pid=980, tid=0x0000000000000044 # guarantee(!GuaranteeOnMonitorMismatch) failed: exiting JavaThread=0x000000000 11f6800 unexpectedly owns ObjectMonitor=0x00000000012d4480 # And here's what the debug info tells me: > config java.util.stream.LoggingTestCase.before(): success This is the last successful test case, but that varies because this bug hits in different places. XXX - JavaThread end: name='ForkJoinPool.commonPool-worker-13', addr=0x00000000011f6800 This thread started its exit and that's when the guarantee() blew. All thread starts and ends are tracked. > dcubed_omei_ticket=0, dcubed_omei_call_path=0x0, dcubed_omei_loop_cnt=0, dcubed_omei_thread=0x0000000000000000 > dcubed_omn_ticket=0, dcubed_omn_call_path=0x0, dcubed_omn_eq=0, dcubed_omn_calling_thread=0x0000000000000000, dcubed_omn_target_thread=0x0000000000000000 These two APIs have no traces in this run. > dcubed_bolx=0xfffffd7fbfa0b2d0, dcubed_bolx_blx=0xfffffd7fbfa0b2d0, dcubed_bolx_call_path=1e7d, dcubed_bolx_ticket=80, m->dcubed_bolx_basic_lock=0xfffffd7fbf4056f0, m->dcubed_bolx_ticket=71, m->dcubed_bolx_thread=0x0000000001321000 This trace tells me that the interpreter code for monitor exit was last used by ticket #71 for the target monitor which is before the transactions below. The most recent ticket for _this thread_ is ticket #80 which means that this thread did a monitor operation of some sort after it failed to exit the target monitor. dcubed_ome_ticket=45, dcubed_ome_call_path=0x8, dcubed_ome_loop_cnt=0, dcubed_ome_thread=0x0000000000d98800, dcubed_ome_C2_ticket=69, dcubed_ome_C2_thread=0x0000000001321000 This trace tells me that ObjectMonitor::enter() (the slow path) was last used for the target monitor by ticket #45 which is well before the transactions below. The C2 portion of the trace tells me that the C2 portion of the monitor-enter code path was last used for the target monitor by ticket #69. That is very likely the "enter" that matches up the "notify all" below. If I had added interpreter level tracing for the "enter" code path, that might have given me a later ticket number, but that's not essential. dcubed_omrei_ticket=75, dcubed_omrei_call_path=0x23, dcubed_omrei_loop_cnt=1, dcubed_omrei_thread=0x00000000008c3000 dcubed_omx_ticket=76, dcubed_omx_call_path=0x30, dcubed_omx_loop_cnt=1, dcubed_omx_thread=0x00000000008c3000, dcubed_omx_C2_thread=0x00000000008c3000 dcubed_omw_ticket=77, dcubed_omw_call_path=0x3, dcubed_omw_thread=0x00000000008c3000 The previous three traces tell me that the main thread (0x00000000008c3000) reentered the monitor (after being notified) and then wait'ed on the monitor. The "exit" trace has an earlier ticket # than the wait ticket because an "exit" is part of the "wait"... dcubed_omna_ticket=78, dcubed_omna_call_path=0xd, dcubed_omna_loop_cnt=2, dcubed_omna_eq=1, dcubed_omna_calling_thread=0x00000000011f6800, dcubed_omna_target_thread=0x00000000008c3000 This is the last transaction before the target JavaThread tries to exit and we crash. This trace tells me that our target thread did a notifyAll on the main thread (0x00000000008c3000). The "dcubed_omna_eq=1" tells me that a thread is queued up to "enter" the monitor. The lack of any other exit trace information for the crashing thread (0x00000000011f6800) strongly indicates that the Java "monitor exit" was lost.
30-06-2015

I ran an over night experiment with two parallel runs with the new options enabled. The experiment ran for ~12.5 hours and I saw 6 of the new assertion failures: $ grep -c FAIL doit_loop.diag_09_0.log doit_loop.diag_09_1.log doit_loop.diag_09_0.log:2 doit_loop.diag_09_1.log:4 $ tail doit_loop.diag_09_0.log Loop #794...PASS Loop #795...PASS Loop #796...PASS Loop #797...PASS Loop #798...PASS Loop #799...PASS Loop #800...PASS Loop #801...PASS Loop #802...PASS Loop #803... $ tail doit_loop.diag_09_1.log Loop #746...PASS Loop #747...PASS Loop #748...PASS Loop #749...PASS Loop #750...PASS Loop #751...PASS Loop #752...PASS Loop #753...PASS Loop #754...PASS Loop #755...
18-06-2015

As part of the hunt for this bug, I've added a couple of new options: $ hg diff src/share/vm/runtime/globals.hpp diff -r 56e01852fed6 src/share/vm/runtime/globals.hpp --- a/src/share/vm/runtime/globals.hpp Mon Apr 27 09:02:41 2015 -0700 +++ b/src/share/vm/runtime/globals.hpp Thu Jun 18 08:27:26 2015 -0600 @@ -583,6 +583,9 @@ class CommandLineFlags { product(bool, JavaMonitorsInStackTrace, true, \ "Print information about Java monitor locks when the stacks are" \ "dumped") \ + \ + diagnostic(bool, VerboseStackTrace, false, \ + "Print extra information when the stacks are dumped") \ \ product_pd(bool, UseLargePages, \ "Use large page memory") \ @@ -2609,6 +2612,12 @@ class CommandLineFlags { product(bool, DisplayVMOutputToStdout, false, \ "If DisplayVMOutput is true, display all VM output to stdout") \ \ + diagnostic(bool, GuaranteeOnMonitorMismatch, false, \ + "Guarantee on monitor mismatch detected at JavaThread exit") \ + \ + product(bool, JavaThreadExitReleasesMonitors, false, \ + "JavaThread exit() releases monitors owned by thread") \ + \ product(bool, UseHeavyMonitors, false, \ "use heavyweight instead of lightweight Java monitors") \ \ VerboseStackTrace (default false) enables the extra information in stack traces that David C added via: JDK-8069412 Locks need better debug-printing support I've made a few additional tweaks to David's work in this area. JavaThreadExitReleasesMonitors (default false) builds on the logic that supports the JNIDetachReleasesMonitors option, but the logic is applied to all exiting JavaThreads and not just detaching JNI threads. This option provides a work around for any future "lost exit" bugs. GuaranteeOnMonitorMismatch (diagnostic and default false) causes a guarantee() failure and a resulting crash when a JavaThread attempts to exit when holding a Java monitor. This last option is provided to catch cases where a Java monitor has a "lost exit", but another thread doesn't attempt to enter the monitor again so no failure/hang is seen. Here's some example output for a run of the test associated with this bug with the JavaThreadExitReleasesMonitors and GuaranteeOnMonitorMismatch options enabled: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (synchronizer.cpp:1835), pid=23425, tid=0x000000000000003d # guarantee(!GuaranteeOnMonitorMismatch) failed: exiting JavaThread=0x00000000 00d4f800 unexpectedly owns ObjectMonitor=0x0000000001346400 # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-2015 0428030352.sspitsyn.8073705-b00) # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-8077392_diag_dcubed-product mixed mode solaris-amd64 compressed oops) # Core dump will be written. /work/shared/bugs/8077392/core or core.23425 # # An error report file with more information is saved as: # /work/shared/bugs/8077392/hs_err_pid23425.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # doit.ksh[11]: 23425 Abort(coredump) + status=134 + echo status=134 status=134 + [ 134 = 95 ] + exit 134 The support for these new options will likely be spun off into a separate bug to enable easier backporting.
18-06-2015

I've done a detailed crawl through the Biased Locking implementation and ran across some code that made me suspicious: src/share/vm/runtime/biasedLocking.cpp: 531 BiasedLocking::Condition BiasedLocking::revoke_and_rebias(Handle obj, bo ol attempt_rebias, TRAPS) { <snip> 593 HeuristicsResult heuristics = update_heuristics(obj(), attempt_rebias) ; 594 if (heuristics == HR_NOT_BIASED) { 595 return NOT_BIASED; 596 } else if (heuristics == HR_SINGLE_REVOKE) { 597 Klass *k = obj->klass(); 598 markOop prototype_header = k->prototype_header(); 599 if (!UseNewCode2 && mark->biased_locker() == THREAD && 600 prototype_header->bias_epoch() == mark->bias_epoch()) { 601 // A thread is trying to revoke the bias of an object biased 602 // toward it, again likely due to an identity hash code 603 // computation. We can again avoid a safepoint in this case 604 // since we are only going to walk our own stack. There are no 605 // races with revocations occurring in other threads because we 606 // reach no safepoints in the revocation path. 607 // Also check the epoch because even if threads match, another thr ead 608 // can come in with a CAS to steal the bias of an object that has a 609 // stale epoch. 610 ResourceMark rm; 611 if (TraceBiasedLocking) { 612 tty->print_cr("Revoking bias by walking my own stack:"); 613 } 614 BiasedLocking::Condition cond = revoke_bias(obj(), false, false, ( JavaThread*) THREAD); 615 ((JavaThread*) THREAD)->set_cached_monitor_info(NULL); 616 assert(cond == BIAS_REVOKED, "why not?"); 617 return cond; 618 } else { The following line in the comment caught my eye: 603 .... We can again avoid a safepoint in this case 604 // since we are only going to walk our own stack. The problem I have with this comment is that this call: 614 BiasedLocking::Condition cond = revoke_bias(obj(), false, false, ( JavaThread*) THREAD); makes changes to the obj()'s markOop and I'm not convinced that is safe in all fourteen of the calls to BiasedLocking::revoke_and_rebias(). I made the following debugging change: $ hg diff src/share/vm/runtime/biasedLocking.cpp diff -r 56e01852fed6 src/share/vm/runtime/biasedLocking.cpp --- a/src/share/vm/runtime/biasedLocking.cpp Mon Apr 27 09:02:41 2015 -0700 +++ b/src/share/vm/runtime/biasedLocking.cpp Fri Jun 12 11:12:26 2015 -0600 @@ -596,7 +596,7 @@ BiasedLocking::Condition BiasedLocking:: } else if (heuristics == HR_SINGLE_REVOKE) { Klass *k = obj->klass(); markOop prototype_header = k->prototype_header(); - if (mark->biased_locker() == THREAD && + if (!UseNewCode2 && mark->biased_locker() == THREAD && prototype_header->bias_epoch() == mark->bias_epoch()) { // A thread is trying to revoke the bias of an object biased // toward it, again likely due to an identity hash code and ran an experiment where the 'UseNewCode2' was _not_ set in two parallel runs. The hang reproduced in run #165 in the second instance; the first instance passed 318 runs without a hang before I stopped the experiment. I ran an experiment with '-XX:+UseNewCode2' specified for two parallel runs. Each instance in that experiment is approaching 3000 runs without a hang. I'm allowing the experiment to continue while I run further experiments to determine the exact scenario that makes the above block of code unsafe. Update: Parallel run #1 finished with 7944 iterations without a hang. Parallel run #2 finished with 7946 iterations without a hang. I had re-enabled my jct-tools/JTREG mirror update and when it switched from 4.1-B10 -> 4.1-B12 there was a hiccup due to an incompatible change that killed the parallel runs. No worries. These runs executed for more than 2X the 3000 runs that Amy was using to determine reproducibility. Now if I could just come up with an explainable scenario all would be good.
17-06-2015

Sorry nothing specific - this code just looks wrong if not executed at a safepoint.
17-06-2015

We're on the same page here. I've been analyzing and experimenting with this code path for a few days now. Haven't been able to nail down a scenario where the mark word update gets messed up which is bugging me to no end (pun intended)... Do you have some suggestions for "other non-safepoint updates to the mark word" that I should examine in closer detail? Maybe you've thought of one that I haven't...
15-06-2015

The second part of that caught my eye: 604 // since we are only going to walk our own stack. There are no 605 // races with revocations occurring in other threads because we 606 // reach no safepoints in the revocation path. There may not be any races with revocations, but are there races with other non-safepoint updates to the mark word? If so then revoke_bias seems broken because it assumes the mark word may have changed since the decision to call revoke_bias was made, but that subsequent changes are not possible - ie it seems to assume being called at a safepoint (which seems to be true for all call paths except this one).
15-06-2015

After reading through the Java monitor code for the umteenth time, I kicked off an experiment with these bits: java version "1.9.0-internal" Java(TM) SE Runtime Environment (build 1.9.0-internal-20150428030352.sspitsyn.8073705-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-20150428030352.sspitsyn.8073705-b00, mixed mode) and this option: -XX:-UseBiasedLocking. The first instance ran for 1000 iterations without a hang and the second instance ran for 1000 iterations without a hang. Time to take a much closer look at how the "fast enter" bucket interacts with BiasedLocking...
19-05-2015

Checking out the debugging code added by David C via: JDK-8069412 Locks need better debug-printing support Looks like the new debug-printing does not cover the case where the lock was entered and then Object.wait() was called. Checking it out more closely... Reading JDK-8069412 led me back to the bug that motivated the new debug-printing support: JDK-8066576 Lock still held Very interesting reading and still unresolved... Update: JDK-8066576 mentions using -XX:-EliminateNestedLocks as a work around for that bug. While working on the new debug-printing code, I launched an experiment using -XX:-EliminateNestedLocks in two parallel runs. The second instance hung in run #247 and the first instance passed 521 runs before being stopped. So this bug is not an instance of having a nested lock eliminated in an unbalanced fashion. Update: I also tried an experiment with both -XX:-EliminateLocks and -XX:-EliminateNestedLocks specified in two parallel runs. The first instance hung in run #180 and the second instance hung in run #367. So this bug is not an instance of having a lock coarsened.
08-05-2015

I tweaked the updated debug-printing logic a bit more and added some additional tracing for "thread start" and "thread end" code paths. I ran an experiment with two parallel runs. The first instance hung in run #156 with this "MainThread" stack trace snippet: "MainThread" #23 prio=5 os_prio=64 tid=0x00000000008a8800 nid=0x30 in Object.wait() [0xfffffd7fc1073000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <no locals available> at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334) - waiting to relock <0xfffffd7e7463f108> (a java.util.stream.Nodes$CollectorTask$OfLong) - lockbits= monitor(0x000000000130d902)={count=0x0000000000000000,waiters=0x0000000000000001,recursions=0x0000000000000000,owner=0x0000000001437000} at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) So the monitor that "MainThread" is waiting for is owned by a JavaThread with the address of: owner=0x0000000001437000. Here's a grep for that address: $ grep 0x0000000001437000 doit.prod_0_0_12.log.17696 XXX - JavaThread start: name='ForkJoinPool.commonPool-worker-14', addr=0x0000000001437000 XXX - JavaThread end: name='ForkJoinPool.commonPool-worker-14', addr=0x0000000001437000 - lockbits= monitor(0x000000000130d902)={count=0x0000000000000000,waiters=0x0000000000000001,recursions=0x0000000000000000,owner=0x0000000001437000} The second instance hung in run #848 with this "MainThread" stack trace snippet: "MainThread" #23 prio=5 os_prio=64 tid=0x00000000008c1000 nid=0x30 in Object.wait() [0xfffffd7fc1073000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <no locals available> at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334) - waiting to relock <0xfffffd7e75b51d88> (a java.util.stream.Nodes$SizedCollectorTask$OfInt) - lockbits= monitor(0x00000000018c1102)={count=0x0000000000000000,waiters=0x0000000000000001,recursions=0x0000000000000000,owner=0x0000000000dcd000} at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) So the monitor that "MainThread" is waiting for is owned by a JavaThread with the address of: owner=0x0000000000dcd000. Here's a grep for that address: $ grep 0x0000000000dcd000 doit.prod_0_1_12.log.10706 XXX - JavaThread start: name='ForkJoinPool.commonPool-worker-27', addr=0x0000000000dcd000 XXX - JavaThread end: name='ForkJoinPool.commonPool-worker-27', addr=0x0000000000dcd000 - lockbits= monitor(0x00000000018c1102)={count=0x0000000000000000,waiters=0x0000000000000001,recursions=0x0000000000000000,owner=0x0000000000dcd000} So in both hangs, the Java monitor is owned by a JavaThread that has exited the building.
08-05-2015

I've made a few more debug-printing info fixes along the same lines as: JDK-8069412 Locks need better debug-printing support with that new info, the hung "MainThread" looks like this: "MainThread" #23 prio=5 os_prio=64 tid=0x00000000008af000 nid=0x30 in Object.wait() [0xfffffd7fc1073000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <no locals available> at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334) - waiting to relock <0xfffffd7e72dccef8> (a java.util.stream.Nodes$ToArrayTask$OfDouble) - lockbits= locked(0x00000000029f0f82)->monitor={count=0x0000000000000000,waiters=0x0000000000000001,recursions=0x0000000000000000,owner=0x0000000002b2e 000} at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) This line: - waiting on <no locals available> is new and reports that we couldn't find local in the vframe for Object.wait(); that's because Object.wait() is native. This line is changed: - waiting to relock <0xfffffd7e72dccef8> (a java.util.stream.Nodes$ToArrayTask$OfDouble) It would have reported "locked" because the old stack trace code did not properly recognize monitors that were released by wait() call in a lower number vframe. In this line: lockbits= locked(0x00000000029f0f82)->monitor={count=0x0000000000000000,waiters=0x0000000000000001,recursions=0x0000000000000000,owner=0x0000000002b2e The "locked(XXX)" portition is wrong. That should be "monitor(XXX)" since what we have is an inflated monitor. Just a small bug in Dave C's original code for monitor debug-printing. The "owner=0x0000000002b2e" value refers to a JavaThread that is no longer running or somehow the owner field got set to garbage... This is why the thread is stuck.
07-05-2015

More on the local object reporting in the stack trace... I wrote a test that: - In worker thread, call thisObject.myWaitFunc() where myWaitFunc() synchronizes on 'this' and calls wait() - In the main() thread, delay until the worker is waiting, synchronize on 'this', call thisObject.notify() and sleep. The purpose of the test is to cause the worker thread to enter the BLOCKED state after being notified in wait() so that I can get a stack trace. External to the program, I send SIGQUIT to get a stack trace. Here's the snippet from an -Xint run: "Thread-0" #6 prio=5 os_prio=64 tid=0x0000000000609000 nid=0x20 in Object.wait() [0xfffffd7fc1e37000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xfffffd7e6a2af730> (a WaitOnSynchronizedThis) at java.lang.Object.wait(Object.java:508) at WaitOnSynchronizedThis.myWaitFunc(WaitOnSynchronizedThis.java:52) - locked <0xfffffd7e6a2af730> (a WaitOnSynchronizedThis) at WaitOnSynchronizedThis$1.run(WaitOnSynchronizedThis.java:13) Here's the snippet from an -Xcomp run: "Thread-0" #22 prio=5 os_prio=64 tid=0x0000000000872000 nid=0x30 in Object.wait( ) [0xfffffd7fc0e17000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:508) at WaitOnSynchronizedThis.myWaitFunc(WaitOnSynchronizedThis.java:52) - locked <0xfffffd7e6a2b6ec0> (a WaitOnSynchronizedThis) at WaitOnSynchronizedThis$1.run(WaitOnSynchronizedThis.java:13) With -Xint, even after being notified in the wait() call, we can still report what kind of object was being wait()'ed for. -Xcomp loses that information.
07-05-2015

While working on the new debug-printing code, I noticed a couple of things. Here's a snippet from our hung thread: "MainThread" #23 prio=5 os_prio=64 tid=0x000000000090f800 nid=0x30 in Object.wait() [0xffff80ffb3129000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334) - locked <0x00000000f797c750> (a java.util.stream.Nodes$ToArrayTask$OfLong) at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405) and here's a snippet from the regular Java main() thread: "main" #1 prio=5 os_prio=64 tid=0x0000000000420000 nid=0x2 in Object.wait() [0xffff80ffbf19e000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000e000f390> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1249) - locked <0x00000000e000f390> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1323) There are a couple of things to notice: 1) the Thread states are different: java.lang.Thread.State: BLOCKED (on object monitor) java.lang.Thread.State: WAITING (on object monitor) What the "BLOCKED" value for "MainThread" means is that thread has been notified via Object.notify() so it is trying to reacquire the Java monitor. 2) the local object in the Object.wait() frame isn't reported: Here's the "main" thread reporting that it is waiting on java.lang.Thread: at java.lang.Object.wait(Native Method) - waiting on <0x00000000e000f390> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1249) and here's what "MainThread" reports: at java.lang.Object.wait(Native Method) at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334) This _could_ be because the Java monitor has been notified. More investigation is needed.
07-05-2015

Forgot to record some test results from Friday (05.01). In this experiment, (QuickEnterBailPoint == 0) which is the default value and I added the following options: -XX:+UnlockExperimentalVMOptions -XX:SyncFlags=1 The '-XX:SyncFlags=1' option causes all park() operations to use a small timer value so it is a good way to check for lost unpark() operations. For this experiment, the first instance passed 389 iterations without a hang and the second instance hung during iteration #13. This strongly indicates that the hang is _not_ due to a lost unpark() in the successor algorithm.
06-05-2015

Ran another experiment using the next bucket in the Contended Locking project (thanks to Mary for suggesting the experiment). The "fast wait/notify" bucket completes the picture with the earlier "fast enter" and "fast exit" buckets. In this experiment, the project bits were rebased to JDK9-B60 + RT_Baseline as of Friday (04.17) morning (8042901 changeset). This was just after the "fast exit" bucket was pushed. The QuickEnterBailPoint debug option is not present in these bits so it was not used in the experiment. For this experiment, the first instance hung during iteration # 54 and the second instance passed 1100 iterations without a hang. This indicates that the "fast wait/notify" bucket does not contain any code to alleviate the race present in the "fast enter" bucket.
06-05-2015

@Amy - Thanks for the additional info. Right now I have good reproducibility of this bug on my Solaris X64 server and I'm looking for the root cause.
04-05-2015

@David - The "hack" isn't in the fix for 8061553. It was in the diagnostic code that I temporarily added under the QuickEnterBailPoint option. In this code blob: +// detect fast ownership grab but use slow path with this bail point +if (QuickEnterBailPoint == 3) { +m->set_owner(NULL); // drop ownership +return false; +} The set_owner() call isn't the right way to safely drop ownership. However, I never saw a hang with QuickEnterBailPoint == 3 so I got away with the hack for the purposes of my diagnostic.
04-05-2015

To make sure that "backout 8061553" test then works fine, we did more testing. After these testing (results mentioned at) http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2015-April/014554.html : * Latest dev build: Reproduced the issue, test timed out 4 times at run #48 #143 #1877 #2231 in total 3000 runs * Backout 8061553 changeset from above build: 3000 runs all pass. We tested again using above build of "backout 8061553 changeset", run continually for 6 days, runs completed 20,000 (at Linux x64) and 19,019 (at Solaris x64). All passed, No failure found.
04-05-2015

Picking up on: " that likely means that the hack that I used to release the lock (without a memory barrier) is bogus " Not withstanding the "hack" doesn't seem to be to blame I can not see this "hack" in the fix for 8061553 ??What was being referred to?
04-05-2015

The fourth experiment has QuickEnterBailPoint == 3 which means bail out when we've successfully grabbed an unowned lock; if this experiment does not hang, then that means that this optimized lock grab code path is the problem; if this experiment hangs, then that likely means that the hack that I used to release the lock (without a memory barrier) is bogus :-) $ hg diff src/share/vm/runtime/synchronizer.cpp diff -r 56e01852fed6 src/share/vm/runtime/synchronizer.cpp --- a/src/share/vm/runtime/synchronizer.cpp Mon Apr 27 09:02:41 2015 -0700 +++ b/src/share/vm/runtime/synchronizer.cpp Fri May 01 09:41:42 2015 -0600 <snip> @@ -179,6 +183,11 @@ bool ObjectSynchronizer::quick_enter(oop Atomic::cmpxchg_ptr(Self, &(m->_owner), NULL) == NULL) { assert(m->_recursions == 0, "invariant"); assert(m->_owner == Self, "invariant"); +// detect fast ownership grab but use slow path with this bail point +if (QuickEnterBailPoint == 3) { +m->set_owner(NULL); // drop ownership +return false; +} return true; } } Update: QuickEnterBailPoint == 3 did not hang in either instance. The first instance passed 402 runs and the second instance passed 401 runs. This tells me that the optimized lock grab code is the problem.
01-05-2015

This morning's hang generated with locally built bits looks more consistent. Here's what the SIGQUIT thread dump shows: "MainThread" #23 prio=5 os_prio=64 tid=0x00000000008ca000 nid=0x30 in Object.wai t() [0xfffffd7fc1073000] java.lang.Thread.State: BLOCKED (on object monitor) at java.lang.Object.wait(Native Method) at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java :334) - locked <0xfffffd7e740f7ad0> (a java.util.stream.Nodes$CollectorTask$Of Double) at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at java.util.stream.Nodes.collectDouble(Nodes.java:442) at java.util.stream.DoublePipeline.evaluateToNode(DoublePipeline.java:13 9) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:564) at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipelin e.java:255) at java.util.stream.DoublePipeline.toArray(DoublePipeline.java:508) at org.openjdk.tests.java.util.stream.ToArrayOpTest.lambda$testDoubleOps WithFilter$87(ToArrayOpTest.java:386) at org.openjdk.tests.java.util.stream.ToArrayOpTest$$Lambda$144/92481751 4.apply(Unknown Source) at java.util.stream.OpTestCase$BaseTerminalTestScenario.run(OpTestCase.j ava:404) at java.util.stream.OpTestCase$ExerciseDataTerminalBuilder.exercise(OpTe stCase.java:528) at java.util.stream.OpTestCase.exerciseTerminalOps(OpTestCase.java:565) at org.openjdk.tests.java.util.stream.ToArrayOpTest.testDoubleOpsWithFil ter(ToArrayOpTest.java:386) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:502) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocat ionHelper.java:84) at org.testng.internal.Invoker.invokeMethod(Invoker.java:714) at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901) at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWork er.java:127) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111) at org.testng.TestRunner.privateRun(TestRunner.java:767) at org.testng.TestRunner.run(TestRunner.java:617) at org.testng.SuiteRunner.runTest(SuiteRunner.java:334) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291) at org.testng.SuiteRunner.run(SuiteRunner.java:240) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224) at org.testng.TestNG.runSuitesLocally(TestNG.java:1149) at org.testng.TestNG.run(TestNG.java:1057) at com.sun.javatest.regtest.TestNGAction$TestNGRunner.main(TestNGAction. java:161) at com.sun.javatest.regtest.TestNGAction$TestNGRunner.main(TestNGAction. java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:502) at com.sun.javatest.regtest.MainWrapper$MainThread.run(MainWrapper.java: 94) at java.lang.Thread.run(Thread.java:745) Here's what "jstack -F -m -l" reports for the same thread: ----------------- t@48 ----------------- 0xfffffd7fff2935ea ___lwp_cond_wait + 0xa 0xfffffd7ffe893ec9 void os::PlatformEvent::park() + 0x79 0xfffffd7ffe86cd51 void ObjectMonitor::wait(long,bool,Thread*) + 0x331 0xfffffd7ffea1c957 int ObjectSynchronizer::wait(Handle,long,Thread*) + 0x10 7 0xfffffd7ffe53d7a7 JVM_MonitorWait + 0x227 0xfffffd7ff5ed22c0 <Unknown compiled code> 0xfffffd7ff5f0e3a4 * java.util.concurrent.ForkJoinTask.externalAwaitDone() bci:86 line:334 (Compiled frame) * java.util.concurrent.ForkJoinTask.doInvoke() bci:46 line:405 (Compiled frame) Locked ownable synchronizers: - None Here's what dbx reports: THREAD t@48 t@48(l@48) stopped in ___lwp_cond_wait at 0xfffffd7fff2935ea 0xfffffd7fff2935ea: ___lwp_cond_wait+0x000a: jae ___lwp_cond_wait+0x18 [ 0xfffffd7fff2935f8, .+0xe ] current thread: t@48 [1] ___lwp_cond_wait(0x8c6848, 0x8c6830, 0x0, 0x0, 0xbcdc80, 0xfffffd7ffefa51f 8), at 0xfffffd7fff2935ea [2] __lwp_cond_wait(), at 0xfffffd7fff27873c =>[3] os::PlatformEvent::park(this = <value unavailable>) (optimized), at 0xffff fd7ffe893ec9 (line ~223) in "os_solaris.hpp" [4] ObjectMonitor::wait(this = 0x140b580, millis = 0, interruptible = true, __ the_thread__ = 0x8ca000) (optimized), at 0xfffffd7ffe86cd51 (line ~1513) in "obj ectMonitor.cpp" [5] ObjectSynchronizer::wait(obj = CLASS, millis = 0, __the_thread__ = 0x8ca00 0) (optimized), at 0xfffffd7ffea1c957 (line ~414) in "synchronizer.cpp" [6] JVM_MonitorWait(env = <value unavailable>, handle = <value unavailable>, m s = 0) (optimized), at 0xfffffd7ffe53d7a7 (line ~571) in "jvm.cpp" [7] 0xfffffd7ff5ed22c0(), at 0xfffffd7ff5ed22c0 [8] 0xfffffd7ff5ed22c0(), at 0xfffffd7ff5ed22c0 [9] 0xfffffd7ff5f0e3a4(), at 0xfffffd7ff5f0e3a4 Current function is Parker::park (optimized) 223 static int cond_wait(cond_t *cv, mutex_t *mx) { return _cond_wait(cv, mx); } These three views of the hung thread are pretty consistent with each other.
01-05-2015

For the fastdebug hang that I saw with these bits: java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b61) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-fastdebug-b61, mixed mode) I just realized I didn't record the info about the hanging thread. Here's what the SIGQUIT thread dump shows: "MainThread" #23 prio=5 os_prio=64 tid=0x0000000000bf1800 nid=0x30 in Object.wait() [0xfffffd7fbfe42000] java.lang.Thread.State: BLOCKED (on object monitor) JavaThread state: _thread_blocked Thread: 0x0000000000bf1800 [0x30] State: _at_safepoint _has_called_back 0 _at_poll_safepoint 0 JavaThread state: _thread_blocked at java.lang.Object.wait(Native Method) at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334) - locked <0xfffffd7e6aad3040> (a java.util.stream.Nodes$SizedCollectorTask$OfLong) at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at java.util.stream.Nodes.collectLong(Nodes.java:400) at java.util.stream.LongPipeline.evaluateToNode(LongPipeline.java:140) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:564) at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:255) at java.util.stream.LongPipeline.toArray(LongPipeline.java:486) at org.openjdk.tests.java.util.stream.ToArrayOpTest.lambda$testLongDistinctAndSortedPermutations$72(ToArrayOpTest.java:327) at org.openjdk.tests.java.util.stream.ToArrayOpTest$$Lambda$231/1303100427.apply(Unknown Source) at java.util.stream.OpTestCase$BaseTerminalTestScenario.run(OpTestCase.java:404) at java.util.stream.OpTestCase$ExerciseDataTerminalBuilder.exercise(OpTestCase.java:528) at java.util.stream.OpTestCase.exerciseTerminalOps(OpTestCase.java:565) at org.openjdk.tests.java.util.stream.ToArrayOpTest.testLongDistinctAndSortedPermutations(ToArrayOpTest.java:327) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:502) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:84) at org.testng.internal.Invoker.invokeMethod(Invoker.java:714) at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901) at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111) at org.testng.TestRunner.privateRun(TestRunner.java:767) at org.testng.TestRunner.run(TestRunner.java:617) at org.testng.SuiteRunner.runTest(SuiteRunner.java:334) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291) at org.testng.SuiteRunner.run(SuiteRunner.java:240) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224) at org.testng.TestNG.runSuitesLocally(TestNG.java:1149) at org.testng.TestNG.run(TestNG.java:1057) at com.sun.javatest.regtest.TestNGAction$TestNGRunner.main(TestNGAction.java:161) at com.sun.javatest.regtest.TestNGAction$TestNGRunner.main(TestNGAction.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:502) at com.sun.javatest.regtest.MainWrapper$MainThread.run(MainWrapper.java:94) at java.lang.Thread.run(Thread.java:745) Here's what "jstack -F -m -l" reports for the same thread: ----------------- t@48 ----------------- 0xfffffd7fff2935ea ___lwp_cond_wait + 0xa 0xfffffd7ffe0f6ac9 void os::PlatformEvent::park() + 0xd9 0xfffffd7ffe0c2754 void ObjectMonitor::wait(long,bool,Thread*) + 0x354 0xfffffd7ffe3149b3 int ObjectSynchronizer::wait(Handle,long,Thread*) + 0x3a 3 0xfffffd7ffdccc9d2 JVM_MonitorWait + 0x532 0xfffffd7ff4eeed67 <Unknown compiled code> 0xfffffd7ff4f2d9cc * java.util.concurrent.ForkJoinTask.externalAwaitDone() bci:86 line:334 (Compiled frame) * java.util.concurrent.ForkJoinTask.doInvoke() bci:46 line:405 (Compiled frame) Locked ownable synchronizers: - None Here's what dbx reports: THREAD t@48 t@48(l@48) stopped in ___lwp_cond_wait at 0xfffffd7fff2935ea 0xfffffd7fff2935ea: ___lwp_cond_wait+0x000a: jae ___lwp_cond_wait+0x18 [ 0xfffffd7fff2935f8, .+0xe ] current thread: t@48 =>[1] ___lwp_cond_wait(0xbf2150, 0xbf2138, 0x0, 0x0, 0x58, 0x1400000), at 0xffff fd7fff2935ea [2] __lwp_cond_wait(), at 0xfffffd7fff27873c [3] MutableSpace::object_iterate(), at 0xfffffd7ffe0f6ac9 [4] MethodHandles::init_MemberName(), at 0xfffffd7ffe0c2754 [5] SharedRuntime::generate_native_wrapper(), at 0xfffffd7ffe3149b3 [6] jni_CallCharMethodV(), at 0xfffffd7ffdccc9d2 [7] 0xfffffd7ff4eeed67(), at 0xfffffd7ff4eeed67 [8] 0xfffffd7ff4eeed67(), at 0xfffffd7ff4eeed67 [9] 0xfffffd7ff4f2d9cc(), at 0xfffffd7ff4f2d9cc [10] 0xfffffd7ff504decc(), at 0xfffffd7ff504decc Update: I'm not convinced that the stack trace from 'dbx' is correct.
01-05-2015

The third experiment has QuickEnterBailPoint == 2 which means bail out when we detect a recursive Java Monitor enter; if this experiment does not hang, then that means there is a problem with this optimized code path for a recursive enter. If this experiment does hang, the problem is with the next optimization. $ hg diff src/share/vm/runtime/synchronizer.cpp diff -r 56e01852fed6 src/share/vm/runtime/synchronizer.cpp --- a/src/share/vm/runtime/synchronizer.cpp Mon Apr 27 09:02:41 2015 -0700 +++ b/src/share/vm/runtime/synchronizer.cpp Fri May 01 09:41:42 2015 -0600 <snip> @@ -171,6 +173,8 @@ bool ObjectSynchronizer::quick_enter(oop // Case: TLE inimical operations such as nested/recursive synchronization if (owner == Self) { +// detect recursion but use slow path with this bail point +if (QuickEnterBailPoint == 2) return false; m->_recursions++; return true; } Update: QuickEnterBailPoint == 2 hung in run #8 in the first instance. The second instance was stopped after run #50 without any hangs. This tells me that the recursive Java Monitor enter code path is not the problem.
01-05-2015

The second experiment has QuickEnterBailPoint == 1 which means bail out after the asserts in ObjectSynchronizer::quick_enter(); if this experiment hangs, then the problem is in the infrastructure that was added to allow quick_enter() to be called. $ hg diff src/share/vm/runtime/synchronizer.cpp diff -r 56e01852fed6 src/share/vm/runtime/synchronizer.cpp --- a/src/share/vm/runtime/synchronizer.cpp Mon Apr 27 09:02:41 2015 -0700 +++ b/src/share/vm/runtime/synchronizer.cpp Thu Apr 30 16:54:38 2015 -0600 @@ -157,6 +157,8 @@ bool ObjectSynchronizer::quick_enter(oop assert(Self->is_Java_thread(), "invariant"); assert(((JavaThread *) Self)->thread_state() == _thread_in_Java, "invariant"); No_Safepoint_Verifier nsv; +// infrastructure changes only with this bail point: +if (QuickEnterBailPoint == 1) return false; if (obj == NULL) return false; // Need to throw NPE const markOop mark = obj->mark(); Update: QuickEnterBailPoint == 1 did not hang in either instance. The first instance passed 1101 runs and the second instance passed 1099 runs. This tells me that the infrastructure added to allow quick_enter() to be called is very likely not the culprit.
01-05-2015

None of the captured artifacts from my product bits and fastdebug bits failures have revealed anything new. I did a local build on mt-haku in order to generate full-debug/jvmg bits and started two parallel runs. One instance ran for 1126 runs and saw one assertion crash unrelated to this bug. The other instance ran for 1130 runs and saw no failures at all. I let that experiment run for about 48 hours. The changes in JDK-8061553 are small so now I'm adding a temporary debug flag to allow me to characterize the hang by bailing out of the optimization at different points. The first experiment has QuickEnterBailPoint == 0 which means no bail outs at all and that will tell me if my locally built bits can repro this failure. Update: QuickEnterBailPoint == 0 hung in run #134 in the second instance. The first instance was stopped in run #660 without any hangs. This tells me that the addition of the option checking code did not break reproducibility.
30-04-2015

Forgot to include version info for my repro runs on mt-haku. Product bits: java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b61) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b61, mixed mode) Fastdebug bits: java version "1.9.0-ea" Java(TM) SE Runtime Environment (build 1.9.0-ea-b61) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-fastdebug-b61, mixed mode) Note: The above is a fastdebug JVM dropped into the regular JDK.
28-04-2015

I was able to reproduce this hang on my local Solaris X64 server (mt-haku). I used this script to run the test once: :::::::::::::: do_jtreg.ksh :::::::::::::: set -x JAVA_HOME="/work/local/jdk/1.9.0_8077392_exp/solaris-x64" export JAVA_HOME JTREG="/java/re/jtreg/4.1/promoted/latest/binaries/jtreg/bin/jtreg" TEST_DIR="/work/shared/mirrors/src_clones/jdk9/dev_baseline/jdk/test" TEST_PATH="$TEST_DIR/java/util/stream/test/org/openjdk/tests/java/util/stream/To ArrayOpTest.java" $JTREG \ -jdk $JAVA_HOME \ -vmoptions:"-showversion $@" \ "$TEST_PATH" Running the test a second time seems to recognize that the objects were built in the first run so a second jtreg run is a bit faster than the first. The ToArrayOpTest.jtr has a repro section that I extracted into another script: :::::::::::::: doit.ksh :::::::::::::: set -x #JAVA_HOME="$1" JAVA_HOME="/work/local/jdk/1.9.0_8077392_exp/solaris-x64" export JAVA_HOME CLASSPATH="/java/re/jtreg/4.1/promoted/latest/binaries/jtreg/lib/javatest.jar:/j ava/re/jtreg/4.1/promoted/latest/binaries/jtreg/lib/jtreg.jar:/work/shared/bugs/ 8077392/JTwork/classes/java/util/stream/test:/work/shared/mirrors/src_clones/jdk 9/dev_baseline/jdk/test/java/util/stream/test" export CLASSPATH $JAVA_HOME/bin/java \ -Dtest.vm.opts=-showversion \ -Dtest.jdk=/work/local/jdk/1.9.0_8077392_exp/solaris-x64 \ -Dtest.timeout.factor=1.0 \ -Dtest.src.path=/work/shared/mirrors/src_clones/jdk9/dev_baseline/jdk/test/j ava/util/stream/test:/work/shared/mirrors/src_clones/jdk9/dev_baseline/jdk/test/ java/util/stream/bootlib \ -Dtest.compiler.opts= \ -Dcompile.jdk=/work/local/jdk/1.9.0_8077392_exp/solaris-x64 \ -Dtest.classes=/work/shared/bugs/8077392/JTwork/classes/java/util/stream/tes t \ -Dtest.class.path=/work/shared/bugs/8077392/JTwork/classes/java/util/stream/ test:/work/shared/bugs/8077392/JTwork/classes/java/util/stream/bootlib \ -Dtest.java.opts= \ -Dtest.src=/work/shared/mirrors/src_clones/jdk9/dev_baseline/jdk/test/java/u til/stream/test \ -Dtest.tool.vm.opts=-J-showversion \ -Xbootclasspath/a:/work/shared/bugs/8077392/JTwork/classes/java/util/stream/ bootlib:/java/re/jtreg/4.1/promoted/latest/binaries/jtreg/lib/testng.jar \ -showversion \ $@ \ com.sun.javatest.regtest.MainWrapper /work/shared/bugs/8077392/JTwork/classe s/java/util/stream/test/org.openjdk.tests.java.util.stream.ToArrayOpTest.jta jav a/util/stream/test/org/openjdk/tests/java/util/stream/ToArrayOpTest.java org.ope njdk.tests.java.util.stream.ToArrayOpTest status="$?" echo "status=$status" if [ "$status" = 95 ]; then status=0 fi exit "$status" My primary reason for a doit.ksh script is to be able to pass different flags to the invocation easily. My secondary reason is to remove jtreg from the equation so that I have fewer 'java' processes running which also means that I get more iterations in each loop due to lower overhead. :::::::::::::: doit_loop.ksh :::::::::::::: OUT_BASE="$1" shift 1 LOG="$OUT_BASE.log" count=0 while true; do echo "Loop #$count...\c" ksh doit.ksh "$@" > "$LOG" 2>&1 status="$?" if [ "$status" = 0 ]; then echo "PASS" else echo "FAIL" mv "$LOG" "$LOG.$count" if [ -f core ]; then mv core "$OUT_BASE.core.$count" fi mv hs_err_pid* "$OUT_BASE.hs_err_pid.$count" > /dev/null 2>&1 fi count=$(($count + 1)) done And the looper script that allows me to use a different log file per parallel invocation. I did notice that the testng-results.xml file is touched in each run, but I don't see any problems with running two invocations in the same directory (yet?). With the above scripts, I got a product bits hang in run #139: $ tail -5 doit_loop.prod.log Loop #135...PASS Loop #136...PASS Loop #137...PASS Loop #138...PASS Loop #139... and a fastdebug bits hang in run #723: $ tail -5 doit_loop.fast.log Loop #719...PASS Loop #720...PASS Loop #721...PASS Loop #722...PASS Loop #723... I've captured core files using gcore, truss output to see which threads are still active, jstack -F -m -l output and sent each hung VM a SIGQUIT to get the usual java stack trace. Analyzing the artifacts to see if I see anything interesting...
28-04-2015

On 4/23/15 7:01 AM, Amy Lu wrote: > On 4/23/15 8:09 PM, Daniel D. Daugherty wrote: >> I have a similar machine in my lab in Colorado so I'll initially >> investigate the failure there. Can you attach a zip archive of >> a standalone copy of the ToArrayOpTest test with a script to run >> it to JDK-8077392? That would help me get up and running on this >> issue more quickly. > > Hi, Dan > > Thanks for looking into this issue. > > ToArrayOpTest is a jtreg/testng tests with dependency on other test libraries. > The easiest way to run is just run it by jtreg, and this is the way I used for this testing. It is possible to run this test directly by testng but we initially try to simulate the normal run and see whether it is reproducible. > > The script used for this testing: > > #!/bin/sh > SANDBOX=/scratch/aurora/sandbox_keepme > TESTBASE=$SANDBOX/testbase/jdk/test > TEST_JAVA=$SANDBOX/jdk > > JT_HOME=$SANDBOX/jtreg > JT_JAVA=$SANDBOX/stable_jdk/bin/java > export JT_HOME JT_JAVA > > run(){ > RST=$SANDBOX/results/$1 > > $SANDBOX/jtreg/bin/jtreg \ > -agentvm -a -ea -esa -v:fail,error,time -retain:fail,error -ignore:quiet \ > -timeoutFactor:10 \ > -J-Xmx512m -vmoption:-Xmx512m \ > -w:$RST -r:$RST \ > -exclude:$TESTBASE/ProblemList.txt -exclude:$TESTBASE/closed/ProblemList.txt \ > -jdk:$TEST_JAVA \ > $TESTBASE/java/util/stream/test/org/openjdk/tests/java/util/stream/ToArrayOpTest.java > > } > > n=3000 > i=1 > while [ $i -le $n ] > do > echo "======= $i =======" > run $i > i=`expr $i + 1` > done > > Not sure whether this is what you want ... That is _exactly_ what I want! Thanks!! > You could also use the machines (*.us) we do this testing as everything already hosted there. Please let me know if you'd like to use them. Will try with my own H/W first and then let you know if I need your H/W. Since a similar bug happens on MacOS X, I suspect that this is not too Xeon model specific so I should be able to get some hangs in my lab... Dan > > Thanks, > Amy > > >
23-04-2015

On 4/23/15 6:31 AM, Paul Sandoz wrote: > On Apr 23, 2015, at 2:09 PM, Daniel D. Daugherty <daniel.daugherty@oracle.com> wrote: >>> The testing was done on two machines, Linux and Solaris, and got similar results. Before drill down to b52/b53, we actually also tested b55, b59 and both could reproduce the issue. >> Glad that it reproduces on Solaris X64 since that's what I'm >> running on my big server in my lab. Does this reproduce on >> Solaris SPARC or MacOS X? > Such failures have been observed on Macs too. See attachments of duplicate issue: > > https://bugs.openjdk.java.net/browse/JDK-8076626 > > Paul. So pretty much all X86/X64 except for Win*. Interesting. If it also happens on Solaris SPARC it might be something on the new fast enter algorithm instead of the X86/X64 specific code... Dunno... yet... Dan
23-04-2015

On 4/23/15 1:45 AM, Paul Sandoz wrote: > Hi Dan, > > On Apr 22, 2015, at 10:43 PM, "Daniel D. Daugherty" <daniel.daugherty@oracle.com> wrote: > >> Paul, >> >> Thanks for letting me know about this potential issue with JDK-8061553. >> >> Is there an open JBS for this issue? > See: > > https://bugs.openjdk.java.net/browse/JDK-8077392 Thanks for the bug ref. I'm planning to assign the bug to myself and move it to 'hotspot/runtime'. Please let me know if that messes up the way your team manages bugs... >> If not can you file one with >> a reproducible test case and some instructions on how to run it? > See Amy's email on this thread for details. It does not appear hard to reproduce but it is time consuming to do so. That's pretty typical for Java monitor races so I'm rather used to that. > It takes many executions of a parallel stream pipeline for a failure to be observed. So far i have tried and failed to reproduce reliably with a simpler test case that runs in a shorter time frame. The other Fork/Join hang that you mentioned: http://cs.oswego.edu/pipermail/concurrency-interest/2015-April/014240.html Do you know if there is a stack trace for it? > Alexey suggests two things: > > 1) running the tests with a fast debug build and see if an assert fails; and Yup. That's in the plan... > 2) developing some jcstress test for wait/notify. Don't know what 'jcstress' is, but I'm guessing we have a new stress test harness... We have a Doug Lea program called CallTimerGrid that we use for stress testing and I've been using it to stress test each of the Contended Locking buckets as they get ready for review. JDK-8061553 passed those runs (2-3 hours for product bits and 60-72 hours for fastdebug bits) so I'm guessing that CallTimerGrid does not easily tickle this issue. Dan > > Paul.
23-04-2015

On 4/22/15 11:19 PM, Amy Lu wrote: > Here I���m providing test results in details. Thanks for the details! > We picked up Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, 32 processors machine, one stream test ToArrayOpTest for this testing. Normally this test takes ~22 seconds to complete. We used longer enough timeout so believe the ���timeout��� show in the testing is a real hang. I have a similar machine in my lab in Colorado so I'll initially investigate the failure there. Can you attach a zip archive of a standalone copy of the ToArrayOpTest test with a script to run it to JDK-8077392? That would help me get up and running on this issue more quickly. > * JDK9/b52: 3000 runs all pass. > * JDK9/b53: Reproduced the issue, test timed out 4 times at run #596 #978 #988 #1290 in total 1568 runs. > > From the changesets that were integrated into b53 we identified JDK-8061553 as a possible cause, and tested the latest dev build: > > * Latest dev build: Reproduced the issue, test timed out 4 times at run #48 #143 #1877 #2231 in total 3000 runs > * Backout 8061553 changeset from above build: 3000 runs all pass. > > The testing was done on two machines, Linux and Solaris, and got similar results. Before drill down to b52/b53, we actually also tested b55, b59 and both could reproduce the issue. Glad that it reproduces on Solaris X64 since that's what I'm running on my big server in my lab. Does this reproduce on Solaris SPARC or MacOS X? It's OK if you haven't tried it there since I think I have enough info to get started. Dan > > Thanks, > Amy > > On 4/23/15 12:31 AM, Paul Sandoz wrote: >> Hi, >> >> Amy and I think we have identified an issue in hotspot that <snip>
23-04-2015

On 4/22/15 10:31 AM, Paul Sandoz wrote: > Hi, > > Amy and I think we have identified an issue in hotspot that only very occasionally results in non-termination of parallel stream execution. Specifically non-termination of stream fork/join tasks. Such failures, when running jtreg stream tests, manifest themselves as timeouts with jstack trace output like the following: > > "MainThread" #23 prio=5 os_prio=0 tid=0x00007f10a4183800 nid=0x5a6e in Object.wait() [0x00007f103e2a0000] > java.lang.Thread.State: BLOCKED (on object monitor) > at java.lang.Object.wait(Native Method) > at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:334) > - locked <0x00000000fc1c1aa8> (a java.util.stream.Nodes$SizedCollectorTask$OfRef) > at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405) > at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) > at java.util.stream.Nodes.collect(Nodes.java:325) > at java.util.stream.ReferencePipeline.evaluateToNode(ReferencePipeline.java:109) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:564) > at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:255) > at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438) > at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:444) > at java.util.stream.StreamTestScenario$12._run(StreamTestScenario.java:144) > at java.util.stream.StreamTestScenario.run(StreamTestScenario.java:220) > at java.util.stream.OpTestCase$ExerciseDataStreamBuilder.exercise(OpTestCase.java:349) > at java.util.stream.OpTestCase.exerciseOpsMulti(OpTestCase.java:114) > at java.util.stream.OpTestCase.exerciseOpsInt(OpTestCase.java:136) > at org.openjdk.tests.java.util.stream.MapOpTest.testOps(MapOpTest.java:74) > > i.e. a main f/j task is waiting for decedents to complete. > > Amy has been doing a lot of testing (since the failure happens very occasionally) and can provide more details on that and the results. I will provide some specific details below. > > By a process of elimination we could reproduce the failure in JDK 9 b53 but not in b52. From the changesets that were integrated into b53 we identified JDK-8061553 as a possible cause: > > Contended Locking fast enter bucket > https://bugs.openjdk.java.net/browse/JDK-8061553 > http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/30137e7eef47 > > We tested with a latest dev build with (naturally) and without that changeset. So far we can reproduce the issue with the former, but not with the latter. > > This indicates the changeset for JDK-8061553 is the likely cause, however i really don't know why this would be the case. Expert advice very much appreciated! Very nice job in narrowing this down. I concur that it is very likely that JDK-8061553: - directly introduced the hang as part of the optimization or - exposed a pre-existing problem due to the optimization Dan > > -- > > Separately there is another issue with Fork/Join: > > http://cs.oswego.edu/pipermail/concurrency-interest/2015-April/014240.html > > At the moment i don't think the two are connected (the latter issue has been present since 8u40), but perhaps there is a combination of factors here. So we will also run some tests with a workaround: > > http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/ForkJoinPool.java?r1=1.240&r2=1.241 > > just to rule this out. > > Paul. >
23-04-2015

This issue and JDK-8076626 (stream StreamSpliteratorTest.java timeout) appear to be showing similar underlying problems. In both cases the output from jstack shows that a root fork/join task (initiated via an invoke) is not terminating, implying child (CountedCompleter) tasks are not correctly reporting to parent tasks they have completed. The fact that there - are two different tasks failing - on different data; and - sporadically indicates there might be a race condition lurking somewhere, but there are not enough samples to draw strong conclusions.
14-04-2015