JDK-8152358 : code and comment cleanups found during the hunt for 8077392
Type:Sub-task
Component:hotspot
Sub-Component:runtime
Affected Version:9
Priority:P3
Status:Resolved
Resolution:Fixed
OS:generic
CPU:generic
Submitted:2016-03-21
Updated:2016-04-21
Resolved:2016-04-04
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
The hunt for JDK-8077392 was long and a number of
code and comment cleanups were made during that time.
This bug only addresses the code and comment cleanups
and should follow the fix for JDK-8077392.
Comments
Grabbed the JPRT job that I based the last code reviews on:
$ /work/shared/bug_hunt/8152358_for_jdk9_hs_rt/8077392_repro/jdk/1.9.0/solaris-x64/bin/java -version
java version "9-internal"
Java(TM) SE Runtime Environment (build 9-internal+0-2016-03-28-043904.iklam.ul)
Java HotSpot(TM) 64-Bit Server VM (build 9-internal+0-2016-03-28-043904.iklam.ul, mixed mode)
and ran a dual instance parallel run for just over 24 hours:
$ elapsed_times doit.ksh doit_loop.copy1?.log
doit.ksh 0 seconds
doit_loop.copy10.log 1 days 45 minutes 43 seconds
doit_loop.copy11.log 5 seconds
556 2016.04.05 15:53:46 $ grep -v PASS doit_loop.copy1?.log
doit_loop.copy10.log:Loop #1033...FAIL
doit_loop.copy10.log:Loop #1121...FAIL
doit_loop.copy10.log:Loop #1815...FAIL
doit_loop.copy10.log:Loop #2175...FAIL
doit_loop.copy10.log:Loop #3534...doit_loop.copy11.log:Loop #91...FAIL
doit_loop.copy11.log:Loop #307...FAIL
doit_loop.copy11.log:Loop #1181...FAIL
doit_loop.copy11.log:Loop #1689...FAIL
doit_loop.copy11.log:Loop #2164...FAIL
doit_loop.copy11.log:Loop #2175...FAIL
doit_loop.copy11.log:Loop #2633...FAIL
doit_loop.copy11.log:Loop #3516...
Got a total of 11 failures (4 in instance #10, and 7 in instance #11).
$ grep 'fatal error:' *.hs_err_pid.*
doit.copy10.hs_err_pid.1033:# fatal error: exiting JavaThread=0x0000000001532800 unexpectedly owns ObjectMonitor=0x0000000002440280
doit.copy10.hs_err_pid.1121:# fatal error: exiting JavaThread=0x0000000001472000 unexpectedly owns ObjectMonitor=0x0000000001c36a80
doit.copy10.hs_err_pid.1815:# fatal error: exiting JavaThread=0x000000000129a000 unexpectedly owns ObjectMonitor=0x0000000002ef6300
doit.copy11.hs_err_pid.1181:# fatal error: exiting JavaThread=0x00000000012e8000 unexpectedly owns ObjectMonitor=0x00000000024b2680
doit.copy11.hs_err_pid.1689:# fatal error: exiting JavaThread=0x0000000001659800 unexpectedly owns ObjectMonitor=0x0000000001bde280
doit.copy11.hs_err_pid.307:# fatal error: exiting JavaThread=0x000000000194c000 unexpectedly owns ObjectMonitor=0x0000000002066480
doit.copy11.hs_err_pid.91:# fatal error: exiting JavaThread=0x0000000001266800 unexpectedly owns ObjectMonitor=0x00000000030ff800
All of the failures are instances of JDK-8077392. So the failure mode
reproduces fairly easily without the fix in 24 hours and doesn't repro
in 72 hours of stress.
Note: Looks like the triple instance configurations (2 product, 1 fastdebug)
changed the timing enough to make the failure not reproduce.
05-04-2016
72 hour stress test results with the fix in place shows no failures:
$ grep -v PASS doit_loop*.copy0[789].log
doit_loop.copy07.log:INFO: limit of 259200 seconds has been reached.
doit_loop.copy08.log:INFO: limit of 259200 seconds has been reached.
doit_loop_fastdebug.copy09.log:INFO: limit of 259200 seconds has been reached.
$ for file in doit_loop*.copy0[789].log; do echo file=$file; tail -2 $file; done
file=doit_loop.copy07.log
Loop #6948...PASS
INFO: limit of 259200 seconds has been reached.
file=doit_loop.copy08.log
Loop #6947...PASS
INFO: limit of 259200 seconds has been reached.
file=doit_loop_fastdebug.copy09.log
Loop #2773...PASS
INFO: limit of 259200 seconds has been reached.
04-04-2016
Rebased the repo to this post-Jigsaw M3 changeset:
$ hg log -r tip
changeset: 10650:5518d3ad6fa7
tag: tip
user: iklam
date: Wed Mar 23 09:00:22 2016 -0700
summary: 8152380: Shared symbol table should never use alternate hashcode
Temporarily added the following code around the one line fix:
$ diff src/share/vm/runtime/synchronizer.cpp.cr1{,.debug}
234a235,237
> if (!UseNewCode) {
> // The fix is enabled by default.
> // Disable the fix with -XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode
235a239
> }
By default, the fix is enabled and with the:
-XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode
options specified the fix is disabled. Did a 24 hour parallel run with
three instances (two product and one fastdebug):
$ grep -v PASS doit_loop*.log
doit_loop.copy04.log:INFO: limit of 86400 seconds has been reached.
doit_loop.copy05.log:INFO: limit of 86400 seconds has been reached.
doit_loop_fastdebug.copy06.log:INFO: limit of 86400 seconds has been reached.
$ tail -2 doit_loop.copy04.log; tail -2 doit_loop.copy05.log; tail -2 doit_loop_fastdebug.copy06.log
Loop #2313...PASS
INFO: limit of 86400 seconds has been reached.
Loop #2315...PASS
INFO: limit of 86400 seconds has been reached.
Loop #923...PASS
INFO: limit of 86400 seconds has been reached.
No failures in 24 hours. That's happened before with two parallel
instances so it's not too much of a surprise.
Also note that the stress kit had to be rebuilt due to Jigsaw M3
and the code base has been updated to post Jigsaw M3. Lots
of possibilities for why we saw no failures in 24 hours.