Bug ID: JDK-8024919 G1: SPECjbb2013 crashes due to a broken object reference

JDK-8024919 : G1: SPECjbb2013 crashes due to a broken object reference

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: hs25

Priority: P2
Status: Closed
Resolution: Fixed

Submitted: 2013-09-17
Updated: 2014-05-28
Resolved: 2013-10-31

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 7	JDK 8	Other
7u60Fixed	8Fixed	hs25Fixed

Related Reports

Duplicate :	JDK-8027216 - Crash in Symbol::as_klass_external_name on OSX
Relates :	JDK-8027751 - C1 crashes in Weblogic with G1 enabled
Relates :	JDK-8044090 - C1: Old value instead of new one is passed to post-barrier in UnsafeGetAndSetObject

Description

While running some benchmarks for a G1 fix I've gotten several crashes in Java code. The crashes seem to come right after encountering an unexpected ClassCastException. I believe that the cause for the exception is that the object reference we're trying to cast is a dead pointer into the heap since sometimes it's to an object of a completely wrong type and sometimes the pointer is in the middle of another object.

The main two stack traces I get are:
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v  ~RuntimeStub::throw_class_cast_exception Runtime1 stub
J 5336 C1 org.spec.jbb.infra.txinjector.Driver$TokenPopulateTask.run()V (48 bytes) @ 0x00007ffb897e87e3 [0x00007ffb897e8140+0x6a3]
J 6363 C2 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (225 bytes) @ 0x00007ffb8a038238 [0x00007ffb8a037ce0+0x558]
J 4817 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 0x00007ffb897c9d4c [0x00007ffb897c9c40+0x10c]
J 2907 C1 java.lang.Thread.run()V (17 bytes) @ 0x00007ffb8996f40c [0x00007ffb8996f2c0+0x14c]
v  ~StubRoutines::call_stub

and: (where the top frame was deopted: reason=class_check action=maybe_recompile pc=0x00007fda217d6dc0 method=org.spec.jbb.infra.txinjector.Driver$ProbeTask.getRunTask()Ljava/lang/Runnable; @ 14)

j  org.spec.jbb.infra.txinjector.Driver$ProbeTask.getRunTask()Ljava/lang/Runnable;+14
J 4283 C2 org.spec.jbb.core.scheduler.SingleLoopTimerScheduler$TimerTask.fire(J)V (48 bytes) @ 0x00007fda21d8e0b0 [0x00007fda21d8da00+0x6b0]
j  org.spec.jbb.core.scheduler.SingleLoopTimerScheduler$TimerTask.run()V+15
J 4431 C1 java.lang.Thread.run()V (17 bytes) @ 0x00007fda21a793cc [0x00007fda21a79280+0x14c]
v  ~StubRoutines::call_stub

I've also seen this crash in the GC code, also due to a broken object reference.

I can't reproduce this with hs25-b43 so I believe this is a fairly recent regression.

I've not been able to run jbb2013 with +Verify{Before,After}GC because the benchmark refuses to ramp up because it thinks the machine is too slow.

ILW=HMM => P2

Comments

C1's getAndSetObject() intrinsic has a bad post barrier. See the attached patch.
30-10-2013
I've caught a VerifyBeforeGC heap verification failure after a lot of short iterations: 453.610: [GC pause (G1 Evacuation Pause) (young)453.610: [G1 Parallel Verification ---------- Missing rem set entry: Field 0x00000006713639d4 of obj 0x00000006713639c8, in region 1811:(O)[0x0000000671300000,0x0000000671400000,0x0000000671400000] java.util.concurrent.atomic.AtomicReference - klass: 'java/util/concurrent/atomic/AtomicReference' points to obj 0x0000000670ab95b8 in region 1802:(S)[0x0000000670a00000,0x0000000670b00000,0x0000000670b00000] java.util.concurrent.atomic.AtomicInteger - klass: 'java/util/concurrent/atomic/AtomicInteger' Obj head CTE = -1, field CTE = -1. ---------- ============================================================================== Unexpected Error ------------------------------------------------------------------------------ Internal Error at g1CollectedHeap.cpp:3344, pid=9185, tid=140169591535360 guarantee(!_failures) failed: no failures! I had to cut out a lot of the G1 verification code to get it fast enough to tickle the bug, leaving only the remembered set verification. I have a core file and the process running in +ShowMessageBoxOnError on the affected machine.
29-10-2013
The was sighting of the same problem with SPECjbb2005 on OSX: JDK-8027216
24-10-2013
I did see the failure a 2/3 weeks ago, not sure with which sources though. But if it reproduces with hotspot-rt it narrows it down. Also a couple of weeks ago I tried running jbb13 with a fixed injection rate with Verify{Before\|After}GC and it didn't fail. So clearly some weird race there. Anyways, currently it's been running for 4 days without the crash on sc14ia18 with hotspot-comp. Let's see what you experiments end up with. Unfortunately it takes a lot of time to even reproduce this problem.
23-10-2013
I could still reproduce this on the machine I've originally saw the issue on, at revision 996d1f2f056f (from hotspot-rt, because of another bug). Have you tried running a product build on the hosts I've mentioned? I'm currently running with a build off the tip of hotspot-comp and I will reopen this bug if I can still reproduce it.
23-10-2013
Not reproducible for the 3+ days of running SPECjbb2013. This was done on a similar machine, with the same flags running hotspot-comp repo. Closing as CNR.
23-10-2013
8-defer-request: While the ILW=HMM=>P2 but the benchmark is not part of the release criteria. This crash occurs infrequently and only after 3+ hours of running SPECjbb2013. Requesting deferral for JDK8 update.
21-10-2013
I've run an automatic bisect script with HG to help me figure out the changeset that causes the crash to appear. The script ran 10 iterations of specjbb2013 to determine if a revision was broken or not. The output of hg in the end was: The first bad revision is: changeset: 5021:d90d1b96b65b parent: 5015:e84845884c85 user: kvn date: Fri Jul 26 12:37:39 2013 -0700 summary: 8008938: TieredCompilation should be default This leads me to believe that this is a compiler (possibly related to C1?) issue and not a GC bug. Steps to reproduce: Get a specjbb2013 bundle (or use the one in refworkload) Run with "-d64 -XX:+UseG1GC -XX:+PrintGC -Xms16g", 10 runs should suffice to reproduce the crash. I've only tried to reproduce this on x86 but I've seen it both on an AMD box running Solaris 11 and on an Intel box running Oracle Linux 6.
19-09-2013