United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-8024919 : G1: SPECjbb2013 crashes due to a broken object reference

Details
Type:
Bug
Submit Date:
2013-09-17
Status:
Closed
Updated Date:
2014-05-28
Project Name:
JDK
Resolved Date:
2013-10-31
Component:
hotspot
OS:
Sub-Component:
compiler
CPU:
Priority:
P2
Resolution:
Fixed
Affected Versions:
hs25
Fixed Versions:
hs25 (b57)

Related Reports
Backport:
Backport:
Duplicate:
Relates:
Relates:

Sub Tasks

Description
While running some benchmarks for a G1 fix I've gotten several crashes in Java code. The crashes seem to come right after encountering an unexpected ClassCastException. I believe that the cause for the exception is that the object reference we're trying to cast is a dead pointer into the heap since sometimes it's to an object of a completely wrong type and sometimes the pointer is in the middle of another object.

The main two stack traces I get are:
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v  ~RuntimeStub::throw_class_cast_exception Runtime1 stub
J 5336 C1 org.spec.jbb.infra.txinjector.Driver$TokenPopulateTask.run()V (48 bytes) @ 0x00007ffb897e87e3 [0x00007ffb897e8140+0x6a3]
J 6363 C2 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (225 bytes) @ 0x00007ffb8a038238 [0x00007ffb8a037ce0+0x558]
J 4817 C1 java.util.concurrent.ThreadPoolExecutor$Worker.run()V (9 bytes) @ 0x00007ffb897c9d4c [0x00007ffb897c9c40+0x10c]
J 2907 C1 java.lang.Thread.run()V (17 bytes) @ 0x00007ffb8996f40c [0x00007ffb8996f2c0+0x14c]
v  ~StubRoutines::call_stub

and: (where the top frame was deopted: reason=class_check action=maybe_recompile pc=0x00007fda217d6dc0 method=org.spec.jbb.infra.txinjector.Driver$ProbeTask.getRunTask()Ljava/lang/Runnable; @ 14)

j  org.spec.jbb.infra.txinjector.Driver$ProbeTask.getRunTask()Ljava/lang/Runnable;+14
J 4283 C2 org.spec.jbb.core.scheduler.SingleLoopTimerScheduler$TimerTask.fire(J)V (48 bytes) @ 0x00007fda21d8e0b0 [0x00007fda21d8da00+0x6b0]
j  org.spec.jbb.core.scheduler.SingleLoopTimerScheduler$TimerTask.run()V+15
J 4431 C1 java.lang.Thread.run()V (17 bytes) @ 0x00007fda21a793cc [0x00007fda21a79280+0x14c]
v  ~StubRoutines::call_stub

I've also seen this crash in the GC code, also due to a broken object reference.

I can't reproduce this with hs25-b43 so I believe this is a fairly recent regression.

I've not been able to run jbb2013 with +Verify{Before,After}GC because the benchmark refuses to ramp up because it thinks the machine is too slow.

ILW=HMM => P2
                                    

Comments
I've run an automatic bisect script with HG to help me figure out the changeset that causes the crash to appear.
The script ran 10 iterations of specjbb2013 to determine if a revision was broken or not. 
The output of hg in the end was:
The first bad revision is:
changeset:   5021:d90d1b96b65b
parent:      5015:e84845884c85
user:        kvn
date:        Fri Jul 26 12:37:39 2013 -0700
summary:     8008938: TieredCompilation should be default

This leads me to believe that this is a compiler (possibly related to C1?) issue and not a GC bug.

Steps to reproduce:
Get a specjbb2013 bundle (or use the one in refworkload)
Run with "-d64 -XX:+UseG1GC -XX:+PrintGC -Xms16g", 10 runs should suffice to reproduce the crash.
I've only tried to reproduce this on x86 but I've seen it both on an AMD box running Solaris 11 and on an Intel box running Oracle Linux 6.

                                     
2013-09-19
8-defer-request:  While the ILW=HMM=>P2 but the benchmark is not part of the release criteria.  This crash occurs infrequently and only after 3+ hours of running SPECjbb2013.  Requesting deferral for JDK8 update.
                                     
2013-10-16
Not reproducible for the 3+ days of running SPECjbb2013.  This was done on a similar machine, with the same flags running hotspot-comp repo.  Closing as CNR.
                                     
2013-10-23
I could still reproduce this on the machine I've originally saw the issue on, at revision 996d1f2f056f (from hotspot-rt, because of another bug).
Have you tried running a product build on the hosts I've mentioned?
I'm currently running with a build off the tip of hotspot-comp and I will reopen this bug if I can still reproduce it.
                                     
2013-10-23
I did see the failure a 2/3 weeks ago, not sure with which sources though. But if it reproduces with hotspot-rt it narrows it down.
Also a couple of weeks ago I tried running jbb13 with a fixed injection rate with Verify{Before|After}GC and it didn't fail. So clearly some weird race there.

Anyways, currently it's been running for 4 days without the crash on sc14ia18 with hotspot-comp. Let's see what you experiments end up with. Unfortunately it takes a lot of time to even reproduce this problem. 
                                     
2013-10-23
The was sighting of the same problem with SPECjbb2005 on OSX: JDK-8027216
                                     
2013-10-24
I've caught a VerifyBeforeGC heap verification failure after a lot of short iterations:
453.610: [GC pause (G1 Evacuation Pause) (young)453.610: [G1 Parallel Verification
----------
Missing rem set entry:
Field 0x00000006713639d4 of obj 0x00000006713639c8, in region 1811:(O)[0x0000000671300000,0x0000000671400000,0x0000000671400000]
java.util.concurrent.atomic.AtomicReference
 - klass: 'java/util/concurrent/atomic/AtomicReference'
points to obj 0x0000000670ab95b8 in region 1802:(S)[0x0000000670a00000,0x0000000670b00000,0x0000000670b00000]
java.util.concurrent.atomic.AtomicInteger
 - klass: 'java/util/concurrent/atomic/AtomicInteger'
Obj head CTE = -1, field CTE = -1.
----------
==============================================================================
Unexpected Error
------------------------------------------------------------------------------
Internal Error at g1CollectedHeap.cpp:3344, pid=9185, tid=140169591535360
guarantee(!_failures) failed: no failures!

I had to cut out a lot of the G1 verification code to get it fast enough to tickle the bug, leaving only the remembered set verification.
I have a core file and the process running in +ShowMessageBoxOnError on the affected machine.
                                     
2013-10-29
C1's getAndSetObject() intrinsic has a bad post barrier. See the attached patch.
                                     
2013-10-30
URL:   http://hg.openjdk.java.net/hsx/hotspot-comp/hotspot/rev/946a8294ab15
User:  iveresov
Date:  2013-10-31 21:45:11 +0000

                                     
2013-10-31
URL:   http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/946a8294ab15
User:  amurillo
Date:  2013-11-01 20:52:37 +0000

                                     
2013-11-01



Hardware and Software, Engineered to Work Together