JDK-6888316 : G1: has_aborted() || _cm->region_stack_empty() fails
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs17
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2009-10-05
  • Updated: 2011-12-22
  • Resolved: 2009-11-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u21Fixed 7Fixed hs16Fixed
Related Reports
Relates :  
Relates :  
Description
During testing we've come across this assertion failure. Poonam hit it while looking at another bug (CR 6847956).

------------------------------------------------------------------------------

#  Internal Error (concurrentMark.cpp:3492), pid=14287, tid=73
#  Error: guarantee(has_aborted() || _cm->region_stack_empty(),"only way to exit the loop")

  [5] VMError::report_and_die(0xffffffff7e7562e8, 0x0, 0x1, 0xffffffff7e5b1e37, 0xffffffff7e760fd1,0xffffffff7e73df20), at 0xffffffff7e42fd64
  [6] report_fatal(0xffffffff7e4bf4b9, 0xda4, 0xffffffff7e4bf528, 0xffffffffffc1f758, 0x3e0884, 0x3e0800), at 0xffffffff7e009384
  [7] CMTask::drain_region_stack(0x104a2f8d0, 0x1, 0x0, 0x0, 0xffffffff7dfd8270, 0x1), at 0xffffffff7dfd86a4
  [8] CMTask::do_marking_step(0x1001f8cd0, 0x104a2f8d0, 0x2000, 0xffffffff7e4bfd08, 0xffffffff7e6ee000, 0xffffffff7e7798f0), at 0xffffffff7dfd8ba4
  [9] CMConcurrentMarkingTask::work(0xffffffff705ff570, 0x5, 0x106679000, 0x1001f8cd0, 0xffffffff7e73605c, 0xffffffff7dfda6e8), at 0xffffffff7dfdaa8c
  [10] GangWorker::loop(0x106679000, 0x6, 0xffffffff7e438980, 0x1022a2ff0, 0x1, 0x5), at 0xffffffff7e438a00
  [11] java_start(0x106679000, 0x67a24, 0x37cf, 0xffffffff7e536cb9, 0xffffffff7e6ee000, 0x106163ae0), at 0xffffffff7e30c928

From disassembly, looks like the guarantee was violated because region stack was not empty.

(dbx) x 0xffffffff7dfd86a4-40/20i
0xffffffff7dfd867c: drain_region_stack+0x03ec:  ldub     [%i0 + 300], %l3    //i0=CMTask* , l3=has_aborted
0xffffffff7dfd8680: drain_region_stack+0x03f0:  ldx      [%i0 + 24], %o0      //ConcurrentMark*
0xffffffff7dfd8684: drain_region_stack+0x03f4:  cmp      %l3, 0                    // l3=0
0xffffffff7dfd8688: drain_region_stack+0x03f8:  bne,pn   %icc,drain_region_stack+0x428  ! 0xffffffff7dfd86b8
0xffffffff7dfd868c: drain_region_stack+0x03fc:  nop
0xffffffff7dfd8690: drain_region_stack+0x0400:  ld       [%o0 + 484], %i1 
0xffffffff7dfd8694: drain_region_stack+0x0404:  cmp      %i1, 0             // i1=1
0xffffffff7dfd8698: drain_region_stack+0x0408:  be,pn    %icc,drain_region_stack+0x428  ! 0xffffffff7dfd86b8
0xffffffff7dfd869c: drain_region_stack+0x040c:  mov      3492, %o1
0xffffffff7dfd86a0: drain_region_stack+0x0410:  add      %l0, -82, %o2
0xffffffff7dfd86a4: drain_region_stack+0x0414:  call     report_fatal   ! 0xffffffff7e009360

Core and logs in /usr/de119005/gctest/drain_stack_failure on v4v-t5220c-sca11.sfbay.

------------------------------------------------------------------------------

I don't think the bug that caused 6847956 could also be causing this, so I opened a separate CR.

Comments
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/4c3458a31e17
07-10-2009

EVALUATION I'm convinced that, when there's more than one marking thread, the guarantee is bogus. Basically, the guarantee checks that we should not have the case that a marking thread has not aborted and the region stack is not empty. However, the first condition is the local abort flag (i.e., whether the thread itself is aborting the marking step), not the global abort flag (which will cause all the marking threads to abort). Given this, here's a plausible scenario that can cause the guarantee to fire: (here "region subset" stands for what we push on the region stack, to differentiate from actual heap regions) thread A is scanning region subset RS thread B notices that region stack is not empty, tries to pop an entry thread C notices that region staci is not empty, tries to pop an entry thread B succeeds in popping the last entry from the region stack and start scanning it thread A decides to abort the region subset iteration (say, it times out) and pushes the remainder on the region stack thread C hits the assertion and it will find that it has not yet decided to abort, but also that the region stack is not empty (as A just pushed a region on it). I can't really think of another guarantee that would be useful and would also make sense. I think we should just remove it.
05-10-2009

EVALUATION From John Cuthbertson: (01:16:50 PM) John Cuthbertson: I think one thread has to scanning (the last) region when it fails and another thread has to be attempting to pop from the region stack before the other region scan fails. (01:17:16 PM) John Cuthbertson: I think that's the only condition that could cause the guarantee to trip.
05-10-2009