United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6888316 G1: has_aborted() || _cm->region_stack_empty() fails
JDK-6888316 : G1: has_aborted() || _cm->region_stack_empty() fails

Details
Type:
Bug
Submit Date:
2009-10-05
Status:
Resolved
Updated Date:
2011-12-22
Project Name:
JDK
Resolved Date:
2009-11-11
Component:
hotspot
OS:
generic
Sub-Component:
gc
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs17
Fixed Versions:
hs17 (b04)

Related Reports
Backport:
Backport:
Backport:
Relates:
Relates:

Sub Tasks

Description
During testing we've come across this assertion failure. Poonam hit it while looking at another bug (CR 6847956).

------------------------------------------------------------------------------

#  Internal Error (concurrentMark.cpp:3492), pid=14287, tid=73
#  Error: guarantee(has_aborted() || _cm->region_stack_empty(),"only way to exit the loop")

  [5] VMError::report_and_die(0xffffffff7e7562e8, 0x0, 0x1, 0xffffffff7e5b1e37, 0xffffffff7e760fd1,0xffffffff7e73df20), at 0xffffffff7e42fd64
  [6] report_fatal(0xffffffff7e4bf4b9, 0xda4, 0xffffffff7e4bf528, 0xffffffffffc1f758, 0x3e0884, 0x3e0800), at 0xffffffff7e009384
  [7] CMTask::drain_region_stack(0x104a2f8d0, 0x1, 0x0, 0x0, 0xffffffff7dfd8270, 0x1), at 0xffffffff7dfd86a4
  [8] CMTask::do_marking_step(0x1001f8cd0, 0x104a2f8d0, 0x2000, 0xffffffff7e4bfd08, 0xffffffff7e6ee000, 0xffffffff7e7798f0), at 0xffffffff7dfd8ba4
  [9] CMConcurrentMarkingTask::work(0xffffffff705ff570, 0x5, 0x106679000, 0x1001f8cd0, 0xffffffff7e73605c, 0xffffffff7dfda6e8), at 0xffffffff7dfdaa8c
  [10] GangWorker::loop(0x106679000, 0x6, 0xffffffff7e438980, 0x1022a2ff0, 0x1, 0x5), at 0xffffffff7e438a00
  [11] java_start(0x106679000, 0x67a24, 0x37cf, 0xffffffff7e536cb9, 0xffffffff7e6ee000, 0x106163ae0), at 0xffffffff7e30c928

From disassembly, looks like the guarantee was violated because region stack was not empty.

(dbx) x 0xffffffff7dfd86a4-40/20i
0xffffffff7dfd867c: drain_region_stack+0x03ec:  ldub     [%i0 + 300], %l3    //i0=CMTask* , l3=has_aborted
0xffffffff7dfd8680: drain_region_stack+0x03f0:  ldx      [%i0 + 24], %o0      //ConcurrentMark*
0xffffffff7dfd8684: drain_region_stack+0x03f4:  cmp      %l3, 0                    // l3=0
0xffffffff7dfd8688: drain_region_stack+0x03f8:  bne,pn   %icc,drain_region_stack+0x428  ! 0xffffffff7dfd86b8
0xffffffff7dfd868c: drain_region_stack+0x03fc:  nop
0xffffffff7dfd8690: drain_region_stack+0x0400:  ld       [%o0 + 484], %i1 
0xffffffff7dfd8694: drain_region_stack+0x0404:  cmp      %i1, 0             // i1=1
0xffffffff7dfd8698: drain_region_stack+0x0408:  be,pn    %icc,drain_region_stack+0x428  ! 0xffffffff7dfd86b8
0xffffffff7dfd869c: drain_region_stack+0x040c:  mov      3492, %o1
0xffffffff7dfd86a0: drain_region_stack+0x0410:  add      %l0, -82, %o2
0xffffffff7dfd86a4: drain_region_stack+0x0414:  call     report_fatal   ! 0xffffffff7e009360

Core and logs in /usr/de119005/gctest/drain_stack_failure on v4v-t5220c-sca11.sfbay.

------------------------------------------------------------------------------

I don't think the bug that caused 6847956 could also be causing this, so I opened a separate CR.

                                    

Comments
EVALUATION

From John Cuthbertson:

(01:16:50 PM) John Cuthbertson: I think one thread has to scanning (the last) region when it fails and another thread has to be attempting to pop from the region stack before the other region scan fails.
(01:17:16 PM) John Cuthbertson: I think that's the only condition that could cause the guarantee to trip.
                                     
2009-10-05
EVALUATION

I'm convinced that, when there's more than one marking thread, the guarantee is bogus.

Basically, the guarantee checks that we should not have the case that a marking thread has not aborted and the region stack is not empty. However, the first condition is the local abort flag (i.e., whether the thread itself is aborting the marking step), not the global abort flag (which will cause all the marking threads to abort). Given this, here's a plausible scenario that can cause the guarantee to fire:

(here "region subset" stands for what we push on the region stack, to differentiate from actual heap regions)

thread A is scanning region subset RS
thread B notices that region stack is not empty, tries to pop an entry
thread C notices that region staci is not empty, tries to pop an entry
thread B succeeds in popping the last entry from the region stack and start scanning it
thread A decides to abort the region subset iteration (say, it times out) and pushes the remainder on the region stack
thread C hits the assertion and it will find that it has not yet decided to abort, but also that the region stack is not empty (as A just pushed a region on it).

I can't really think of another guarantee that would be useful and would also make sense. I think we should just remove it.
                                     
2009-10-05
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/4c3458a31e17
                                     
2009-10-07



Hardware and Software, Engineered to Work Together