Bug ID: JDK-8140588 Internal Error: gc/g1/ptrQueue.hpp:126 assert(_index == _sz) failed: invariant: queues are empty when activated

JDK-8140588 : Internal Error: gc/g1/ptrQueue.hpp:126 assert(_index == _sz) failed: invariant: queues are empty when activated

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 9

Priority: P2
Status: Resolved
Resolution: Fixed

Submitted: 2015-10-27
Updated: 2018-06-21
Resolved: 2017-01-17

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 10	JDK 9
10Fixed	9 b156Fixed

Related Reports

Relates :	JDK-6793828 - G1: invariant: queues are empty when activated
Relates :	JDK-8188056 - G1/SATB in progress checked twice in C1
Relates :	JDK-8017065 - C2 allows safepoint checks to leak into G1 pre-barriers

Description

jdk9 b89 PIT
runtime/Metaspace/FragmentMetaspace.java failed on win32 (seen once)

#
#  Internal Error (C:\jprt\T\P1\002332.amurillo\s\hotspot\src\share\vm\gc/g1/ptrQueue.hpp:126), pid=220080, tid=184968
#  assert(_index == _sz) failed: invariant: queues are empty when activated.
#
# JRE version: Java(TM) SE Runtime Environment (9.0) (build 1.9.0-internal-fastdebug-20151027002332.amurillo.jdk9-hs-2015-10--b00)
# Java VM: Java HotSpot(TM) Server VM (1.9.0-internal-20151027002332.amurillo.jdk9-hs-2015-10--b00, compiled mode, tiered, g1 gc, windows-x86)
# Core dump will be written. Default location: C:\Users\aurora\sandbox\results\workDir\runtime\Metaspace\FragmentMetaspace\hs_err_pid220080.mdmp
#

---------------  S U M M A R Y ------------

Command Line: -Dtest.src=C:\Users\aurora\CommonData\j2se_hotspot\hotspot\test\runtime\Metaspace -Dtest.src.path=C:\Users\aurora\CommonData\j2se_hotspot\hotspot\test\runtime\Metaspace;C:\Users\aurora\CommonData\j2se_hotspot\hotspot\test\runtime\testlibrary -Dtest.classes=C:\Users\aurora\sandbox\results\workDir\classes\runtime\Metaspace -Dtest.class.path=C:\Users\aurora\sandbox\results\workDir\classes\runtime\Metaspace;C:\Users\aurora\sandbox\results\workDir\classes\runtime\testlibrary -Dtest.vm.opts= -Dtest.tool.vm.opts= -Dtest.compiler.opts= -Dtest.java.opts=-Xcomp -server -Xcomp -Dtest.jdk=c:\users\aurora\CommonData\jdk -Dcompile.jdk=c:\users\aurora\CommonData\jdk -Dtest.timeout.factor=3.0 -Dtest.nativepath=C:\users\aurora\sandbox\JTREG_NATIVEPATH_LIBRARY_PREPARED -Xcomp -Xcomp -Djava.library.path=C:\users\aurora\sandbox\JTREG_NATIVEPATH_LIBRARY_PREPARED -Xmx300m com.sun.javatest.regtest.agent.MainWrapper C:\Users\aurora\sandbox\results\workDir\classes\runtime\Metaspace\FragmentMetaspace.jta

Host: busgo3007, Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, 2 cores, 15G,  Windows Server 2012 R2 , 64 bit Build 9600 (6.3.9600.17415)
Time: Tue Oct 27 03:34:11 2015 Eastern Daylight Time elapsed time: 52 seconds (0d 0h 0m 52s)

---------------  T H R E A D  ---------------

Current thread (0x258ccc00):  VMThread [stack: 0x25b00000,0x25b50000] [id=184968]

Stack: [0x25b00000,0x25b50000],  sp=0x25b4ee4c,  free space=315k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x2e0415]
V  [jvm.dll+0x2e08f7]
V  [jvm.dll+0x2d4239]
V  [jvm.dll+0x475d7e]
V  [jvm.dll+0x428596]
V  [jvm.dll+0x477ff6]
V  [jvm.dll+0x2a9003]
V  [jvm.dll+0x2a6f20]
V  [jvm.dll+0x2a77d5]
V  [jvm.dll+0x2a7d94]
V  [jvm.dll+0x317bc1]
C  [msvcr120.dll+0x2c01d]
C  [msvcr120.dll+0x2c001]
C  [KERNEL32.DLL+0x17c04]
C  [ntdll.dll+0x5ad1f]
C  [ntdll.dll+0x5acea]

VM_Operation (0x271cf0dc): G1IncCollectionPause, mode: safepoint, requested by thread 0x26b25000

Comments

Bugs found by nightly testing. Verified by passed nightly.
26-07-2017
This email from Per Liden with an explanation of this issue and plan to resolve: Following up on yesterdays discussion. The potential C1 issue I was trying to describe over the phone is this: In LIRGenerator::G1SATBCardTableModRef_pre_barrier(), we generate code which essentially does this: if (satb_active) { goto G1PreBarrierStub } The generation of that pre-barrier stub looks like this: void G1PreBarrierStub::emit_code(LIR_Assembler* ce) { // At this point we know that marking is in progress. // If do_load() is true then we have to emit the // load of the previous value; otherwise it has already // been loaded into _pre_val. __ bind(_entry); assert(pre_val()->is_register(), "Precondition."); Register pre_val_reg = pre_val()->as_register(); if (do_load()) { ce->mem2reg(addr(), pre_val(), T_OBJECT, patch_code(), info(), false /wide/, false /unaligned/); } __ cmpptr(pre_val_reg, (int32_t) NULL_WORD); __ jcc(Assembler::equal, _continuation); ce->store_parameter(pre_val()->as_register(), 0); __ call(RuntimeAddress(Runtime1::entry_for(Runtime1::g1_pre_barrier_slow_id))); __ jmp(_continuation); } The potential problem I see here is that mem2reg() can in turn generate a PatchingStub. If a PatchingStub is generated here then mem2reg becomes a point where we could safepoint, and in the following scenario we could end up enqueuing oops to an inactive SATB queue. 1) Assume concurrent mark is running and SATB queues are active. 2) A mutator applies a pre-barrier. SATB queue is active, so it starts executing the associated G1PreBarrierStub. 3) Assume a PatchingStub was installed in the mem2reg() instruction location, so the mutator will now starts to execute the that PatchingStub. That PatchingStub will in turn enter the VM, which means we could safepoint here. 4) Assume G1 just issued a Remark VM operation when the mutator tried to execute it's PatchingStub. The Remark operation will drain the SATB queues make them inactive, when the mutator is in the middle of a pre-barrier. 5) After the Remark operation finishes, the mutator returns from the PatchingStub, and continues executing the remaining part of the G1PreBarrierStub, which will then enqueue an oop to an SATB queue which is now inactive. 6) In the next concurrent mark cycle, we assert because the queues should be empty when they are activated. One way of solving this would be to just re-check the queue active state after returning from the PatchingStub. On x86 it would look something like this (code compiles but is completely untested): --- a/src/cpu/x86/vm/c1_Runtime1_x86.cpp +++ b/src/cpu/x86/vm/c1_Runtime1_x86.cpp @@ -1623,6 +1623,8 @@ NOT_LP64(__ get_thread(thread);) + Address queue_active(thread, in_bytes(JavaThread::satb_mark_queue_offset() + + SATBMarkQueue::byte_offset_of_active())); Address queue_index(thread, in_bytes(JavaThread::satb_mark_queue_offset() + SATBMarkQueue::byte_offset_of_index())); Address buffer(thread, in_bytes(JavaThread::satb_mark_queue_offset() + @@ -1631,6 +1633,11 @@ Label done; Label runtime; + // Is queue active? + __ movbool(tmp, queue_active); + __ testbool(tmp); + __ jcc(Assembler::zero, done); + // Can we store original value in the thread's buffer? __ movptr(tmp, queue_index); cheers, Per
13-01-2017
[~pliden] suggested that the c1 pre barrier code may be responsible for this behavior: before loading the old value from memory (and after looking at whether marking is active) there can be a safepoint for code patching. This safepoint may abort marking. So after resumption of the pre-barrier, it unconditionally adds an entry to the SATB queue. This results in a state where the assert fails. This is somewhat reproducable with -XX:+PatchALot.
19-12-2016
Issue cannot be reproduced anymore.
24-05-2016
I reproduced the crash with tracing code that collects addresses in the c2 compiled code where a safepoint poll is taken. I only see 3 places during the failing execution and all 3 are safepoint polls on return where clearly no g1 barrier code is involved. So this is not an issue that is similar to previous ones. Reassigning to gc for further investigation.
26-11-2015
Another more recent occurrence.
28-10-2015
The same problem already occurred once in JDK-6793828. Has been a compiler problem.
28-10-2015