JDK-8292946 : GC lock/jni/jnilock001 test failed "assert(gch->gc_cause() == GCCause::_scavenge_alot || !gch->incremental_collection_failed()) failed: Twice in a row"
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 20
  • Priority: P1
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2022-08-25
  • Updated: 2023-12-08
  • Resolved: 2022-09-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 20
11.0.23-oracleFixed 17.0.11-oracleFixed 20 b14Fixed
Related Reports
Relates :  
Sub Tasks
JDK-8292994 :  
Description
The following test failed in the JDK20 CI:

vmTestbase/gc/lock/jni/jnilock001/TestDescription.java

Here's a snippet from the log file:

#section:main
----------messages:(4/454)----------
command: main -Xbootclasspath/a:. -Xlog:gc=debug:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI gc.lock.LockerTest -lockers jni -t 1
reason: User specified action: run main/othervm/native -Xbootclasspath/a:. -Xlog:gc=debug:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI gc.lock.LockerTest -lockers jni -t 1 
Mode: othervm [/othervm specified]
elapsed time (seconds): 65.634
----------configuration:(0/0)----------
----------System.out:(35/2309)----------
Stress time: 120 seconds
Stress iterations factor: 1
Stress threads factor: 1
Stress runs factor: 1
Max memory: 1003487232
Sleep time: 500
Iterations: 0
Number of threads: 1
Run GC thread: false
Run mem diag thread: false
Run forever: false
Starting Thread[#19,gc.lock.LockerTest$Worker@1b49a530,5,MainThreadGroup]
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/defNewGeneration.cpp:870
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/opt/mach5/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S71798/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/069b35e7-3997-4f1d-ac34-bb6183f4f9f3/runs/d9bc802f-606a-4d6b-8b3c-69684b064d27/workspace/open/src/hotspot/share/gc/serial/defNewGeneration.cpp:870), pid=2094368, tid=2094375
#  assert(gch->gc_cause() == GCCause::_scavenge_alot || !gch->incremental_collection_failed()) failed: Twice in a row
#
# JRE version: Java(TM) SE Runtime Environment (20.0+13) (fastdebug build 20-ea+13-791)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-ea+13-791, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, serial gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xbc9064]  DefNewGeneration::gc_epilogue(bool)+0x354
#
# Core dump will be written. Default location: Core dumps may be processed with "/opt/core.sh %p" (or dumping to /opt/mach5/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S71916/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/4de9c880-c5ab-4d25-86b5-695f0037ced5/runs/2c6be331-cadb-4277-8ed2-792d881c0901/testoutput/test-support/jtreg_open_test_hotspot_jtreg_vmTestbase_vm_gc_locker/scratch/1/core.2094368)
#
Unsupported internal testing APIs have been used.

# An error report file with more information is saved as:
# /opt/mach5/mesos/work_dir/slaves/0c72054a-24ab-4dbb-944f-97f9341a1b96-S71916/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/4de9c880-c5ab-4d25-86b5-695f0037ced5/runs/2c6be331-cadb-4277-8ed2-792d881c0901/testoutput/test-support/jtreg_open_test_hotspot_jtreg_vmTestbase_vm_gc_locker/scratch/1/hs_err_pid2094368.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
----------System.err:(0/0)----------
----------rerun:(40/7816)*----------


Here's the crashing thread's stack:

---------------  T H R E A D  ---------------

Current thread (0x00007fa93c1511d0):  VMThread "VM Thread" [stack: 0x00007fa942149000,0x00007fa942249000] [id=2094375]

Stack: [0x00007fa942149000,0x00007fa942249000],  sp=0x00007fa9422475c0,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xbc9064]  DefNewGeneration::gc_epilogue(bool)+0x354  (defNewGeneration.cpp:870)
V  [libjvm.so+0xedde63]  GenCollectedHeap::gc_epilogue(bool)+0x33  (genCollectedHeap.cpp:1171)
V  [libjvm.so+0xee201c]  GenCollectedHeap::do_collection(bool, bool, unsigned long, bool, GenCollectedHeap::GenerationType)+0x8ec  (genCollectedHeap.cpp:577)
V  [libjvm.so+0xee2e84]  GenCollectedHeap::do_full_collection(bool, GenCollectedHeap::GenerationType)+0x24  (genCollectedHeap.cpp:871)
V  [libjvm.so+0xec9991]  VM_GenCollectFull::doit()+0x51  (gcVMOperations.cpp:192)
V  [libjvm.so+0x1b784e2]  VM_Operation::evaluate()+0x182  (vmOperations.cpp:71)
V  [libjvm.so+0x1b9d2ba]  VMThread::evaluate_operation(VM_Operation*)+0x18a  (vmThread.cpp:282)
V  [libjvm.so+0x1b9eb5f]  VMThread::inner_execute(VM_Operation*)+0x40f  (vmThread.cpp:429)
V  [libjvm.so+0x1b9ed35]  VMThread::loop()+0xc5  (vmThread.cpp:496)
V  [libjvm.so+0x1b9ee50]  VMThread::run()+0xb0  (vmThread.cpp:175)
V  [libjvm.so+0x1a82a10]  Thread::call_run()+0x100  (thread.cpp:224)
V  [libjvm.so+0x17488e3]  thread_native_entry(Thread*)+0x103  (os_linux.cpp:707)
Comments
Fix request [11u] I backport this for parity with 11.0.23-oracle. No risk, only a test change. Only remove a assert. Test passes. SAP nightly testing passed.
06-12-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/2338 Date: 2023-12-05 05:44:57 +0000
05-12-2023

Fix request [17u] I backport this for parity with 17.0.11-oracle. No risk, only a test change. -- Actually removes an assertion. Clean backport. Test passes. SAP nightly testing passed.
04-12-2023

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/1994 Date: 2023-11-29 14:14:32 +0000
29-11-2023

Changeset: 9833c025 Author: Albert Mingkun Yang <ayang@openjdk.org> Date: 2022-09-05 13:35:36 +0000 URL: https://git.openjdk.org/jdk/commit/9833c025fd7daf6bb1be81d93148a4204a9f184c
05-09-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10096 Date: 2022-08-31 09:12:22 +0000
31-08-2022

I added `_wb_younggc` to the assert as suggested by Thomas, but the failure persisted. After taking a deep look, I think the relevant code is: ``` GenCollectedHeap::do_full_collection(...) { // after inlining do_collection { ... if (!do_full_collection) { <- (1) ... gc_epilogue(...) // contains the failed assertion } ... } if (gc_cause() == GCCause::_gc_locker && incremental_collection_failed()) { <- (2) // run Full GC } } ``` (1) and (2) checks if a Full GC is needed after finishing a Young GC, and they can give different results. When the crash occurs, (1) == true, i.e. `do_full_collection` == false, and (2) == true. IOW, (1) decides a Full GC is not needed but (2) disagrees. `GenCollectedHeap::do_full_collection` is fairly convoluted --- not obvious how to modify its logic while keeping all its callers happy. OTOH, the value of the crashing assertion is low, as it essentially checks a failed Young GC (not initiated by `_scavenge_alot` or `_wb_young_gc`) should trigger a Full GC, which is exactly what (2) captures. Therefore, the best fix, IMO, is to just remove this problematic (incorrect) assertion. (PS: If my analysis is correct, this crash doesn't affect release builds, just to be explicit.)
29-08-2022

The first jdk-20+13-791-tier6 sighting is in a test task with these JVM args: -XX:+UseSerialGC -Xshare:off -showversion
26-08-2022

Some initial debugging: GC cause is _wb_young_gc This may be just two _wb_young_gc's in sequence (the code which both fail due to heap exhaustion; with the given parameters GenCollectedHeap::do_collection (this=this@entry=0x7fa93c046a40, full=full@entry=true, clear_all_soft_refs=clear_all_soft_refs@entry=false, size=size@entry=0, is_tlab=is_tlab@entry=false, max_generation=max_generation@entry=GenCollectedHeap::YoungGen because bool GenCollectedHeap::should_do_full_collection(size_t size, bool full, bool is_tlab, GenCollectedHeap::GenerationType max_gen) const { return max_gen == OldGen && _old_gen->should_collect(full, size, is_tlab); } returns false with max_gen == YoungGen. The iteration in the test code to cause the gcs: public static void engageGC(long testMemory) { final int YOUNG_GC_ITERATIONS = 100; final long memChunk = testMemory / YOUNG_GC_ITERATIONS; int iteration = 0; Object referenceArray[] = new Object[YOUNG_GC_ITERATIONS]; while (iteration < YOUNG_GC_ITERATIONS) { referenceArray[iteration++] = byteArrayProducer.create(memChunk); WhiteBox.getWhiteBox().youngGC(); } WhiteBox.getWhiteBox().fullGC(); } may be simple enough for the compiler to remove the allocation although there is that WB method in it. The assert looks as follows: assert(gch->gc_cause() == GCCause::_scavenge_alot || !gch->incremental_collection_failed(), "Twice in a row"); in the debugger gch->incremental_collection_failed() is true. Given that there is the explicit exception for GCCause::_scavenge_alot already here, this issue is known and should/could be extended to _wb_younggc.
26-08-2022

Also the test method should be fixed so that the allocations can not be compiled away
26-08-2022

The PR has this comment from 2022.08.07: https://github.com/openjdk/jdk/pull/9658#issuecomment-1207420760 > Tested for all GC types - G1GC, SerialGC, ParallelGC, ZGC. However, there were commits to the PR after 2022.08.07 so it's not clear what tests were run on the final version.
25-08-2022

The jdk-20+13-791-tier5 build-ID lists 20 changesets, but only one sticks out: JDK-8289764 gc/lock tests failed with "OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects" because it made minor changes to the failing test. I'm not saying that JDK-8289764 caused theses new test failures. However, it does give the investigator a place to start poking around.
25-08-2022