JDK-8328879 : G1: Some gtests modify global state crashing the JVM during GC after JDK-8289822
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 23
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2024-03-25
  • Updated: 2024-06-24
  • Resolved: 2024-04-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23 b19Fixed
Related Reports
Relates :  
Description
This triggers when running make test TEST="gtest:all" with aarch64 openjdk slowdebug:

......

[----------] 1 test from G1CommittedRegionMapTest
[ RUN      ] G1CommittedRegionMapTest.serial
[       OK ] G1CommittedRegionMapTest.serial (6 ms)
[----------] 1 test from G1CommittedRegionMapTest (6 ms total)

[----------] 3 tests from G1ServiceThread
[ RUN      ] G1ServiceThread.test_add_vm
[       OK ] G1ServiceThread.test_add_vm (1000 ms)
[ RUN      ] G1ServiceThread.test_add_while_waiting_vm
assert failed: assert(limit == bottom) failed: the region limit should be at bottomassert failed: assert(limit == bottom) failed: the region limit should be at bottomassert failed: assert(limit == bottom) failed: the region limit should be at bottom[thread 293938 also had an error][thread 294262 also had an error]

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/fyang/jdk/src/hotspot/share/gc/g1/g1ConcurrentMark.cpp:1931), pid=293932, tid=294261
#  assert(limit == bottom) failed: the region limit should be at bottom
#
# JRE version: OpenJDK Runtime Environment (23.0) (slowdebug build 23-internal-adhoc.fyang.jdk)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 23-internal-adhoc.fyang.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/fyang/jdk/build/linux-aarch64-server-slowdebug/test-support/gtest_all_server/core.293932)
#
# An error report file with more information is saved as:
# /home/fyang/jdk/build/linux-aarch64-server-slowdebug/test-support/gtest_all_server/hs_err_pid293932.log
[       OK ] G1ServiceThread.test_add_while_waiting_vm (1000 ms)
[ RUN      ] G1ServiceThread.test_add_run_once_vm
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
/usr/bin/bash: line 1: 293932 Aborted                 (core dumped) /home/fyang/jdk/build/linux-aarch64-server-slowdebug/images/test/hotspot/gtest/server/gtestLauncher -jdk /home/fyang/jdk/build/linux-aarch64-server-slowdebug/images/jdk --gtest_output=xml:/home/fyang/jdk/build/linux-aarch64-server-slowdebug/test-results/gtest_all_server/gtest.xml --gtest_catch_exceptions=0 > >(/usr/bin/tee /home/fyang/jdk/build/linux-aarch64-server-slowdebug/test-results/gtest_all_server/gtest.txt)
Finished running test 'gtest:all/server'
Test report is stored in build/linux-aarch64-server-slowdebug/test-results/gtest_all_server

==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR
   gtest:all/server                                      0     0     0     0
==============================
TEST SUCCESS

Finished building target 'test' in configuration 'linux-aarch64-server-slowdebug'
Comments
Changeset: 2c45eca1 Author: Thomas Schatzl <tschatzl@openjdk.org> Date: 2024-04-12 07:22:06 +0000 URL: https://git.openjdk.org/jdk/commit/2c45eca15943826cb6bfbdf6e6fd88abc196e8f7
12-04-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/18691 Date: 2024-04-09 12:38:55 +0000
09-04-2024

Besides suppressing GCs, tests that are modifying global state and not restoring it should be using TEST_OTHER_VM. But we don't want lots of tests using that, as it's expensive to start up a new VM for a single gtest case.
27-03-2024

We can't move the TAMS update in HeapRegion::hr_clear() somewhere else easily: when doing hr_clear() we really want to notify the concurrent marking that the tams should be reset too; extracting this out would mean that we need to carefully analyze the places where heap regions are newly allocated and add that notification at the correct place(s). The reason is that concurrent mark does not take a "proper" snapshot of the heap regions at the start of marking, but walks the global HeapRegion* array directly which can be modified concurrently (when allocating humongous regions). So an alternative would be taking a snapshot of the "current" HeapRegion array at concurrent start too. Or getting away from G1CMTask using any HeapRegion* (for _curr_region) as using indexes for the current region should be fine, and storing in the TAMS array whether that region is part of the snapshot or not.
26-03-2024

I think this is just the gtests interfering with global heap data structures: after the changes, the FreeRegionList gtest (test_freeRegionlist.cpp) is not self-containing any more, modifying the global TAMSes instead of the HeapRegion local ones when initializing the dummy HeapRegions. Since it does not restore the old state either, this trips up that periodic gc that is for some reason started (it should be disabled by default?). Note that there are other gtests that modify global state, e.g. the one in test_heapRegion.cpp that marks random bits in the mark bitmap (and does not restore them either), so having any kind of GC during gtests is just not safe. Maybe the tests can be rewritten to inhibit GCs or something, and/or restoring state properly.
26-03-2024

TAMSes for the first few regions are weird :) Heap Regions: E=young(eden), S=young(survivor), O=old, HS=humongous(starts), HC=humongous(continues), CS=collection set, F=free, TAMS=top-at-mark-start, PB=parsable bottom | 0|0x000000060a800000, ...| 0%| F| |TAMS 0x0000000000000000| PB 0x000000060a800000| Untracked | 0 | 1|0x000000060ac00000, ...| 0%| F| |TAMS 0x0000000000400000| PB 0x000000060ac00000| Untracked | 0 | 2|0x000000060b000000, ...| 0%| F| |TAMS 0x0000000000800000| PB 0x000000060b000000| Untracked | 0 | 3|0x000000060b400000, ...| 0%| F| |TAMS 0x0000000000c00000| PB 0x000000060b400000| Untracked | 0 | 4|0x000000060b800000, ...| 0%| F| |TAMS 0x0000000001000000| PB 0x000000060b800000| Untracked | 0 | 5|0x000000060bc00000, ...| 0%| F| |TAMS 0x000000060bc00000| PB 0x000000060bc00000| Untracked | 0 | 6|0x000000060c000000, ...| 0%| F| |TAMS 0x000000060c000000| PB 0x000000060c000000| Untracked | 0 I.e. region 0-4's TAMS != bottom, apparently missing the heap base offset (0x000000060a800000); note that PB's are good.
26-03-2024