JDK-8210557 : G1 next bitmap verification at the end of concurrent mark sometimes fails
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 11.0.14-oracle,12
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2018-09-10
  • Updated: 2022-08-26
  • Resolved: 2018-09-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 12
12 b13Fixed
Related Reports
Relates :  
Description
Test TestEagerReclaimHumongousRegionsClearMarkBits might fail with 
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=t:/workspace/open/src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp:385
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (t:/workspace/open/src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp:385), pid=69516, tid=60924
#  assert(!G1VerifyBitmaps || _cm->next_mark_bitmap_is_clear()) failed: Next mark bitmap must be clear
#
# JRE version: Java(TM) SE Runtime Environment (12.0+10) (fastdebug build 12-ea+10)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 12-ea+10, mixed mode, tiered, compressed oops, g1 gc, windows-amd64)
# Core dump will be written. Default location: T:\\testoutput\\jtreg\\JTwork\\scratch\\0\\hs_err_pid69516.mdmp
#
# An error report file with more information is saved as:
# T:\\testoutput\\jtreg\\JTwork\\scratch\\0\\hs_err_pid69516.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
[32.733s][info][gc] GC(124) Pause Full (G1 Evacuation Pause) 128M->28M(128M) 233.705ms
[32.908s][info][gc] GC(126) Pause Young (Normal) (G1 Evacuation Pause) 30M->30M(128M) 106.408ms
[33.017s][info][gc] GC(127) Pause Young (Normal) (G1 Evacuation Pause) 31M->31M(128M) 105.130ms
[34.704s][info][gc] GC(128) Pause Young (Normal) (G1 Evacuation Pause) 32M->32M(128M) 1679.785ms

Failure is very intermittent. Failed less then 0.5% times but I seen it twice for JDK 12 b10 testing.
Comments
URL: http://hg.openjdk.java.net/jdk/jdk/rev/b177af763b82 User: tschatzl Date: 2018-09-21 13:11:40 +0000
21-09-2018

Changed priority to reflect that this is a bad assert that is annoying in testing.
20-09-2018

This seems to be an issue introduced with parallel full gc, but maybe earlier: parallel full gc uses the next mark bitmap for itself for marking. So if the concurrent mark thread is aborted, it might not get scheduled between the initial clear of the bitmap before setting has_aborted and the later mark phase. I.e. what roughly happens is: 0) Concurrent start Young gc 1) Concurrent mark thread is started (e.g. until before printing "Concurrent Cycle") 2) Concurrent mark thread is descheduled 3) Full GC starts and clears next bitmap during G1ConcurrentMark::concurrent_cycle_abort(); sets the _has_aborted flag in ConcurrentMark 4) Full GC starts marking (creating liveness information in the next bitmap) ... X) Concurrent mark thread verifies the next mark bitmap X+1) Full GC clears the next mark bitmap I.e. if X) happened before 4) or after X+1), everything would be fine. Particularly on a highly loaded machine it is somehow likely that . I think the bitmap verification and the assert is placed wrongly, or at least the timing should somehow be changed. Since asserts are not active in product builds, this problem has no impact there.
20-09-2018

Unlike JDK-8054808 wrongly marked objects are not necessarily humongous objects: running more tests with -XX:+G1VerifyBitmaps yields a few more tests that assert at the same location; when also printing the addresses of these marks they are not necessarily region aligned.
19-09-2018

In both instances the concurrent cycle has been interrupted by a full gc right after a Concurrent start pause that has had an evacuation failure. E.g.: [31.382s][info][gc] GC(121) Pause Young (Normal) (G1 Evacuation Pause) 124M->125M(128M) 573.374ms [31.391s][info][gc] GC(46) Pause Cleanup 126M->126M(128M) 2.287ms [31.395s][info][gc] GC(46) Concurrent Cycle 25597.589ms [31.957s][info][gc] GC(122) To-space exhausted [31.957s][info][gc] GC(122) Pause Young (Normal) (G1 Evacuation Pause) 126M->127M(128M) 558.479ms [32.491s][info][gc] GC(123) To-space exhausted [32.491s][info][gc] GC(123) Pause Young (Concurrent Start) (G1 Evacuation Pause) 128M->128M(128M) 532.913ms [32.502s][info][gc] GC(125) Concurrent Cycle (... assertion failure text...) [32.733s][info][gc] GC(124) Pause Full (G1 Evacuation Pause) 128M->28M(128M) 233.705ms [32.908s][info][gc] GC(126) Pause Young (Normal) (G1 Evacuation Pause) 30M->30M(128M) 106.408ms [33.017s][info][gc] GC(127) Pause Young (Normal) (G1 Evacuation Pause) 31M->31M(128M) 105.130ms [34.704s][info][gc] GC(128) Pause Young (Normal) (G1 Evacuation Pause) 32M->32M(128M) 1679.785ms Second failure log snippet: [47.294s][info][gc] GC(120) Pause Young (Normal) (G1 Evacuation Pause) 125M->125M(128M) 832.514ms [47.300s][info][gc] GC(49) Pause Cleanup 125M->125M(128M) 0.985ms [47.301s][info][gc] GC(49) Concurrent Cycle 38359.958ms [48.026s][info][gc] GC(121) Pause Young (Normal) (G1 Evacuation Pause) 126M->126M(128M) 723.725ms [48.802s][info][gc] GC(122) To-space exhausted [48.802s][info][gc] GC(122) Pause Young (Concurrent Start) (G1 Evacuation Pause) 127M->128M(128M) 755.778ms [48.812s][info][gc] GC(124) Concurrent Cycle (...assertion failure text...) [48.999s][info][gc] GC(123) Pause Full (G1 Evacuation Pause) 128M->27M(128M) 190.863ms [49.133s][info][gc] GC(125) Pause Young (Normal) (G1 Evacuation Pause) 29M->29M(128M) 129.298ms
18-09-2018