JDK-8241804 : Dacapo24H.java SIGSEGV in G1ParScanThreadState::copy_to_survivor_space
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 15
  • Priority: P2
  • Status: Closed
  • Resolution: Cannot Reproduce
  • OS: linux
  • CPU: x86_64
  • Submitted: 2020-03-29
  • Updated: 2020-07-22
  • Resolved: 2020-07-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 15
15Resolved
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
The following test failed in the JDK15 CI:

applications/dacapo/Dacapo24H.java

Here's a snippet from the log file:

Stress process main method is started.
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f02c546c425, pid=3725, tid=3806
#
# JRE version: Java(TM) SE Runtime Environment (15.0+16) (build 15-ea+16-676)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (15-ea+16-676, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x664425]  G1ParScanThreadState::copy_to_survivor_space(G1HeapRegionAttr, oopDesc*, markWord)+0x1b5
#
# Core dump will be written. Default location: Core dumps may be processed with "/opt/core.sh %p" (or dumping to /opt/mach5/mesos/work_dir/slaves/b0d836b1-c68c-4dbd-8b78-5085890ddd4c-S1433/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/045c0e6f-7879-4e7f-9a90-441eb0221a09/runs/456ad909-e950-4cf8-ad8b-713a2fe1965e/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_dacapo_Dacapo24H_java/scratch/0/core.3725)
#

Here's the crashing thread's stack:

---------------  T H R E A D  ---------------

Current thread (0x00007f0248004800):  GCTaskThread "GC Thread#3" [stack: 0x00007f02508e7000,0x00007f02509e7000] [id=3806]

Stack: [0x00007f02508e7000,0x00007f02509e7000],  sp=0x00007f02509e59f0,  free space=1018k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x664425]  G1ParScanThreadState::copy_to_survivor_space(G1HeapRegionAttr, oopDesc*, markWord)+0x1b5
V  [libjvm.so+0x661b2b]  G1ParScanThreadState::trim_queue_partially()+0x44b
V  [libjvm.so+0x678357]  G1ScanHRForRegionClosure::scan_heap_roots(HeapRegion*)+0x517
V  [libjvm.so+0x672772]  G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases)+0x282
V  [libjvm.so+0x621c93]  G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x43
V  [libjvm.so+0x6228f8]  G1EvacuateRegionsBaseTask::work(unsigned int)+0x98
V  [libjvm.so+0xd1557d]  GangWorker::loop()+0x4d
V  [libjvm.so+0xc8138d]  Thread::call_run()+0x10d
V  [libjvm.so+0xad97d7]  thread_native_entry(Thread*)+0xe7


siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000000

Comments
Did not reproduce for months now. From the available stack trace it looks like JDK-8249192, but we can't verify due to missing artifacts. Reopen if it reproduces.
22-07-2020

I think my suggestion is a decent start: look a few stack frames down and see if the code is scaning roots from some specific sub-system of the JVM. That usually brings you much nearer to the bug than the GC copying/marking loop. The inner copying/marking loops usually don't change, so it's much more likely to be something else.
29-06-2020

[~dcubed] [~dholmes]] The top frame is just a GC function that will crash if an oop is broken. It's very unlikely that this is the source of the bug, and by grouping these crashes together, you run the risk of conflating the issues. It's usually better to look at the frames below, and see what the GC tries to scan. Often you find some runtime code that is broken and needs the attention: Original bug (looks like a GC issue): V [libjvm.so+0x661b2b] G1ParScanThreadState::trim_queue_partially()+0x44b V [libjvm.so+0x678357] G1ScanHRForRegionClosure::scan_heap_roots(HeapRegion*)+0x517 V [libjvm.so+0x672772] G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, Broken oop in CLDs (Runtime issue?): V [libjvm.so+0xb9842f] void G1ParCopyClosure<(G1Barrier)1, (G1Mark)0>::do_oop_work<oop>(oop*)+0xef V [libjvm.so+0x894863] ClassLoaderData::ChunkedHandleList::oops_do(OopClosure*)+0x93 V [libjvm.so+0xb5f6a3] G1CLDScanClosure::do_cld(ClassLoaderData*)+0x33 V [libjvm.so+0x89e6fa] ClassLoaderDataGraph::roots_cld_do(CLDClosure*, CLDClosure*)+0x4a Broken Handle (Runtime issue?): V [libjvm.so+0x6a2ccc] G1ParCopyClosure<(G1Barrier)0, (G1Mark)1>::do_oop(oopDesc**)+0x15c V [libjvm.so+0x6fda3d] HandleArea::oops_do(OopClosure*)+0x5d V [libjvm.so+0xceb348] JavaThread::oops_do(OopClosure*, CodeBlobClosure*)+0x58 V [libjvm.so+0xcf19e7] Threads::possibly_parallel_oops_do(bool, OopClosure*, CodeBlobClosure*)+0x187 V [libjvm.so+0x6a47f3] G1RootProcessor::process_java_roots(G1RootClosures*, G1GCPhaseTimes*, unsigned int)+0x73 I think these kind of issues need to split up into separate bugs, so that they are all investigated separately.
29-06-2020

Filed and closed: JDK-8248446 (appears to have been a machine issue)
28-06-2020

I can file a new bug too. It is always difficult to characterise these kind of GC detected failures. If the GC finds a bad oop there's nothing to go on to tell you where the bad oop came from, only where it was found. And you can't tell if the problem is with the regular code using the oop, or with the synchronization/coordination between that code and the GC code. You would think that by now we would have a relatively simple step-by-step procedure to try and diagnose these kinds of crashes. :(
28-06-2020

[~stefank] - I suspect that you meant to include [~dholmes] in the comment above rather than me twice since the CLD failure sighting was added by David. I'll move my sighting to a new bug. I've moved my sighting to: JDK-8248442 Kitchensink24HStress.java SIGSEGV in G1ParScanThreadState::copy_to_survivor_space due to bad Handle and deleted the two entries from this bug.
28-06-2020

Similar failure mode in recent CI testing: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f6d99622a34, pid=3789, tid=3792 # # JRE version: Java(TM) SE Runtime Environment (15.0+27) (fastdebug build 15-ea+27-1314) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 15-ea+27-1314, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xb64a34] G1ParScanThreadState::copy_to_survivor_space(G1HeapRegionAttr, oop, markWord)+0x314 # --------------- T H R E A D --------------- Current thread (0x00007f6d94071500): GCTaskThread "GC Thread#0" [stack: 0x00007f6d69d1a000,0x00007f6d69e1a000] [id=3792] Stack: [0x00007f6d69d1a000,0x00007f6d69e1a000], sp=0x00007f6d69e188e0, free space=1018k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xb64a34] G1ParScanThreadState::copy_to_survivor_space(G1HeapRegionAttr, oop, markWord)+0x314 V [libjvm.so+0xb9842f] void G1ParCopyClosure<(G1Barrier)1, (G1Mark)0>::do_oop_work<oop>(oop*)+0xef V [libjvm.so+0x894863] ClassLoaderData::ChunkedHandleList::oops_do(OopClosure*)+0x93 V [libjvm.so+0xb5f6a3] G1CLDScanClosure::do_cld(ClassLoaderData*)+0x33 V [libjvm.so+0x89e6fa] ClassLoaderDataGraph::roots_cld_do(CLDClosure*, CLDClosure*)+0x4a V [libjvm.so+0xb9a85e] G1RootProcessor::process_java_roots(G1RootClosures*, G1GCPhaseTimes*, unsigned int)+0xce V [libjvm.so+0xb9ae84] G1RootProcessor::evacuate_roots(G1ParScanThreadState*, unsigned int)+0x64 V [libjvm.so+0xae3cf2] G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x22 V [libjvm.so+0xae489a] G1EvacuateRegionsBaseTask::work(unsigned int)+0x8a V [libjvm.so+0x1800a14] GangWorker::run_task(WorkData)+0x84 V [libjvm.so+0x1800b3e] GangWorker::loop()+0x2e V [libjvm.so+0x16c3240] Thread::call_run()+0x100 V [libjvm.so+0x13c6346] thread_native_entry(Thread*)+0x116
08-06-2020

Similar failure mode in JDK-8246219. Though this could just be the result of finding a bad oop.
08-06-2020