JDK-8327647 : Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 22
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: aarch64
  • Submitted: 2024-03-08
  • Updated: 2024-06-21
  • Resolved: 2024-04-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23 b21Fixed
Related Reports
Relates :  
Description
#                                                                               
# A fatal error has been detected by the Java Runtime Environment:              
#                                                                               
#  SIGSEGV (0xb) at pc=0x0000ffff49ca8078, pid=1675873, tid=1675969             
#                                                                               
# JRE version: OpenJDK Runtime Environment (22.0+36) (build 22+36-2370)         
# Java VM: OpenJDK 64-Bit Server VM (22+36-2370, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:                                                            
# V  [libjvm.so+0xae8078]  markWord::displaced_mark_helper() const+0x18         
#                                                                               
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/sunguoyun/vm-infra/SPECjvm2008/core.1675873)
#                                                                               
# If you would like to submit a bug report, please visit:                       
#   https://bugreport.java.com/bugreport/crash.jsp                              
#                                                                               
                                                                                
---------------  S U M M A R Y ------------                                     
                                                                                
Command Line: SPECjvm2008.jar -ikv -ict -coe -ops 1 -crf 0 sunflow              
                                                                                
Host: AArch64, 64 cores, 254G, Kylin V10                                        
Time: Wed Mar  6 15:59:12 2024 CST elapsed time: 0.639457 seconds (0d 0h 0m 0s)
Comments
A regression test has not been integrated as part of fix. Presuming it is hard to create a reliable JTReg regression test for this, therefore applying noreg-hard label.
21-06-2024

Changeset: 9ce21d13 Author: Matias Saavedra Silva <matsaave@openjdk.org> Date: 2024-04-30 16:02:55 +0000 URL: https://git.openjdk.org/jdk/commit/9ce21d1382a4f5ad601a7ee610bab64a9c575302
30-04-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/18477 Date: 2024-03-25 19:41:02 +0000
11-04-2024

[~dcubed] Yeah, it's a typo. I mean this bug JDK-8301996: Move field resolution information out of the cpCache.
20-03-2024

[~fyang] in this comment: https://bugs.openjdk.org/browse/JDK-8327647?focusedId=14658524&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14658524 you state: after changes for JDK-8327647. However, JDK-8327647 is this bug so I suspect you mean another bug.
19-03-2024

There are a couple of other places that have load_field_entry (in jvmti) that might not have the LoadLoad membar. I think it should be added in the load_field_entry() function. It's redundant in the case where we resolve the field entry but performance isn't critical.
19-03-2024

It does trigger on my kunpeng aarch64 server with a probability of 1/4000. JDK-8248219 was resolved about four years ago, here are some random recall from my side about the issue. As mentioned by Andrew in the previous discussion [1], all the ConstantPoolCacheEntry::set_XXX methods in file oops/cpCache.cpp use Atomic::release_store. So normally all the use sites of the CpCache should be paired with loads with acquire_load semantic. And that's why we planted a LoadLoad membar before the use sites which will prevent reordering of subsequent CpCache loads when resolving JDK-8248219. But seems that there isn't such an ensurance after changes for JDK-8327647. The new code now changed to use TemplateTable::load_resolved_field_entry() and InterpreterMacroAssembler::load_field_entry() to load from CpCache which emit regular loads without such a memory semantic. And the original LoadLoad membar added by JDK-8248219 now comes after TemplateTable::load_resolved_field_entry() and load_field_entry(), which seems won't work either. That's just my understanding about the code. Not sure if I recall it right. Hope that helps :-) [1] https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2020-June/040322.html
19-03-2024

[~sguoyun][~fyang] could you explain in detail what the load ordering issue is? Looking at previous comments, it seems that the following bytecode could be loaded before the cache entry is loaded, so the entry being read could be invalid. For getfield and putfield, we have a matching load-acquire in resolve_cache_and_index_for_field for resolution to make the store-release when populating the entry before dispatching the next bytecode. In fast_storefield, the membar is located after load_resolved_field_entry() which may be reading stale data. Is it best to place a membar inside of load_field_entry() instead? That way the threads are reading the resolved entry. fast_xaccess, and jvmti_post_field_access don't use membar the same way either. Also how did you find out this was the issue?
15-03-2024

If this is introduced by JDK-8301996, jdk11u is not affected.
13-03-2024

Perhaps @Fei Yang knows how to reproduce this problem because he solved a reorder load issue https://bugs.openjdk.org/browse/JDK-8248219 The information I reproduced is Host: AArch64, 64 cores, 254G, Kylin V10
13-03-2024

I can't reproduce this failure on aarch64(kunpeng920) for run 10k times. java -version openjdk version "11.0.22.19" 2024-01-16 OpenJDK Runtime Environment (Alibaba Dragonwell Extended Edition)-11.0.22.19+7-ga (build 11.0.22.19+7) OpenJDK 64-Bit Server VM (Alibaba Dragonwell Extended Edition)-11.0.22.19+7-ga (build 11.0.22.19+7, mixed mode)
13-03-2024

Since this is a loadload barrier that's in the wrong place, and all architectures reorder loads...
12-03-2024

risc-v should have the same issue.
12-03-2024

The suggested patch in the template interpreter suggests gc algorithm independence so removing the tag. Also moving back to runtime team as the interpreter is their field of expertise. Probably more than just aarch64 may be affected by this issue if the fix is correct.
12-03-2024

https://github.com/sunny868/jdk/commit/ca40f11558bfdf36a9b5a91a35eb24e17ebe78c2 , sunflow was run 2000 times without reproducing `SIGSEGV` issue.
12-03-2024

Here's the crashing thread's stack trace: --------------- T H R E A D --------------- Current thread (0x0000fffed80340c0): WorkerThread "GC Thread#40" [id=1675969, stack(0x0000fffd86f90000,0x0000fffd87190000) (2048K)] Stack: [0x0000fffd86f90000,0x0000fffd87190000], sp=0x0000fffd8718e630, free space=2041k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xae8078] markWord::displaced_mark_helper() const+0x18 V [libjvm.so+0x74b70c] G1ParEvacuateFollowersClosure::do_void()+0x4c V [libjvm.so+0x74bc44] G1EvacuateRegionsTask::evacuate_live_objects(G1ParScanThreadState*, unsigned int)+0x74 V [libjvm.so+0x74995c] G1EvacuateRegionsBaseTask::work(unsigned int)+0x9c V [libjvm.so+0xdae1e8] WorkerThread::run()+0x98 V [libjvm.so+0xd09ff8] Thread::call_run()+0xa8 V [libjvm.so+0xb93268] thread_native_entry(Thread*)+0xd8 C [libpthread.so.0+0x88cc] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000003f0e38e4 I'm moving this bug from hotspot/runtime -> hotspot/gc for initial triage.
09-03-2024

This issue was first discovered on the LoongArch64 and can also be replicated on aarch64 with a probability of 1/1000. Possible related with issues JDK-8301996 Test JDK: https://download.java.net/java/GA/jdk22/830ec9fcccef480bb3e73fb7ecafe059/36/GPL/openjdk-22_linux-aarch64_bin.tar.gz
08-03-2024