JDK-8264782 : applications/jcstress/seqcst.java SIGILL in ObjectSynchronizer::quick_enter
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 17
  • Priority: P3
  • Status: Closed
  • Resolution: Cannot Reproduce
  • OS: linux
  • CPU: x86_64
  • Submitted: 2021-04-06
  • Updated: 2022-05-10
  • Resolved: 2021-05-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17
17Resolved
Related Reports
Relates :  
Relates :  
Description
The following test failed in the JDK17 CI:

applications/jcstress/seqcst.java

The crash isn't showing up in the .log file. The output is elided
because it is so big so I'm guess the crash happened in that region.

Here's snippets from the hs_err_pid file:

  SIGILL (0x4) at pc=0x00007f154f417c80, pid=8695, tid=8744
#
# JRE version: Java(TM) SE Runtime Environment (17.0+17) (fastdebug build 17-ea+17-LTS-1368)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 17-ea+17-LTS-1368, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x17ecc80]  ObjectSynchronizer::quick_enter(oop, JavaThread*, BasicLock*)+0x1a0

---------------  S U M M A R Y ------------

Command Line: -XX:+UnlockDiagnosticVMOptions -XX:MaxRAMFraction=8 -XX:MinRAMFraction=8 -XX:CICompilerCount=4 -XX:ParallelGCThreads=4 -XX:ConcGCThreads=4 -XX:G1ConcRefinementThreads=4 -XX:+WhiteBoxAPI -Xbootclasspath/a:/opt/mach5/mesos/work_dir/slaves/e8f948fe-dc79-4c12-82c8-0e7ba4ac7993-S53/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/8d02d52f-8562-4bfc-babe-b3407d0380dd/runs/d61c94d9-e2a8-4d92-a7ff-d7cabb5e2304/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part1/scratch/0/whitebox18325503424106751384.jar -Djava.io.tmpdir=/opt/mach5/mesos/work_dir/slaves/e8f948fe-dc79-4c12-82c8-0e7ba4ac7993-S53/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/8d02d52f-8562-4bfc-babe-b3407d0380dd/runs/d61c94d9-e2a8-4d92-a7ff-d7cabb5e2304/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part1/scratch/0 -Djava.io.tmpdir=/opt/mach5/mesos/work_dir/slaves/e8f948fe-dc79-4c12-82c8-0e7ba4ac7993-S53/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/8d02d52f-8562-4bfc-babe-b3407d0380dd/runs/d61c94d9-e2a8-4d92-a7ff-d7cabb5e2304/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part1/scratch/0 -XX:MaxRAMPercentage=6 -Djava.io.tmpdir=/opt/mach5/mesos/work_dir/slaves/e8f948fe-dc79-4c12-82c8-0e7ba4ac7993-S53/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/8d02d52f-8562-4bfc-babe-b3407d0380dd/runs/d61c94d9-e2a8-4d92-a7ff-d7cabb5e2304/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part1/tmp -XX:-UseBiasedLocking org.openjdk.jcstress.ForkedMain 127.0.0.1 45917 fork-token-417

Host: <host>, AMD EPYC 7742 64-Core Processor, 8 cores, 30G, Oracle Linux Server release 7.9
Time: Tue Apr  6 06:51:10 2021 UTC elapsed time: 6.893491 seconds (0d 0h 0m 6s)

---------------  T H R E A D  ---------------

Current thread (0x00007f1548552360):  JavaThread "jcstress-worker-3" daemon [_thread_in_Java, id=8744, stack(0x00007f151d9f7000,0x00007f151daf8000)]

Stack: [0x00007f151d9f7000,0x00007f151daf8000],  sp=0x00007f151daf65a0,  free space=1021k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x17ecc80]  ObjectSynchronizer::quick_enter(oop, JavaThread*, BasicLock*)+0x1a0
V  [libjvm.so+0x16cc685]  SharedRuntime::monitor_enter_helper(oopDesc*, BasicLock*, JavaThread*)+0x55
V  [libjvm.so+0x16cc9c0]  SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*, JavaThread*)+0x20

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v  ~RuntimeStub::_complete_monitor_locking_Java
J 792% c2 org.openjdk.jcstress.tests.seqcst.sync.L1_L2_L1__S2__S1_S2_Test_jcstress.actor2()Lorg/openjdk/jcstress/util/Counter; (113 bytes) @ 0x00007f15393385fc [0x00007f1539337a40+0x0000000000000bbc]

[error occurred during error reporting (printing Java stack), id 0xb, SIGSEGV (0xb) at pc=0x00007f154e805e1e]


siginfo: si_signo: 4 (SIGILL), si_code: 2 (ILL_ILLOPN), si_addr: 0x00007f154f417c80

Comments
The test machine for the original failure sighting is no longer available in Mach5. A search of Mach5 only has a single instance of this failure mode so I'm closing this bug as "Cannot Reproduce".
20-05-2021

ILW = HLM = P3
13-04-2021

Looks like stack and PC are both messed up (from kjdb): Threads Current Thread Thread 0x0(0x7f5f304a60c0) "[" SP=0x7f5f146f4540 Stack: [0x7f5f146f4540 to 0x7f5f146f4600) (192b) signal=4 Disassembly: Instruction operands are shown in Intel order, so expect destination first, then source. Integer arguments usually passed in left-to-right order in RDI, RSI, RDX, RCX, R8, R9 0x7f5f39a95060: 0x00 0x48 0x8d ADD [AL+141], CL 0x7f5f39a95063: 0x15 0x40 0x69 0x2b 0x00 ADC EAX, 0x2b6940 0x7f5f39a95068: 0xbe 0x80 0x00 0x00 0x00 MOV ESI, 0x80 0x7f5f39a9506d: 0x48 0x8d 0x3d 0x7c 0x69 0x2b 0x00 (REX.W) LEA RDI, (cannot decode r/m size in 'm') 0x7f5f39a95074: 0x48 0x8b 0x00 (REX.W) MOV RAX, [RAX] 0x7f5f39a95077: 0xc6 0x00 0x58 MOV [AL], 0x58 0x7f5f39a9507a: 0x31 0xc0 XOR EAX, EAX 0x7f5f39a9507c: 0xe8 0x1f 0xb3 0x23 0xff CALL 0x7f6038cd03a0 # vvvvv 0xff23b324 0x7f5f39a95081: 0xe8 0x8a 0x75 0xd5 0xff CALL 0x7f60397ec610 # vvvvv 0xffd5758f 0x7f5f39a95086: 0x41 0xc7 0x86 0xd8 0x03 0x00 0x00 0x0b 0x00 0x00 0x00 (REX.B) MOV [R14D+984 ], 0xb 0x7f5f39a95091: 0xf0 0x83 0x04 0x24 0x00 (LOCK) ADD (SP, SP, 1), 0x0 0x7f5f39a95096: 0x49 0x8b 0x86 0x28 0x01 0x00 0x00 (REX.WB) MOV R8, [R14+296 ] 0x7f5f39a9509d: 0xa8 0x01 TEST AL, 0x1 0x7f5f39a9509f: 0x74 0x08 JE 0x7f5f39a950a9 # vvvvv 0xa Stack search in: [ jbs - bugdb - google ] grok 0 SP=0x7f5f146f4540 FP=0x7f5f146f4600 PC=0x7f5f39a95080 libjvm.so+0x1843080 ObjectSynchronizer::deflate_idle_monitors()+0x600 FAULT SIGILL si_code: ILL_ILLOPN grok 1 PC=0x7f5f39970a44 libjvm.so+0x171ea44 SharedRuntime::monitor_enter_helper(oopDesc*, BasicLock*, JavaThread*)+0x54 grok 2 PC=0x7f5f39970ee1 libjvm.so+0x171eee1 SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*, JavaThread*)+0x41 grok (finished hs_err native frames, showing Java frames...) grok 3 RuntimeStub::_complete_monitor_locking_Java+0x0 grok 4 org.openjdk.jcstress.tests.seqcst.sync.L1_S2__L1_S1__S1_L2_Test_jcstress.actor3()Lorg/openjdk/jcstress/util/Counter;+0x0 (org/openjdk/jcstress/tests/seqcst/sync/L1_S2__L1_S1__S1_L2_Test_jcstress.java) compileID=598% compileType=c2 grok 5 org.openjdk.jcstress.tests.seqcst.sync.L1_S2__L1_S1__S1_L2_Test_jcstress$$Lambda$45+0x4 (org/openjdk/jcstress/tests/seqcst/sync/L1_S2__L1_S1__S1_L2_Test_jcstress.java) grok 6 java.util.concurrent.FutureTask.run()V+0x27 (java/util/concurrent/FutureTask.java) grok 7 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+0x5c (java/util/concurrent/ThreadPoolExecutor.java) grok 8 java.util.concurrent.ThreadPoolExecutor$Worker.run()V+0x5 (java/util/concurrent/ThreadPoolExecutor.java) grok 9 java.lang.Thread.run()V+0xb (java/lang/Thread.java) grok 10 StubRoutines::call_stub+0x0
11-04-2021

This stack trace doesn't make a lot of sense: V [libjvm.so+0x1843080] ObjectSynchronizer::deflate_idle_monitors()+0x600 V [libjvm.so+0x171ea44] SharedRuntime::monitor_enter_helper(oopDesc*, BasicLock*, JavaThread*)+0x54 V [libjvm.so+0x171eee1] SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*, JavaThread*)+0x41 ObjectSynchronizer::deflate_idle_monitors() is only called from MonitorDeflationThread::monitor_deflation_thread_entry() by the dedicated MonitorDeflationThread and by ObjectSynchronizer::do_final_audit_and_print_stats() by the VMThread. This crashing thread is a java.util.concurrent.ThreadPoolExecutor$Worker().
09-04-2021

There is a similar crash in a non-CI mach5 run. Different machine # # A fatal error has been detected by the Java Runtime Environment: # # SIGILL (0x4) at pc=0x00007f5f39a95080, pid=11930, tid=11953 # # JRE version: Java(TM) SE Runtime Environment (16.0.1+9) (fastdebug build 16.0.1+9-24) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16.0.1+9-24, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1843080] ObjectSynchronizer::deflate_idle_monitors()+0x600 # # Core dump will be written. Default location: Core dumps may be processed with "/opt/core.sh %p" (or dumping to /opt/mach5/mesos/work_dir/slaves/983c483a-6907-44e0-ad29-98c7183575e2-S77136/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/0bd83985-3eee-46e1-9bec-0ae823ebdce3/runs/a3e658aa-e1fc-447e-a085-7af219c7ba8f/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part1/scratch/0/core.11930) # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # --------------- S U M M A R Y ------------ Command Line: -XX:+UnlockDiagnosticVMOptions -XX:MaxRAMFraction=8 -XX:MinRAMFraction=8 -XX:CICompilerCount=4 -XX:ParallelGCThreads=4 -XX:ConcGCThreads=4 -XX:G1ConcRefinementThreads=4 -XX:+WhiteBoxAPI -Xbootclasspath/a:/opt/mach5/mesos/work_dir/slaves/983c483a-6907-44e0-ad29-98c7183575e2-S77136/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/0bd83985-3eee-46e1-9bec-0ae823ebdce3/runs/a3e658aa-e1fc-447e-a085-7af219c7ba8f/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part1/scratch/0/whitebox1298983278210782411.jar -Djava.io.tmpdir=/opt/mach5/mesos/work_dir/slaves/983c483a-6907-44e0-ad29-98c7183575e2-S77136/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/0bd83985-3eee-46e1-9bec-0ae823ebdce3/runs/a3e658aa-e1fc-447e-a085-7af219c7ba8f/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part1/scratch/0 -Djava.io.tmpdir=/opt/mach5/mesos/work_dir/slaves/983c483a-6907-44e0-ad29-98c7183575e2-S77136/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/0bd83985-3eee-46e1-9bec-0ae823ebdce3/runs/a3e658aa-e1fc-447e-a085-7af219c7ba8f/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part1/scratch/0 -XX:MaxRAMPercentage=6 -Djava.io.tmpdir=/opt/mach5/mesos/work_dir/slaves/983c483a-6907-44e0-ad29-98c7183575e2-S77136/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/0bd83985-3eee-46e1-9bec-0ae823ebdce3/runs/a3e658aa-e1fc-447e-a085-7af219c7ba8f/testoutput/test-support/jtreg_open_test_hotspot_jtreg_jcstress_part1/tmp org.openjdk.jcstress.ForkedMain 127.0.0.1 44897 fork-token-65 Host: AMD EPYC 7742 64-Core Processor, 8 cores, 30G, Oracle Linux Server release 7.9 Time: Tue Mar 30 07:44:06 2021 UTC elapsed time: 1.996638 seconds (0d 0h 0m 1s) --------------- T H R E A D --------------- Current thread (0x00007f5f304a60c0): [error occurred during error reporting (printing current thread), id 0xe0000000, Internal Error (/opt/mach5/mesos/work_dir/slaves/e8f948fe-dc79-4c12-82c8-0e7ba4ac7993-S71/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/f52f3c08-f0cb-44e2-bff7-66d96c898011/runs/746754f8-0b76-4eae-a572-bcd4aa72f598/workspace/open/src/hotspot/share/classfile/javaClasses.inline.hpp:59)] Stack: [0x00007f5f145f5000,0x00007f5f146f6000], sp=0x00007f5f146f4540, free space=1021k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1843080] ObjectSynchronizer::deflate_idle_monitors()+0x600 V [libjvm.so+0x171ea44] SharedRuntime::monitor_enter_helper(oopDesc*, BasicLock*, JavaThread*)+0x54 V [libjvm.so+0x171eee1] SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*, JavaThread*)+0x41 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_complete_monitor_locking_Java J 598% c2 org.openjdk.jcstress.tests.seqcst.sync.L1_S2__L1_S1__S1_L2_Test_jcstress.actor3()Lorg/openjdk/jcstress/util/Counter; (113 bytes) @ 0x00007f5f21314820 [0x00007f5f21313f20+0x0000000000000900] j org.openjdk.jcstress.tests.seqcst.sync.L1_S2__L1_S1__S1_L2_Test_jcstress$$Lambda$45+0x000000080100ed78.call()Ljava/lang/Object;+4 j java.util.concurrent.FutureTask.run()V+39 java.base@16.0.1 j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 java.base@16.0.1 j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@16.0.1 j java.lang.Thread.run()V+11 java.base@16.0.1 v ~StubRoutines::call_stub siginfo: si_signo: 4 (SIGILL), si_code: 2 (ILL_ILLOPN), si_addr: 0x00007f5f39a95080
09-04-2021

Yes, it does not look like a HotSpot issue but more like a OS/kernel/hardware/... bug. I was just wondering if you guys observed similar failures or if I'm missing anything in that synchronization code that could cause/explain this.
08-04-2021

[~thartmann] it doesn't look like a runtime issue either! A corrupt pc? :)
07-04-2021

I've checked the core file and we are crashing because of an invalid pc = 0x00007f154f417c80 that points in-between instructions: 0x7f154f417c70: mov -0x98(%rbp),%r15 0x7f154f417c77: lea -0x50(%rbp),%rdi 0x7f154f417c7b: mov %rdi,-0x98(%rbp) 0x7f154f417c82: xor $0x2,%r15 Not sure how this happened. We are in ObjectSynchronizer::quick_enter: Stack: [0x00007f151d9f7000,0x00007f151daf8000], sp=0x00007f151daf65a0, free space=1021k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x17ecc80] ObjectSynchronizer::quick_enter(oop, JavaThread*, BasicLock*)+0x1a0 V [libjvm.so+0x16cc685] SharedRuntime::monitor_enter_helper(oopDesc*, BasicLock*, JavaThread*)+0x55 V [libjvm.so+0x16cc9c0] SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*, JavaThread*)+0x20 The return address in the caller (SharedRuntime::monitor_enter_helper at libjvm.so+0x16cc685) is valid: 0x7f154f2f7680: callq 0x7f154f417ae0 0x7f154f2f7685: cmpb $0x0,(%r12) And the code in ObjectSynchronizer::quick_enter (starting at 0x7f154f417ae0) looks good as well. This does not look like a compiler issue. I'm moving this to hotspot/runtime for further investigation.
07-04-2021