JDK-8227605 : Kitchensink fails "assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant"
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 11,12,13
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: x86_64
  • Submitted: 2019-07-11
  • Updated: 2020-04-27
  • Resolved: 2019-07-31
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14 Other
11.0.6Fixed 13 b32Fixed 14Fixed openjdk8u262Fixed
Related Reports
Relates :  
Description
The following test crashed in the JDK13 CI:

applications/kitchensink/Kitchensink.java

Here's a snippet from the log file:

[stress.process.out] # A fatal error has been detected by the Java Runtime Environment:
[stress.process.out] #
[stress.process.out] #  Internal Error (open/src/hotspot/share/jfr/recorder/checkpoint/types/traceid/jfrTraceId.inline.hpp:80), pid=32110, tid=548
[stress.process.out] #  assert((((klass)->trace_id() & (JfrTraceIdEpoch::leakp_in_use_this_epoch_bit())) != 0)) failed: invariant
[stress.process.out] #
[stress.process.out] # JRE version: Java(TM) SE Runtime Environment (13.0) (fastdebug build 13-ea+0-1449)
[stress.process.out] # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 13-ea+0-1449, compiled mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
[stress.process.out] # Problematic frame:
[stress.process.out] # V  [libjvm.so+0xe809c0]  JfrStackTrace::record_safe(JavaThread*, int, bool)+0xd30
[stress.process.out] #
[stress.process.out] # Core dump will be written. Default location: /scratch/opt/mach5/mesos/work_dir/slaves/dd0bab7c-a530-4905-bb6b-daa7153eb2c1-S223/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/21fbedd3-1217-44a1-a223-5f70460c420d/runs/773623ba-d13d-4fc5-a9ed-80922b73ffa0/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_kitchensink_Kitchensink_java/scratch/0/core.32110
[stress.process.out] #
[stress.process.out] Unsupported internal testing APIs have been used.
[stress.process.out] 
[stress.process.out] # An error report file with more information is saved as:
[stress.process.out] # /scratch/opt/mach5/mesos/work_dir/slaves/dd0bab7c-a530-4905-bb6b-daa7153eb2c1-S223/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/21fbedd3-1217-44a1-a223-5f70460c420d/runs/773623ba-d13d-4fc5-a9ed-80922b73ffa0/testoutput/test-support/jtreg_closed_test_hotspot_jtreg_applications_kitchensink_Kitchensink_java/scratch/0/hs_err_pid32110.log
[stress.process.out] #
[stress.process.out] # If you would like to submit a bug report, please visit:
[stress.process.out] #   http://bugreport.java.com/bugreport/crash.jsp
[stress.process.out] #
[picker.process.out] [2019-07-11T19:54:55.481267Z] Gathering output for process 662
Comments
Replacing jdk8u-fix-request with link to JDK-8239140
17-02-2020

RFC: https://mail.openjdk.java.net/pipermail/jdk8u-dev/2020-January/011063.html
30-01-2020

I'll run it through SAP's test queue and will sponsor it.
01-10-2019

Fix Request This patch applies cleanly to 11u.
19-09-2019

Fix Request: Should be fixed in order to not lose tag information in JFR and also in order to have Kitchensink not trap on this assertion during testing. Risk is low (restore previous behaviour). Testing: jdk_jfr tests, Kitchensink (short) Reviewed by: dholmes, dcubed, egahlin
31-07-2019

Fix request approved.
31-07-2019

URL: https://hg.openjdk.java.net/jdk/jdk13/rev/5f3b05a36d61 User: mgronlun Date: 2019-07-31 14:15:18 +0000
31-07-2019

Markus and I discussed this and we believe the missing CAS is the reason the assert fails. Reassigning to Markus for now as he like to take wider view at the mechanism.
29-07-2019

When threads come out of a safepoint, caused by a chunk rotation, and previous epoch bit is to be cleared, it could read 10 and that we try to set it to 00 (using a mask). At the same time, an allocation thread is tagging the class in the current epoch. That is, trying to set 01. When the allocation thread sees 00, it will trigger the assert. as it expects 01. I have created a reproducer that will trigger the assertion within 5 seconds by calling into native hammering on a traceid by using the same macros as used by the sampler and artifacts clearer. If I change so that the clearer uses CAS, so it it always succeeds in writing 00, and change the sampler thread so it sets both epoch bits, the problem disappear in the reproducer. Another way to fix it is to capture the stack trace under sampler lock, but it will increase contention.
23-07-2019

I think I know what goes wrong, will try to implement a fix.
23-07-2019

ILW = HLH = P2
11-07-2019

Attached the hs_err_pid as test-support_jtreg_closed_test_hotspot_jtreg_applications_kitchensink_Kitchensink_java_applications_kitchensink_Kitchensink_hs_err_pid32110.log Here's the crashing thread's stack trace: --------------- T H R E A D --------------- Current thread (0x00007f987451b800): JavaThread "Thread-19" [_thread_in_vm, id=548, stack(0x00007f982c5d2000,0x00007f982c6d3000)] Stack: [0x00007f982c5d2000,0x00007f982c6d3000], sp=0x00007f982c6cfcc0, free space=1015k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xe809c0] JfrStackTrace::record_safe(JavaThread*, int, bool)+0xd30 V [libjvm.so+0x13b5686] ObjectSampler::sample(HeapWordImpl**, unsigned long, JavaThread*)+0x2a6 V [libjvm.so+0x5c7b96] AllocTracer::send_allocation_in_new_tlab(Klass*, HeapWordImpl**, unsigned long, unsigned long, Thread*)+0x36 V [libjvm.so+0x12a91e7] MemAllocator::Allocation::notify_allocation_jfr_sampler()+0x77 V [libjvm.so+0x12aabd9] MemAllocator::allocate() const+0x1a9 V [libjvm.so+0xdb99cd] InstanceKlass::allocate_instance(Thread*)+0x9d V [libjvm.so+0x15453a4] OptoRuntime::new_instance_C(Klass*, JavaThread*)+0x6a4 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) v ~RuntimeStub::_new_instance_Java J 18195 c2 spec.jbb.Warehouse.retrieveStock(I)Lspec/jbb/Stock; (17 bytes) @ 0x00007f9919239160 [0x00007f9919239040+0x0000000000000120] J 20071 c2 spec.jbb.DeliveryTransaction.preprocess()Z (482 bytes) @ 0x00007f99193d5038 [0x00007f99193d44c0+0x0000000000000b78] J 20214 c2 spec.jbb.DeliveryTransaction.process()Z (84 bytes) @ 0x00007f9919430040 [0x00007f991942fe40+0x0000000000000200] J 20088 c2 spec.jbb.TransactionManager.goManual(ILspec/jbb/TimerData;)J (146 bytes) @ 0x00007f99193e07d4 [0x00007f99193e03e0+0x00000000000003f4] J 20405% c2 spec.jbb.TransactionManager.go()V (694 bytes) @ 0x00007f9919499edc [0x00007f9919499ba0+0x000000000000033c] J 19701 c1 spec.jbb.JBBmain.run()V (114 bytes) @ 0x00007f9912e605f4 [0x00007f9912e5e580+0x0000000000002074] J 19068 c1 java.lang.Thread.run()V java.base@13-ea (17 bytes) @ 0x00007f9912c45304 [0x00007f9912c451e0+0x0000000000000124] v ~StubRoutines::call_stub
11-07-2019