JDK-8199882 : compiler/uncommontrap/TestDeoptOOM.java failed w/ fatal error: ExceptionMark constructor expects no pending exceptions
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2018-03-20
  • Updated: 2018-06-27
  • Resolved: 2018-06-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11
11 b17Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/exceptions.cpp:484
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/opt/mach5/mesos/work_dir/cf88bcc3-69c5-4720-92b9-df6e59171b59/workspace/open/src/hotspot/share/utilities/exceptions.cpp:484), pid=43032, tid=11
#  fatal error: ExceptionMark constructor expects no pending exceptions
#
# JRE version: Java(TM) SE Runtime Environment (11.0) (fastdebug build 11-internal+0-2018-03-20-0603290.vm-sqe-notificationswwgrp.pit180315)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 11-internal+0-2018-03-20-0603290.vm-sqe-notificationswwgrp.pit180315, compiled mode, tiered, compressed oops, g1 gc, solaris-sparc)
# Core dump will be written. Default location: /scratch/opt/mach5/mesos/work_dir/87b54363-625e-48c3-974e-406a876fcc2d/testoutput/jtreg/JTwork/scratch/0/core or core.43032
#
# An error report file with more information is saved as:
# /scratch/opt/mach5/mesos/work_dir/87b54363-625e-48c3-974e-406a876fcc2d/testoutput/jtreg/JTwork/scratch/0/hs_err_pid43032.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Comments
Proceeding with option 3. Testing is problematic. Need to force a thread dump very early in VM startup.
31-05-2018

Ah option 4 means that AOS should only be accessed when it's loaded (thanks for clarification). java_util_concurrent_locks_AbstractOwnableSynchronizer::initialize is a misnomer and all it does is to load AOS class and compute the offset of the exclusiveOwnerThread field. I think preloading AOS should be adequate for option 3.
30-05-2018

[~mchung] Note that option 4 doesn't have to deal with OOME it only needs to deal with the AOS class possibly not being loaded. So this: if (_blocker_object != NULL && _blocker_object->is_a(SystemDictionary::abstract_ownable_synchronizer_klass())) { _blocker_object_owner = java_util_concurrent_locks_AbstractOwnableSynchronizer::get_owner_threadObj(_blocker_object); } becomes: if (_blocker_object != NULL && + AOSClassIsInitialized() && _blocker_object->is_a(SystemDictionary::abstract_ownable_synchronizer_klass())) { _blocker_object_owner = java_util_concurrent_locks_AbstractOwnableSynchronizer::get_owner_threadObj(_blocker_object); } where AOSClassIsInitialized() has to be implemented by some means. The current cached abstract_ownable_synchronizer_klass() won't work because we'd be removing the code that set that. So we'd need to query the SystemDictionary more directly. Option 3 is also slightly more complicated than first thought because SystemDictionary::initialize_preloaded_classes doesn't actually initialize any of the classes. It really means process_preloaded_classes. But we should be able to inject AOS initialization somewhere. Thanks.
30-05-2018

I think option 3 would be the best and simplest solution. The main reason why AbstractOwnableSynchronizer was lazily loaded in the fix for JDK-8154589 was to support mix-n-match of hotspot VM running on older JDK releases (hotspot-express like model). This is no longer the case. I also think that loading AbstractOwnableSynchronizer would have negligible startup impact. Option 4 should work too but adding the guard to prepare for OOME would make the code cumbersome. The use of AOS is solely for thread dump support (via SIGBREAK and java.lang.management.ThreadMXBean) to get the owner of AbstractOwnableSynchronizer. The current implementation calls is_a(SystemDictionary::abstract_ownable_synchronizer_klass()) to ensure that the lock object is an instance of AbstractOwnableSynchronizer before calling get_owner_threadObj function.
29-05-2018

[~mchung] Mandy: I'd appreciate your opinion here given your involvement in the original code. (I'd need you to review option 4 regardless :) ). Thanks.
13-05-2018

There are 4 ways to fix this: 1. Clear the pending exception and issue a warning so the fact it occurred is not lost. This is somewhat crude, but simple. 2. Fix all the code involved to be exception aware and to propagate the exception back to a level where it can be safely thrown. The implications of this are unclear without attempting it. But if we are in the service thread then it's exception reporting may not be adequate in any case. 3. Avoid the problem by preloading and initializing the AbstractOwnableSynchronizer class. This is quite simple, but needs to be examined for any startup impact. (Though discussions with Claes suggest it is unlikely to be observable). 4. Fix the code that uses AbstractOwnableSynchronizer to anticipate that the class may not be loaded. If the class is not loaded there can not possibly be any AOS instances to report in the lock or stack dump. This requires the ability to query the SD to ask if the class is loaded, but otherwise is quite simple and could lead to the removal of the code for lazily initializing the class.
13-05-2018

I inadvertently caused this in the fix for JDK-8154589. I didn't recognize then that simply punting the exception back to the caller wasn't going to work.
09-05-2018

This llogic was initially added by JDK-5086470, but no exception checking was performed.
09-05-2018

Or we stop lazily loading AbstractOwnableSynchronizer and do it at VM initialization time. That's much simpler/cleaner provided there is no startup impact.
09-05-2018

That sounds very reasonable to me.
22-03-2018

All of VM_PrintThreads, VM_ThreadDump, and VM_FindDeadLocks can leave an exception pending due to the call to: java_util_concurrent_locks_AbstractOwnableSynchronizer::initialize(jt); Looking at the use sites for those VM ops, the sites are generally completely oblivious to the potential for exceptions to be left pending at: VMThread::execute(&op); They don't expect them and they don't check for them, but continue doing whatever it is they were doing. In some cases the caller will immediately detect and throw the exception. In some cases another action may encounter an exception and cause an early return (through CHECK macro), whilst in the current problematic case the exception would be left pending for a later ExceptionMark to trip over. I think the best solution would be to issue a warning if the exception is pending and then clear it, within doit_prologue, and return false so the op itself does not execute.
22-03-2018

Yes, the simple scenario that you've described will trigger the crash. And yes, we should probably fix VM_PrintThreads. I'll leave this to the runtime team to fix, you guys know that code better.
21-03-2018

Moving to hotspot/runtime (I can take care of the fix though).
21-03-2018

I'm tempted to fix VM_PrintThreads::doit_prologue(). It can't do classloading and completely ignore the possibilities of exceptions occurring
21-03-2018

Simple scenario: - send SIGBREAK to do a thread dump - PrintThreads triggers OOME - send attach command - AttachListener::init encounters pending OOME at ExceptionMark constructor.
21-03-2018

Edit: never mind, I missed the prolog part. That is interesting I don't think VM_operation code is supposed to allow exceptions to escape either. I don't think most code expects Vm op code to execute Java code or be able to encounter exceptions! BTW this seems like a runtime issue. :)
21-03-2018

I think this should be fixed by always executing the HAS_PENDING_EXCEPTION check that is now only used for signals != SIGBREAK: http://cr.openjdk.java.net/~thartmann/8199882/webrev.00/
21-03-2018

I was able to reproduce this by making the following modification to the VM code: diff -r 1708db7f94c6 src/hotspot/share/classfile/javaClasses.cpp --- a/src/hotspot/share/classfile/javaClasses.cpp Wed Mar 21 08:18:54 2018 +0100 +++ b/src/hotspot/share/classfile/javaClasses.cpp Wed Mar 21 13:51:00 2018 +0100 @@ -4372,6 +4372,9 @@ if (_owner_offset != 0) return; SystemDictionary::load_abstract_ownable_synchronizer_klass(CHECK); + // Let's assume above fails + Exceptions::_throw_oop(THREAD_AND_LOCATION, Universe::out_of_memory_error_java_heap()); + InstanceKlass* k = SystemDictionary::abstract_ownable_synchronizer_klass(); compute_offset(_owner_offset, k, "exclusiveOwnerThread", vmSymbols::thread_signature()); And setting the test timeout to 10 seconds. Here is the output with -Xlog:attach=Trace: OOM caught in m1 OOM caught in m2_1 OOM caught in m3_1 Timeout refired 100 times [120.180s][trace][attach] Failed to find attach file: .attach_pid41549, trying alternate Exception in VM (AttachListener::init) : java.lang.OutOfMemoryError: Java heap space [125.564s][trace][attach] Failed to find attach file: .attach_pid41549, trying alternate Exception in VM (AttachListener::init) : java.lang.OutOfMemoryError: Java heap space [127.025s][trace][attach] Failed to find attach file: .attach_pid41549, trying alternate Exception in VM (AttachListener::init) : java.lang.OutOfMemoryError: Java heap space [128.105s][trace][attach] Failed to find attach file: .attach_pid41549, trying alternate Exception in VM (AttachListener::init) : java.lang.OutOfMemoryError: Java heap space [129.187s][trace][attach] Failed to find attach file: .attach_pid41549, trying alternate Exception in VM (AttachListener::init) : java.lang.OutOfMemoryError: Java heap space [130.685s][trace][attach] Failed to find attach file: .attach_pid41549, trying alternate [130.685s][debug][attach] Failed to find attach file: /tmp/.attach_pid41549 JNI global refs: 6, weak refs: 0 Heap garbage-first heap total 131072K, used 130313K [0x00000007b8000000, 0x00000007c0000000) region size 1024K, 0 young (0K), 0 survivors (0K) Metaspace used 6681K, capacity 6766K, committed 6912K, reserved 1056768K class space used 438K, capacity 462K, committed 512K, reserved 1048576K [134.372s][trace][attach] Failed to find attach file: .attach_pid41549, trying alternate java.lang.OutOfMemoryError {0x00000007b81503e8} - klass: 'java/lang/OutOfMemoryError' - ---- fields (total size 5 words): - private transient 'backtrace' 'Ljava/lang/Object;' @12 NULL (0) - private 'detailMessage' 'Ljava/lang/String;' @16 "Java heap space"{0x00000007b80616a0} (f700c2d4) - private 'cause' 'Ljava/lang/Throwable;' @20 NULL (0) - private 'stackTrace' '[Ljava/lang/StackTraceElement;' @24 NULL (0) - private strict 'suppressedExceptions' 'Ljava/util/List;' @28 NULL (0) - private transient 'depth' 'I' @32 0 # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/exceptions.cpp:484 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/scratch/opt/mach5/mesos/work_dir/7f6a1e03-5442-4acb-a822-2fad50d7fe7f/workspace/open/src/hotspot/share/utilities/exceptions.cpp:484), pid=41549, tid=11 # fatal error: ExceptionMark constructor expects no pending exceptions After multiple failed attempts to initialize the AttachListener, the .attach_pid file in /tmp/ is not found (probably because the /tmp/ directory was cleared by the OS or some other application). As a result, the signal handling code (see signal_thread_entry()) assumes that the SIGBREAK was not send to start the AttachListener and prints stack traces. The VM_PrintThreads code then fails with an OOME when loading a Klass: VMThread::execute() -> VM_PrintThreads::doit_prologue() -> java_util_concurrent_locks_AbstractOwnableSynchronizer::initialize() -> SystemDictionary::load_abstract_ownable_synchronizer_klass(CHECK); That's why thread printing is skipped in the log file. The pending OOME is *not* cleared and the next attempt to start the attach listener via SIGBREAK fails.
21-03-2018

This is very intermittent: I've tried to reproduce this by executing 100 runs of the same test/configuration on Solaris Sparc - no luck. Looking at the log file, I think the following happens: - The machine is very slow and the test times out after 1200000 ms - jtreg then tries to gather additional timeout information by attaching jstack and this triggers initialization of the AttachListener - Since the test deliberately fills up the Java heap, most of the attempts to initialize fail with: jib > Exception in VM (AttachListener::init) : jib > java.lang.OutOfMemoryError: Java heap space - It then seems that at least some of the AttachListener code is executed because the VM starts printing JNI and heap information: jib > JNI global refs: 6, weak refs: 0 jib > jib > Heap jib > garbage-first heap total 131072K, used 130318K [0x00000007b8000000, 0x00000007c0000000) jib > region size 1024K, 0 young (0K), 0 survivors (0K) jib > Metaspace used 6694K, capacity 6768K, committed 6912K, reserved 1056768K jib > class space used 438K, capacity 464K, committed 512K, reserved 1048576K - Since the thread dumping is missing, I would assume that it fails with an OOME. But I'm not sure yet why that code is even executed when the AttachListener is not yet initialized. - The next attempt to initialize the AttachListener thread fails because there is a pending OOME I don't think this is related to my fix for JDK-8198826 but was probably triggered by the -XX:+VerifyStack option that I've added to the test (because the test is slower now). ILW = Assert in attach listener initialization code because of pending OOME, single test in hs-tier7 but very intermittent, no workaround = MMH = P3
21-03-2018

attached hs_err file
20-03-2018