JDK-8163511 : Allocation of compile task fails with assert: "Leaking compilation tasks?"
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8,9,10,11,12,13
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • CPU: generic
  • Submitted: 2016-08-09
  • Updated: 2020-04-22
  • Resolved: 2019-03-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13
11.0.5Fixed 13 b12Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Weblogic testrun failed on i586 with assert:

#  Internal Error (/scratch/opt/jprt/T/P1/205047.iggy/s/hotspot/src/share/vm/compiler/compileTask.cpp:51), pid=7798, tid=9411
#  assert(WhiteBoxAPI || _num_allocated_tasks < 10000) failed: Leaking compilation tasks?

no aot-compiled code was used.
Comments
Attached patch is a work in progress jdk8 patch. The test was tweaked so the failure reproduces. Backporting the fix to 8u is complicated. In particular, turning a weak reference to a strong reference from compiler threads doesn't seem to be possible with 8. Given the complexity of the backport, I'm putting it on hold, at least for now.
27-08-2019

Fix request This fixes a leak-like-scenario (see above). The original fix does not apply cleanly because _failure_reason handling is not in 11. The provided regression test passes with the backported fix, and fails without. Tier1 testing passes fine. See review thread for the backport: https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2019-August/001553.html
06-08-2019

I've attached a simple regression test (Test.java) that reproduces the issue with a low compile threshold. Classes are not unloaded because methods are stuck in the compile queue and as a result the queue keeps filling up with stale methods.
22-02-2019

Yes. Actually, this was an attempt to form a negative interrogative sentence [1], implying that of course we should :) [1] https://www.thefreedictionary.com/Negative-Interrogative-Sentences.htm
22-02-2019

"shouldn't" --> "should"
21-02-2019

I've checked in more detail and the duplicate methods in the queue come from the same class/method loaded by different class loaders. This is because the test suite runs multiple tests at the same time in the same VM by loading them with different class loaders and executing them in different threads. Many of these tests use the same framework and therefore the same methods are enqueued for compilation again and again (up to 100x). The bug shows up more often when running with -XX:-TieredCompilation which is due to a lower number of compiler threads and a different threshold policy that does not remove stale (unused) methods from the compile queue. But the same problem can and does happen with TieredCompilation enabled. My current hypothesis is that this is just due to the test overloading the VM. The compiler threads are unable to keep up with the high number of compilation requests. I'm actually wondering how class unloading works with that many methods being referenced from compile tasks in the queue. After a test has been executed, the class loader becomes dead and therefore all test klasses/methods should be unloaded. Seems like the compile tasks keep these klasses alive through global handles (CompileTask::initialize -> JNIHandles::make_global). I wonder if we shouldn't just remove these methods from the queue on unloading.
21-02-2019

I've added some debug code: http://cr.openjdk.java.net/~thartmann/8163511/webrev.00/ I'm able to reproduce the problem with -XX:-TieredCompilation and a single C2 compiler thread. When we fail, the compile queue contains 9998 elements and the compiler thread already works for 1293 ms on compiling a method that was enqueued for 853653 ms. Therefore, I don't think there is a deadlock or endless loop but the compile queue was just filling up because C2 didn't catch up with compilation. Looking at the printed compile queue, I can see duplicate entries though. Will investigate more.
18-02-2019

Yes, that's what I plan to implement.
18-02-2019

Maybe we should add some basic compile queue information to the hs_err-files, and running times on the active compilation tasks would give an indication if any task isn't making progress.
18-02-2019

Okay, thanks. I'll take a look.
18-02-2019

Yes, I don't have time to look on it now. If someone can volunteer to look on this it would be nice. Snippets from my observation in previous comments: 1. Assert hit when we have > 10K compilation tasks in queue. 2. There are a lot of compilation requests > 160K. And C2 may not keep with requests so queue is growing. 3. But most concerning is I see it that almost all threads (including all compiler threads) are blocked (at least when hs_err is dumped). Note, the thread which holds CodeCache_lock and Compile_lock is also blocked. So it could be some synchronization bottleneck.
15-02-2019

This happens frequently in RunThese30M too.
15-02-2019

Updated ILW = Assert in compile broker (should not affect product), frequent in CI testing (generates noise), no workaround = MMH = P3
15-02-2019

We see this a lot in the Valhalla CI and also quite frequently in the jdk CI. Raising priority, targeting for JDK 13 and removing starter label. Vladimir, please un/re-assign if you are not planning to work on this.
15-02-2019

Is this really a starter bug?
16-01-2019

I attached hs_err_pid29462.log from last failure. There are 10 C2 threads. VM state:not at safepoint (normal execution) VM Mutex/Monitor currently owned by a thread: ([mutex/lock_event]) [0x000000010012c840] CodeCache_lock - owner thread: 0x0000000119847000 [0x000000010012ede0] CompileTaskAlloc_lock - owner thread: 0x0000000119009800 [0x000000010012f320] Compile_lock - owner thread: 0x0000000119847000 [0x000000010012f5c0] MethodCompileQueue_lock - owner thread: 0x0000000119009800
15-08-2018

I think a variation of this issue was just tickled in an internal test: Crash: Internal Error ...compileTask.cpp...assert(WhiteBoxAPI || UseJVMCICompiler || _num_allocated_tasks < ...) failed: Leaking compilation tasks?
15-08-2018

Hmm, it is not safepoint: VM state:not at safepoint (normal execution) But almost all threads (including all compiler threads) are blocked. Note, the thread which holds CodeCache_lock and Compile_lock is also blocked: 0x00000001189c0000 JavaThread "/opt/mach5/mesos/work_dir/jib-master/install/com/oracle/jpg/bigapps/jck/9.0/jck-9.0-20170922.165600-1.tar.gz/JCK-runtime-9/classes javasoft.sqe.tests.api.java.util.Collections.synchronizd.set.SynchronizedSetParallelStream" daemon [_thread_blocked, id=18345, stack(0xffffffff03700000,0xffffffff03800000)]
12-07-2018

Recent compilations: Event: 1563.436 Thread 0x0000000100879000 nmethod 158734 0xffffffff6980eb90 code [0xffffffff6980ed80, 0xffffffff6980f0d0] Event: 1563.437 Thread 0x0000000100879000 159473 % 4 javasoft.sqe.tests.api.java.util.stream.testdata.TestDataBase::getDataTagCloud @ 794 (895 bytes) Event: 1563.951 Thread 0x00000001018e9000 nmethod 158300 0xffffffff69885f10 code [0xffffffff698862e0, 0xffffffff69887a30] Event: 1563.951 Thread 0x00000001018e9000 160226 % 4 javasoft.sqe.tests.api.java.util.stream.ASDFactory$StreamProducer::lambda$callProduce$0 @ 19 (74 bytes) Event: 1565.684 Thread 0x00000001018e2000 nmethod 159472 0xffffffff69883f90 code [0xffffffff69884260, 0xffffffff69884d28] Event: 1565.738 Thread 0x00000001018e2000 158733 ! 4 com.oracle.tck.lib.autd2.AUTD2Utils::iterateThroughProcessorsUntilAllAreDone (207 bytes) Event: 1565.738 Thread 0x00000001018b7000 nmethod 159362 0xffffffff69ad4b90 code [0xffffffff69ad4e60, 0xffffffff69ad5db8] Event: 1565.738 Thread 0x00000001018b7000 159069 4 com.oracle.tck.lib.autd2.AUTD2Utils$ProcessorsDependenciesSorter::simulateRun (111 bytes) Event: 1566.398 Thread 0x0000000101899800 nmethod 158418% 0xffffffff62a84d10 code [0xffffffff62a86280, 0xffffffff62a953b0] Event: 1566.407 Thread 0x0000000101899800 160117 3 javasoft.sqe.tests.api.java.util.stream.testdata.TestDataBase::getDataTagCloud (895 bytes) Note we have 160K compilation requests and that into 25min of execution. It could be reality that 10 C2 threads can't keep up with requests. And this is normal execution, not -Xcomp.
12-07-2018

It is T5: CPU:total 63 (initial active 63) v9, popc, vis1, vis2, blk_init, fmaf, hpc, ima, aes, des, cam, md5, sha1, sha256, sha512, mpmul, mont, pause, cbcond, crc32c, vis3b, adi, sparc5, mwait, xmpmul, xmont, pause_nsec, vamask, *idiv, *rdpc, *ld, *ind_br, *blk_zeroing
12-07-2018

CodeCache has enough space CodeHeap 'non-profiled nmethods': size=118784Kb used=12110Kb max_used=12433Kb free=106673Kb bounds [0xffffffff69400000, 0xffffffff6a040000, 0xffffffff70800000] CodeHeap 'profiled nmethods': size=118784Kb used=10492Kb max_used=10492Kb free=108291Kb bounds [0xffffffff62000000, 0xffffffff62aa0000, 0xffffffff69400000] CodeHeap 'non-nmethods': size=8192Kb used=5745Kb max_used=5773Kb free=2446Kb bounds [0xffffffff61800000, 0xffffffff61dc0000, 0xffffffff62000000] total_blobs=12457 nmethods=3984 adapters=1796 compilation: enabled stopped_count=0, restarted_count=0 full_count=0
12-07-2018

We have 5 C1 and 10 C2 threads. And all doing compilation: Threads with active compile tasks: C2 CompilerThread0159473 % 4 javasoft.sqe.tests.api.java.util.stream.testdata.TestDataBase::getDataTagCloud @ 794 (895 bytes) C1 CompilerThread0158879 ! 2 com.oracle.tck.lib.autd2.AUTD2Utils::iterateThroughProcessorsUntilAllAreDone (207 bytes) C1 CompilerThread1159526 2 javasoft.sqe.tests.api.java.util.stream.testdata.TestDataBase::getDataTagCloud (895 bytes) C1 CompilerThread2160100 3 jdk.internal.module.Checks::requireModuleName (137 bytes) C1 CompilerThread3160132 % 3 jdk.internal.module.Checks::requireTypeName @ 33 (178 bytes) C1 CompilerThread4160117 3 javasoft.sqe.tests.api.java.util.stream.testdata.TestDataBase::getDataTagCloud (895 bytes) C2 CompilerThread1159069 4 com.oracle.tck.lib.autd2.AUTD2Utils$ProcessorsDependenciesSorter::simulateRun (111 bytes) C2 CompilerThread2158733 ! 4 com.oracle.tck.lib.autd2.AUTD2Utils::iterateThroughProcessorsUntilAllAreDone (207 bytes) C2 CompilerThread3154787 4 javasoft.sqe.tests.api.java.util.stream.ASDFactory$StreamProducer::lambda$callProduce$0 (74 bytes) C2 CompilerThread4159873 4 java.util.EnumSet::copyOf (84 bytes) C2 CompilerThread5160226 % 4 javasoft.sqe.tests.api.java.util.stream.ASDFactory$StreamProducer::lambda$callProduce$0 @ 19 (74 bytes) C2 CompilerThread6157286 4 javasoft.sqe.tests.api.java.util.stream.testdata.TestDataBase::getDataTagCloud (895 bytes) C2 CompilerThread7159123 4 com.oracle.tck.lib.autd2.AUTD2Utils$ProcessorsDependenciesSorter::doSort (147 bytes) C2 CompilerThread8152003 4 javasoft.sqe.tests.api.java.util.stream.ASDFactory$StreamProducer::lambda$callProduce$0 (74 bytes) C2 CompilerThread9159846 4 java.lang.invoke.StringConcatFactory::doStringConcat (526 bytes) We do observed in other tests compiler threads are too slow on SPARC for some reasons. So it could be really we hit 10000 waiting compilation tasks in such case.
12-07-2018

I attached new hs_err file JTwork_applications_runthese_RunThese30M_hs_err_pid2113.log which have all information this time. Again compilation locks are hold by 2 threads: VM Mutex/Monitor currently owned by a thread: ([mutex/lock_event]) [0x000000010012c760] CodeCache_lock - owner thread: 0x00000001189c0000 [0x000000010012ed00] CompileTaskAlloc_lock - owner thread: 0x000000011879f800 [0x000000010012f240] Compile_lock - owner thread: 0x00000001189c0000 [0x000000010012f4e0] MethodCompileQueue_lock - owner thread: 0x000000011879f800 0x000000011879f800 is Java thread which requested compilation and hit assert: Current thread (0x000000011879f800): JavaThread "/opt/mach5/mesos/work_dir/jib-master/install/com/oracle/jpg/bigapps/jck/9.0/jck-9.0-20170922.165600-1.tar.gz/JCK-runtime-9/classes javasoft.sqe.tests.api.java.util.stream.IntStream.IterateFinite" daemon [_thread_in_vm, id=17975, stack(0xffffffff14200000,0xffffffff14300000)] Stack: [0xffffffff14200000,0xffffffff14300000], sp=0xffffffff142f0c50, free space=963k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x20274e8] void VMError::report_and_die(int,const char*,const char*,void*,Thread*,unsigned char*,void*,void*,const char*,int,unsigned long)+0xa28 V [libjvm.so+0x2026a4c] void VMError::report_and_die(Thread*,void*,const char*,int,const char*,const char*,void*)+0x3c V [libjvm.so+0xe71760] void report_vm_error(const char*,int,const char*,const char*,...)+0x80 V [libjvm.so+0xd84a7c] CompileTask*CompileTask::allocate()+0x19c V [libjvm.so+0xd7b940] CompileTask*CompileBroker::create_compile_task(CompileQueue*,int,const methodHandle&,int,int,const 0x00000001189c0000 is an other Java thread: 0x00000001189c0000 JavaThread "/opt/mach5/mesos/work_dir/jib-master/install/com/oracle/jpg/bigapps/jck/9.0/jck-9.0-20170922.165600-1.tar.gz/JCK-runtime-9/classes javasoft.sqe.tests.api.java.util.Collections.synchronizd.set.SynchronizedSetParallelStream" daemon [_thread_blocked, id=18345, stack(0xffffffff03700000,0xffffffff03800000)]
12-07-2018

I looked on test output in artifactory and nothing related to compilation (like codecahce full or something else). hs_err file shows a lot (167244 count) of compilation requests: Event: 1653.356 Thread 0x00000001029e8800 167244 ! 4 javasoft.sqe.tests.api.java.util.stream.Stream.StreamAssert::checkElementsRegardingOrder (89 bytes) Also it seems it is tier 4, C2, compilation request. Number of compiler threads is set to 15: intx CICompilerCount = 15 {product} {ergonomic} Number of compiler threads to run With ratio 2:1 it means we should have 10 C2 and 5 C1. Based on hs_err file all compilers threads are alive. So it is not regression from dynamic compiler threads changes. One thing I noticed is that all compiler threads are blocked when VM crashed with CodeCache_lock and Compile_lock locks held by 0x0000000123370800 thread: VM state:not at safepoint (normal execution) VM Mutex/Monitor currently owned by a thread: ([mutex/lock_event]) [0x000000010012b5e0] CodeCache_lock - owner thread: 0x0000000123370800 [0x000000010012db80] CompileTaskAlloc_lock - owner thread: 0x000000011f6f7800 [0x000000010012e0c0] Compile_lock - owner thread: 0x0000000123370800 [0x000000010012e360] MethodCompileQueue_lock - owner thread: 0x000000011f6f7800 I can't find such thread in the threads list because list printing error: [error occurred during error reporting (printing all threads), id 0xe0000000] Someone holding lock which may prevent compilations.
30-05-2018

The only thing I got from log file is your job name 2018-05-29-0408293.David.Holmes.jdk-jep181. I don't know how to navigate to test failure from log link :(
30-05-2018

[~dholmes] David, please, attach whole hs_err file to bug report. It could happen in VM with small number of compiler threads which can't catch up with compilation requests.
29-05-2018

P4 seems low for a crash like this.
29-05-2018

Does it make a sense to stop running RunThese on SPARC until this issue is fixed?
26-04-2018

Also happening on Solaris/sparc, so I am removing the "i586" reference from the synopsis. I also removed reference to "WebLogic" from synopsis because the crash happens with other test cases. # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/scratch/opt/mach5/mesos/work_dir/ead3677b-23e4-48c2-96be-104d8fdeebf8/workspace/open/src/hotspot/share/compiler/compileTask.cpp:53), pid=52256, tid=354 # assert(WhiteBoxAPI || UseJVMCICompiler || _num_allocated_tasks < 10000) failed: Leaking compilation tasks? # # JRE version: Java(TM) SE Runtime Environment (10.0) (fastdebug build 10-internal+0-2017-10-25-2125148.jesper.wilhelmsson.hs) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 10-internal+0-2017-10-25-2125148.jesper.wilhelmsson.hs, compiled mode, tiered, compressed oops, g1 gc, solaris-sparc) # Core dump will be written. Default location: /scratch/opt/mach5/mesos/work_dir/a3c80198-e8e4-40bb-8d6a-b11be58fd9dc/testoutput/tonga/mach5-one.SunOS.sparcv9/thread_large/core or core.52256
25-10-2017

Here is the another example of same failure. Test creates about 10000 threads and fails. Link: http://aurora.us.oracle.com/functional/faces/RunDetails.xhtml?names=2333014.rbt-leonid.mesnik-jdk9-20170708-0032-42261-804#vm.runtime%20%5B(!exclude)&&(!quarantine)&&(!desktop)%5D%20(tonga)_runtime/threads/thread_large RULE "runtime/threads/thread_large" Crash Internal Error ...compileTask.cpp...assert(WhiteBoxAPI || _num_allocated_tasks < ...) failed: Leaking compilation tasks?
11-07-2017

Hi Rahul, I think this issue is related to the compile queue overloaded on machines with a large number of processors but with a small number of compiler threads. If you have some time, could you please try to reproduce this issue? Maybe adding some statements to print the compile queue's length would be helpful to understand how much the queue is overloaded. We could then adjust the thresholds of 10'000 (or remove the assert). Thank you! Best regards, Zoltan
21-02-2017

9-defer-request: Assert failing because of a predefined threshold value is reached. The threshold was empirically set, that is, reaching it does not imply a correctness problem.
31-08-2016

9-defer-SQE-ok: agreed on deferral reason
31-08-2016

This problem does not seem to be AOT-related.
16-08-2016

ILW=failing assert but not crash,one occurrence,running with more w/ more compiler threads could help=MLM=P4
10-08-2016

The VM is running an application w/ ~80 threads. The VM is running with lowered compilation thresholds (CompileThreshold=100) but only with 2 C2 compiler threads. The likely problem here is that the compilation queue is filled up as application threads produce compilation tasks faster than compiler threads can consume them.
10-08-2016