JDK-8146795 : CallTimerGrid module hangs Kitchensink's stress process
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 8,9
  • Priority: P3
  • Status: Closed
  • Resolution: Not an Issue
  • Submitted: 2015-11-06
  • Updated: 2019-01-15
  • Resolved: 2019-01-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Relates :  
Description
here is output of stress process:
[stress.process.err] Stress module expected time before shutdown for: Exception: 5m
[stress.process.err] Stress module expected time before shutdown for: LockDeflation: 5m
[stress.process.err] Stress module expected time before shutdown for: LpHello: 5m
[stress.process.err] Stress module expected time before shutdown for: Jstat: Immediately
[stress.process.err] Stress module expected time before shutdown for: MBean: 5m
[stress.process.err] Stress module expected time before shutdown for: Reference: Unknown duration
[stress.process.err] Stress module expected time before shutdown for: SpecJbb2005: 5m
[stress.process.err] Stress module expected time before shutdown for: SpecJvm2008Batch: Unknown duration
[stress.process.err] Stress module expected time before shutdown for: SpecJvm2008Derby: Unknown duration
[stress.process.err] Stress module expected time before shutdown for: ThreadDumper: 5m
[stress.process.err] Stress module expected time before shutdown for: Jni: 5m
[stress.process.err] Stress module expected time before shutdown for: Jrcmd: Immediately
[stress.process.err] Stress module expected time before shutdown for: CallTimerGrid: Unknown duration
[stress.process.err] Stress module expected time before shutdown for: Jfr: 10m
[stress.process.err] Stress module expected time before shutdown for: NMT: Immediately
[stress.process.err] Stress module expected time before shutdown for: MemoryPressure: 9m
[stress.process.err] Stress module expected time before shutdown for: [ResourceManagement: 5m
[stress.process.err] Stress module finished at [Tue Nov 03 03:32:10 PST 2015]: Jstat
[stress.process.err] Stress module finished at [Tue Nov 03 03:32:10 PST 2015]: MBean
[stress.process.err] Stress module finished at [Tue Nov 03 03:32:10 PST 2015]: ThreadDumper
[stress.process.err] Stress module finished at [Tue Nov 03 03:32:10 PST 2015]: MemoryPressure
[stress.process.err] Stress module finished at [Tue Nov 03 03:32:11 PST 2015]: NMT
[stress.process.err] Stress module finished at [Tue Nov 03 03:32:11 PST 2015]: Jrcmd
[stress.process.err] Stress module finished at [Tue Nov 03 03:32:15 PST 2015]: [ResourceManagement
[stress.process.err] Stress module finished at [Tue Nov 03 03:32:37 PST 2015]: Jfr
[stress.process.err] Stress module finished at [Tue Nov 03 03:32:45 PST 2015]: Reference
[stress.process.err] Stress module finished at [Tue Nov 03 03:34:02 PST 2015]: SpecJvm2008Batch
[stress.process.err] Stress module finished at [Tue Nov 03 03:34:17 PST 2015]: SpecJbb2005
[stress.process.err] Stress module finished at [Tue Nov 03 03:34:30 PST 2015]: LpHello
[stress.process.err] Stress module finished at [Tue Nov 03 03:35:23 PST 2015]: Exception
[stress.process.err] Stress module finished at [Tue Nov 03 03:35:23 PST 2015]: LockDeflation
[stress.process.err] Stress module finished at [Tue Nov 03 03:35:24 PST 2015]: Jni
[stress.process.err] Stress module finished at [Tue Nov 03 03:37:24 PST 2015]: SpecJvm2008Derby

all modules except CTG finished quite fast after command to stop, but CTG module was not finished, caused stress process to be killed by timeout
Comments
CallTimerGrid is not used in Kitchensink now.
03-01-2019

I'm changing the Fix Version from '9' to '10'. Please see the detailed comment that I added above: Daniel Daugherty added a comment - 2016-05-24 14:22 In short, this is a new stress test that was added in JDK9. The older versions of the test hung all the time when Karen and I used it. The newer version hangs less often, but still hangs. Also note that a couple of comments of mine are still not addressed: Daniel Daugherty added a comment - 2016-03-22 21:57 @Boris or @Anton - is there a bundle somewhere that I can download to try and reproduce the CallTimerGrid hang? The response to the above was to tell me how to run BigApps. Not what I asked for. I want a standalone version of CallTimerGrid. So I asked the following the next day: Daniel Daugherty added a comment - 2016-03-23 12:24 Does this hang reproduce _outside_ of the BigApps framework? Absolutely no response to the above. Two months later I wrote the big comment referenced above where I downgraded the priority from P1 -> P3.
15-09-2016

Just for the record, JDK-8144013 has been fixed. There are some failures described above that were caused by JDK-8144013. They are the ones on AARCH64. You may also see references to dladdr, which is the same bug, and also only on AARCH64. However, this CR is still turning up on other platforms.
26-08-2016

This bug was originally filed on 2015.11.06 with a priority of P3. Christian changed the priority from P3 -> P1 on 2015.12.11, but there is no corresponding ILW note. However, looking at the notes, it's clear that CallTimerGrid was causing Kitchensink's stress process to hang until CallTimerGrid was disabled sometime after Christian asked for it to be disabled on 2015.12.17. The last hang sighting recorded in this bug is using JDK9-b99 on 2015.12.24. I can see why this bug was a P1 when the Kitchensink stress process was hanging frequently, but now that CallTimerGrid is disabled, I do not think that this bug should be a P1 anymore. CallTimerGrid is a new stress test in JDK9 (and JDK8 for some reason). Karen and I have used older versions of CallTimerGrid for years to stress test the Java monitor subsystem. In my and Karen's use of CallTimerGrid, the application finished, but always hung on exit. We believed that hang to be due to an incomplete attempt at disabling the GUI interface, but we were never motivated enough to find the root cause of the on-exit hang. The version of CallTimerGrid that was integrated into Kitchensink is newer and also has the GUI disabled. It appears that this version works better than the old version that Karen and I used to run since it does not hang on-exit all the time. However, it's entirely possible that this version of CallTimerGrid is not yet ready for daily use in the Kitchensink stress module. The hangs may be related to product bugs or bugs in CallTimerGrid itself. We don't know. I'm lowering the priority of this bug from P1 -> P3 since CallTimerGrid is a new module relative to nightly Kitchensink testing.
24-05-2016

Does this hang reproduce _outside_ of the BigApps framework?
23-03-2016

@Boris or @Anton - is there a bundle somewhere that I can download to try and reproduce the CallTimerGrid hang?
23-03-2016

> The missing piece seems to be: who own the dl-addr lock? Are we executing some kind > of onLoad hook that is getting us into a deadlock? There are a lot of JFR event recordings in progress. For a hint at who might be holding the dladdr lock, look at JDK-8144013. JFR appears to have suspended a thread that is in dladdr, presumably holding the dladdr lock. This then causes other threads to block in dladdr. I suspect if you reviewed all the threads, you'll find one that looks similar to what is in JDK-8144013, which has the following backtrace: 0x0000007fb77aada8 __GI___sigsuspend + 0x74 0x0000007fb742360c _ZL10SR_handleriP9siginfo_tP8ucontext + 0x10c 0x0000007fb794c510 ???????? 0x0000007fb7873c28 __GI__dl_addr + 0x38 0x0000007fb7423398 _ZN2os16address_is_in_vmEPh + 0x30 0x0000007fb742d8b8 _ZN2os13current_frameEv + 0x128 0x0000007fb74311e4 _ZN2os16get_native_stackEPPhii + 0x28 0x0000007fb6ef678c _ZN14ArrayAllocatorImL10MemoryType7EE10reallocateEm + 0x1e0
10-02-2016

Java stacks are of no use here, we need to see the full native stack to see if there is any hint as to who owns the dl-addr lock.
18-01-2016

What happened to finding out who owned the dl-addr lock?
12-01-2016

Boris: look for a thread that has some dl related native frames on it's stack.
22-12-2015

[~dholmes] David, could you advise how to do that?
21-12-2015

Boris - there is insufficient information to fully determine the underlying cause. As I said we need to see who owns the dl-addr lock.
18-12-2015

[~aaivanov] Can we get the CallTimerGrid module removed from the Kitchensink configuration until this issue has been resolved?
17-12-2015

JVM_SetThreadPriority grabs the Threads_lock. Looking at the gdb file the system is trying to get to a safepoint: Thread 43 (LWP 2683): #0 0x0000007fa46ca3d8 in __pthread_cond_wait (cond=0x7f9c5beb58, mutex=0x7f9c5beb28) at pthread_cond_wait.c:187 #1 0x0000007fa4208558 in os::PlatformEvent::park() () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #2 0x0000007fa41d0dc4 in Monitor::IWait(Thread*, long) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #3 0x0000007fa41d1a74 in Monitor::wait(bool, long, bool) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #4 0x0000007fa438cad4 in VMThread::execute(VM_Operation*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #5 0x0000007fa3ce9598 in BiasedLocking::revoke_and_rebias(Handle, bool, Thread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #6 0x0000007fa4307f70 in ObjectSynchronizer::fast_enter(Handle, BasicLock*, bool, Thread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #7 0x0000007fa42ad170 in SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*, JavaThread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #8 0x0000007fa0803584 in ?? () #9 0x0000000000000009 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) In addition a lot of threads are blocked in the system libraries during initialization: Thread 167 (LWP 365): #0 0x0000007fa46cd228 in __lll_lock_wait (futex=futex@entry=0x7fa47159b8 <_rtld_global+2440>, private=0) at ../nptl/sysdeps/unix/sysv/linux/lowlevellock.c:46 #1 0x0000007fa46c80e8 in __GI___pthread_mutex_lock (mutex=0x7fa47159b8 <_rtld_global+2440>) at pthread_mutex_lock.c:114 #2 0x0000007fa4639c20 in __GI__dl_addr (address=0x7fa4018a04 <ThreadLocalTraceBuffer::initialize(unsigned long, Thread*)+272>, info=<optimized out>, mapp=<optimized out>, symbolp=<optimized out>) at dl-addr.c:131 #3 0x0000007fa41ffb4c in os::address_is_in_vm(unsigned char*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #4 0x0000007fa420a430 in os::get_sender_for_C_frame(frame*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #5 0x0000007fa420dad0 in os::get_native_stack(unsigned char**, int, int) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #6 0x0000007fa4018a04 in ThreadLocalTraceBuffer::initialize(unsigned long, Thread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #7 0x0000007fa3ff822c in JfrBuffers::thread_local_buffer(Thread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #8 0x0000007fa433b744 in JavaThread::run() () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #9 0x0000007fa4200fb0 in java_start(Thread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so this thread is similarly blocked: Thread 141 (LWP 31196): #0 0x0000007fa46cd228 in __lll_lock_wait (futex=futex@entry=0x7fa47159b8 <_rtld_global+2440>, private=0) at ../nptl/sysdeps/unix/sysv/linux/lowlevellock.c:46 #1 0x0000007fa46c80e8 in __GI___pthread_mutex_lock (mutex=0x7fa47159b8 <_rtld_global+2440>) at pthread_mutex_lock.c:114 #2 0x0000007fa4639c20 in __GI__dl_addr (address=0x7fa420da24 <os::get_native_stack(unsigned char**, int, int)+40>, info=<optimized out>, mapp=<optimized out>, symbolp=<optimized out>) at dl-addr.c:131 #3 0x0000007fa41ffb4c in os::address_is_in_vm(unsigned char*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #4 0x0000007fa420a0f8 in os::current_frame() () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #5 0x0000007fa420da24 in os::get_native_stack(unsigned char**, int, int) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #6 0x0000007fa3e3c4b8 in CHeapObj<(MemoryType)6>::operator new(unsigned long) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #7 0x0000007fa3e56f84 in CompiledIC::compute_monomorphic_entry(methodHandle, KlassHandle, bool, bool, CompiledICInfo&, Thread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #8 0x0000007fa42b0188 in SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #9 0x0000007fa42b0378 in SharedRuntime::handle_wrong_method_ic_miss(JavaThread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #10 0x0000007fa07c3700 in ?? () and it in turn is I think blocking these threads: Thread 143 (LWP 31448): #0 0x0000007fa46ca3d8 in __pthread_cond_wait (cond=0x3c2f8858, mutex=0x3c2f8828) at pthread_cond_wait.c:187 #1 0x0000007fa4208558 in os::PlatformEvent::park() () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #2 0x0000007fa41d0974 in Monitor::ILock(Thread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #3 0x0000007fa41d10dc in Monitor::lock(Thread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #4 0x0000007fa42b003c in SharedRuntime::handle_ic_miss_helper(JavaThread*, Thread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #5 0x0000007fa42b0378 in SharedRuntime::handle_wrong_method_ic_miss(JavaThread*) () from /export/local/aurora/CommonData/jdk/lib/aarch64/server/libjvm.so #6 0x0000007fa07c3700 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) The missing piece seems to be: who own the dl-addr lock? Are we executing some kind of onLoad hook that is getting us into a deadlock? There are a lot of JFR event recordings in progress.
14-12-2015

If the same hang reproduces in JDK8, then this hang has nothing to do with the Contended Locking project since those changes are not in JDK8. Re: pre-integration testing Based on the sightings in this bug and JDK-8144013 "Kitchensink process hangs", it looks like the hang is intermittent so a single execution per platform may not have run into this issue.
01-12-2015

CallTimerGrid was integrated into BigApps using this bug: JDK-8073322 JEP-JDK-8046133: Test task: integrate CallTimerGrid to Kitchensink https://bugs.openjdk.java.net/browse/JDK-8073322 And was recently enabled in BigApps using this bug: INTJDK-7621462 CallTimerGrid module is enabled in Kitchensink https://bugs.openjdk.java.net/browse/INTJDK-7621462 Neither of these bugs has any information about what platforms were tested using CallTimerGrid prior to integration or being enabled. Can someone fill in the info about the testing done with this version of CallTimerGrid? Update: I'm trying to figure out if behavior has changed since CallTimerGrid was integrated into Kitchen Sink via JDK-8073322 (2015.03.19) and when it was enabled for regular execution via INTJDK-7621462 (2015.11.02). If pre-integration testing for JDK-8073322 showed that CallerTimerGrid was stable back in 2015.03, then something changed between then and when CallTimerGrid was enabled for regular execution on 2015.11.02.
25-11-2015

It is possible that OOME could cause a hang - it depends. Where are the sources for these tests?
19-11-2015

Could possibly be related to the problem Dan has been trying to track down in JDK-8077392
15-11-2015

Need to see the full stack including native to see what the thread is blocked on. Thread.setPriority is not a blocking operation, so the blocking must be incidental. Also could be the compiled frame information is not correct.
15-11-2015