JDK-5088398 : java.lang.instrument TCK test deadlock (test11)
  • Type: Bug
  • Component: core-svc
  • Sub-Component: tools
  • Affected Version: 5.0
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2004-08-18
  • Updated: 2014-10-27
  • Resolved: 2011-03-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7 b26Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
Cheryl wrote:

>One of my tests is intermittently (once every 10 or 20 times, maybe)
>deadlocking.  The thread dump is below.  Can one of you investigate
>and/or is this not interesting until I get a more final build
>and/or is this most likely a problem in the test itself?
>What this test does is register 4 trivial (no-op) transformers.
>I'm not seeing deadlock in any of my other tests.
>
>thanks,
>cheryl
>
>===================================================
>***JPLIS ASSERTION FAILED***: "test == expected" at Reentrancy.c line: 80
>***JPLIS ASSERTION FAILED***: "test == expected" at Reentrancy.c line: 80
>Full thread dump Java HotSpot(TM) Client VM (20030619211432.rfield.hotspot
>mixed mode):
>
>"Low Memory Detector" daemon prio=5 tid=0x009bad70 nid=0x3f0 waiting on
>condition [0..0]
>
>"CompilerThread0" daemon prio=5 tid=0x009b9a10 nid=0x7f0 waiting on
>condition [0..1822fa90]
>
>"Signal Dispatcher" daemon prio=10 tid=0x009b4008 nid=0x774 waiting on
>condition [0..0]
>
>"Finalizer" daemon prio=9 tid=0x0093e848 nid=0x7fc in Object.wait()
>[1816f000..1816fd94]
>        at java.lang.Object.wait(Native Method)
>        - waiting on <0x100a0198> (a java.lang.ref.ReferenceQueue$Lock)
>        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
>        - locked <0x100a0198> (a java.lang.ref.ReferenceQueue$Lock)
>        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
>        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>
>"Reference Handler" daemon prio=10 tid=0x0093d3c0 nid=0x810 waiting for
>monitor entry [1812f000..1812fd94]
>        at
>javasoft.sqe.tests.api.java.lang.instrument.ClassFileTransformer.Test11Agent
>.transform(Test11Agent.java:60)
>        at
>com.sun.lang.instrument.TransformerManager.transform(TransformerManager.java
>:127)
>        at
>com.sun.lang.instrument.InstrumentationImpl.transform(InstrumentationImpl.ja
>va:158)
>        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:122)
>
>"main" prio=5 tid=0x00235928 nid=0x6e4 waiting for monitor entry
>[6f000..6fbfc]
>        at java.lang.ClassLoader.findBootstrapClass(Native Method)
>        at java.lang.ClassLoader.findBootstrapClass0(ClassLoader.java:709)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:284)
>        - locked <0x100a0450> (a sun.misc.Launcher$ExtClassLoader)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:282)
>        - locked <0x100a0220> (a sun.misc.Launcher$AppClassLoader)
>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:274)
>        - locked <0x100a0220> (a sun.misc.Launcher$AppClassLoader)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:235)
>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:302)
>        - locked <0x100a0220> (a sun.misc.Launcher$AppClassLoader)
>
>"VM Thread" prio=5 tid=0x0097de88 nid=0x730 runnable
>
>"VM Periodic Task Thread" prio=10 tid=0x009af818 nid=0x7e4 waiting on
>condition
>
>"Suspend Checker Thread" prio=10 tid=0x009a6618 nid=0x7f8 runnable
>
>
>Found one Java-level deadlock:
>=============================
>"Reference Handler":
>  waiting to lock monitor 0x0093deac (object 0x100a0220, a
>sun.misc.Launcher$AppClassLoader),
>  which is held by "main"
>"main":
>  waiting to lock monitor 0x0093de8c (object 0x14013f18, a [[I),
>  which is held by "Reference Handler"
>
>Java stack information for the threads listed above:
>===================================================
>"Reference Handler":
>        at
>javasoft.sqe.tests.api.java.lang.instrument.ClassFileTransformer.Test11Agent
>.transform(Test11Agent.java:60)
>        at
>com.sun.lang.instrument.TransformerManager.transform(TransformerManager.java
>:127)
>        at
>com.sun.lang.instrument.InstrumentationImpl.transform(InstrumentationImpl.ja
>va:158)
>        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:122)
>"main":
>        at java.lang.ClassLoader.findBootstrapClass(Native Method)
>        at java.lang.ClassLoader.findBootstrapClass0(ClassLoader.java:709)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:284)
>        - locked <0x100a0450> (a sun.misc.Launcher$ExtClassLoader)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:282)
>        - locked <0x100a0220> (a sun.misc.Launcher$AppClassLoader)
>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:274)
>        - locked <0x100a0220> (a sun.misc.Launcher$AppClassLoader)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:235)
>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:302)
>        - locked <0x100a0220> (a sun.misc.Launcher$AppClassLoader)
>
>Found 1 deadlock.
>
>  
>



Assuming the line numbers haven't changed, the "Reference Handler" 
thread starts in
ReferenceHandler.run and at this point holds no locks, it references a 
class ("Cleaner")
which kicks off the JPLIS code.  Unless I'm missing something, the only 
lock being
grabbed by the JPLIS code (native and Java) is on the TransformerManager 
which is
not at play here.

On the "main" thread, regular class loading has locked the AppClassLoader.
It is not clear to me why the native code of class loader is waiting to 
lock an instance of "int[][]"
an examination of the code doesn't show anything at the top level.
However that doesn't seem strange, what does seem strange to me is that 
the TCK code is
locking the same array. 

Cheryl, what locks do you grab in this test?
In particular, do you grab any locks on a two dimensional array of int, 
or on the class loader?

-Robert



----------------------------------------------------------


"Reference Handler":
 waiting to lock monitor 0x0093deac (object 0x100a0220, a
                                               
sun.misc.Launcher$AppClassLoader),
 which is held by "main"

    "Reference Handler":
     [3]      at
    javasoft.sqe.tests.api.java.lang.instrument.ClassFileTransformer
                        .Test11Agent.transform(Test11Agent.java:60)
     [2]      at com.sun.lang.instrument.TransformerManager
                        .transform(TransformerManager.java :127)
     [1]      at com.sun.lang.instrument.InstrumentationImpl
                        .transform(InstrumentationImpl.java:158)
     [0]      at
    java.lang.ref.Reference$ReferenceHandler.run(Reference.java:122)

> Assuming the line numbers haven't changed, the "Reference Handler" 
> thread starts in
> ReferenceHandler.run and at this point holds no locks, it references a 
> class ("Cleaner")
> which kicks off the JPLIS code.  Unless I'm missing something, the 
> only lock being
> grabbed by the JPLIS code (native and Java) is on the 
> TransformerManager which is
> not at play here. 

Cheryl wrote that the test doesn't explicitly grab any locks.  So what 
Jeff postulated must
be true:

    The deadlock detector says that this thread holds the [[I lock, which is
    why I am concluding that the "invisible to the java thread dumper"
    native code frames must be doing the acquisition of this lock.

It seems to me that this could be occurring between [0] and [1] or after 
[3].  But I believe
the below assumes you think this is happenning between [0] and [1]?  
That makes sense
to me.

    The java transformer then tries to run, which causes an implicit
    load of a class
    that invokes the AppClassLoader. This tries to get AppClassLoader lock.

Cheryl, is line 60 referencing a class?  This would means we are doing a 
nested load.
Which I guess is normally working as long as another thread isn't in the 
middle of a class
load at the same time.

"main":
 waiting to lock monitor 0x0093de8c (object 0x14013f18, a [[I),
 which is held by "Reference Handler"

    "main":
           at java.lang.ClassLoader.findBootstrapClass(Native Method)
           at
    java.lang.ClassLoader.findBootstrapClass0(ClassLoader.java:709)
           at java.lang.ClassLoader.loadClass(ClassLoader.java:284)
           - locked <0x100a0450> (a sun.misc.Launcher$ExtClassLoader)
           at java.lang.ClassLoader.loadClass(ClassLoader.java:282)
           - locked <0x100a0220> (a sun.misc.Launcher$AppClassLoader)
           at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:274)
           - locked <0x100a0220> (a sun.misc.Launcher$AppClassLoader)
           at java.lang.ClassLoader.loadClass(ClassLoader.java:235)
           at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:302)
           - locked <0x100a0220> (a sun.misc.Launcher$AppClassLoader) 

> On the "main" thread, regular class loading has locked the 
> AppClassLoader. 

I'm pretty sure what is happenning is that the main class is being 
loaded, and that is
setting off a cascade of other loads.  The only object lock grabs I saw 
in the VM
(the SDK native layer is thin and I didn't see any lock grabs at all) is 
of the class
loader, the constant pool, and the instance being created.  So one 
possibility is that
somewhere in that sequence a int[][] is being created -- but I can't 
figure out why
the other thread would be locking that object.   The other possibility 
is that this is
some internal lock that I missed.

I guess we should continue to try to figure out what is actually 
happenning, but it does
raise a spec question -- should we say that an agent should assure that 
it's transformer
code should (directly and indirectly) use only loaded classes?  
Otherwise we are putting
the class loading mechanism in a situation it would normally never 
encounter.  Designing
the JPLIS code so it does not reenter (as it is designed) does nothing 
for this case.

-Robert


----------------------------------------------------------


INTERVENING INFO IN ATTACHMENTS

----------------------------------------------------------


Hey, our latest testing with 5.0 betas (not sure exactly which, I can
find out) is surfacing this deadlock again-- interleaved acquisition =
of
an AppClassLoader and an [[I. Any ideas? Did this ever get run down l=
ast
fall?
=20
jeff

----------------------------------------------------------



We?re testing this with the JDK 5.0 Beta 3 Build 60 snapshot build.  This definitely does appear to be a timing issue when one thread loads a class from within the transform method at the same time as any other thread is loading a class from the same classloader.  It appears that the native code is holding a lock (on [[I) when it calls the transformers or else this issue would show up with any classloading from multiple threads and not just when the transformer hook is used.  I?ve attached a test case that reproduces the problem (more details below).  It shouldn?t require any changes to the specification, just a bug fix in the JVM so it doesn?t hold this lock when calling the transformers.  This is a very important bug to us because it means that using the BCI hook can deadlock arbitrary threads in our customers applications and that is not something we are willing to do so we will not be able to use this feature until this bug is fixed.  You stated that it is too late to fix this in Tiger.  Did you mean it is too late to make specification changes or too late to do any bug fixing?  If it is too late to do any bug fixing, would it be a candidate for a 5.0_01 patch release?  Would it help if I file this through the standard developer connection website as well?  Please let me know if there is any additional information I can provide or testing I can do to help get this resolved faster.  Thanks for any help or information you can provide!

 

            Daryl

 

Test case details: The test case is in the attached zip file.  The test is a very simple scenario of loading classes from multiple threads (one from within a transformer) and doing that repeatedly to try and hit the timing hole.  The problem doesn?t happen every single time I run the test but I was able to reproduce it at least one out of every two or three runs on my Windows 2000 machine.  I?ve included the source files, the jar files and a couple of batch scripts to build and run it respectively.  The batch scripts are very simple, mainly wrappers around the javac, jar and java tools so they should be easy to replicate if you?re using a different platform.  For output it prints out a period each iteration so you can see it making progress.  If it completes 1000 iterations it prints out a success message and stops but if it deadlocks before then you can tell because it stops printing the periods at which point you can get a thread dump that reports the deadlock ? I?ve also attached output and a thread dump from one run where it deadlocked.

----------------------------------------------------------

ZIP in attachments

----------------------------------------------------------

Yes, I?m very confident that the [[I lock is not in the agent for a couple of reasons:

1)       The source code of the transformer involved does not acquire any locks (and definitely not on any int[][] objects)

I?ve included the source code and you can verify this yourself

2)       The thread that is waiting to acquire the [[I lock has not yet reached the transformer (and is in a native method)

In the sample output I sent you, the ?TestThread? is waiting to acquire that lock and the top of its stack trace is:

        at java.lang.ClassLoader.findBootstrapClass(Native Method)

        at java.lang.ClassLoader.findBootstrapClass0(ClassLoader.java:891)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:301)

 

Does this help?  Will the bug be visible via the developer connection?  If so, can you send me the bug number once you file it so we can keep track of it?  Thanks!

 

            Daryl

Comments
SUGGESTED FIX See the attached 5088398-webrev-cr0 for the proposed new test.
29-02-2008

EVALUATION Karen fixed the bootstrap class loader to work in parallel via: 4997893 4/5 Investigate allowing bootstrap loader to work in parallel with that fix in place the deadlock no longer reproduces. I'm planning to use this bug as the vehicle for getting the test program into the INSTRUMENT_REGRESSION test suite.
29-02-2008

PUBLIC COMMENTS java.lang.instrument deadlocks on the "int[][]" lock in a TCK test
22-09-2004

EVALUATION The issue is that the CFLH will always be called with the event thread holding the lock for the defining loader of the class to be transformed. If the event handler provokes a class load with a loader that is a child of that defining loader then the class loaders will be locked in the wrong order and hence can deadlock with class loading that occurs through normal delegation. There is a potential spec issue here - at a minimum the JVMTI and javadoc should state that the lock for the defining class loader is held. Note that the issue cannot arise if the ClassFileTransformer implementation is on the boot class path and does not attempt to load classes using another class loader. ###@###.### 2004-09-21
21-09-2004