United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-6497639 : Profiling Swing application caused JVM crash

Submit Date:
Updated Date:
Project Name:
Resolved Date:
Affected Versions:
Fixed Versions:
hs12 (b02)

Related Reports

Sub Tasks

This is copy of original report available here: http://www.netbeans.org/issues/show_bug.cgi?id=88776

I tried to attach to an swing application several times and always it ends after
a few seconds of profiling by java error.

Both ide and application run on jdk 1.6. b104, linux i586.

Application's log:

# An unexpected error has been detected by Java Runtime Environment:
#  Internal Error (53484152454432554E54494D450E435050020F), pid=14600,
# Java VM: Java HotSpot(TM) Client VM (1.6.0-rc-b104 mixed mode, sharing)
# An error report file with more information is saved as hs_err_pid14600.log
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
Here is an e-mail from Tomas on 2007.04.11 on how to reproduce
this failure mode:

===== begin e-mail extract =====

> Can you point me at the appropriate bits?
Ok, here is the test case:
1) download latest NetBeans IDE from http://smetiste.czech.sun.com/builds/netbeans/6.0/latest/symlinks/component/ide-en.zip
2) unzip ide-en.zip
3) run NetBeans using netbeans/bin/netbeans
(if you need to specify JDK, use --jdkhome switch)
4) run Java2Demo.jar in separate terminal
5) back in NetBeans go to Profile -> Attach Profile, click Attach
6) In Attach Wizard select 'Application' as Target Type and 'Attach Invocation' must be 'Dynamic (JDK 1.6)', click 'Next', click 'Finish'
7) Select Java2Demo in 'Select Process' panel and click 'Ok'
8) wait for JVM crash

Bellow it the console output from Java2Demo.jar

[th125165@dhcp-eprg05-75-104 Java2D]$ /usr/java/jdk1.6.0/bin/java -jar Java2Demo.jar
Profiler Agent: JNI On Load Initializing...
Profiler Agent: JNI OnLoad Initialized succesfully
Profiler Agent: Waiting for connection on port 5140 (Protocol version: 6)
Profiler Agent: Established local connection with the tool
cache_loade_classes, classes 2223
Retransform called
Retransform end
# An unexpected error has been detected by Java Runtime Environment:
#  Internal Error (53484152454432554E54494D450E435050020F), pid=11055, tid=3050302384
# Java VM: Java HotSpot(TM) Client VM (1.6.0-b105 mixed mode, sharing)
# An error report file with more information is saved as /tmp/hs_err_pid11055.log
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp

===== end e-mail extract =====

I've attached the ide-en.zip file that I used during this work:

-rw-rw-r--   1 dcubed   green    73933339 May 15  2007 ide-en.zip
I downloaded a NetBeans 6.0 release where NetBeans->Help->About
shows the following:

Product Version: NetBeans IDE 6.0 (Build 200711261600)
Java: 1.6.0_04; Java HotSpot(TM) Client VM 10.0-b19
System: SunOS version 5.10 running on sparc; ISO646-US; en (nb)
Userdir: /home/dcubed/.netbeans/6.0

This version has a slightly different crash than the others
that Tomas reported or that I've seen during my testing:

# An unexpected error has been detected by Java Runtime Environment:
#  SIGSEGV (0xb) at pc=0xfc400740, pid=16936, tid=11
# Java VM: Java HotSpot(TM) Client VM (10.0-b19 mixed mode, sharing solaris-sparc)
# Problematic frame:
# v  ~BufferBlob::StubRoutines (1)

Here is a snippet of the crashing thread's stack trace:

  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [8] 0xfc400740(0x1, 0x154400, 0x154400, 0x154d30, 0x55f3fc, 0xfee2e000), at 0xfc400740
  [9] ObjectSynchronizer::fast_enter(0xfee3b339, 0xfb67f840, 0x154d30, 0xfb67f4f4, 0xfb67f854, 0x154400), at 0xfec9e8f4
  [10] ObjectLocker::ObjectLocker(0xfb67f838, 0xfb67f830, 0x154400, 0x1, 0x154d30, 0x154d30), at 0xfec9bf40
  [11] constantPoolOopDesc::klass_at_impl(0xfb67f8ac, 0x358, 0x154400, 0x23, 0xd852d918, 0x14), at 0xfe8e02c0
  [12] methodOopDesc::fast_exception_handler_bci_for(0x154d2c, 0xfb67f98c, 0x26, 0x154400, 0x154d28, 0x142), at 0xfe9473ac
  [13] InterpreterRuntime::exception_handler_for_exception(0x153528, 0x26, 0x154d18, 0x1544e4, 0x154400, 0x154d14), at 0xfe946c94
  [14] 0xfc40b698(0xd04b6590, 0x0, 0x1442c, 0xfc416be0, 0x2a880, 0xfb67fa60), at 0xfc40b698
  [15] 0xfc405a10(0xd04b6590, 0xd8c150b8, 0xfb67fbb4, 0xfc416a88, 0x30c8c, 0xfb67fae0), at 0xfc405a10
  [16] 0xfc405f20(0xfb67ffa0, 0xfee4242c, 0x14428, 0xfc416eb0, 0xd04c3738, 0xfb67fb58), at 0xfc405f20
  [17] 0xfc40021c(0xfb67fc40, 0xfb67fe98, 0xa, 0xd8c16730, 0xfc40bee0, 0xfb67fd98), at 0xfc40021c
  [18] JavaCalls::call_helper(0xfb67fe90, 0x154d10, 0xfb67fd90, 0x154400, 0x154d00, 0xfb67fc20), at 0xfe8deedc

I've attached the following:


This crash does not reproduce when my fixes for this bug (6497639)
and 6599425 are applied.



The fix for this bug (6497639) has been forward ported to
Dolphin/JDK7-B22, 160_10-baseline, and Dolphin/JDK7 main/baseline.
See 6497639-webrev-cr0-160_10.tgz and 6497639-webrev-cr0-170_main.tgz.

Please see the attached 6497639-webrev-cr0.tgz file for
the proposed fix. This proposed fix is relative to 1.6.0_04.

It looks like are/were "just" four layers to this onion:

    layer 1 - 6530811 fixed by Jon M. (6u4-B02, Dolphin-B17)
    layer 2 - iterate over entire shared read-write space in GC
              phase-3 if a RedefineClasses() call has been made
    layer 3 - OopMapCache problem covered by 6599425
    layer 4 - bad oop in ConstantPoolCacheEntry caused by weak
              references to shared ConstantPools

The bad oop in the ConstantPoolCacheEntry appears to be
caused by a premature removal of the PreviousVersionNode
from the instanceKlass. The "old" ConstantPool is still
out there and valid, but is no longer present on the
previous version info list. It looks like the weak
reference to the ConstantPool is being collected even
though the ConstantPool is still out there. This looks
like another bad interaction between RedefineClasses()
and sharing.

Weak references are only kept alive when the underlying
object is found to be alive by another path. Shared
objects are a bit different. They are marked, but their
marks are cleared in GC phase-3 so they look like they
were never marked. I think this is what allows the weak
references to be collected. Once the weak reference is
gone, the PreviousVersionNode is deleted and we can no
longer update the ConstantPoolCache as needed when stuff
is redefined. This allows an oop to go bad in the cache.
If we're lucky we crash the next time GC processes the
ConstantPoolCacheEntry. If we're not lucky, we just
silently corrupt some memory because we're treating it
as an oop and it might be something else now.

I've completely rewritten the experimental fix for the
crash in comment #4. I've opted to try a sweep of the
shared read-write space when a RedefineClasses() call
has been made. This is the only way to be sure.

The experimental fix that I added to fix the crash in comment #4
was insufficient. While walking the previous version info, I was
only finding the EMCP methods; the obsolete methods were still
getting missed. I changed the algorithm to walk the stacks of the
JavaThreads instead and that caught both EMCP and obsolete methods.
During testing I did find one method in the HandleArea, but not on
the stack so I added logic to walk the JavaThread's HandleArea also.
Further testing revealed a missed ConstantPoolOop. While chasing
missed methodOops, I remember seeing some PreviousVersionNodes with
a live _prev_constant_pool, but an empty _prev_EMCP_methods. That
meant that all the methods had been GC'ed, but the constantpool
was still hanging around for some reason.

The bad OopMapCache failure mode is now covered by the following bug:

    6599425 4/3 OopMapCache::lookup() can cause later crash or assert() failure

I forgot to update this bug with the results of the investigation
into the crash reported by Tomas in comment #4. This investigation
was done in mid-July. The methodOop for "java.lang.Object.wait(J)V"
isn't found because it is now in the "previous version" area. So
it's not in the normal methodOop array or a constantPool cache; we
have a weak ref to it and that's it. As an experiment, I added code
to walk through the "previous version" weak refs of the accessible
instanceKlasses via the SystemDictionary and applied
RecursiveAdjustSharedObjectClosure to the previous version oops.

That experiment allowed the methodOop for "java.lang.Object.wait(J)V"
to be found and we made it past the crash in comment #4. However, we
hit another crash in what appears to be the next layer of the onion.

The crash reported by Tomas in comment #4 occurs because
"java.lang.Object.wait(J)V" is redefined when sharing is
enabled. When sharing is enabled, the oop is marked in
phase 1 by SystemDictionary::shared_oops_do() code path
via constantPoolCacheKlass::oop_follow_contents(). The
mark is reinitialized in phase 3 by the
SystemDictionary::shared_oops_do() code path via
instanceKlass::oop_oop_iterate_v() and whole series of
oop_oop_iterate() calls that eventually finds the method
oop in an objArrayKlass::oop_oop_iterate_v() call.

After "java.lang.Object.wait(J)V" is redefined, the oop
is marked in phase 1 by Threads::oops_do() via an
nmethod::oops_do() call. The mark is not reinitialized
in phase 3 and that is what causes the assertion to fail
in phase 4. The big question is why isn't the mark
reinitialized in phase 3...

I've been able to consistently reproduce the crash in comment #6.
I've tracked this particular failure mode to a bad oop in
SystemDictionary::_int_mirror. Note that spelling. For a while I
was chasing the bug by watching SystemDictionary::_int_klass.
So while I know that I have a bad oop, I have no idea (yet) how
it went bad.

Since there is more than one failure mode, I can't quite change
the status to "cause known" yet...

Update: the failure mode in comment #6 is being tracked by 6530811.

Running profiled application with -Xshare:off seems to be a work around.

Hardware and Software, Engineered to Work Together