JDK-6497639 : Profiling Swing application caused JVM crash
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jvmti
  • Affected Version: 6
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2006-11-27
  • Updated: 2010-09-29
  • Resolved: 2008-04-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u6Fixed 7Fixed hs10Fixed
Related Reports
Relates :  
Relates :  
Relates :  
This is copy of original report available here: http://www.netbeans.org/issues/show_bug.cgi?id=88776

I tried to attach to an swing application several times and always it ends after
a few seconds of profiling by java error.

Both ide and application run on jdk 1.6. b104, linux i586.

Application's log:

# An unexpected error has been detected by Java Runtime Environment:
#  Internal Error (53484152454432554E54494D450E435050020F), pid=14600,
# Java VM: Java HotSpot(TM) Client VM (1.6.0-rc-b104 mixed mode, sharing)
# An error report file with more information is saved as hs_err_pid14600.log
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
Here is an e-mail from Tomas on 2007.04.11 on how to reproduce
this failure mode:

===== begin e-mail extract =====

> Can you point me at the appropriate bits?
Ok, here is the test case:
1) download latest NetBeans IDE from http://smetiste.czech.sun.com/builds/netbeans/6.0/latest/symlinks/component/ide-en.zip
2) unzip ide-en.zip
3) run NetBeans using netbeans/bin/netbeans
(if you need to specify JDK, use --jdkhome switch)
4) run Java2Demo.jar in separate terminal
5) back in NetBeans go to Profile -> Attach Profile, click Attach
6) In Attach Wizard select 'Application' as Target Type and 'Attach Invocation' must be 'Dynamic (JDK 1.6)', click 'Next', click 'Finish'
7) Select Java2Demo in 'Select Process' panel and click 'Ok'
8) wait for JVM crash

Bellow it the console output from Java2Demo.jar

[th125165@dhcp-eprg05-75-104 Java2D]$ /usr/java/jdk1.6.0/bin/java -jar Java2Demo.jar
Profiler Agent: JNI On Load Initializing...
Profiler Agent: JNI OnLoad Initialized succesfully
Profiler Agent: Waiting for connection on port 5140 (Protocol version: 6)
Profiler Agent: Established local connection with the tool
cache_loade_classes, classes 2223
Retransform called
Retransform end
# An unexpected error has been detected by Java Runtime Environment:
#  Internal Error (53484152454432554E54494D450E435050020F), pid=11055, tid=3050302384
# Java VM: Java HotSpot(TM) Client VM (1.6.0-b105 mixed mode, sharing)
# An error report file with more information is saved as /tmp/hs_err_pid11055.log
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp

===== end e-mail extract =====

I've attached the ide-en.zip file that I used during this work:

-rw-rw-r--   1 dcubed   green    73933339 May 15  2007 ide-en.zip
I downloaded a NetBeans 6.0 release where NetBeans->Help->About
shows the following:

Product Version: NetBeans IDE 6.0 (Build 200711261600)
Java: 1.6.0_04; Java HotSpot(TM) Client VM 10.0-b19
System: SunOS version 5.10 running on sparc; ISO646-US; en (nb)
Userdir: /home/dcubed/.netbeans/6.0

This version has a slightly different crash than the others
that Tomas reported or that I've seen during my testing:

# An unexpected error has been detected by Java Runtime Environment:
#  SIGSEGV (0xb) at pc=0xfc400740, pid=16936, tid=11
# Java VM: Java HotSpot(TM) Client VM (10.0-b19 mixed mode, sharing solaris-sparc)
# Problematic frame:
# v  ~BufferBlob::StubRoutines (1)

Here is a snippet of the crashing thread's stack trace:

  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [8] 0xfc400740(0x1, 0x154400, 0x154400, 0x154d30, 0x55f3fc, 0xfee2e000), at 0xfc400740
  [9] ObjectSynchronizer::fast_enter(0xfee3b339, 0xfb67f840, 0x154d30, 0xfb67f4f4, 0xfb67f854, 0x154400), at 0xfec9e8f4
  [10] ObjectLocker::ObjectLocker(0xfb67f838, 0xfb67f830, 0x154400, 0x1, 0x154d30, 0x154d30), at 0xfec9bf40
  [11] constantPoolOopDesc::klass_at_impl(0xfb67f8ac, 0x358, 0x154400, 0x23, 0xd852d918, 0x14), at 0xfe8e02c0
  [12] methodOopDesc::fast_exception_handler_bci_for(0x154d2c, 0xfb67f98c, 0x26, 0x154400, 0x154d28, 0x142), at 0xfe9473ac
  [13] InterpreterRuntime::exception_handler_for_exception(0x153528, 0x26, 0x154d18, 0x1544e4, 0x154400, 0x154d14), at 0xfe946c94
  [14] 0xfc40b698(0xd04b6590, 0x0, 0x1442c, 0xfc416be0, 0x2a880, 0xfb67fa60), at 0xfc40b698
  [15] 0xfc405a10(0xd04b6590, 0xd8c150b8, 0xfb67fbb4, 0xfc416a88, 0x30c8c, 0xfb67fae0), at 0xfc405a10
  [16] 0xfc405f20(0xfb67ffa0, 0xfee4242c, 0x14428, 0xfc416eb0, 0xd04c3738, 0xfb67fb58), at 0xfc405f20
  [17] 0xfc40021c(0xfb67fc40, 0xfb67fe98, 0xa, 0xd8c16730, 0xfc40bee0, 0xfb67fd98), at 0xfc40021c
  [18] JavaCalls::call_helper(0xfb67fe90, 0x154d10, 0xfb67fd90, 0x154400, 0x154d00, 0xfb67fc20), at 0xfe8deedc

I've attached the following:


This crash does not reproduce when my fixes for this bug (6497639)
and 6599425 are applied.

SUGGESTED FIX The fix for this bug (6497639) has been forward ported to Dolphin/JDK7-B22, 160_10-baseline, and Dolphin/JDK7 main/baseline. See 6497639-webrev-cr0-160_10.tgz and 6497639-webrev-cr0-170_main.tgz.

SUGGESTED FIX Please see the attached 6497639-webrev-cr0.tgz file for the proposed fix. This proposed fix is relative to 1.6.0_04.

EVALUATION It looks like are/were "just" four layers to this onion: layer 1 - 6530811 fixed by Jon M. (6u4-B02, Dolphin-B17) layer 2 - iterate over entire shared read-write space in GC phase-3 if a RedefineClasses() call has been made layer 3 - OopMapCache problem covered by 6599425 layer 4 - bad oop in ConstantPoolCacheEntry caused by weak references to shared ConstantPools

EVALUATION The bad oop in the ConstantPoolCacheEntry appears to be caused by a premature removal of the PreviousVersionNode from the instanceKlass. The "old" ConstantPool is still out there and valid, but is no longer present on the previous version info list. It looks like the weak reference to the ConstantPool is being collected even though the ConstantPool is still out there. This looks like another bad interaction between RedefineClasses() and sharing. Weak references are only kept alive when the underlying object is found to be alive by another path. Shared objects are a bit different. They are marked, but their marks are cleared in GC phase-3 so they look like they were never marked. I think this is what allows the weak references to be collected. Once the weak reference is gone, the PreviousVersionNode is deleted and we can no longer update the ConstantPoolCache as needed when stuff is redefined. This allows an oop to go bad in the cache. If we're lucky we crash the next time GC processes the ConstantPoolCacheEntry. If we're not lucky, we just silently corrupt some memory because we're treating it as an oop and it might be something else now.

EVALUATION I've completely rewritten the experimental fix for the crash in comment #4. I've opted to try a sweep of the shared read-write space when a RedefineClasses() call has been made. This is the only way to be sure.

EVALUATION The experimental fix that I added to fix the crash in comment #4 was insufficient. While walking the previous version info, I was only finding the EMCP methods; the obsolete methods were still getting missed. I changed the algorithm to walk the stacks of the JavaThreads instead and that caught both EMCP and obsolete methods. During testing I did find one method in the HandleArea, but not on the stack so I added logic to walk the JavaThread's HandleArea also. Further testing revealed a missed ConstantPoolOop. While chasing missed methodOops, I remember seeing some PreviousVersionNodes with a live _prev_constant_pool, but an empty _prev_EMCP_methods. That meant that all the methods had been GC'ed, but the constantpool was still hanging around for some reason.

EVALUATION The bad OopMapCache failure mode is now covered by the following bug: 6599425 4/3 OopMapCache::lookup() can cause later crash or assert() failure

EVALUATION I forgot to update this bug with the results of the investigation into the crash reported by Tomas in comment #4. This investigation was done in mid-July. The methodOop for "java.lang.Object.wait(J)V" isn't found because it is now in the "previous version" area. So it's not in the normal methodOop array or a constantPool cache; we have a weak ref to it and that's it. As an experiment, I added code to walk through the "previous version" weak refs of the accessible instanceKlasses via the SystemDictionary and applied RecursiveAdjustSharedObjectClosure to the previous version oops. That experiment allowed the methodOop for "java.lang.Object.wait(J)V" to be found and we made it past the crash in comment #4. However, we hit another crash in what appears to be the next layer of the onion.

EVALUATION The crash reported by Tomas in comment #4 occurs because "java.lang.Object.wait(J)V" is redefined when sharing is enabled. When sharing is enabled, the oop is marked in phase 1 by SystemDictionary::shared_oops_do() code path via constantPoolCacheKlass::oop_follow_contents(). The mark is reinitialized in phase 3 by the SystemDictionary::shared_oops_do() code path via instanceKlass::oop_oop_iterate_v() and whole series of oop_oop_iterate() calls that eventually finds the method oop in an objArrayKlass::oop_oop_iterate_v() call. After "java.lang.Object.wait(J)V" is redefined, the oop is marked in phase 1 by Threads::oops_do() via an nmethod::oops_do() call. The mark is not reinitialized in phase 3 and that is what causes the assertion to fail in phase 4. The big question is why isn't the mark reinitialized in phase 3...

EVALUATION I've been able to consistently reproduce the crash in comment #6. I've tracked this particular failure mode to a bad oop in SystemDictionary::_int_mirror. Note that spelling. For a while I was chasing the bug by watching SystemDictionary::_int_klass. So while I know that I have a bad oop, I have no idea (yet) how it went bad. Since there is more than one failure mode, I can't quite change the status to "cause known" yet... Update: the failure mode in comment #6 is being tracked by 6530811.

WORK AROUND Running profiled application with -Xshare:off seems to be a work around.