JDK-4738326 : VM crashes while running an application
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 1.3.1_04,1.4.2
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,windows_2000
  • CPU: generic,x86
  • Submitted: 2002-08-28
  • Updated: 2012-10-08
  • Resolved: 2003-01-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other Other
1.3.1_08 08Fixed 1.4.1_03Fixed 1.4.2Fixed
Related Reports
Relates :  
Description
We have a relatively large Java application, which uses Java 2D and 
CORBA (OpenORB 1.1.0). There is two CORBA connections, one to a CORBA Server 
that serves data located in a database. The other goes to a C++ CORBA Server 
that is located on an computational server. Some of the results of the 
computational server can get large (about 10 Megs, about every minute). The application talks with the computational server for potentially a long time. Lately, the Java application comes down after a bit of time, between 20 minutes and 12 hours. 

Problem occurs on Win2k with 1.3.1 and can be reproduced with 1.3.1_04.
We removed -Dsun.java2d.noddraw=true flag then ran the testcase with the 
"-XX:+ShowMessageBoxOnError" flag and got the attached stack dumps.

We have access to the customer's test application. Please contact me for setup instructions.

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.3.1_08 1.4.1_03 mantis-beta FIXED IN: 1.3.1_08 1.4.1_03 mantis-beta INTEGRATED IN: 1.3.1_08 1.4.1_03 mantis-beta
14-06-2004

WORK AROUND No workaround except the following trivial one: Don't call clone() on objects that have a finalize method (override) ========================================================================== Works with IBM's 1.3.0 jre. Customer can't move to our jdk 1.4[ We use Sitraka's JClass 4.5.1 components and they do not seem compliant with Java 1.4.
11-06-2004

PUBLIC COMMENTS bug in JVM_Clone()
10-06-2004

SUGGESTED FIX ###@###.### 2002-12-17: card-mark before potential java up-call for finalizer registration; diffs for 1.4.2 below: ------- jvm.cpp ------- 428c428 < } --- > } 442a443,448 > // Store check (mark entire object and let gc sort it out) > BarrierSet* bs = Universe::heap()->barrier_set(); > bs->write_region(MemRegion((HeapWord*)new_obj, size)); > > // Caution: this involves a java upcall, so the clone should be > // "gc-robust" by this stage. 448,455d453 < // Store check (mark entire object and let gc sort it out) < BarrierSet* bs = Universe::heap()->barrier_set(); < bs->write_region(MemRegion((HeapWord*)new_obj, size)); < < /* < assert(bs->kind() == BarrierSet::CCardTableModRef, "Wrong barrier set kind"); < ((CCardTableModRefBS*)bs)->inline_write_region(MemRegion((HeapWord*)new_obj, size)); < */ ###@###.### 2002-12-21: Fix put back to main/baseline, will be in mantis b12: Original workspace: neeraja:/net/spot/archive02/ysr/clone Parent workspace: /net/jano/export/disk05/hotspot/ws/main/baseline Submitter: ysr imgr data: /net/balvenie.sfbay/export/imgr_home/archive/main/baseline/2002/20021220112621.ysr.clone Fixed: 4738326 VM crashes while running an application Webrev: http://analemma.sfbay/net/spot/archive02/ysr/clone/webrev The bug was in JVM_Clone: we first allocate storage for the clone, then do a raw copy and then, because the storage might be in old gen, do a card-mark to take care of intergenerational refs (or in the case of CMS, to make the clone "grey", so it's rescanned if appropriate). If the clone's class has a finalize method (override), we need to register the clone with the finalizer. Unfortunately the last step listed above was being done between the copying and the card-marking, and because finalizer registration involves executing Java code, a GC could sneak in. If the clone were in the old gen and held ref(s) to young gen object(s), this could cause the ref(s) to become dangling pointer(s). The debugging was done by R. Anupam (JPSE India), and the fix is with his assistance. The fix has been sent to the customer for verification. Note that the bug has been present in all versions of HotSpot since at least Ladybird. JPSE will fix in update releases on the anvil. Reviewed by: Fred, Ken, Jane, John C, Jon, Ross Risk of fix: low Benefits: fixes escalated bug; removes one source of instability, heap corruption, hard-to-identify crashes Approved by: the server committee Other testing: (sparc,solaris,fastdebug,c1) . refworkload (default) . vtest . quicklook -full (c1) passed linux i486 product SPECjvm98 GeoMean 36.54 57.84 passed linux i486 product1 SPECjvm98 GeoMean 44.86 50.05 passed linux i486 productcore SPECjvm98 GeoMean 98.83 98.83 passed linux ia64 productcore SPECjvm98 GeoMean 26.13 26.13 passed solaris i486 product SPECjvm98 GeoMean 37.45 55.26 passed solaris i486 product1 SPECjvm98 GeoMean 44.24 48.82 passed solaris i486 productcore SPECjvm98 GeoMean 98.75 98.75 passed solaris sparc product SPECjvm98 GeoMean 23.13 35.26 passed solaris sparc product1 SPECjvm98 GeoMean 23.84 26.27 passed solaris sparc productcore SPECjvm98 GeoMean 39.03 39.03 passed solaris sparcv9 product SPECjvm98 GeoMean 23.96 35.57 passed solaris sparcv9 productcore SPECjvm98 GeoMean 39.92 39.92 passed windows i486 compiler2 SPECjvm98 GeoMean 49.56 102.43 passed windows i486 compiler1 SPECjvm98 GeoMean 61.37 73.99 passed windows i486 core SPECjvm98 GeoMean 130.66 130.66 passed windows ia64 core SPECjvm98 GeoMean 16.05 16.05 END COMMENT update: src/share/vm/prims/jvm.cpp update: src/share/vm/runtime/globals.hpp ------------------------------------ Diffs for 1.3.1 are: --- jvm.cpp Fri Jan 3 09:36:16 2003 *************** *** 315,327 **** oop(new_obj)->set_mark(); } if (klass->has_finalizer() && RegisterFinalizers) { assert(obj->is_instance(), "should be instanceOop"); new_obj = (oop*)instanceKlass::register_finalizer(instanceOop(new_obj), CHECK_0); } - // Store check (mark entire object and let gc sort it out) - RememberedSet::record_array_store(new_obj, new_obj+size); return JNIHandles::make_local(env, oop(new_obj)); JVM_END --- 315,328 ---- oop(new_obj)->set_mark(); } + // Store check (mark entire object and let gc sort it out) + RememberedSet::record_array_store(new_obj, new_obj+size); + if (klass->has_finalizer() && RegisterFinalizers) { assert(obj->is_instance(), "should be instanceOop"); new_obj = (oop*)instanceKlass::register_finalizer(instanceOop(new_obj), CHECK_0); } return JNIHandles::make_local(env, oop(new_obj)); JVM_END ###@###.### 2003-01-03
03-01-2003

EVALUATION JPSE team will handle this bug ###@###.### 2002-08-29 -------------------------------------------- ###@###.### 2002-09-26 Got proper dumps with 1.3.1_05. The failure is occuring on the VM thread when it's doing a GC operation. It's happening in the old generation, when doing scavenge for RememberedSet. This could be because of write barriers not generated properly, when a GC happened earlier(purely a GC bug) or a wrongly generated OopMap by c1. Here is the VM thread stack trace. Don't bother about frame#00, that's wrong because method call in frame#1 got a bad value for Klass from Klass* klass = blueprint(); and windbg just matched the pointer to nearest symbol in jvm.pdb(not_ok_msg). 00 jvm!InstructionFilter::not_ok_msg()+0xa 01 jvm!oopDesc::size()+0x1c 02 jvm!oopDesc::copy_to_survivor_space(oopDesc** from = 184489c8 )+0x12 03 jvm!Scavenge::scavenge_tenured_oop(oopDesc** p = 184489c8 )+0x30 04 jvm!instanceKlass::oop_scavenge_contents(oopDesc* obj = 18448998 , oopDesc** bottom = 1 8448800 , oopDesc** top = 18448c00 )+0x100 05 jvm!oopDesc::scavenge_contents(oopDesc** bottom = 18448800 , oopDesc** top = 18448c00 ) +0x3e 06 jvm!RememberedSet::scavenge_contents(OldSpace* sp = 0023aa50 )+0x341 07 jvm!OldSpace::scavenge_recorded_stores()+0xf 08 jvm!OneSpaceOldGeneration::scavenge_recorded_stores()+0x12 09 jvm!Scavenge::invoke_at_safepoint(int size_to_be_allocated = 0x104, long deferred = 0, long* notify_ref_lock = 3df1fad8 )+0x4e5 0a jvm!VM_Scavenge::doit()+0x1b 0b jvm!VM_Operation::evaluate()+0x33 0c jvm!VMThread::evaluate_operation(VM_Operation* op = 3df1faac )+0x19 0d jvm!VMThread::loop()+0x1b5 0e jvm!VMThread::run()+0x6d 0f jvm!_start(Thread* thread = 009398f8 )+0x1a We need to verify the state of heap before/after scavenge. I need outputs from run with these flags: -XX:+VerifyBeforeScavenge -XX:VerifyAfterScavenge These flags won't work with a product build. They work either with a debug or an optimized build. I have done an optimized build, and put it at this location; /net/jlab113.india/export/jpse/ar118872/binaries/jvm.dll -------------------------------------------------------------- ###@###.### 2002-11-13 Looking at the dumps and getting opinions from Ken & Ramakrishna it's established that it is a missing card mark problem during a young gen pointer store in an old gen object. And it's missed in the compiled code as the problem doesn't reproduce with -Xint. Also it's been observed that: 1. Most of the threads have same stack state for all the crash instances. 2. the old gen object(unmarked card) is always of the same class sun.awt.windows.WGraphics whereas only the referred young gen object varies. So a suggested workaround is to run with -XX:-UseCardMarks till we come up with the final solution. -UseCardMarks should do away with the use of the card table during scavenge and thus allow run without the crash, though it will slow down the gc. ###@###.### 2002-11-22 ------------------------------------- Analysis right now is based on narrowing down to the wrongly compiled method. Comparing outputs from 2 different runs with scavenge flags: 1. Both had two Rectangle type references inside WGraphics pointing in young gen, one of which was reported as missing card mark(probably both are). 2. Rectangle type references are 24 bytes apart in young gen in both runs. Probably they have been set into the old gen object in the same compiled method. Looking backwards in the PrintCompilation log plus customer's source file where they set 2 new Rectangle references in a Graphics object should help narrow it down. 3. It's also been seen that the first run had 3 young gen refs AffineTransform(unmarked),Rectangle,Rectangle(all towards the end of young gen) , whereas second one had Rectangle(unmarked),Rectangle. In the second case AffineTransform lies in old space near the referring object(WGraphics). State of heap in 1st run is: Heap new generation total 2304K, used 2303K [0x10010000, 0x10290000) eden space 2048K, 100% used [0x10010000,0x10210000,0x10210000) from space 256K, 99% used [0x10250000,0x1028ff98,0x10290000) to space 256K, 0% used [0x10210000,0x10210000,0x10250000) tenured generation total 198304K, used 155237K [0x10290000, 0x35810000) old space 198304K, 78% used [0x10290000,0x19a294e0,0x1c438000) permanent generation total 15360K, used 15125K [0x35810000, 0x37810000) old space 15360K, 98% used [0x35810000,0x366d5490,0x36710000) - the referred object (0x19a29438->0x102535c0):a 'java/awt/Rectangle' java/awt/geom/AffineTransform' (195f0dd8) java/awt/Rectangle' (102535c0) unmarked java/awt/Rectangle' (102535d8) State of heap in 2nd run Heap new generation total 2304K, used 2304K [0x10010000, 0x10290000) eden space 2048K, 100% used [0x10010000,0x10210000,0x10210000) from space 256K, 100% used [0x10250000,0x10290000,0x10290000) to space 256K, 0% used [0x10210000,0x10210000,0x10250000) tenured generation total 153284K, used 111484K [0x10290000, 0x35810000) old space 153284K, 72% used [0x10290000,0x16f6f3d0,0x19841000) permanent generation total 15104K, used 15079K [0x35810000, 0x37810000) old space 15104K, 99% used [0x35810000,0x366c9c60,0x366d0000) - the referred object (0x16f6f30c->0x1028ff88):a 'java/awt/geom/AffineTransform' java/awt/geom/AffineTransform' (1028ff88) java/awt/Rectangle' (1028ffc8) java/awt/Rectangle' (1028ffe0) After looking at the customer's sources I am sending them some methods for disabling compilation. ###@###.### 2002-11-22
22-11-2002