JDK-6415406 : ATG client crash with fastdebug build at methodOopDesc::bci_from
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 5.0,5.0u6,5.0u24-rev,6
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS:
    generic,linux,linux_redhat_4.0,solaris,windows_xp,windows_2008,windows_vista generic,linux,linux_redhat_4.0,solaris,windows_xp,windows_2008,windows_vista
  • CPU: generic,x86,sparc
  • Submitted: 2006-04-19
  • Updated: 2012-02-01
  • Resolved: 2006-07-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b92Fixed
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Description
What failed : Test Application's client process.

HW : 2x900Mhz PIII w/ RHEL 4
Flags : -server  -Xincgc  
Logs : /bt/atgrun.1098.-server 
Hostname : jtg-linux7.sfbay   please contact submitter for root password.
Build : 20060413162810.dh198349.rt_b81_merge-debug 

We were able to reproduce this failure in less than a day. 
Live process is still there for someone to take a look at. 

Error message :
------------------------------------------------------------------------------

Unexpected Error
------------------------------------------------------------------------------
Internal Error at methodOop.cpp:151, pid=22594, tid=2865122224

Do you want to debug the problem?

To debug, run 'gdb /proc/22594/exe 22594'; then switch to thread -1429845072
Enter 'yes' to launch gdb automatically (PATH must include gdb)
Otherwise, press RETURN to abort...
=============================================================================

backtrace :

(gdb) thread 53
[Switching to thread 53 (Thread -1429845072 (LWP 22657))]#0  0x001847a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) where
#0  0x001847a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x0022ad56 in __nanosleep_nocancel () from /lib/tls/libc.so.6
#2  0x0022ab53 in sleep () from /lib/tls/libc.so.6
#3  0x06767f64 in os::message_box () from /usr/j2se/jre/lib/i386/server/libjvm.so
#4  0x068de5b6 in VMError::show_message_box ()
   from /usr/j2se/jre/lib/i386/server/libjvm.so
#5  0x068de10e in VMError::report_and_die ()
   from /usr/j2se/jre/lib/i386/server/libjvm.so
#6  0x063fdfe5 in report_assertion_failure ()
   from /usr/j2se/jre/lib/i386/server/libjvm.so
#7  0x06724615 in methodOopDesc::bci_from ()
   from /usr/j2se/jre/lib/i386/server/libjvm.so
#8  0x064d99a6 in InterpreterRuntime::note_trap ()
   from /usr/j2se/jre/lib/i386/server/libjvm.so
#9  0x064da143 in InterpreterRuntime::create_exception ()
   from /usr/j2se/jre/lib/i386/server/libjvm.so
#10 0xb4f2a098 in ?? ()
#11 0x081e0800 in ?? ()
#12 0x06985c40 in IndexSetIterator::_second_bit ()
   from /usr/j2se/jre/lib/i386/server/libjvm.so
#13 0x00000000 in ?? ()

The following is the earlier crash in this machine with the same App/Build/flags.

Error message :
------------------------------------------------------------------------------

Unexpected Error
------------------------------------------------------------------------------
Internal Error at concurrentMarkSweepGeneration.cpp:4091, pid=22848, tid=2890685360

Do you want to debug the problem?

To debug, run 'gdb /proc/22848/exe 22848'; then switch to thread -1404281936
Enter 'yes' to launch gdb automatically (PATH must include gdb)
Otherwise, press RETURN to abort...
==============================================================================

(gdb) thread 111
[Switching to thread 111 (Thread -1404281936 (LWP 22852))]#0  0x001847a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) where
#0  0x001847a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x0022ad56 in __nanosleep_nocancel () from /lib/tls/libc.so.6
#2  0x0022ab53 in sleep () from /lib/tls/libc.so.6
#3  0x06767f64 in os::message_box () from /usr/j2se/jre/lib/i386/server/libjvm.so
#4  0x068de5b6 in VMError::show_message_box () from /usr/j2se/jre/lib/i386/server/libjvm.so
#5  0x068de10e in VMError::report_and_die () from /usr/j2se/jre/lib/i386/server/libjvm.so
#6  0x063fdfe5 in report_assertion_failure () from /usr/j2se/jre/lib/i386/server/libjvm.so
#7  0x063d04e2 in CMSCollector::sample_eden () from /usr/j2se/jre/lib/i386/server/libjvm.so
#8  0x063daf92 in ScanMarkedObjectsAgainCarefullyClosure::do_object_careful_m () from /usr/j2se/jre/lib/i386/server/libjvm.so
#9  0x0639ae6d in CompactibleFreeListSpace::object_iterate_careful_m () from /usr/j2se/jre/lib/i386/server/libjvm.so
#10 0x063d11c5 in CMSCollector::preclean_mod_union_table () from /usr/j2se/jre/lib/i386/server/libjvm.so
#11 0x063d08d9 in CMSCollector::preclean_work () from /usr/j2se/jre/lib/i386/server/libjvm.so
#12 0x063cfe43 in CMSCollector::preclean () from /usr/j2se/jre/lib/i386/server/libjvm.so
#13 0x063c94d2 in CMSCollector::collect_in_background () from /usr/j2se/jre/lib/i386/server/libjvm.so
#14 0x063e4ea0 in ConcurrentMarkSweepThread::run () from /usr/j2se/jre/lib/i386/server/libjvm.so
#15 0x067691a2 in java_start () from /usr/j2se/jre/lib/i386/server/libjvm.so
#16 0x003d73ae in start_thread () from /lib/tls/libpthread.so.0
#17 0x00267aee in clone () from /lib/tls/libc.so.6

Comments
WORK AROUND Do not use iCMS, use regular CMS instead.
18-10-2006

SUGGESTED FIX If and when this is backported to 5.0uX and 1.4.2_XX please also make sure to backport the fix for 6472335, otherwise you are apt to have issues.
25-09-2006

EVALUATION If and when this is backported to 5.0uX and 1.4.2_XX please also make sure to backport the fix for 6472335, otherwise you are apt to have issues.
25-09-2006

SUGGESTED FIX The following fix was putback to gc_baseline for b92 by Andrey: Event: putback-to Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /net/prt-web.east/prt-workspaces/20060713062300.ap159146.gc_baseline_sync/workspace (prt-web:/net/prt-web.east/prt-workspaces/20060713062300.ap159146.gc_baseline_sync/workspace) User: ap159146 Comment: --------------------------------------------------------- Job ID: 20060713062300.ap159146.gc_baseline_sync Original workspace: spb-east:/scratch/users/ap159146/gc_baseline_sync Submitter: ap159146 Archived data: /net/prt-data.east/archives/main/gc_baseline/2006/20060713062300.ap159146.gc_baseline_sync/ Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060713062300.ap159146.gc_baseline_sync/workspace/webrevs/webrev-2006.07.13/index.html Fixed 6415406: ATG client crash with fastdebug build at methodOopDesc::bci_from Fixed 6399567: iCMS: JumbleGC002? and LoadUnloadGC2? intermittently crash in nightly testing w/-server webrev at: http://jruntime.east/~ap159146/6415406_6399567_upd_upd/ Reviewed by: Y. Srinivas Ramakrishna, John Coomes Approved by: David Cox (Low Risk) Fix verified (y/n): y Verification testing: - atg client on linux for more than three days with -Xincgc; Other testing: - PRT, GCOld, refworkload, runThese; Details: During the debugging I had found that the 'eden_end' value passed to the ContigiousSpace::par_allocate_impl(...) is less than the top(). So the 'end_val' passed to the par_allocate_impl(...) was too old due to threads preemption, the actual value of _soft_end (and the _top) at the time of pointer_delta(...) were far from "cached" end_val passed through the parameter. Also I have found a race on updating _soft_end in DefNewGeneration::allocate(...): The method tries to do an allocation, it is failed, then new allocation limit is calculated depending on current _top and size of the allocation request. So allocation_limit_reached(...) can return stop limit or the eden space end value. Suppose we have two threads attempting to allocate N (top + N < stop limit) and M (top + M > stop limit) (top + N < soft end, top + M < soft end) bytes. So if second thread wins the race, soft end become equal to the eden space's end and allocation of M bytes could be successful so the top pointer could become greater than stop limit. Then first thread changes soft end to stop limit and have the case where top points beyond the end. About the fix. Generally the fix for the first problem looks like: was: end_val = end(); top_val = top(); become: top_val = top(); end_val = end(); The invariant is the top() should be read before the end() because top() can't be greater than end(), so if an update of _soft_end occurs between end_val = end(); and top_val = top(); the top() also can grow up to the new end() and the condition top_val > end_val is true. To ensure the loading order I have placed OrderAccess::loadload() after top() read. The fix is low-risk since it doesn't change the previous semantics it just makes it MT-safe. Andrey Files: update: src/share/vm/memory/collectorPolicy.cpp update: src/share/vm/memory/collectorPolicy.hpp update: src/share/vm/memory/concurrentMarkSweepGeneration.cpp update: src/share/vm/memory/defNewGeneration.cpp update: src/share/vm/memory/defNewGeneration.inline.hpp update: src/share/vm/memory/space.cpp update: src/share/vm/memory/space.hpp Examined files: 3865 Contents Summary: 7 update 3858 no action (unchanged)
14-07-2006

EVALUATION MT-unsafeness in slow path allocation in Eden in the presence of iCMS; see comments section.
17-06-2006