JDK-4770828 : Loop related Hotspot crash
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 1.4.1,1.4.1_01,1.4.2
  • Priority: P1
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_2.4,solaris_2.6,solaris_8
  • CPU: generic,sparc
  • Submitted: 2002-10-29
  • Updated: 2004-04-16
  • Resolved: 2002-12-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other Other
1.3.1_12 12Fixed 1.4.1_03Fixed 1.4.2Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
The testcase from HP (see below) reveals a HotSpot server crash related to loops. The crash can be avoided by setting LoopUnrollLimit=0. The test case has a huge jar. I will work with HP on trying to obtain the source and produce a smaller test case. The test doesn't fail with client compiler.

The crash dump (reproducible with the latest mantis build: 1.4.2-beta-b04):
#
# HotSpot Virtual Machine Error, Internal Error
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Server VM (1.4.2-beta-b04 mixed mode)
#
# Error ID: 53484152454432554E54494D450E435050018D 01
#
# Problematic Thread: prio=5 tid=0x0002deb8 nid=0x1 runnable 
#

TEST CASE TO REPRODUCE:

0. cd /net/knight1/export/tmp/hp_loop_bug/

1. Use ksh or bash

2. "source setenv.sh"

3. run myant

4. after a couple of minutes a crash will happen

====================================

HP's customer is not comfortable giving the source to Sun. So, I won't be able to deliver a micro test case.

###@###.### 2002-11-01

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.3.1_12 1.4.1_03 generic mantis-beta FIXED IN: 1.3.1_12 1.4.1_03 mantis-beta INTEGRATED IN: 1.3.1_12 1.4.1_03 mantis-b10 mantis-beta
14-06-2004

SUGGESTED FIX ------- memnode.cpp ------- 152a153,204 > //---------------------------cast_is_loop_invariant---------------------------- > // A helper function for Ideal_DU_postCCP to check if a CastPP in a counted > // loop is loop invariant. Make a quick traversal of Phi and CastPP nodes > // looking to see if they are a closed group within the loop. > static bool cast_is_loop_invariant(Node* cast, Node* adr) { > // The idea is that the phi-nest must boil down to only CastPP nodes > // with the same data input as the original cast. This implies that any path > // into the loop already includes such a CastPP, and so the original cast, > // whatever its input, must be covered by an equivalent cast, with an earlier > // control input. > ResourceMark rm; > Unique_Node_List closure; > closure.push(cast); > closure.push(adr); > > Unique_Node_List worklist; > worklist.push(adr->in(LoopNode::LoopBackControl)); > > int op = cast->Opcode(); > Node* input = cast->in(1); > > // Begin recursive walk of phi nodes. > while( worklist.size() ){ > // Take a node off the worklist > Node *n = worklist.pop(); > if( !closure.member(n) ){ > // Add it to the closure. > closure.push(n); > // Make a sanity check to ensure we don't waste too much time here. > if( closure.size() > 10) return false; > // This node is OK if: > // - it is a cast identical (based on opcode and input) to the original one > // - or it is a phi node (then we add its inputs to the worklist) > // Otherwise, the node is not OK, and we presume the cast is not invariant > if( n->Opcode() == op ) { > if( n->in(1) != input ){ > return false; > } > } else if( n->Opcode() == Op_Phi ) { > for( uint i = 1; i < n->req(); i++ ) { > worklist.push(n->in(i)); > } > } else { > return false; > } > } > } > > // Quit when the worklist is empty, and we've found no offending nodes. > return true; > } > 169a222 > Node *elided_cast = NULL; 184a238,241 > // Remember the cast that we've peeked though. If we peek > // through more than one, then we end up remembering the highest > // one, that is, if in a loop, the one closest to the top. > elided_cast = adr; 196c253 < // We can float above a Phi to some dominating point --- > // Attempt to float above a Phi to some dominating point. 198,199c255,262 < adr = adr->in(1); < continue; --- > // If we've already peeked through a CastPP (which could have set the > // control), we can't float above a Phi, because the ignored CastPP > // may not be loop invariant. > if( elided_cast == NULL || > cast_is_loop_invariant(elided_cast, adr)) { > adr = adr->in(1); > continue; > } ------- includeDB_compiler2 ------- 791a792 > memnode.cpp loopnode.hpp ###@###.### 2002-11-26
26-11-2002

EVALUATION Loop unrolling is missing a loop-carried dependency between a test against null and a load. ###@###.### 2002-11-07 In the program provided, unrolling of a loop causes bad code to be generated because a loop carried dependency is lost. The load at the top a loop is mistakenly moved outside a loop iteration because the PostCCP pass presumes a CastPP to loop invariant. When the loop is unrolled, there is no control to pin the load in the current iteration, allowing it to float above the previous iteration's load, on which it is dependent. In Ideal_DU_postCCP, the server compile removes CastPP nodes and adjusts the control input of memnodes. If a CastPP is elided, C2 needs to be vigilant about not moving the associated load out of a [counted] loop, unless it can be determined that the CastPP is loop invariant. We could simply not check for loop invariance, and disallowed all CastPP covered loads to not float out of a loop. However, this causes a few loops in Spec to suffer code degradation. The invariance check allows Spec code to remain unchanged, and during a CTW test rt.jar, only 5 loads in 3 loops failed this additional check. ###@###.### 2002-11-22
22-11-2002

WORK AROUND create a file named .hotspot_compiler in current directory, and put the followling line in the file: exclude weblogic/xml/babel/scanner/Trie get ###@###.### 2002-11-05
05-11-2002