JDK-4985384 : Java 1.4.2_03 C2 Compiler Crash with SIGBUS on Solaris
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 1.4.2_03
  • Priority: P1
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2004-01-29
  • Updated: 2004-04-07
  • Resolved: 2004-03-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.2_05 05Fixed
Related Reports
Relates :  
Relates :  
Description
C2 Compiler crashes repeatedly with SIGBUS with the following stack:

*>   t@11  b l@11   _start()   signal SIGBUS in  build_loop_late()


=>[1] PhaseIdealLoop::build_loop_late(0x1012b8f88, 0xffffffff12efdd50, 0x0, 0xffffffff12efdbb0, 0x0, 0x10108ae11), at 0xffffffff7cd926f4
  [2] PhaseIdealLoop::PhaseIdealLoop(0x0, 0xffffffff12efe370, 0x0, 0x7800, 0x7b68, 0x0), at 0xffffffff7ce6b824
  [3] Compile::Optimize(0xffffffff12efeca0, 0xffffffff12efe8d0, 0xffffffff12efeb20, 0x1, 0x0, 0xffffffff12efeb29), at 0xffffffff7ce8ef88
  [4] Compile::Compile(0xffffffff12efe8d0, 0xffffffff12efe908, 0x0, 0x1017d6028, 0x226, 0x1), at 0xffffffff7cfd14b4
  [5] C2Compiler::compile_method(0x1008b1160, 0xffffffff12effb50, 0x0, 0x1017d6028, 0x226, 0x0), at 0xffffffff7cfa7928
  [6] CompileBroker::invoke_compiler_on_method(0x1, 0x226, 0xffffffff7d360ac8, 0x64, 0xffffffff7d35bce0, 0x1008b34d8), at 0xffffffff7ce8d4b0
  [7] CompileBroker::compiler_thread_loop(0xffffffff7d3610e0, 0x1008b33c0, 0x100fd5fc0, 0x100fd5f80, 0xffffffff7ced2b04, 0x0), at 0xffffffff7cf134f8
  [8] JavaThread::thread_main_inner(0x1008b33c0, 0x40, 0x0, 0x0, 0x0, 0x0), at 0xffffffff7ced2b2c
  [9] _start(0x1008b33c0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffffff7cecd694

using -XX:+ShowMessageBoxOnError causes NO hs_err*log and no core file to be generated
(see:hotspot/src/share/vm/runtime/os.cpp, void os::handle_unexpected_exception ), so we 
had to attached dbx on the crashing process, and set a breakpoint on os:abort(). 

The Pattern and Callstack is reproducible.

Solaris: Solaris 9 12/03 s9s_u5wos_08b SPARC
Box: 

SunOS suncc13 5.9 Generic_112233-11 sun4u sparc SUNW,Sun-Fire-V240

suncc13:/ # psrinfo -v
Status of virtual processor 0 as of: 01/29/2004 15:47:01
  on-line since 01/28/2004 17:30:31.
  The sparcv9 processor operates at 1002 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 01/29/2004 15:47:01
  on-line since 01/28/2004 17:30:30.
  The sparcv9 processor operates at 1002 MHz,
        and has a sparcv9 floating point processor.

Memory size: 4096 Megabytes

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.2_05 generic tiger-beta2 FIXED IN: 1.4.2_05 tiger-beta2 INTEGRATED IN: 1.4.2_05 tiger-b42 tiger-beta2
14-06-2004

EVALUATION ###@###.### 2004-02-02 Thank you, Matthias The code is fine. But according to the registers we have the value 0x00000001018f7849 in the %l1. It is not aligned to the 8 bytes. This is why we have SIGBUS when using 'ldx'. %l1 is loaded from PhaseTransform::_node[] which is used internally by the hotspot server compiler to keep pointers to control and loop node descriptors. To distinguish control and loop nodes we set first bit in the pointer to 1 for control nodes. The assembler code below corresponds to the method PhaseIdealLoop::get_loop(): IdealLoopTree *get_loop( Node *n ) const { // Dead nodes have no loop, so return the top level loop instead if (!has_node(n)) return _ltree_root; assert(!has_ctrl(n), ""); return (IdealLoopTree*)_nodes[n->_idx]; } ... // Set/get control node out. Set lower bit to distinguish from IdealLoopTree bool has_ctrl( Node *n ) const { return ((intptr_t)_nodes[n->_idx]) & 1; } It has assert which guards it from accessing control nodes. But in the product version of JVM this assert is absent. So, please, run debug build to catch the situation when we could get this assert. I would like to have the core file also to investigate it by myself. I will add this evaluation to the bug report. Thanks, Vladimir Matthias Schmidt wrote: > > Hi Rob, > > can you check, which C++ Sourcecode-line translates to this operation: > > 0xffffffff7cd926f4: build_loop_late+0x0414: ldx [%l1 + 0x10], %g2 > > giving these registers: > > (dbx) regs > current thread: t@11 > current frame: [1] > g0-g1 0x0000000000000000 0x0000000000007800 > g2-g3 0x0000000101f08160 0x0000000000005b88 > g4-g5 0x0000000000000b71 0x0000000000002000 > g6-g7 0x0000000000000000 0xffffffff7e102800 > o0-o1 0x0000000000000001 0x0000000000000000 > o2-o3 0x00000001019b3448 0xffffffff12efdbb0 > o4-o5 0x0000000000000000 0x0000000101c1d3e0 > o6-o7 0xffffffff12efd251 0xffffffff7cd92644 > l0-l1 0xffffffff7cd94cd0 0x00000001018f7849 > l2-l3 0x00000001018e3900 0xffffffff12efdce8 > l4-l5 0x00000001019b3410 0xffffffff12efdbd0 > l6-l7 0xffffffff12efdd08 0xffffffff7d3774c0 > i0-i1 0x00000001019b34c8 0xffffffff12efdd50 > i2-i3 0x0000000000000000 0xffffffff12efdbb0 > i4-i5 0x0000000000000000 0x0000000101f32381 > i6-i7 0xffffffff12efd301 0xffffffff7ce6b824 > y 0x0000000040000000 > ccr 0x0000000000000099 > pc 0xffffffff7cd926f4:build_loop_late+0x414 ldx [%l1 + 0x10], %g2 > npc 0xffffffff7cd92700:build_loop_late+0x420 add %l1, 0x30, %o0 > ###@###.### 2004-02-20 There is optimization to move nodes outside of a loop when they are used only outside. But there is problem to keep Load nodes from returning back inside the loop. A control edge to a node outside of the loop is put on loads (fix for 4641526 in 2002) to force them to not return back to the loop during GVN optimization. Unfortunately (our 4985384) it is against the rule which is IfNode should have only 2 outs: IfTrueNode and IfFalseNode. Additional outs in IfNode don't allow to remove a dead IfNode since it still has out edges after IfTrue and IfFalse are gone. This leads to incorrect graph and the problem we see. Also the code generator expects only 2 outs in IfNode. It treats additional outs as optimization failure and the method is marked as not compilable any more.
11-06-2004

SUGGESTED FIX ###@###.### 2004-02-20 Don't use IfNode as a control edge for LoadNode. ###@###.### 2004-03-01 In common case we have the same problem with any MultiNode not just IfNode. Don't use MultiNode as a control node when we set the control edge for LoadNode. Fix in src/share/vm/opto/loopopts.cpp: 642a643,659 > //------------------------------set_ctrl_for_Load------------------------------ > /* > A control edge to a CFG node outside of the loop is put on LoadNode clones > (fix for 4641526) to force them to not combine and return back inside the loop > during GVN optimization. But MultiNode should have only ProjNode as outs. > For example, IfNode should have only 2 outs: IfTrueNode and IfFalseNode. > Additional outs in IfNode don't allow to remove a dead IfNode (4985384). > Also final_graph_reshaping() checks IfNode explicitly for 2 outs. > */ > inline static Node *set_ctrl_for_Load( Node *n, Node *ctrl ) { > if (ctrl != NULL && ctrl->is_Multi()) > ctrl = ctrl->in(0); // Use MultiNode's control edge instead > n->set_req(0, ctrl); > return ctrl; > } > > 806c823,824 < if( x->is_Load() ) x->set_req(0,u_ctrl); // force new use (outside of loops) --- > if( x->is_Load() ) > u_ctrl = set_ctrl_for_Load(x, u_ctrl); 814c832,833 < if( x->is_Load() ) x->set_req(0,u_ctrl); // force new use (outside of loops) --- > if( x->is_Load() ) > u_ctrl = set_ctrl_for_Load(x, u_ctrl);
11-06-2004