United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4985384 Java 1.4.2_03 C2 Compiler Crash with SIGBUS on Solaris
JDK-4985384 : Java 1.4.2_03 C2 Compiler Crash with SIGBUS on Solaris

Details
Type:
Bug
Submit Date:
2004-01-29
Status:
Resolved
Updated Date:
2004-04-07
Project Name:
JDK
Resolved Date:
2004-03-08
Component:
hotspot
OS:
generic
Sub-Component:
compiler
CPU:
generic
Priority:
P1
Resolution:
Fixed
Affected Versions:
1.4.2_03
Fixed Versions:
1.4.2_05 (05)

Related Reports
Backport:
Relates:
Relates:

Sub Tasks

Description
C2 Compiler crashes repeatedly with SIGBUS with the following stack:

*>   t@11  b l@11   _start()   signal SIGBUS in  build_loop_late()


=>[1] PhaseIdealLoop::build_loop_late(0x1012b8f88, 0xffffffff12efdd50, 0x0, 0xffffffff12efdbb0, 0x0, 0x10108ae11), at 0xffffffff7cd926f4
  [2] PhaseIdealLoop::PhaseIdealLoop(0x0, 0xffffffff12efe370, 0x0, 0x7800, 0x7b68, 0x0), at 0xffffffff7ce6b824
  [3] Compile::Optimize(0xffffffff12efeca0, 0xffffffff12efe8d0, 0xffffffff12efeb20, 0x1, 0x0, 0xffffffff12efeb29), at 0xffffffff7ce8ef88
  [4] Compile::Compile(0xffffffff12efe8d0, 0xffffffff12efe908, 0x0, 0x1017d6028, 0x226, 0x1), at 0xffffffff7cfd14b4
  [5] C2Compiler::compile_method(0x1008b1160, 0xffffffff12effb50, 0x0, 0x1017d6028, 0x226, 0x0), at 0xffffffff7cfa7928
  [6] CompileBroker::invoke_compiler_on_method(0x1, 0x226, 0xffffffff7d360ac8, 0x64, 0xffffffff7d35bce0, 0x1008b34d8), at 0xffffffff7ce8d4b0
  [7] CompileBroker::compiler_thread_loop(0xffffffff7d3610e0, 0x1008b33c0, 0x100fd5fc0, 0x100fd5f80, 0xffffffff7ced2b04, 0x0), at 0xffffffff7cf134f8
  [8] JavaThread::thread_main_inner(0x1008b33c0, 0x40, 0x0, 0x0, 0x0, 0x0), at 0xffffffff7ced2b2c
  [9] _start(0x1008b33c0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffffff7cecd694

using -XX:+ShowMessageBoxOnError causes NO hs_err*log and no core file to be generated
(see:hotspot/src/share/vm/runtime/os.cpp, void os::handle_unexpected_exception ), so we 
had to attached dbx on the crashing process, and set a breakpoint on os:abort(). 

The Pattern and Callstack is reproducible.

Solaris: Solaris 9 12/03 s9s_u5wos_08b SPARC
Box: 

SunOS suncc13 5.9 Generic_112233-11 sun4u sparc SUNW,Sun-Fire-V240

suncc13:/ # psrinfo -v
Status of virtual processor 0 as of: 01/29/2004 15:47:01
  on-line since 01/28/2004 17:30:31.
  The sparcv9 processor operates at 1002 MHz,
        and has a sparcv9 floating point processor.
Status of virtual processor 1 as of: 01/29/2004 15:47:01
  on-line since 01/28/2004 17:30:30.
  The sparcv9 processor operates at 1002 MHz,
        and has a sparcv9 floating point processor.

Memory size: 4096 Megabytes

                                    

Comments
SUGGESTED FIX


###@###.### 2004-02-20

Don't use IfNode as a control edge for LoadNode.

###@###.### 2004-03-01

In common case we have the same problem with any MultiNode not just IfNode.
Don't use MultiNode as a control node when we set the control edge for LoadNode.

Fix in src/share/vm/opto/loopopts.cpp:

642a643,659
> //------------------------------set_ctrl_for_Load------------------------------
> /*
>  A control edge to a CFG node outside of the loop is put on LoadNode clones
>  (fix for 4641526) to force them to not combine and return back inside the loop 
>  during GVN optimization. But MultiNode should have only ProjNode as outs.
>  For example, IfNode should have only 2 outs: IfTrueNode and IfFalseNode.
>  Additional outs in IfNode don't allow to remove a dead IfNode (4985384). 
>  Also final_graph_reshaping() checks IfNode explicitly for 2 outs. 
> */
> inline static Node *set_ctrl_for_Load( Node *n, Node *ctrl ) {
>   if (ctrl != NULL && ctrl->is_Multi())
>     ctrl = ctrl->in(0); // Use MultiNode's control edge instead
>   n->set_req(0, ctrl);
>   return ctrl;
> }
> 
> 
806c823,824
<             if( x->is_Load() ) x->set_req(0,u_ctrl); // force new use (outside of loops)
---
>             if( x->is_Load() ) 
>               u_ctrl = set_ctrl_for_Load(x, u_ctrl);
814c832,833
<             if( x->is_Load() ) x->set_req(0,u_ctrl); // force new use (outside of loops)
---
>             if( x->is_Load() ) 
>               u_ctrl = set_ctrl_for_Load(x, u_ctrl);
                                     
2004-06-11
EVALUATION


###@###.### 2004-02-02

Thank you, Matthias

The code is fine. But according to the registers we have the value 
0x00000001018f7849 in the %l1. It is not aligned to the 8 bytes.
This is why we have SIGBUS when using 'ldx'.

%l1 is loaded from PhaseTransform::_node[] which is used internally
by the hotspot server compiler to keep pointers to control and loop
node descriptors. To distinguish control and loop nodes we set 
first bit in the pointer to 1 for control nodes. 

The assembler code below corresponds to the method PhaseIdealLoop::get_loop():

  IdealLoopTree *get_loop( Node *n ) const {
    // Dead nodes have no loop, so return the top level loop instead
    if (!has_node(n))  return _ltree_root;
    assert(!has_ctrl(n), "");
    return (IdealLoopTree*)_nodes[n->_idx];
  }
...
  // Set/get control node out.  Set lower bit to distinguish from IdealLoopTree
  bool has_ctrl( Node *n ) const { return ((intptr_t)_nodes[n->_idx]) & 1; }

It has assert which guards it from accessing control nodes. But in the
product version of JVM this assert is absent. 

So, please, run debug build to catch the situation when we could get this assert.
I would like to have the core file also to investigate it by myself. 

I will add this evaluation to the bug report.

Thanks,
Vladimir


Matthias Schmidt wrote:
> 
> Hi Rob,
> 
> can you check, which C++ Sourcecode-line translates to this operation:
> 
> 0xffffffff7cd926f4: build_loop_late+0x0414:     ldx     [%l1 + 0x10], %g2
> 
> giving these registers:
> 
> (dbx) regs
> current thread: t@11
> current frame:  [1]
> g0-g1    0x0000000000000000 0x0000000000007800
> g2-g3    0x0000000101f08160 0x0000000000005b88
> g4-g5    0x0000000000000b71 0x0000000000002000
> g6-g7    0x0000000000000000 0xffffffff7e102800
> o0-o1    0x0000000000000001 0x0000000000000000
> o2-o3    0x00000001019b3448 0xffffffff12efdbb0
> o4-o5    0x0000000000000000 0x0000000101c1d3e0
> o6-o7    0xffffffff12efd251 0xffffffff7cd92644
> l0-l1    0xffffffff7cd94cd0 0x00000001018f7849
> l2-l3    0x00000001018e3900 0xffffffff12efdce8
> l4-l5    0x00000001019b3410 0xffffffff12efdbd0
> l6-l7    0xffffffff12efdd08 0xffffffff7d3774c0
> i0-i1    0x00000001019b34c8 0xffffffff12efdd50
> i2-i3    0x0000000000000000 0xffffffff12efdbb0
> i4-i5    0x0000000000000000 0x0000000101f32381
> i6-i7    0xffffffff12efd301 0xffffffff7ce6b824
> y        0x0000000040000000
> ccr      0x0000000000000099
> pc       0xffffffff7cd926f4:build_loop_late+0x414       ldx     [%l1 + 0x10], %g2
> npc      0xffffffff7cd92700:build_loop_late+0x420       add     %l1, 0x30, %o0
> 


###@###.### 2004-02-20

There is optimization to move nodes outside of a loop when
they are used only outside. But there is problem to keep
Load nodes from returning back inside the loop.
A control edge to a node outside of the loop is put on 
loads (fix for 4641526 in 2002) to force them to not 
return back to the loop during GVN optimization.
Unfortunately (our 4985384) it is against the rule which is
IfNode should have only 2 outs: IfTrueNode and IfFalseNode.
Additional outs in IfNode don't allow to remove a dead IfNode 
since it still has out edges after IfTrue and IfFalse are gone. 
This leads to incorrect graph and the problem we see.
Also the code generator expects only 2 outs in IfNode. 
It treats additional outs as optimization failure
and the method is marked as not compilable any more. 

                                     
2004-06-11
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
1.4.2_05
generic
tiger-beta2

FIXED IN:
1.4.2_05
tiger-beta2

INTEGRATED IN:
1.4.2_05
tiger-b42
tiger-beta2


                                     
2004-06-14



Hardware and Software, Engineered to Work Together