JDK-4436388 : VMark intermitten failure on Solaris Sparc with Merlin b58 -client flag
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 1.4.0
  • Priority: P1
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: sparc
  • Submitted: 2001-04-10
  • Updated: 2001-04-10
  • Resolved: 2001-04-10
Related Reports
Duplicate :  
Description
With Merlin b58, -client flag, VMARK intermittenly failed on machine jtgb4u2b
(Solaris 2.6 Sparc)
[Thu Apr 05 18:45:05 PDT 2001] Error handling packet from jtgb4u2b. (java.lang.A
rithmeticException: / by zero)
log files are availeble under /net/jtgb4u4c.eng/export/sail14/bigapps_log/solaris/merlin_b58_client

Coleen Phillimore's evaluation:
This failure happens an hour after the tests are started so they don't really
pass.  If you put a bit of run.vmark.out where it shows the arithmetic
exception that would be enough distinguishing information.  I don't know how
to debug this further myself but it's probably a client compiler problem.


june.zhong@eng 2001-04-09

Comments
EVALUATION ArithmeticException: / by zero is a known issue, same as bug 4427606. This bug will be closed as a duplicate of 4427606. Thomas Rodriguez evaluation: So we've tracked down the source of the bug. Here's a little snippet of assembly which shows the problem: 0xf9cf5778: nop ;*ldiv ; - java.text.DigitList::set@89 0xf9cf577c: orcc %o1, %o0, %g0 0xf9cf5780: be %icc, f9cf59e0 ; {safepoint} 0xf9cf5784: nop ;*ldiv ; - java.text.DigitList::set@89 This is the code emitted by explicit_div_by_zero_check(). The "be" instruction is marked as a safepoint so if we safepoint there, the condition codes will get destroyed. It used to be that the illegal_instruction_handler for sparc preserved the condition codes but it was believed that we didn't need to anymore and the code was removed, which uncovered this problem with lrem/ldiv. Since they weren't preserved, it was somewhat random whether it threw a div by zero error or not. It could also cause a crash since the oopmap for this safepoint probably isn't correct which is most likely why one of the bugs was a crash on linux. The fix suggested by srdjan is to make sure that none of the LIR branches are safepoints. The smallest fix for this for merlin_beta is to just swallow the CodeEmitInfos inside LIR_List::branch. For beta refresh, we'll clean up the LIR interface to reflect these changes. I've tested with moes program and the problem isn't reproducible. I also tested Bug4407042.java and didn't see failures but I didn't always see failures without the fix. Please review the webrev at: http://javaweb/~never/webrev/divbyzero gary.collins@East 2001-04-10
10-04-2001