Bug ID: JDK-6360541 5.0u6 assert gcm.cpp:1306 assert(bs->has_valid

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 5.0u6

Priority: P2
Status: Resolved
Resolution: Fixed
OS: linux_redhat_3.0
CPU: other

Submitted: 2005-12-08
Updated: 2010-04-02
Resolved: 2006-04-04

Other	JDK 6
5.0u8Fixed	6 b79Fixed

5.0u6 product and fastdebug failed on compilation on method:
oracle/jdbc/driver/NumberCommonAccessor  getBigDecimal
in product mode, it failed on Output:
const StartNode *start = entry->_nodes[0]->is_Start();
return a NULL, the following for this stat node failed.
For fastdebug, it failed on:  gcm.cpp:1306
assert(bs->has_valid_counts(), "Bad goto frequency/count assignment");

java version "1.5.0_06"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05)
Java HotSpot(TM) Server VM (build 1.5-internal-debug, mixed mode)

SUGGESTED FIX http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2006/20060323113135.never.6360541/workspace/webrevs/webrev-2006.03.23/index.html
24-03-2006
EVALUATION I've induced a crash similar to the original reported crash by allowing the interpreter invocation count to overflow, resulting the in the start block being in the wrong place so this really appears to be the problem.
01-02-2006
EVALUATION The educated guess is that Block:is_uncommon() is moving the start block due to the negative frequency. That can be easily changed.
19-01-2006
EVALUATION I took a look at the core file from the report and I think I know at least part of what's going on. I found the PhaseCFG* at 0x6cb0f910 and from there found the frequency and count of the _broot is large and negative. (gdb) x/2f 0x40+0x64ba0648 0x64ba0688: -862164.562 -862164.562 Estimate_Block_Frequency starts off with this: int cnts = C->method() ? C->method()->interpreter_invocation_count() : 1; if( cnts == 0 ) cnts = 1; float f = (float)cnts/(float)FreqCountInvocations; _broot->_freq = f; _broot->_cnt = f; The ciMethod for the compile is 0x0bd18100 which gives a methodOop at 0x6fa40488. (gdb) x/20 0x6fa40488 0x6fa40488: 0x00000001 0x6d610ce0 0x6fa38428 0x6fa30df8 0x6fa40498: 0x6fb67418 0xb2eab203 0xc1000000 0x0000001b The interpreter invocation count is at offset 20 which is 0xb2eab203 or -1293241853. -1293241853/FreqCountInvocations / -1293241853/(float)1500 == -862161.23733333335 which is close to -862164.562. Assuming the interpreter is still running it's possible it changed before the core was actually dumped. Certainly the code doesn't guard against the possibility that interpreter_invocation_count overflows. i just reproduced this in mustang with /net/smite.never/iic.java. 2% iic::large_method @ 18216 (18364 bytes) # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/gcm.cpp:1360] # # An unexpected error has been detected by Java Runtime Environment: # # Internal Error (/BUILD_AREA/jdk6.0/hotspot/src/share/vm/opto/gcm.cpp, 1360 [ Patched ]), pid=17288, tid=10 # # Java VM: Java HotSpot(TM) Server VM (1.6.0-rc-fastdebug-b67-debug mixed mode) # # Error: assert(bs->has_valid_counts(),"Bad goto frequency/count assignment") # An error report file with more information is saved as hs_err_pid17288.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # Current thread is 10 Dumping core ... Abort (core dumped) smite ~ % It takes about an hour when run with -XX:+PrintCompilation -XX:+PrintOptoAssembly -XX:+LogCompilation in fastdebug, and it doesn't seem to crash in product so it may not be the root cause of the escalation. Further investigation of the core file seems to indicate that the block containing the StartOSRNode has been scheduled later in the block list than it expected. This causes this piece of code to return NULL for is_Start because it's looking at the wrong block: Block entry = _cfg->_blocks[1]; Block broot = _cfg->_broot; const StartNode *start = entry->_nodes[0]->is_Start(); cfg->_blocks[1] isn't the successor of broot so it can't find the start node. I do think that the negative frequencies are the source of this since we attempt to move low frequency blocks to the end of the list so they are out of line. The block containing the Start node probably shouldn't ever be moved. It wouldn't hurt to use broot->succ[0]->_nodes[0]->is_Start to find the start node too. Additionally ciMethod::interpeter_invocation_count should probably be modified to only return positive values to avoid the sign problems with the frequencies.
19-01-2006
EVALUATION The compilationPolicy issue has been opened under a separate bug, 6372116. The underlying cause of the failure is not yet understood.
12-01-2006
EVALUATION The failure is happening on an OSR compile of a huge method, which has more than 17000 bytes of bytecodes. Our compilation policy avoids compiling such methods on entry, but not for OSR. It is easy to fix compilationPolicy.cpp to not do such OSR compiles. Such a solution will help the customer, but just makes the underlying bug latent. If the customer is willing to supply more information, we can better attack the base issue.
11-01-2006