United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6732194 Data corruption dependent on -server/-client/-Xbatch
JDK-6732194 : Data corruption dependent on -server/-client/-Xbatch

Details
Type:
Bug
Submit Date:
2008-07-31
Status:
Closed
Updated Date:
2011-03-07
Project Name:
JDK
Resolved Date:
2011-03-07
Component:
hotspot
OS:
linux,solaris_10
Sub-Component:
compiler
CPU:
x86
Priority:
P3
Resolution:
Fixed
Affected Versions:
6,6u10
Fixed Versions:
hs14 (b04)

Related Reports
Backport:
Backport:
Backport:
Backport:
Backport:
Backport:
Backport:
Duplicate:
Relates:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.6.0_06"
Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Linux 2.6.24-19-generic (Ubuntu 8.04) i686, also reported on Debian Lenny amd64.

A DESCRIPTION OF THE PROBLEM :
Downstream bug report: https://bugzilla.wikimedia.org/show_bug.cgi?id=14610

Data corruption is observed in our application when JVM is run with -server, but not with -client or "-server -Xbatch". I suspect it's due to background compilation in the region of com.fluendo.jheora.Decode.ExtractToken(). It's a heisenbug: attempting to instrument this function with debugging statements changed the behaviour of the function in odd ways.

Reported to be a regression from 5.0, I haven't confirmed this personally.


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
* svn co http://svn.wikimedia.org/svnroot/mediawiki/trunk/cortado
* cd cortado
* ant applet-ovt
(alternatively get a jar from http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/OggHandler/cortado-ovt-stripped-wm_r31776.jar)
* wget http://pozimski.eu/itheora/data/bd1_pp.ogg

appletviewer test file test.html (adjust the .jar version number if necessary):

<html>
 <head>
 </head>
 <body>
   <applet code="com.fluendo.player.Cortado.class"
           archive="output/dist/applet/cortado-ovt-debug-wm_r36880.jar"
	   width="384" height="288">
     <param name="url" value="bd1_pp.ogg"/>
     <param name="local" value="true"/>
     <param name="duration" value="224"/>
     <param name="keepAspect" value="true"/>
     <param name="video" value="true"/>
     <param name="audio" value="false"/>
     <param name="debug" value="1"/>
   </applet>
 </body>
</html>

Then:
* appletviewer -J-Xbatch -J-server test.html
* appletviewer -J-server test.html
* appletviewer -J-client test.html


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
A video will play. Apparently some sort of German-speaking pirate.
ACTUAL -
With -J-server, it does a frame or two before the bug kicks in. Then you get corruption of the video frame, with blocks of colour appearing, and finally the corrupted token stream causes an ArrayIndexOutOfBoundsException in the application code.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Details may vary, depending on the exact contents of the garbage returned by ExtractToken() post-bug.

java.lang.ArrayIndexOutOfBoundsException: 66
	at com.fluendo.jheora.DCTDecode.ExpandToken(DCTDecode.java:542)
	at com.fluendo.jheora.Decode.unpackAndExpandToken(Decode.java:460)
	at com.fluendo.jheora.Decode.unPackVideo(Decode.java:603)
	at com.fluendo.jheora.Decode.loadAndDecode(Decode.java:655)
	at com.fluendo.jheora.State.decodePacketin(State.java:74)
	at com.fluendo.plugin.TheoraDec$2.chainFunc(TheoraDec.java:212)
	at com.fluendo.jst.Pad.chain(Pad.java:257)
	at com.fluendo.jst.Pad.push(Pad.java:271)
	at com.fluendo.plugin.Queue$1.taskFunc(Queue.java:135)
	at com.fluendo.jst.Pad.run(Pad.java:339)
	at java.lang.Thread.run(Thread.java:619)


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
Sorry, I haven't been able to isolate this any further.
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Use -Xbatch.
Customer later clarified that -Xbatch workaround is not really sufficient as it's quite invasive.  A workaround that can be used in the HTML tag or some other simple approach is needed.

                                    

Comments
EVALUATION

I can reliably reproduce this on solaris as well going back to at least build 49 of jdk 1.6.0.  The range check failure in ExpandToken appears to be caused by the arguments passed in or some other external state, since excluding it from compilation still shows a failure.  Increasing the number of compiler threads seems to encourage the failure as well.  I'm currently testing with appletviewer -J-server -J-XX:CompileOnly=com/fluendo/jheora -J-XX:CICompilerCount=4 -J-XX:+PrintCompilation -J-XX:-PrintInlining -J-XX:CompileCommand=exclude,com/fluendo/jheora/DCTDecode.ExpandToken test.html
                                     
2008-07-31
EVALUATION

I've constructed a test case that shows the problem with 1.7 though the same test case doesn't fail with 1.6 so it's possible there are multiple bugs or it could just be that it manifests differently.  It appears to be a bug in the register allocator.  It clones a compare instruction and when it's fixing up the inputs of the clone it grabs the wrong input so that instead of comparing a load with 1 it compares 1 with 1.
                                     
2008-08-05
WORK AROUND

In FrArray.java in getNextBBit use  NextBit = (byte) ( NextBit ^ 1) instead of NextBit = (byte) ( NextBit == 1  ? 0 : 1).  This avoids the pattern that needs to rematerialize compare.  It's also more efficient.  The compiler can't do this for you since it doesn't know that NextBit is always either 1 or 0.
                                     
2008-08-05
EVALUATION

while rematerializing a compI_eReg_imm the wrong input is hooked into the clone so the calculation goes wrong.  Before:

 103    compI_eReg_imm  === _  107  [[ 102 ]] #1
 401    loadConI        ===  18  [[ 102 ]] #1
 450    loadConI0       ===  18  [[ 451  102 ]] #0
 451    MachProj        ===  450  [[]] #1 !orig=[106]
 102    cmovI_reg       === _  103  401  450  [[ 99  255 ]] eq !jvms: FrArray::getNextBBit @ bci:21 FrArray::test2 @ bci:29

after:

 501    MachSpillCopy   === _  107  [[ 103 ]] 
 103    compI_eReg_imm  === _  501  [[]] #1
 401    loadConI        ===  18  [[ 102  502 ]] #1
 450    loadConI0       ===  18  [[ 451  102 ]] #0
 451    MachProj        ===  450  [[]] #1 !orig=[106]
 502    compI_eReg_imm  === _  401  [[ 102 ]] #1 !orig=103
 102    cmovI_reg       === _  502  401  450  [[ 99  255 ]] eq !jvms: FrArray::getNextBBit @ bci:21 FrArray::test2 @ bci:29


501 was produced for use by 502 because 107 is part of a multidef LRG so it needs a new LRG.  Because walkThru is true in the call to split_Rematerialize we search to find the original 107 and find the node to use from its LRG, thus defeating the whole purpose of creating 501.  This is pretty much the same bug as 6207830 though I in that case I assume that walkThru was passed as false.  I've got a hack fix for this but I think to make it work correctly split_Rematerialize needs to be substantially rearranged.  I'm going to do some more tests to make sure this is the only issue.
                                     
2008-08-05
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/ea18057223c4
                                     
2008-08-19
EVALUATION

In most cases using a multidef LRG doesn't cause problem but cases where it would definitely cause problems can be detected by code like this:

diff --git a/src/share/vm/opto/reg_split.cpp b/src/share/vm/opto/reg_split.cpp                                                       
--- a/src/share/vm/opto/reg_split.cpp                                                                                                
+++ b/src/share/vm/opto/reg_split.cpp                                                                                                
@@ -318,6 +318,27 @@ Node *PhaseChaitin::split_Rematerialize(
         }                                                                                                                           
                                                                                                                                     
         if (lidx < _maxlrg && lrgs(lidx).is_multidef()) {                                                                           
+#ifdef ASSERT                                                                                                                       
+          int defidx = 0;                                                                                                           
+          for( uint i = 0; i < b->_nodes.size(); i++ ) {                                                                            
+            if (b->_nodes[i] == def)  {                                                                                             
+              defidx = i;                                                                                                           
+              break;                                                                                                                
+            }                                                                                                                       
+          }                                                                                                                         
+          for (uint i = defidx; i < insidx; i++) {                                                                                  
+            if (n2lidx(b->_nodes[i]) == lidx) {                                                                                     
+              in->dump();                                                                                                           
+              spill->dump();                                                                                                        
+              b->_nodes[i]->dump();                                                                                                 
+              b->dump();                                                                                                            
+              C->method()->print();                                                                                                 
+              C->set_print_assembly(true);                                                                                          
+              break;                                                                                                                
+            }                                                                                                                       
+          }                                                                                                                         
+#endif                                                                                                                              
+                                                                                                                                    
           // walkThru found a multidef LRG, which is unsafe to use, so                                                              
           // just keep the original def used in the clone.                                                                          
           in = spill->in(i);
                                     
2008-10-28



Hardware and Software, Engineered to Work Together