JDK-8068881 : SIGBUS in C2 compiled method weblogic.wsee.jaxws.framework.jaxrpc.EnvironmentFactory$SimulatedWsdlDefinitions.
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8u40
  • Priority: P1
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2015-01-13
  • Updated: 2017-12-21
  • Resolved: 2015-01-19
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 JDK 9 Other
7u121Fixed 8u60Fixed 9 b49Fixed openjdk7uFixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
JDK-8047383 was fixed in 8u40b13. But issue is still reproducible.

The hs_err head is
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0xffffffff712ea184, pid=4315, tid=72
#
# JRE version: Java(TM) SE Runtime Environment (8.0_40) (build 1.8.0_40-internal-20150109165513.amurillo.hs25-40-b24-snap-b00)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.40-b24 compiled mode solaris-sparc )
# Problematic frame:
# J 87270 C2 weblogic.wsee.jaxws.framework.jaxrpc.EnvironmentFactory$SimulatedWsdlDefinitions.<init>(Lweblogic/wsee/jaxws/framework/jaxrpc/EnvironmentFactory;Lcom/sun/xml/ws/api/model/wsdl/WSDLModel;)V (313 bytes) @ 0xffffffff712ea184 [0xffffffff712e90a0+0x10e4]
#
# Core dump written. Default location: /export/local/aurora/sandbox/results/weblogic/core or core.4315
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#

---------------  T H R E A D  ---------------

Current thread (0x0000000101a37800):  JavaThread "[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon [_thread_in_Java, id=72, stack(0xffffffff2ff00000,0xffffffff30000000)]

siginfo: si_signo: 10 (SIGBUS), si_code: 1 (BUS_ADRALN), si_addr: 0xfffffff9adb9deac

Registers:
 G1=0xfffffff9adc72650 G2=0x0000000101a37800 G3=0x0000000040000000 G4=0xffffffff4f91f2a8
 G5=0xffffffff4f801388 G6=0x0000000000000000 G7=0xffffffff7b213a00 Y=0x0000000000000000
 O0=0xfffffff9adc72678 O1=0xfffffff9adb9da78 O2=0xfffffff9adc71c50 O3=0xffffffff23ebf988
 O4=0xfffffff9adc72318 O5=0xff7fffff6c000000 O6=0xffffffff2fffb3f1 O7=0xffffffff712ea17c
 L0=0xfffffff9adc714c8 L1=0xfffffff9adc71310 L2=0xfffffff9adc72640 L3=0xfffffff9adb9ead8
 L4=0xfffffff9adc72678 L5=0x0000000000000005 L6=0xffffffff23dd8d48 L7=0xfffffff9adc726c0
 I0=0xfffffff9adb9def0 I1=0xfffffff9adc71310 I2=0xfffffff9adc71ce8 I3=0xffffffff23ecaa18
 I4=0xffffffff23ec8ff8 I5=0xffffffff4f912e58 I6=0xffffffff2fffb4c1 I7=0xffffffff712dfb54
 PC=0xffffffff712ea184 nPC=0xffffffff712ea188


Top of Stack: (sp=0xffffffff2fffbbf0)
0xffffffff2fffbbf0:   fffffff9adc714c8 fffffff9adc71310
0xffffffff2fffbc00:   fffffff9adc72640 fffffff9adb9ead8
0xffffffff2fffbc10:   fffffff9adc72678 0000000000000005
0xffffffff2fffbc20:   ffffffff23dd8d48 fffffff9adc726c0
0xffffffff2fffbc30:   fffffff9adb9def0 fffffff9adc71310
0xffffffff2fffbc40:   fffffff9adc71ce8 ffffffff23ecaa18
0xffffffff2fffbc50:   ffffffff23ec8ff8 ffffffff4f912e58
0xffffffff2fffbc60:   ffffffff2fffb4c1 ffffffff712dfb54
0xffffffff2fffbc70:   fffffff9adb9b128 fffffff9adb9da78
0xffffffff2fffbc80:   fffffff9adb9dec8 0000000000000000
0xffffffff2fffbc90:   0000000000000000 3f60115555555555
0xffffffff2fffbca0:   41a354a000000000 41ae000000000000
0xffffffff2fffbcb0:   0000000000000000 0000000000000000
0xffffffff2fffbcc0:   ffffffff23e29088 ffffffff23dd8d48
0xffffffff2fffbcd0:   fffffffffffffff8 fffffff9adc714c8
0xffffffff2fffbce0:   ff7fffff6c000000 ffffffff23e29088 

Instructions: (pc=0xffffffff712ea184)
0xffffffff712ea164:   c0 72 20 48 e2 72 20 10 e2 72 20 20 e2 72 20 50
0xffffffff712ea174:   a8 10 00 08 90 10 00 14 7e cb 1f d9 01 00 00 00
0xffffffff712ea184:   ee 5e 3f bc ab 35 30 09 e6 75 20 18 c0 2d c0 15
0xffffffff712ea194:   c2 58 a0 60 ea 58 a0 70 ae 00 60 40 3a ed c7 b5 

Register to memory mapping:

G1=0xfffffff9adc72650 is an oop

EOF

JDK-8047383 was fixed in 8u40b13. But issue is still reproducible
Comments
Igor Ignatyev added a comment - 2015-01-20 21:28 since it's a regression and crash in weblogic, SQE is ok to take it into 8u40.
21-01-2015

I went though a couple of more prototypes and settled on having a separate pass that is done after post-alloc copy removal. It makes it much easier because post-alloc removes copies, which means the tracking arrays need to be adjusted all the time - the nodes that use the lrgs we track can go away, as well as the nodes around them, nodes may be removed so that the MachMerge is not longer required, etc. It's quite an error-prone process with questionable computational benefits. I also switched to using a pointer to the first use instead of having an offset, since finding the node requires much less work on average (the blocks are usually quite small), as opposed to adjusting offsets in rather large arrays of max_reg size. So here is the best solution I have so far: http://cr.openjdk.java.net/~iveresov/8068881/webrev.01/
18-01-2015

I was more concern about data collocation when several arrays are used. But it is not critical, as you said, for small arrays. So leave it as it is.
17-01-2015

Actually, I found a flaw in the algorithm. It actually must have list of nodes indices (a single use_index is not enough) that have same uses that has to be maintained before the new use if found. I'll fix that tomorrow. Example: Node1(use_node_1_lrg_1) Node2(use_node_1_lrg_1) Node3(use_node_2_lrg_2) Here Node1 and Node2 both need to be fixed up (not just Node2 as it would happen with the current implementation).
17-01-2015

Anyways, I don't really have strong strong feelings either way. Perhaps an array of structs is easier to comprehend when reading the code. I can do the change if that is your preference.
17-01-2015

Having an array of nodes for defs is a bit easier than offsets, since there can be multiple edges per node pointing to a single def. We could also encode it as int pairs though, but this will require an indirection. We could have a single array with a struct { int idx, Node* def }. But, I don't think it all makes any difference - the arrays are tiny (physical regs + spills), so in reality it's not much more 1k elements or so on average.
17-01-2015

can we have 1 array with 2 ints per element?
17-01-2015

Prototype looks good!
17-01-2015

So, I settled on following solution: I'd like to introduce a new node type to merge the inputs and make all the users in the block use it. Here is why: - A block needs to have uniform inputs for multidef lrgs (otherwise the scheduler may be confused) - It's impossible to choose one of the defs to unify the inputs. Consider having multiple different phis as inputs. Killing any one of them could improperly collapse the graph. The prototype: http://cr.openjdk.java.net/~iveresov/8068881/webrev/
17-01-2015

Avoiding phi removal is problematic, since it seems to be illegal to have nodes refer to multiple different inputs that are a part of the same multidef lrg. This breaks Scheduling::verify_good_schedule(). That's probably why this fix was introduced: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/diff/2f644f85485d/src/share/vm/opto/postaloc.cpp
16-01-2015

The cause of the problem is that post-alloc copy removal kills a phi that is a part of multidef lrg. After that the spillcopy that is one of phi's inputs goes away and that kills destroys the lrg's value on one of paths. As far as I can see, removing phis at that point may be a bad idea. I was hoping that the data-flow analysis in post-alloc should detect such a problem (a conflict of defs), however in this particular case it looses track of the phi (because of a back edge), and then uses a use (down the flow) to assume that register is live. In that case the use is of another node (of a copy) and the traversal forgets about the phi. The answer obtained by the data-flow is correct, however it is correct util the phi is there. Perhaps we should just avoid removing phis.
16-01-2015

See asm-weblogic-replay.pdf for visualization of the problem, assembler dump in asm-weblogic-replay.txt
14-01-2015

Replay command: /ldmpool/export/iggy/2015-01-09-165513.amurillo.hs25-40-b24-snapshot-HACK/bin/java -server -XX:+PrintCompilation -XX:ReplayDataFile=replay_pid2a26_compid87214.log -XX:+ReplayIgnoreInitErrors -XX:+ReplayCompiles -XX:+PrintAssembly -Djava.net.preferIPv6Addresses=false -Xmx65280M -XX:-PrintVMOptions -XX:+DisplayVMOutputToStderr -Xloggc:/ldmpool/export/iggy/ute/results/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+DisableExplicitGC -XX:+PrintFlagsFinal -da:weblogic.management.provider.ManagementService -cp /ldmpool/export/iggy/2015-01-09-165513.amurillo.hs25-40-b24-snapshot/lib/tools.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/utils/config/10.3/config-launch.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/weblogic/server/lib/weblogic_sp.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/weblogic/server/lib/weblogic.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/modules/features/weblogic.server.modules_10.3.1.0.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/weblogic/server/lib/webservices.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/modules/org.apache.ant_1.7.0/lib/ant-all.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/modules/net.sf.antcontrib_1.0.0.0_1-0b2/lib/ant-contrib.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/modules/features/com.bea.core.apache.commons.logging_1.1.0.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/modules/features/com.bea.core.apache.commons.logging.api_1.1.0.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/modules/features/com.bea.core.apache.logging_1.0.0.0.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/MedRec/domain/medrec/lib/log4j-1.2.13.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/MedRec/domain/medrec/lib/wllog4j.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/weblogic/samples/server/medrec/dist/modules/medrec-data-import.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/weblogic/samples/server/medrec/dist/modules/medrec-facade.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/samples/server/medrec/modules/medrec/domain/target/medrec-domain.jar:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/samples/server/medrec/modules/physician/domain/target/physician-domain.jar -Dweblogic.Name=MedRecServer -Djava.security.policy=/ldmpool/export/iggy/ute/results/weblogic/config/weblogic.policy -Dweblogic.management.username=weblogic -Dweblogic.management.password=welcome1 -Dlog4j.configuration=file:/ldmpool/export/iggy/ute/local/testbase/bigapps/Weblogic/Weblogic+medrec/log5j.properties weblogic.Server
14-01-2015

Analysis of the same type of crash from earlier (see asm-weblogic2.txt and weblogic-stripped-cfg3.pdf). In this case constant table base uses 3 registers, but same symptoms: B17: (B21, B24) -> (B18, B25, B63) <snip> 0xffffffff707bf884: cwbe %o0, 0x0, 0xffffffff707bfc24 ; B25 0xffffffff707bf888: mov %i5, %l6 ; <--- this need to go up ^^^^
14-01-2015

ILW=Crash, bigapps, none=HHH=P1
14-01-2015

It is kind of an RA bug, yet again. See weblogic-stripped-cfg2.pdf. B16: (B20, B23) -> (B17, B24, B62) <snip> 0xffffffff712e9af4: cwbe %o0, 0x0, 0xffffffff712e9ed4 ; B24 0xffffffff712e9af8: mov %i5, %i0 It feels like these two have been swapped. A shorten branches/delay slot filling problem?
13-01-2015

Indeed a similar-looking problem with the constant table base corruption: ffffffff712ea174 a8 10 00 08 mov %o0, %l4 ffffffff712ea178 90 10 00 14 mov %l4, %o0 ffffffff712ea17c 7e cb 1f d9 call 0xffffffff6c5b20e0 ffffffff712ea180 01 00 00 00 nop --------------- ffffffff712ea184 ee 5e 3f bc ldx [ %i0 + -68 ], %l7 ffffffff712ea188 ab 35 30 09 srlx %l4, 9, %l5 ffffffff712ea18c e6 75 20 18 stx %l3, [ %l4 + 0x18 ] ffffffff712ea190 c0 2d c0 15 clrb [ %l7 + %l5 ] ffffffff712ea194 c2 58 a0 60 ldx [ %g2 + 0x60 ], %g1 ffffffff712ea198 ea 58 a0 70 ldx [ %g2 + 0x70 ], %l5 ffffffff712ea19c ae 00 60 40 add %g1, 0x40, %l7 ffffffff712ea1a0 3a ed c7 b5 unknown With i0=0xfffffff9adb9def0 which look like a heap address
13-01-2015