JDK-8189064 : Crash with compiler/codegen/*Vect.java on Solaris-sparc
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 10
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: solaris
  • CPU: sparc
  • Submitted: 2017-10-09
  • Updated: 2020-09-01
  • Resolved: 2017-10-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10
10 b33Fixed
Related Reports
Relates :  
Description
The following tests failed in hs-nightly 2017-10-06 and 2017-10-05

compiler/codegen/TestBooleanVect.java
compiler/codegen/TestByteFloatVect.java
compiler/codegen/TestByteIntVect.java
compiler/codegen/TestByteShortVect.java
compiler/codegen/TestByteVect.java
compiler/codegen/TestShortFloatVect.java
compiler/codegen/TestShortIntVect.java

----------System.out:(35/1976)----------
Testing Boolean vectors
Warmup
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0xffffffff6aecf514, pid=33002, tid=71
#
# JRE version: Java(TM) SE Runtime Environment (10.0) (fastdebug build 10-internal+0-2017-10-07-0300098.jesper.wilhelmsson.hs)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 10-internal+0-2017-10-07-0300098.jesper.wilhelmsson.hs, mixed mode, compressed oops, g1 gc, solaris-sparc)
# Problematic frame:
# J 453% c2 compiler.codegen.TestBooleanVect.test()I (5156 bytes) @ 0xffffffff6aecf514 [0xffffffff6aeceea0+0x0000000000000674]
#
# Core dump will be written. Default location: /scratch/opt/mach5/mesos/work_dir/e865ca56-11f0-408c-a274-1f8285a45d26/testoutput/jtreg/JTwork/scratch/9/core or core.33002
#
# An error report file with more information is saved as:
# /scratch/opt/mach5/mesos/work_dir/e865ca56-11f0-408c-a274-1f8285a45d26/testoutput/jtreg/JTwork/scratch/9/hs_err_pid33002.log
Compiled method (c2)   16224  453 %           compiler.codegen.TestBooleanVect::test @ 22 (5156 bytes)
 total in heap  [0xffffffff6aecec90,0xffffffff6aed3920] = 19600
 relocation     [0xffffffff6aecee10,0xffffffff6aecee68] = 88
 constants      [0xffffffff6aecee80,0xffffffff6aeceea0] = 32
 main code      [0xffffffff6aeceea0,0xffffffff6aed06e0] = 6208
 stub code      [0xffffffff6aed06e0,0xffffffff6aed0730] = 80
 oops           [0xffffffff6aed0730,0xffffffff6aed0738] = 8
 metadata       [0xffffffff6aed0738,0xffffffff6aed0870] = 312
 scopes data    [0xffffffff6aed0870,0xffffffff6aed10b0] = 2112
 scopes pcs     [0xffffffff6aed10b0,0xffffffff6aed3900] = 10320
 dependencies   [0xffffffff6aed3900,0xffffffff6aed3908] = 8
 nul chk table  [0xffffffff6aed3908,0xffffffff6aed3920] = 24
Could not load hsdis-sparcv9.so; library not loadable; PrintAssembly is disabled
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Current thread is 71
Dumping core ...
----------System.err:(0/0)----------

Comments
Great! The fix looks reasonable to me and the test results look good as well. I've reviewed the changes: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-October/027370.html
27-10-2017

Evaluation. test() methods inlines several methods with loops some of which are not vectorized - for example, test_cp_oppos(). New code from JDK-8187601 will trigger an other round of loopopts to try to unroll that loop more. But that also trigger second round to vectorize loops. To avoid second vectorization we have cl->is_vectorized_loop() check in SuperWord::transform_loop(): http://hg.openjdk.java.net/jdk10/hs/file/c6d2381c6932/src/hotspot/share/opto/superword.cpp#l146 Unfortunately cl->mark_loop_vectorized() is called in SuperWord::output() under several conditions: http://hg.openjdk.java.net/jdk10/hs/file/c6d2381c6932/src/hotspot/share/opto/superword.cpp#l2442 and one of them (compare vector length with unroll count) is not true on SPARC because it has very small vectors (8 bytes): http://hg.openjdk.java.net/jdk10/hs/file/c6d2381c6932/src/hotspot/share/opto/superword.cpp#l2423 as result cl->mark_loop_vectorized() is not called. The fix is simply call cl->mark_loop_vectorized() when vectors are generated. I also modified JDK-8187601 changes to trigger loop optimization only when main loop is not vectorized: diff -r c6d2381c6932 src/hotspot/share/opto/superword.cpp --- a/src/hotspot/share/opto/superword.cpp +++ b/src/hotspot/share/opto/superword.cpp @@ -2168,10 +2168,12 @@ CountedLoopNode *cl = lpt()->_head->as_CountedLoop(); Compile* C = _phase->C; if (_packset.length() == 0) { - // Instigate more unrolling for optimization when vectorization fails. - C->set_major_progress(); - cl->set_notpassed_slp(); - cl->mark_do_unroll_only(); + if (cl->is_main_loop()) { + // Instigate more unrolling for optimization when vectorization fails. + C->set_major_progress(); + cl->set_notpassed_slp(); + cl->mark_do_unroll_only(); + } return; } @@ -2417,6 +2419,9 @@ }//for (int i = 0; i < _block.length(); i++) C->set_max_vector_size(max_vlen_in_bytes); + if (max_vlen_in_bytes > 0) { + cl->mark_loop_vectorized(); + } if (SuperWordLoopUnrollAnalysis) { if (cl->has_passed_slp()) { @@ -2439,7 +2444,6 @@ } if (do_reserve_copy()) { - cl->mark_loop_vectorized(); if (can_process_post_loop) { // Now create the difference of trip and limit and use it as our mask index. // Note: We limited the unroll of the vectorized loop so that
27-10-2017

Note, on x86 we have only one vectorization since both arrays are vectorized at the same time.
26-10-2017

We vectorize only one array access during first attempt because we can't align 2 different types arrays: http://hg.openjdk.java.net/jdk10/hs/file/c6d2381c6932/src/hotspot/share/opto/superword.cpp#l626 We need to prevent second vectorization.
26-10-2017

Yes - it does vectorization twice. Which is NO-NO on SPARC where unalighed acces is not allowed. First it vectorize byte[] stores and second float[]: memory slices: 0 1359 Phi === 1356 757 442 [[ 1293 ]] #memory Memory: @float[int:>=0]:exact+any *, idx=7; !orig=1089,912,811,[475],[473],[378] !jvms: test3::test_ci_unaln @ bci:10 test3::test @ bci:34 442 StoreF === 1356 797 440 407 [[ 779 1359 ]] @float[int:>=0]:exact+any *, idx=7; Memory: @float[int:>=0]:NotNull:exact+any *, idx=7; !orig=819 !jvms: test3::test_ci_unaln @ bci:21 test3::test @ bci:34 1 1360 Phi === 1356 761 1672 [[ 1670 ]] #memory Memory: @byte[int:>=0]:exact+any *, idx=6; !orig=1088,913,810,[380] !jvms: test3::test_ci_unaln @ bci:10 test3::test @ bci:34 1672 StoreVector === 1356 1670 1060 1669 [[ 1360 781 ]] @byte[int:>=0]:NotNull:exact+any *, idx=6; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=6; !orig=[1059],[895],[801],[406],[820] !jvms: test3::test_ci_unaln @ bci:16 test3::test @ bci:34
26-10-2017

This problem reminds me of JDK-8078497 I've fixed some years ago. Looking at the -XX:+TraceSuperWord and -XX:+TraceLoopOpts output it seems that we are vectorizing the same loop twice: After filter_packs packset Pack: 0 align: 0 1175 StoreB === 1201 1204 1176 259 [[ 1173 ]] @byte[int:>=0]:exact+any *, idx=6; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=6; !orig=1035,850,281,[612],[336],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:16 TestByteFloatVect::test @ bci:45 align: 1 1173 StoreB === 1201 1175 1174 259 [[ 1171 ]] @byte[int:>=0]:exact+any *, idx=6; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=6; !orig=1033,281,[612],[336],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:16 TestByteFloatVect::test @ bci:45 align: 2 1171 StoreB === 1201 1173 1172 259 [[ 1169 ]] @byte[int:>=0]:exact+any *, idx=6; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=6; !orig=850,281,[612],[336],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:16 TestByteFloatVect::test @ bci:45 align: 3 1169 StoreB === 1201 1171 1170 259 [[ 1035 ]] @byte[int:>=0]:exact+any *, idx=6; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=6; !orig=281,[612],[336],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:16 TestByteFloatVect::test @ bci:45 align: 4 1035 StoreB === 1201 1169 1036 259 [[ 1033 ]] @byte[int:>=0]:exact+any *, idx=6; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=6; !orig=850,281,[612],[336],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:16 TestByteFloatVect::test @ bci:45 align: 5 1033 StoreB === 1201 1035 1034 259 [[ 850 ]] @byte[int:>=0]:exact+any *, idx=6; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=6; !orig=281,[612],[336],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:16 TestByteFloatVect::test @ bci:45 align: 6 850 StoreB === 1201 1033 851 259 [[ 833 281 ]] @byte[int:>=0]:exact+any *, idx=6; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=6; !orig=281,[612],[336],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:16 TestByteFloatVect::test @ bci:45 align: 7 281 StoreB === 1201 850 279 259 [[ 1204 829 ]] @byte[int:>=0]:exact+any *, idx=6; Memory: @byte[int:>=0]:NotNull:exact+any *, idx=6; !orig=[612],[336],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:16 TestByteFloatVect::test @ bci:45 SuperWord::output Loop: N1201/N329 counted [int,int),+8 (1025 iters) main has_sfpt [...] After filter_packs packset Pack: 0 align: 0 1165 StoreF === 1201 1205 1166 282 [[ 1161 ]] @float[int:>=0]:exact+any *, idx=7; Memory: @float[int:>=0]:NotNull:exact+any *, idx=7; !orig=1026,846,317,[617],[337],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:21 TestByteFloatVect::test @ bci:45 align: 4 1161 StoreF === 1201 1165 1162 282 [[ 1157 ]] @float[int:>=0]:exact+any *, idx=7; Memory: @float[int:>=0]:NotNull:exact+any *, idx=7; !orig=1022,317,[617],[337],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:21 TestByteFloatVect::test @ bci:45 Pack: 1 align: 0 1157 StoreF === 1201 1161 1158 282 [[ 1153 ]] @float[int:>=0]:exact+any *, idx=7; Memory: @float[int:>=0]:NotNull:exact+any *, idx=7; !orig=846,317,[617],[337],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:21 TestByteFloatVect::test @ bci:45 align: 4 1153 StoreF === 1201 1157 1154 282 [[ 1026 ]] @float[int:>=0]:exact+any *, idx=7; Memory: @float[int:>=0]:NotNull:exact+any *, idx=7; !orig=317,[617],[337],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:21 TestByteFloatVect::test @ bci:45 Pack: 2 align: 0 1026 StoreF === 1201 1153 1027 282 [[ 1022 ]] @float[int:>=0]:exact+any *, idx=7; Memory: @float[int:>=0]:NotNull:exact+any *, idx=7; !orig=846,317,[617],[337],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:21 TestByteFloatVect::test @ bci:45 align: 4 1022 StoreF === 1201 1026 1023 282 [[ 846 ]] @float[int:>=0]:exact+any *, idx=7; Memory: @float[int:>=0]:NotNull:exact+any *, idx=7; !orig=317,[617],[337],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:21 TestByteFloatVect::test @ bci:45 Pack: 3 align: 0 846 StoreF === 1201 1022 847 282 [[ 834 317 ]] @float[int:>=0]:exact+any *, idx=7; Memory: @float[int:>=0]:NotNull:exact+any *, idx=7; !orig=317,[617],[337],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:21 TestByteFloatVect::test @ bci:45 align: 4 317 StoreF === 1201 846 315 282 [[ 827 1205 ]] @float[int:>=0]:exact+any *, idx=7; Memory: @float[int:>=0]:NotNull:exact+any *, idx=7; !orig=[617],[337],[334],[370],[401] !jvms: TestByteFloatVect::test_ci_unaln @ bci:21 TestByteFloatVect::test @ bci:45 SuperWord::output Loop: N1201/N329 counted [int,int),+8 (1025 iters) main has_sfpt Could it be that we are adjusting the pre-loop twice (once to align StoreB and then again to align StoreF in the main loop)? This would mean that the vectorized StoreB ends up unaligned which is consistent with what we see in the assembly code.
26-10-2017

For me this 8189064 task is taking more time than planned and understood at this moment this is the only one integration-blocker. Yes, 8187601 only triggered the hidden issue. But since this is blocking integration may I propose to please revert the same 8187601 and remove integration-blocker for now, as not able find a fix for 8189064 immediately?
26-10-2017

Hmm, looks like a bad type for vectors when additional unrolling happens: std %f12, [ %l2 + 0x15 ] ;*bastore . It seems it thinks it is byte store so the offset is odd 0x15. It could be problem existed before JDK-8187601. We usually did not unroll after vectorization especially on SPARC. So rolling back JDK-8187601 will simple hide the problem. I would suggest to use TestByteVect.java for investigation instead to have only one type of arrays/vectors.
25-10-2017

[Attachment : hs_err_pid3590.log] BUS error due to store to unaligned address? ............ >>>>>> # SIGBUS (0xa) at pc=0xffffffff739e0e54, pid=3590, tid=2 ............ # J 33% c2 TestByteFloatVect.test()V (44 bytes) @ 0xffffffff739e0e54 [0xffffffff739e0be0+0x0000000000000274] ............ L2=0x0000000200c11b4c is pointing into object: 0x0000000200c11b48 [B {0x0000000200c11b48} - klass: {type array byte} - length: 997 - 0: 0 ............. ;; B37: # B37 B38 <- B36 B37 Loop: B37-B37 inner main of N199 Freq: 1.0725e+08 0xffffffff739e0e38: signx %o0, %l2 0xffffffff739e0e3c: sllx %l2, 2, %o4 0xffffffff739e0e40: add %l5, %o4, %o4 0xffffffff739e0e44: std %f8, [ %o4 + 0x10 ] 0xffffffff739e0e48: std %f8, [ %o4 + 0x18 ] 0xffffffff739e0e4c: add %g4, %l2, %l2 0xffffffff739e0e50: std %f8, [ %o4 + 0x20 ] ;*fastore {reexecute=0 rethrow=0 return_oop=0} ; - TestByteFloatVect::test_ci_unaln@21 (line 29) ; - TestByteFloatVect::test@34 (line 16) >>>>>>>> 0xffffffff739e0e54: std %f12, [ %l2 + 0x15 ] ;*bastore {reexecute=0 rethrow=0 return_oop=0} ; - TestByteFloatVect::test_ci_unaln@16 (line 28) ; - TestByteFloatVect::test@34 (line 16) .............
25-10-2017

Found the basic thing done or the effect of the other 8187601 fix was, the number of times loop gets unrolled increased, and this seems triggered the crash issue in solaris-sparc. Sample reduced test - ---------------------------------------------- public class TestByteFloatVect { private static final int ARRLEN = 997; private static final int ITERS = 11000; public static void main(String args[]) { test(); } static void test() { byte[] a1 = new byte[ARRLEN]; byte[] a2 = new byte[ARRLEN]; float[] b1 = new float[ARRLEN]; for (int i=0; i<ITERS; i++) { test_cp_oppos(a1, a2); test_ci_unaln(a1, b1); } } static void test_cp_oppos(byte[] a, byte[] b) { int limit = a.length-1; for (int i = 0; i < a.length; i+=1) { a[i] = b[limit-i]; } } static void test_ci_unaln(byte[] a, float[] b) { for (int i = 0; i < a.length-5; i+=1) { a[i+5] = 99; b[i] = 77.0f; } } } ---------------------------------------------- e.g.: $ ....build/solaris-sparcv9-debug/jdk/bin/java -XX:-CreateCoredumpOnCrash -XX:-TransmitErrorReport -XX:CompileThreshold=1 -Xbatch -XX:-TieredCompilation -XX:CompileOnly=TestByteFloatVect.test,TestByteFloatVect.test_cp_oppos,TestByteFloatVect.test_ci_unaln -XX:LoopMaxUnroll=7 TestByteFloatVect <NO FAILURE> $ ......build/solaris-sparcv9-debug/jdk/bin/java -XX:-CreateCoredumpOnCrash -XX:-TransmitErrorReport -XX:CompileThreshold=1 -Xbatch -XX:-TieredCompilation -XX:CompileOnly=TestByteFloatVect.test,TestByteFloatVect.test_cp_oppos,TestByteFloatVect.test_ci_unaln -XX:LoopMaxUnroll=8 TestByteFloatVect <FAILURE - SIGBUS crash in compiled code only for solaris-sparc>
25-10-2017

-- Reproduced the reported 8189064 crash issue with latest sources build in solaris-sparc test machine $ /scratch/rvraghav/8189064/build/jdk10-hs/build/solaris-sparcv9-debug/jdk/bin/javac TestByteFloatVect.java $ /scratch/rvraghav/8189064/build/jdk10-hs/build/solaris-sparcv9-debug/jdk/bin/java -XX:CompileThreshold=100 -Xbatch -XX:-TieredCompilation -XX:-OptimizeFill TestByteFloatVect <FAILURE - SIGBUS crash> -- Confirmed the same 8189064 failure do NOT happen with test build after removing only the other 8187601 fix changeset, from the latest sources (http://hg.openjdk.java.net/jdk10/hs/rev/d78db2ebce5e pushed on 05Oct2017) -- Work in progress for correct fix.
23-10-2017

attached hs_err files.
13-10-2017

attached hs_err files
13-10-2017

Attached log and jtr files extracted from this task's artifacts. http://java.se.oracle.com/mdash/jobs/mach5-j-jdk10-hs-nightly-all-2017-10-06-13-20171007-0301-2750/tasks/mach5-j-jdk10-hs-nightly-all-2017-10-06-13-20171007-0301-2750-tier3-comp-jdk_open_test_hotspot_jtreg_hotspot_compiler-solaris-sparcv9-debug-42/results For other files download https://java.se.oracle.com/artifactory/trixie-results/PERSONAL/2017.10/mach5-j-jdk10-hs-nightly-all-2017-10-06-13-20171007-0301-2750/mach5-j-jdk10-hs-nightly-all-2017-10-06-13-20171007-0301-2750-tier3-comp-jdk_open_test_hotspot_jtreg_hotspot_compiler-solaris-sparcv9-debug-42.tar.gz It contains a bunch of core files so it takes a LOT of disk space to extract.
13-10-2017

I suspect JDK-8187601 is causing this since it was pushed Oct 5.
13-10-2017

Tasks shows empty pages only :( If you can, please, attach at least one hs_err file to this bug report.
13-10-2017

ILW = SIGBUS crash; with 7 compiler/codegen/*Vect.java only for Solaris-Sparc; no workaround = HLH = P2
12-10-2017