JDK-8003878 : compiler/7196199 test failed on OS X since 8b54, jdk7u12b01
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: hs24,7u40,8
  • Priority: P1
  • Status: Closed
  • Resolution: Fixed
  • OS: os_x
  • CPU: x86
  • Submitted: 2012-11-21
  • Updated: 2014-03-12
  • Resolved: 2013-01-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 Other
7u40Fixed 8Fixed hs24Fixed
Related Reports
Relates :  
Description
$ jtreg -testjdk:jdk/8/latest/binaries/macosx-x64 -va -vmoptions:'-server' compiler/7196199
--------------------------------------------------
TEST: compiler/7196199/Test7196199.java
JDK under test: (jdk/8/latest/binaries/macosx-x64)
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b65)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b09, mixed mode)

ACTION: build -- Passed. All files up to date
REASON: Named class compiled on demand
TIME:   0.0050 seconds
messages:
command: build Test7196199
reason: Named class compiled on demand
elapsed time (seconds): 0.0050

ACTION: main -- Failed. Execution failed: Execution failed
REASON: User specified action: run main/othervm/timeout=400 -Xmx32m -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:-TieredCompilation -XX:CompileCommand=exclude,Test7196199.test -XX:+SafepointALot -XX:GuaranteedSafepointInterval=100 Test7196199 
TIME:   6.854 seconds
messages:
command: main -Xmx32m -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:-TieredCompilation -XX:CompileCommand=exclude,Test7196199.test -XX:+SafepointALot -XX:GuaranteedSafepointInterval=100 Test7196199
reason: User specified action: run main/othervm/timeout=400 -Xmx32m -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:-TieredCompilation -XX:CompileCommand=exclude,Test7196199.test -XX:+SafepointALot -XX:GuaranteedSafepointInterval=100 Test7196199 
elapsed time (seconds): 6.854
STDOUT:
CompilerOracle: exclude Test7196199.test
Warmup
Verification
  test_incrc
### Excluding compile: static Test7196199::test
  test_incrv
  test_addc
  test_addv
STDERR:
test_addv: [8] = 154610.0 != 230000.0
test_addv: [9] = 164610.0 != 240000.0
test_addv: [10] = 174610.0 != 250000.0
test_addv: [11] = 184610.0 != 260000.0
test_addv: [16] = 234610.0 != 310000.0
test_addv: [17] = 244610.0 != 320000.0
test_addv: [18] = 254610.0 != 330000.0
test_addv: [19] = 264610.0 != 340000.0
test_addv: [24] = 314610.0 != 390000.0
test_addv: [25] = 324610.0 != 400000.0
test_addv: [26] = 334610.0 != 410000.0
test_addv: [27] = 344610.0 != 420000.0
test_addv: [32] = 394610.0 != 470000.0
test_addv: [33] = 404610.0 != 480000.0
test_addv: [34] = 414610.0 != 490000.0
test_addv: [35] = 424610.0 != 500000.0
test_addv: [40] = 474610.0 != 550000.0
test_addv: [41] = 484610.0 != 560000.0
test_addv: [42] = 494610.0 != 570000.0
test_addv: [43] = 504610.0 != 580000.0
test_addv: [48] = 554610.0 != 630000.0
test_addv: [49] = 564610.0 != 640000.0
test_addv: [50] = 574610.0 != 650000.0
test_addv: [51] = 584610.0 != 660000.0
test_addv: [56] = 634610.0 != 710000.0
test_addv: [57] = 644610.0 != 720000.0
test_addv: [58] = 654610.0 != 730000.0
test_addv: [59] = 664610.0 != 740000.0
test_addv: [64] = 714610.0 != 790000.0
test_addv: [65] = 724610.0 != 800000.0
test_addv: [66] = 734610.0 != 810000.0
test_addv: [67] = 744610.0 != 820000.0
test_addv: [72] = 794610.0 != 870000.0
test_addv: [73] = 804610.0 != 880000.0
test_addv: [74] = 814610.0 != 890000.0
test_addv: [75] = 824610.0 != 900000.0
test_addv: [80] = 874610.0 != 950000.0
test_addv: [81] = 884610.0 != 960000.0
test_addv: [82] = 894610.0 != 970000.0
test_addv: [83] = 904610.0 != 980000.0
test_addv: [88] = 954610.0 != 1030000.0
test_addv: [89] = 964610.0 != 1040000.0
test_addv: [90] = 974610.0 != 1050000.0
test_addv: [91] = 984610.0 != 1060000.0
FAILED: 44 errors

TEST RESULT: Failed. Execution failed: Execution failed
--------------------------------------------------
Test results: failed: 1

Comments
I still think there is something wrong in OSX with restoring upper 128 bit of YMM registers when returning from signal handler. You are correct about calling sequence during polling page signal processing. Here is calling stack: #0 SharedRuntime::get_poll_stub (pc=0x1058fff8e "?\005l ?H??\020'") at sharedRuntime.cpp:533 #1 0x0000000100b1c0dc in JVM_handle_bsd_signal (sig=10, info=0x102300280, ucVoid=0x1023002e8, abort_if_unrecognized=1) at os_bsd_x86.cpp:505 #2 0x0000000100b13f28 in signalHandler (sig=10, info=0x102300280, uc=0x1023002e8) at os_bsd.cpp:2873 #3 <signal handler called> #4 0x00000001058fff8e in ?? () The frame #4 is compiled frame with vectorized loop and the address is safepoint at the end of outer loop: 0ca B12: # B3 B13 <- B11 B10 B3 Freq: 402780 0ca addq RCX, #1 # long 0ce testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC # Test7196199::test_incrc @ bci:36 L[0]=RSI L[1]=RCX L[2]=_ L[3]=_ # OopMap{rsi=Oop off=206} 0d4 cmpq RCX, #10000 0db jl B3 P=1.000000 C=116.000000 Here is java method which is tested: static void test_incrc(float[] a0) { // Non-counted loop with safepoint. for (long l = 0; l < SFP_ITERS; l++) { // Counted and vectorized loop. for (int i = 0; i < a0.length; i+=1) { a0[i] += VALUE; } } } Note, I found that saved ucontext looks correct. Unfortunately neither installed latest gdb (GNU gdb 6.3.50-20050815 (Apple version gdb-1822) (Sun Aug 5 03:00:42 UTC 2012) or include files (from latest Xcode 4.5.2) used for VM compilation does not show upper 128 bit of YMM registers. But include file /usr/include/mach/i386/_structs.h has ucontext structure __darwin_x86_avx_state64 which shows where ymm's upper bits are saved. It looks like OSX uses it because I see saved upper bits in corresponding locations. (How VM can use __darwin_x86_avx_state64? I did not find any llvm's flags: -mavx is not recognized and -mtune=native foes not help). I did not ignore SIGBUS so during execution I hit it in the compiled code: (gdb) b JVM_handle_bsd_signal (gdb) run -XX:+PrintOptoAssembly -XX:CICompilerCount=1 -XX:-BlockLayoutRotateLoops -XX:-BlockLayoutByFrequency -XX:+PrintCompilation -Xmx32m -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:-TieredCompilation -XX:CompileCommand=exclude,Test7196199.test -XX:+SafepointALot -XX:GuaranteedSafepointInterval=100 Test7196199 078 B7: # B7 B8 <- B6 B7 Loop: B7-B7 inner pre of N225 Freq: 797349 078 vaddss XMM3, XMM0, [RSI + #16 + R9 << #2] 07f movss [RSI + #16 + R9 << #2], XMM3 # float 086 incl R9 # int 089 cmpl R9, RBX 08c jl,s B7 # loop end P=0.500000 C=11264.000000 08c 08e B8: # B10 B9 <- B7 Freq: 398675 08e cmpl R9, R8 091 jge,s B10 P=0.000001 C=-1.000000 091 093 B9: # B9 B10 <- B8 B9 Loop: B9-B9 inner main of N90 Freq: 3.91112e+07 093 vmovdqu XMM2,[RSI + #16 + R9 << #2] ! load vector (32 bytes) 09a vaddps XMM2,XMM2,XMM1 ! add packed8F 09e vmovdqu [RSI + #16 + R9 << #2],XMM2 ! store vector (32 bytes) 0a5 addl R9, #8 # int 0a9 cmpl R9, R8 0ac jl,s B9 # loop end P=0.989807 C=11264.000000 0ac 0ae B10: # B12 B11 <- B8 B9 Freq: 398675 0ae cmpl R9, R11 0b1 jge,s B12 P=0.500000 C=-1.000000 nop # 1 bytes pad for loops and calls 0b4 B11: # B11 B12 <- B10 B11 Loop: B11-B11 inner post of N225 Freq: 398675 0b4 vaddss XMM2, XMM0, [RSI + #16 + R9 << #2] 0bb movss [RSI + #16 + R9 << #2], XMM2 # float 0c2 incl R9 # int 0c5 cmpl R9, R11 0c8 jl,s B11 # loop end P=0.500000 C=11264.000000 0c8 0ca B12: # B3 B13 <- B11 B10 B3 Freq: 402780 0ca addq RCX, #1 # long 0ce testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC # Test7196199::test_incrc @ bci:36 L[0]=RSI L[1]=RCX L[2]=_ L[3]=_ # OopMap{rsi=Oop off=206} 0d4 cmpq RCX, #10000 0db jl B3 P=1.000000 C=116.000000 0db Program received signal SIGBUS, Bus error. [Switching to process 57796 thread 0x1203] 0x00000001050fff8e in ?? () 0x01050fff8e corresponds to SFP instruction at 0ce. (gdb) bt #0 0x00000001050fff8e in ?? () #1 0x0000000105006414 in ?? () #2 0x0000000105000681 in ?? () #3 0x00000001007b6e14 in JavaCalls::call_helper (result=0x102300c58, ... (gdb) info all-reg rax 0x104c00ba0 4374662048 rbx 0x4 4 rcx 0x10fa 4346 rdx 0x105006298 4378878616 rsi 0x7ff615cc0 34349341888 rdi 0x0 0 rbp 0x102300420 0x102300420 rsp 0x1023003a0 0x1023003a0 r8 0x5a 90 r9 0x61 97 r10 0x60 96 r11 0x61 97 r12 0x0 0 r13 0x1023003d8 4331668440 r14 0x102300468 4331668584 r15 0x104000800 4362078208 rip 0x1050fff8e 0x1050fff8e eflags 0x10206 66054 ... xmm0 { v4_float = {0, 0, 0, 15}, v2_double = {0, 5.4241445145035159e-315}, v16_int8 = {0 <repeats 12 times>, 65, 112, 0, 0}, v8_int16 = {0, 0, 0, 0, 0, 0, 16752, 0}, v4_int32 = {0, 0, 0, 1097859072}, v2_int64 = {0, 1097859072}, uint128 = 123424475185152 } (raw 0x00007041000000000000000000000000) xmm1 { v4_float = {15, 15, 15, 15}, v2_double = {16777220.08984375, 16777220.08984375}, v16_int8 = {65, 112, 0, 0, 65, 112, 0, 0, 65, 112, 0, 0, 65, 112, 0, 0}, v8_int16 = {16752, 0, 16752, 0, 16752, 0, 16752, 0}, v4_int32 = {1097859072, 1097859072, 1097859072, 1097859072}, v2_int64 = {4715268810954768384, 4715268810954768384}, uint128 = 0x41700000417000004170000041700000 } (raw 0x00007041000070410000704100007041) xmm2 { v4_float = {0, 0, 0, 13715190}, v2_double = {0, 6.2431013556031348e-315}, v16_int8 = {0 <repeats 12 times>, 75, 81, 70, -10}, v8_int16 = {0, 0, 0, 0, 0, 0, 19281, 18166}, v4_int32 = {0, 0, 0, 1263617782}, v2_int64 = {0, 1263617782}, uint128 = 17745960764264415232 } (raw 0xf646514b000000000000000000000000) xmm3 { v4_float = {0, 0, 0, 13715190}, v2_double = {0, 6.2431013556031348e-315}, v16_int8 = {0 <repeats 12 times>, 75, 81, 70, -10}, v8_int16 = {0, 0, 0, 0, 0, 0, 19281, 18166}, v4_int32 = {0, 0, 0, 1263617782}, v2_int64 = {0, 1263617782}, uint128 = 17745960764264415232 } (raw 0xf646514b000000000000000000000000) As you can see xmm1 has correct repeated value 15.f which is 0x41700000 in hex. Unfortunately, as I said, gdb does not show whole YMMs. Continue execution and hit breakpoint: (gdb) signal SIGBUS Continuing with signal SIGBUS. Breakpoint 1, JVM_handle_bsd_signal (sig=10, info=0x102300280, ucVoid=0x1023002e8, abort_if_unrecognized=1) at os_bsd_x86.cpp:400 400 ucontext_t* uc = (ucontext_t*) ucVoid; (gdb) next Current language: auto; currently c++ 402 Thread* t = ThreadLocalStorage::get_thread_slow(); (gdb) bt #0 JVM_handle_bsd_signal (sig=10, info=0x102300280, ucVoid=0x1023002e8, abort_if_unrecognized=1) at os_bsd_x86.cpp:402 #1 0x0000000100b13ee8 in signalHandler (sig=10, info=0x102300280, uc=0x1023002e8) at os_bsd.cpp:2873 #2 <signal handler called> #3 0x00000001050fff8e in ?? () #4 0x0000000105006414 in ?? () #5 0x0000000105000681 in ?? () #6 0x00000001007b6e14 in JavaCalls::call_helper (result=0x102300c58, m=0x1023009f8, args=0x1023008c8, __the_thread__=0x104000800) at javaCalls.cpp:402 (gdb) print *info $1 = { si_signo = 10, si_errno = 0, si_code = 2, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x1020f2000, si_value = { sival_int = 0, sival_ptr = 0x0 }, si_band = 0, __pad = {4331668384, 0, 0, 0, 0, 0, 0} } (gdb) print *uc $2 = { uc_onstack = 0, uc_sigmask = 4, uc_stack = { ss_sp = 0x1022ffe58, ss_size = 0, ss_flags = 0 }, uc_link = 0x0, uc_mcsize = 1032, uc_mcontext = 0x1022ffe78, __mcontext_data = { ... (gdb) ptype ucontext_t type = struct __darwin_ucontext { int uc_onstack; __darwin_sigset_t uc_sigmask; __darwin_sigaltstack uc_stack; __darwin_ucontext *uc_link; __darwin_size_t uc_mcsize; __darwin_mcontext64 *uc_mcontext; __darwin_mcontext64 __mcontext_data; } I have no clue what is __mcontext_data but we use uc_mcontext for pc search (uc_mcontext->__ss.rip). (gdb) print *uc->uc_mcontext $3 = { __es = { __trapno = 14, __cpu = 0, __err = 4, __faultvaddr = 4329512960 }, __ss = { __rax = 4374662048, __rbx = 4, __rcx = 4346, __rdx = 4378878616, __rdi = 0, __rsi = 34349341888, __rbp = 4331668512, __rsp = 4331668384, __r8 = 90, __r9 = 97, __r10 = 96, __r11 = 97, __r12 = 0, __r13 = 4331668440, __r14 = 4331668584, __r15 = 4362078208, __rip = 4379901838, __rflags = 66054, __cs = 43, __fs = 0, __gs = 0 }, __fs = { ... __fpu_xmm0 = { __xmm_reg = "\000\000pA", '\0' <repeats 11 times> }, __fpu_xmm1 = { __xmm_reg = "\000\000pA\000\000pA\000\000pA\000\000pA" }, __fpu_xmm2 = { __xmm_reg = "?FQK", '\0' <repeats 11 times> }, __fpu_xmm3 = { __xmm_reg = "?FQK", '\0' <repeats 11 times> }, (gdb) print &uc->uc_mcontext.__fs.__fpu_xmm0 $8 = (__darwin_xmm_reg *) 0x1022fffd8 (gdb) x/8xg 0x1022fffd8 0x1022fffd8: 0x0000000041700000 0x0000000000000000 0x1022fffe8: 0x4170000041700000 0x4170000041700000 0x1022ffff8: 0x000000004b5146f6 0x0000000000000000 0x102300008: 0x000000004b5146f6 0x0000000000000000 So it looks like saved low 128 bits matching xmm contents during safepoint (see above). To see saved hi 128 bits we use __darwin_x86_avx_state64 knowledge: _STRUCT_XMM_REG __fpu_xmm15; /* XMM 15 */ char __fpu_rsrv4[6*16]; /* reserved */ int __fpu_reserved1; char __avx_reserved1[64]; _STRUCT_XMM_REG __fpu_ymmh0; /* YMMH 0 */ _STRUCT_XMM_REG __fpu_ymmh1; /* YMMH 1 */ _STRUCT_XMM_REG __fpu_ymmh2; /* YMMH 2 */ _STRUCT_XMM_REG __fpu_ymmh3; /* YMMH 3 */ (gdb) print &uc->uc_mcontext.__fs.__fpu_reserved1 $6 = (int *) 0x102300138 (gdb) x/8xg 0x102300138 + 4 + 64 0x10230017c: 0x0000000000000000 0x0000000000000000 0x10230018c: 0x4170000041700000 0x4170000041700000 0x10230019c: 0x0000000000000000 0x0000000000000000 0x1023001ac: 0x0000000000000000 0x0000000000000000 Which is what we expect. YMM0,YMM2,YMM3 have 0s there and YMM1 has 15.f (0x41700000). Next thing I did is short cut signal processing in JVM_handle_bsd_signal. Instead of executing safepoint blob I set next instruction after safepoint and return from signal handle: if ((sig == SIGSEGV || sig == SIGBUS) && os::is_poll_address((address)info->si_addr)) { stub = SharedRuntime::get_poll_stub(pc); + if (stub != NULL & UseNewCode) { + uc->context_pc += 6; + return true; + } So the OS should restore YMM on return and continue execution of compiled frame. But I still see failure with it. I am thinking to disable using whole YMM register on OS X (set default value MaxVectorSize=16) as short term solution because I don't understand what is wrong. And we are almost at the point of final fixes for jdk7 update and jdk8 before code freeze.
25-01-2013

test failed since 7u12b01 (passed with jdk7u11b03); since jdk8b54 (passed with jdk8b53)
23-11-2012

I modified RegisterSaver::save_live_registers and RegisterSaver::restore_live_registers in sharedRuntime_x86_64.cpp to preserve upper 128 bits in YMM registers always when UseAVX > 0. The test still fails on OS X.
22-11-2012

It could be OS X problem since the test passed on Solaris and Linux. The failure means that upper 128 bits in YMM registers are zeroed(not preserved) during signal processing (safepoint in compiled code). My fix for 7196199 saved/restored upper 128 bits in our safepoint handle blob and it solved problem for Solaris and Linux.
21-11-2012