JDK-8249218 : AArch64: Test7196199.java FAILED: 5 errors
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 15,16
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: linux
  • CPU: aarch64
  • Submitted: 2020-07-11
  • Updated: 2020-12-18
  • Resolved: 2020-12-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 16
16Resolved
Related Reports
Relates :  
Relates :  
Description
The following test failed in the JDK15 CI:

compiler/runtime/Test7196199.java

Here's a snippet from the log file:

#section:main
----------messages:(4/641)----------
command: main -Xmx128m -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+SafepointALot -XX:GuaranteedSafepointInterval=100 -XX:CompileCommand=exclude,compiler.runtime.Test7196199::test compiler.runtime.Test7196199
reason: User specified action: run main/othervm/timeout=400 -Xmx128m -Xbatch -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+SafepointALot -XX:GuaranteedSafepointInterval=100 -XX:CompileCommand=exclude,compiler.runtime.Test7196199::test compiler.runtime.Test7196199 
Mode: othervm [/othervm specified]
elapsed time (seconds): 20.131
----------configuration:(0/0)----------
----------System.out:(7/128)----------
CompileCommand: exclude compiler/runtime/Test7196199.test
Warmup
Verification
  test_incrc
  test_incrv
  test_addc
  test_addv
----------System.err:(6/211)----------
test_addv: [0] = 29985.0 != 150000.0
test_addv: [1] = 39985.0 != 160000.0
test_addv: [94] = 969970.0 != 1090000.0
test_addv: [95] = 979970.0 != 1100000.0
test_addv: [96] = 989970.0 != 1110000.0
FAILED: 5 errors
----------rerun:(47/5612)*----------

The test task description is:

Run test open/test/hotspot/jtreg/:tier1_compiler_3 with linux-aarch64-debug with -XX:+CreateCoredumpOnCrash -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation #tier2-comp

I don't know if any of the above options will
help with reproducibility.
Comments
Duplicate of JDK-8248408.
18-12-2020

The floating point registers aren't restored at signal handler return on our test machines, so any code the changes a FP register during JVM_handle_linux_signal is going to cause problems. -Xlog:safepoint=debug is a perfect example.
10-12-2020

I don't see a big difference between the code generated for the "v" methods vs the "c" methods, except which neon registers are used.
10-12-2020

The bad test_addv elements at the ends don't look like valid intermediate values. They differ by 10000, not "l" (lowercase "L"), the outer loop variable. If these are final values and not intermediate values, then GC corruption of the array at a safepoint seems less likely. One possibility that could explain these values is corruption of the incoming "b" value, which is duplicated into a different vector register for the inner vector operations. Corruption of only the register containing "b" with a value like 0.0 could explain the weird end values, and explain why only the vectorized loops see a problem.
10-12-2020

Since this is aarch64, I guess we can't rule out missing memory barriers that allow the compiled code to see old values if the array was copied during a safepoint.
04-12-2020

The failures so far: test_addv: [0] = 29985.0 != 150000.0 test_addv: [1] = 39985.0 != 160000.0 test_addv: [94] = 969970.0 != 1090000.0 test_addv: [95] = 979970.0 != 1100000.0 test_addv: [96] = 989970.0 != 1110000.0 test_incrv: [0] = 18255.0 != 150000.0 test_incrv: [1] = 18255.0 != 150000.0 test_incrv: [94] = 18240.0 != 150000.0 test_incrv: [95] = 18240.0 != 150000.0 test_incrv: [96] = 18240.0 != 150000.0 test_addv: [0] = 97200.0 != 150000.0 test_addv: [1] = 107200.0 != 160000.0 test_addv: [94] = 1037200.0 != 1090000.0 test_addv: [95] = 1047200.0 != 1100000.0 test_addv: [96] = 1057200.0 != 1110000.0 test_addv: [0] = 74835.0 != 150000.0 test_addv: [1] = 84835.0 != 160000.0 test_addv: [2] = 94835.0 != 170000.0 test_addv: [3] = 104835.0 != 180000.0 test_addv: [96] = 1034820.0 != 1110000.0 There's always a "good" section of exactly size 92 in the middle. This does not look like a bug in the safepoint register save/restore, or a bug in the generated JIT code. To me it looks like something went wrong with a bulk memory copy, which the test isn't doing, so that leaves GC.
04-12-2020

It is interesting that the failures all involve vector elements 0..3 94..96, and those two groups are always off by a constant amount.
04-12-2020

I ran the test 15000+ times with the same jdk-16+20-1058 JDK as the most recent failure, but could not reproduce.
02-12-2020

No luck reproducing so far. Test history shows 1 failure in the last 6873 attempts on this platform.
01-12-2020

Unable to reproduce (on 16-track). Does not seem to be any other instance of this failure over the last two months, except for the test-runs (made by ~rhen) during work on JDK-8212107.
29-09-2020