JDK-8319708 : Assertion 'fsetenv didn't work' in jdk tier4 tests after 8295159 on Linux aarch64 RHEL9.3
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 22
  • Priority: P3
  • Status: Closed
  • Resolution: Cannot Reproduce
  • OS: linux
  • CPU: aarch64
  • Submitted: 2023-11-08
  • Updated: 2023-11-20
  • Resolved: 2023-11-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Relates :  
Relates :  
Description
In the jdk tier4 tests we run into the following assertion :

#  Internal Error (/openjdk/linuxaarch64/jdk-dev/src/hotspot/os/linux/os_linux.cpp:1850), pid=1794541, tid=1794637
#  assert(IEEE_subnormal_handling_OK()) failed: fsetenv didn't work

This assertion shows up on RHEL 9.3 Linux aarch64 after the change 8295159 added the IEEE check.
On SLES15 or Ubuntu 22.04 Linux aarch systems the issue is not seen.

The errors all occur in the execution of the javax/sound  jtreg tests .
Maybe there (or shortly before in tier4)  a lib is loaded (outside of hotspot os::dll_load)  that messes up the IEEE conformance?


Comments
> Seems the update of libgcc_s-11 by libgcc-11.4.1-2.1.el9.aarch64 fixed the issue, after the update no assertions were seen any more. Aha! I can't see any relevant bug fixes in that particular version of libgcc, but I don't know what older version of libgcc was installed before it. Do you? If this bug was caused by the test program being started in a broken floating-point environment, then the fix for https://bugs.openjdk.org/browse/JDK-8319973 will probably fix it. I say "probably" because I don't know of all the myriad ways in which the starting environment might be broken. The fix for 8319973 sets the rounding modes correctly.
20-11-2023

Seems the update of libgcc_s-11 by libgcc-11.4.1-2.1.el9.aarch64 fixed the issue, after the update no assertions were seen any more. Unfortunately the sharedlib output and the dll events in the hs_err file are currently somewhat limited and do not show the real reason why/by which lib exactly the fp-env was manipulated, there is probably room for improvement.
20-11-2023

I do not find the ones listed above. However libm and libgcc_s-11 show up in the hs_err file and those have the pattern you grepped above. There is even one interesting point, the libgcc_s-11 seems to be updated now (compared to the some days old hs_err), I have to check when and why exactly. I think the libgcc_s-11 is some local stuff .
15-11-2023

I just did this: find /lib -type f -follow -name '*.so.*' | while read i; do echo -n "$i "; objdump -D $i | grep -c 'mrs.*fpcr'; done | tee libraries.txt grep -v '0$' libraries.txt and got lib/libavutil.so.56 4 /lib/libavutil.so.56.70.100 4 /lib/libavutil.so.58.2.100 4 /lib/libavutil.so.58 4 /lib/libblkid.so.1.1.0 5 ... It'd be interesting to compare that with the libs in the hs_err file.
15-11-2023

Try this: objdump -D /lib/libc.so.6 | grep 'mrs.*fpcr' Some libraries like libc and libm do this, but they are careful to preserve the result.
15-11-2023

Hi [~aph] your patch did not cause any issues in our build/test CI . (as stated before , unfortunately the assertion was gone already before, after the build moves)
15-11-2023

> We'll reopen if it ever is reproduced. Yes that's fine, thanks for looking into it. Btw., would there be an easy way to go through the libs in the hs_err file caused by the asserts some days ago when we saw it, and identify lib(s) changing the fenv ? For example with some objdump or strings call on the lib to get a hint which one it might be? The fact that we always saw it in the javax/sound jtreg tests points a bit into the direction of those libs or maybe the dependencies of those libs.
15-11-2023

We'll reopen if it ever is reproduced.
15-11-2023

Ha! Well, it did reveal a bug, so it's all good. Hopefully we'll never hear from this one again.
14-11-2023

Thanks Andrew, I added the PR to our build/test queue. Unfortunately our build system was moved a few days ago and now the mentioned error does not show up any more - very strange ...
14-11-2023

Here you are: https://github.com/openjdk/jdk/pull/16637
13-11-2023

> How about I write a patch to do that, and you test it in your infra? Thanks [~aph] , sounds like a plan. Best / easiest for testing for me would be a PR link . Btw. I would still prefer a bit of additional tracing/logging when the IEEE check fails; just the asserts will not add much transparency in product-binary case. I created JDK-8319927: Add some logging after 8295159 for this.
10-11-2023

I just ran tier4 on aarch64/RHEL 9.3, and all was well. So it must be something in your environment. Here's a thought: x86 sets the floating-point control register in the call stub, but aarch64 does not. I think aarch64 should, because today if aarch64 is called directly from the invocation interface with FP in a non-default mode we'll not be spec-correct. How about I write a patch to do that, and you test it in your infra?
10-11-2023

Yes, we observed this when running tests with fastdebug.
10-11-2023

fastdebug, yeah?
09-11-2023

OK, I'll run the tier4 too.
09-11-2023

Hi [~aph] , some tests that trigger the assert : javax/sound/sampled/DataLine/DataLine_ArrayIndexOutOfBounds.java javax/sound/sampled/Lines/16and32KHz/Has16and32KHz.java javax/sound/sampled/Lines/SourceDataLineDefaultBufferSizeCrash.java Please be aware that we see the issue only when the whole jdk tier4 is executed in our central test landscape. Running the single test on RHEL 9.3 aarch64 might not show the issue. I'll add a IEEE check call just *before* fegetenv to see if the env is already messed up (for example by a problematic lib). In such a case, the current fegetenv/fesetenv will not help us much (we might instead do fegetenv at the beginning of the JVM run and store this environment for later reset).
09-11-2023

Aha! I think we may have detected a real bug that breaks Java. Excellent that we've found it. It's probably best if I take this one. I'll need at least the name of the failing test.
09-11-2023