JDK-8258396 : SIGILL in jdk.jfr.internal.PlatformRecorder.rotateDisk()
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 11.0.8
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2020-12-15
  • Updated: 2021-07-27
  • Resolved: 2020-12-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 15 JDK 16 JDK 17 Other
11.0.11-oracleFixed 13.0.6Fixed 15.0.4Fixed 16Fixed 17 b03Fixed openjdk8u292Fixed
Description
We are seeing intermittent crashes at customer site when JFR is rotating chunks.

{noformat}
A fatal error has been detected by the Java Runtime Environment:
SIGILL (0x4) at pc=0x00007fa665cd4e5e, pid=1, tid=376
JRE version: OpenJDK Runtime Environment Zulu11.41+23-CA (11.0.8+10) (build 11.0.8+10-LTS)
Java VM: OpenJDK 64-Bit Server VM Zulu11.41+23-CA (11.0.8+10-LTS, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
Problematic frame:
V  [libjvm.so+0x8c9e5e]
Core dump will be written. Default location: //core
An error report file with more information is saved as:
/tmp/hs_err_pid1.log
{noformat}

Thanks to @evergizova the culprit was identified to be an erroneous memcpy in JfrStorage::flush_regular() or JfrStorage::flush_large() in combination with musl libc which inserts special traps for cases when memcpy src and dst regions overlap (https://git.2f30.org/fortify-headers/file/include/string.h.html#l39).

The problem boils down to the fact that for a non-empty buffer the  JfrStorage::flush_regular_buffer() will
reset cur.pos() to the start offset while cur_pos will stay at the
start offset + N. 
Then memcpy(cur.pos(), cur_pos, used) will have the
src and dest regions overlapping (given that used > N) and on Alpine
linux (musl libc) SIGILL will be raised.
Comments
Fix request (13u) Requesting backport to 13u, the issue is present there too. The patch applies cleanly. Tested with tier1 and jdk/jfr tests.
20-01-2021

Fix request for JDK 16 retroactively approved.
18-01-2021

[16] Fix Request Please, consider this fix for backporting - it prevents SIGILL crash on musl libc based systems when using JFR. The fix is trivial - changing memcpy to memmove to account for possibly overlapping memory regions (when 'resetting' local buffer the data is shifted from pos N to pos 0). The fix applies cleanly.
18-01-2021

Changeset: e85892bf Author: Jaroslav Bachorik <jbachorik@openjdk.org> Date: 2021-01-15 15:12:03 +0000 URL: https://git.openjdk.java.net/jdk/commit/e85892bf
16-01-2021

Only P1 and P2 bugs with approval can be fixed in RDP2: http://openjdk.java.net/jeps/3#rdp-2 You need to change priority to P2 and add label and comment for JDK 16 fix request similar what is done for 15u. I will approve after that.
16-01-2021

[~jbachorik] and [~mgronlun] - This is a P3 bug and has been integrated after RDP2 which has limited rules for integration. See https://openjdk.java.net/jeps/3#Fix-Request-Process [~kvn] should be able to help with figuring out how to get retroactive approval for a P3 fix.
15-01-2021

Hi Jaroslav, no, it's all good I think. The PR has the "ready" label. So, just comment "/integrate" on the PR and you're done.
14-01-2021

Hi Christoph, I have created a backport for JDK16 - https://github.com/openjdk/jdk16/pull/111 and changed the fix version. I will need a proper approval on that PR. It is disallowed to push directly to openjd/jdk repo and the backport must go through the PR. What about 15u? Would the 15u maintainer mind approving that backport as well?
14-01-2021

Jaroslav, you can still push this to jdk16. That's allowed as per RDP rules (https://openjdk.java.net/jeps/3). JDK 16 is in RDP phase one, so P3 bugfixes can still be done. How to do is described here (Skara backport process): https://wiki.openjdk.java.net/display/SKARA/Backports#Backports-CLI - you will need to do the manual commands though as git backport and the backporting by comment in git don't work yet. But you should be able to do it without involving any other OpenJDK committer as it should be a clean backport. And please change the version of JDK-8259607 from 16.0.1 to 16 so the skara update bot can pick it up :)
12-01-2021

Chris, no problem there. Should this be labelled with 'critical' request so it can get to 16.0? Or is 16.0 already considered to be 16u?
12-01-2021

Can consideration be given to also porting this issue to 16u. ( the fixVersion indicates that it missed the cut-off? )
08-01-2021

Ah, I see. I got confused by the rampdown critical request process. Since this issue was spotted quite late in the dev cycle of the current updates and it is leading to a reliable crash under given conditions I thought it would be better to get it in now than to wait another 3 months. I will adjust the labels. Actually, it is already in JDK 16 as the push was done before cut-off (it seems)
06-01-2021

Is this a regression in 8u282? It doesn't immediately seem a candidate for a critical fix and perhaps should be jdk8u-fix-request instead?
22-12-2020

[~jbachorik], I think you mean jdk8u-fix-request as well as jdk15u-fix-request and not -critical-request. The naming of these labels is a bit confusing as critical doesn't stand for the criticality of the issue here but rather for whether it should still be included in the rampdown releases (e.g. january updates this time). On the other hand, I guess it would be nice if you could backport this fix to JDK16.
21-12-2020

[15u critical] Fix Request Please, consider this fix for backporting - it prevents SIGILL crash on musl libc based systems when using JFR. The fix is trivial - changing memcpy to memmove to account for possibly overlapping memory regions (when 'resetting' local buffer the data is shifted from pos N to pos 0). The fix applies cleanly.
21-12-2020

Approving for 11.0.11 (Push to jdk11u-dev repo) as we're already in rampdown for 11.0.10. Changed flag to jdk11u-fix-request accordingly.
21-12-2020

[8u critical] Fix Request Please, consider this fix for backporting - it prevents SIGILL crash on musl libc based systems when using JFR. The fix is trivial - changing memcpy to memmove to account for possibly overlapping memory regions (when 'resetting' local buffer the data is shifted from pos N to pos 0). The fix applies cleanly.
21-12-2020

[11u critical] Fix Request Please, consider this fix for backporting - it prevents SIGILL crash on musl libc based systems when using JFR. The fix is trivial - changing memcpy to memmove to account for possibly overlapping memory regions (when 'resetting' local buffer the data is shifted from pos N to pos 0). The fix applies cleanly.
21-12-2020

Changeset: a06cea50 Author: Jaroslav Bachorik <jbachorik@openjdk.org> Date: 2020-12-21 11:43:13 +0000 URL: https://git.openjdk.java.net/jdk/commit/a06cea50
21-12-2020