JDK-8289477 : Memory corruption with CPU_ALLOC, CPU_FREE on muslc
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11,17,18,19,20
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • Submitted: 2022-06-29
  • Updated: 2022-07-27
  • Resolved: 2022-06-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 17 JDK 19 JDK 20
11.0.17-oracleFixed 17.0.5-oracleFixed 19Fixed 20 b05Fixed
Related Reports
Relates :  
Description
On Alpine, I see:

```
 stdout: 
[[0.001s][trace][os] active_processor_count: using dynamic path (forced) - configured processors: 16
[0.001s][trace][os] active_processor_count: sched_getaffinity processor count: 16
NMT Block at 0x00007fc5a35db9f0, corruption at: 0x00007fc5a35db9f0: 
0x00007fc5a35db970:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35db980:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35db990:   f8 a1 c8 1b ea 55 00 00 00 00 00 00 00 c0 00 00
0x00007fc5a35db9a0:   00 a4 c8 1b ea 55 00 00 01 00 00 00 00 c0 00 00
0x00007fc5a35db9b0:   d8 a3 c8 1b ea 55 00 00 1d 00 00 00 00 a0 00 00
0x00007fc5a35db9c0:   2d 63 70 00 00 00 00 00 08 00 00 00 00 61 01 00
0x00007fc5a35db9d0:   2d 76 65 72 73 69 6f 6e 00 00 00 00 00 82 02 00
0x00007fc5a35db9e0:   2d 73 65 72 76 65 72 00 00 00 00 00 00 83 03 00
0x00007fc5a35db9f0:   2d 63 6c 69 65 6e 74 00 00 00 00 00 00 84 04 00
0x00007fc5a35dba00:   ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba10:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba20:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba30:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba40:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba50:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00007fc5a35dba60:   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/mallocTracker.cpp:151
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/ubuntu/client_home/workspace/build-user-branch-linux_alpine_x86_64/SapMachine/src/hotspot/share/services/mallocTracker.cpp:151), pid=219496, tid=219512
#  fatal error: NMT corruption: Block at 0x00007fc5a35db9f0: header canary broken
#
# JRE version:  (20.0) (fastdebug build )
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 20-snapshotbeta-2022-06-29, mixed mode, sharing, tiered, unknown gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/ubuntu/client_home/workspace/build-user-branch-linux_alpine_x86_64/test_report_hotspot/JTwork/scratch/8/core.219496)
#
```

Reason:

In `os::Linux::active_processor_count()`, we use the CPU_xxx macros to manage sets of CPU information.

muslc defines those macros to call `calloc(3)` and `free(3)`:

```
#define CPU_ALLOC(n) ((cpu_set_t *)calloc(1,CPU_ALLOC_SIZE(n)))
#define CPU_FREE(set) free(set)
```

whereas glibc uses intermediate functions:

```
#define __CPU_ALLOC(count) __sched_cpualloc (count)
#define __CPU_FREE(cpuset) __sched_cpufree (cpuset)
```

which in the end also takes from C-heap, but those calls are not inlined.

So, on muslc we call `calloc()` and `free()`. Call happens inside the `os::Linux` namespace, `free()` resolves to `os::free()`. We have no wrapper in os for calloc though, so `calloc()` calls into muslc right away.

That means we have raw ::malloc() -> os::free(), which is unbalanced. Raw `::malloc()` does not write the header `os::free()` expects. If NMT is on, we assert now, because NMT does not find its header in os::free(). 

This can be very easily reproduced by starting an Alpine VM with NMT on (or, a debug VM) and ` -XX:+UnlockDiagnosticVMOptions -XX:+UseCpuAllocPath`.

The position of the musl devs is that "calloc" and "free" are reserved words in C, and should not be used [1]. I think they are right. The way we reuse known C- and Posix symbol names in the os namespace has bitten me in the past in similar cases.

[1] https://www.openwall.com/lists/musl/2022/06/29/3
Comments
Changeset: 0526402a Author: Thomas Stuefe <stuefe@openjdk.org> Date: 2022-07-06 10:15:38 +0000 URL: https://git.openjdk.org/jdk/commit/0526402a023d5725bf32ef6587001ad05e28c10f
06-07-2022

Fix Request (11, 17) I'd like to backport this fix since it can lead to memory corruption (up to JDK17) or assert/guarantees (in JDK18 and later) on Alpine if we run with more than 1024 CPUs or with -XX:+UseCpuAllocPath Fix is very minimal and very safe and applies cleanly in JDK11, 17 and 19.
06-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk11u-dev/pull/1201 Date: 2022-07-06 06:37:05 +0000
06-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/531 Date: 2022-07-06 06:36:50 +0000
06-07-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk19/pull/112 Date: 2022-07-06 06:35:59 +0000
06-07-2022

Changeset: da6d1fc0 Author: Thomas Stuefe <stuefe@openjdk.org> Date: 2022-06-30 06:19:25 +0000 URL: https://git.openjdk.org/jdk/commit/da6d1fc0e0aeb1fdb504aced4b0dba0290ec240f
30-06-2022

This will lead to memory corruptions in JDK11 and 17 if NMT is switched on or we run a debug VM. In later JVMs, this will lead to above assert (which is a guarantee, so it also happens in release).
30-06-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/9328 Date: 2022-06-29 16:58:57 +0000
29-06-2022