JDK-8331253 : 16 bits is not enough for nmethod::_skipped_instructions_size field
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 23
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2024-04-29
  • Updated: 2024-05-09
  • Resolved: 2024-05-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 23
23 b22Fixed
Related Reports
Relates :  
Relates :  
Description
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (c:\sb\prod\1714223866\workspace\open\src\hotspot\share\utilities/checkedCast.hpp:41), pid=26580, tid=25932
#  assert(static_cast<T1>(result) == thing) failed: must be

---------------  T H R E A D  ---------------

Current thread (0x000001a4c11efd70):  JavaThread "C2 CompilerThread0" daemon [_thread_in_vm, id=25932, stack(0x000000627c200000,0x000000627c300000) (1024K)]


Current CompileTask:
C2:21418 2591    b        sun.security.util.KnownOIDs::<clinit> (6091 bytes)

Stack: [0x000000627c200000,0x000000627c300000]
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0xc792b1]  os::win32::platform_print_native_stack+0x101  (os_windows_x86.cpp:235)
V  [jvm.dll+0xf3962b]  VMError::report+0x149b  (vmError.cpp:1010)
V  [jvm.dll+0xf3bcce]  VMError::report_and_die+0x80e  (vmError.cpp:1845)
V  [jvm.dll+0xf3c3f4]  VMError::report_and_die+0x64  (vmError.cpp:1610)
V  [jvm.dll+0x53dbcb]  report_vm_error+0x5b  (debug.cpp:193)
V  [jvm.dll+0xc1f5e5]  nmethod::init_defaults+0x135  (nmethod.cpp:1203)
V  [jvm.dll+0xc1ad68]  nmethod::nmethod+0xf8  (nmethod.cpp:1379)
V  [jvm.dll+0xc21bdd]  nmethod::new_nmethod+0x31d  (nmethod.cpp:1121)
V  [jvm.dll+0x42bb07]  ciEnv::register_method+0x537  (ciEnv.cpp:1130)
V  [jvm.dll+0xc86df4]  PhaseOutput::install_code+0x1e4  (output.cpp:3443)
V  [jvm.dll+0xc86b73]  PhaseOutput::install+0x183  (output.cpp:3388)
V  [jvm.dll+0x4c9dfd]  Compile::Code_Gen+0x3bd  (compile.cpp:3038)
V  [jvm.dll+0x4c874b]  Compile::Compile+0x134b  (compile.cpp:897)
V  [jvm.dll+0x3d8019]  C2Compiler::compile_method+0x179  (c2compiler.cpp:145)
V  [jvm.dll+0x4e57ab]  CompileBroker::invoke_compiler_on_method+0x73b  (compileBroker.cpp:2307)
V  [jvm.dll+0x4e2e2b]  CompileBroker::compiler_thread_loop+0x33b  (compileBroker.cpp:1964)
V  [jvm.dll+0x7e4c06]  JavaThread::thread_main_inner+0x266  (javaThread.cpp:761)
V  [jvm.dll+0xe9cb67]  Thread::call_run+0x1b7  (thread.cpp:226)
V  [jvm.dll+0xc778b6]  thread_native_entry+0xd6  (os_windows.cpp:552)
C  [ucrtbase.dll+0x2268a]  (no source info available)
C  [KERNEL32.DLL+0x17ac4]  (no source info available)
C  [ntdll.dll+0x5a4e1]  (no source info available)

Comments
Changeset: 3383ad63 Author: Vladimir Kozlov <kvn@openjdk.org> Date: 2024-05-02 14:41:30 +0000 URL: https://git.openjdk.org/jdk/commit/3383ad6397d5a2d8fb232ffd3e29a54e0b37b686
02-05-2024

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/19029 Date: 2024-05-01 03:31:41 +0000
01-05-2024

I am fine with current header size. This change will not increase it - we have some room.
30-04-2024

I replaced it with size of code (32 bits) in instructions section without stubs at the end of section. You don't need to use Assembler::InlineSkippedInstructionsCounter in GC barriers stubs after that.
30-04-2024

If we don't care about losing a little accuracy, we could change this field to "_skipped_instructions_size_in_words" and increase its useful range by a factor of 8, or whatever scaling factor we choose. As far as I can tell, this value is only ever compared to 3 different values: 0, InlineSmallCode, and InlineSmallCode/4, so if we really wanted to save bits, we could encode that information in 3 bits.
30-04-2024

I recorded the size of stubs in instructions section (barriers and other) and it matched (slightly bigger) skipped_instructions_size. Looks like original ZGC also generates barriers stubs but does not record them in skipped_instructions_size. This shows that there is no double counting and I simply need to change _skipped_instructions_size field back to 32-bits. G1: main code = 3215696 (72.892624%) stubs code = 46522 (1.446716%) skipped insts = 237744 (7.393236%) ZGC: main code = 3237080 (75.567032%) stubs code = 810577 (25.040375%) skipped insts = 44432 (1.372595%) GenZGC: main code = 4034704 (78.238518%) stubs code = 1356703 (33.625839%) skipped insts = 1074611 (26.634197%)
30-04-2024

I checked a simple example on the late G1 barrier expansion prototype and could not find any double-counting issue.
30-04-2024

I am fine with converting _skipped_instructions_size field back to `int` (32-bits) type. My main concern is that it is calculated correctly because it is used in inlining decisions.
30-04-2024

The JEP 475 (late barrier expansion for G1) prototype [1] also suffers from this issue (_skipped_instructions_size overflow) when rebased on top of JDK-8329433. As ZGC, the G1 prototype marks barriers as "inline-skipped", and G1 barriers are large (particularly the post-barrier). Making _skipped_instructions_size int again solves the issue. I also need to investigate whether there is double counting. [1] https://github.com/robcasloz/jdk/tree/g1-late-barrier-expansion
30-04-2024

Generational ZGC uses both store barriers and load barriers, and we count the stubs as well, and they don’t have any sharing; 1 stub per barrier. It sounds a bit fishy though. Looks like total code size did not increase as much as the discrepancy of skipped instructions. If the stubs don’t fit at the end, we have to grow the nmethod. Then we will emit instructions again. I can imagine that messing with the skippped bytes accounting.
30-04-2024

I finally was able reproduce t105.java failure with latest changes: # Internal Error (\\workspace\\open\\src\\hotspot\\share\\code\\nmethod.cpp:1256), pid=29304, tid=13100 # assert(static_cast<int>(_skipped_instructions_size) == (code_buffer->total_skipped_instructions_size())) failed: failed: 3044 != 68580 As I suspected it is _skipped_instructions_size field value 68580 does not fit into 16 bits. I am still investigating why GenZGC reports such big _skipped_instructions_size.
29-04-2024

I am thinking it is some kind of double count because "main code" increased by 500K but skipped intstrs by 900K.
29-04-2024

Is it possible it just double count? I see that `z_*_barrier()` calls `Z*BarrierStubC2::create()`. Are you count instructions in stub too? Also do you generate separate stub for each barrier? No sharing?
29-04-2024

This was tested on linux-x64. In my changes for JDK-8329433 I changed nmethod::_skipped_instructions_size field to uint16 assuming that noop instructions and barriers should not take a lot of instructions. I never expected that Generation ZGC barriers take 25% of code. I am fixing this bug anyway because there are failures without GenZGC. The question is: should I file a bug on GenZGC to investigate why GenZGC barriers in C2 are so huge?
29-04-2024

[~eosterlund] and [~stefank] do you know why Generational ZGC barriers takes so much instructions than normal ZGC? Even in optimized VM. I based this on skipped instructions value which is calculated for ZGC barriers by InlineSkippedInstructionsCounter: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L61 I modified PrintNMethodStatistics output to print skipped instruction size vs code size and got next with optimized VM running Leyden's JavacBench: -XX:+UseZGC -XX:-ZGenerational main code = 3128768 (75.934044%) skipped insts = 42016 (1.342893%) -XX:+UseZGC -XX:+ZGenerational main code = 3624456 (77.640076%) skipped insts = 952072 (26.267998%)
29-04-2024

No luck to reproduce so far. I pushed JDK-8331087 changes which uses macro to report more information about cast failures in nmethod.cpp.
29-04-2024

ILW = Assert because nmethod metadata data does not fit into field (truncation), intermittent, no workaround = HLH = P2
29-04-2024

Happens only on Windows with Generational ZGC and -Xcomp. The suspect is uint16_t _skipped_instructions_size; field. KnownOIDs is huge enum class so may be Generational ZGC generates a lot of barriers code which adds to skipped_instructions_size (): https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L1496 I was not able to trigger this <clinit> compilation locally on linux box to reproduce the failure: Current CompileTask: C2:20679 2591 b sun.security.util.KnownOIDs::<clinit> (6091 bytes)
29-04-2024

Regression after JDK-8329433
29-04-2024