JDK-8344355 : Register corruption in MacroAssembler::lookup_secondary_supers_table_var: x86-64 only
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 24
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86_64
  • Submitted: 2024-11-17
  • Updated: 2024-11-28
  • Resolved: 2024-11-27
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 24
24 b26Fixed
Related Reports
Relates :  
Description
One of my x86_64 machines recently started to fail building openjdk, while my other x86_64 machines do not have this problem, despite running the same OS and build environment - Xubuntu 24.04.1 LTS.

What's special about the failing machine?  It's very old - MacBookAir3,1 from 2010 with 4GB of RAM.

Build log snippet follows:

Compiling up to 4 files for COMPILE_CREATE_SYMBOLS
An exception has occurred in the compiler (24-internal). Please file a bug against the Java compiler via the Java bug reporting page (https://bugreport.java.com) after checking the Bug Database (https://bugs.java.com) for duplicates. Include your program, the following diagnostic, and the parameters passed to the Java compiler in your report. Thank you.
java.lang.BootstrapMethodError: bootstrap method initialization exception
	at java.base/java.lang.invoke.BootstrapMethodInvoker.invoke(BootstrapMethodInvoker.java:187)
	at java.base/java.lang.invoke.CallSite.makeSite(CallSite.java:316)
	at java.base/java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:277)
	at java.base/java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:267)
	at java.base/java.util.Comparator.comparing(Comparator.java:472)
	at jdk.compiler/com.sun.tools.javac.comp.Check.methodsGroupedByName(Check.java:2883)
	at jdk.compiler/com.sun.tools.javac.comp.Check.checkPotentiallyAmbiguousOverloads(Check.java:2755)
	at jdk.compiler/com.sun.tools.javac.comp.Attr.attribClassBody(Attr.java:5600)
	at jdk.compiler/com.sun.tools.javac.comp.Attr.attribClass(Attr.java:5515)
	at jdk.compiler/com.sun.tools.javac.comp.Attr.attribClass(Attr.java:5339)
	at jdk.compiler/com.sun.tools.javac.comp.Attr.attrib(Attr.java:5276)
	at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.attribute(JavaCompiler.java:1355)
	at jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:977)
	at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:319)
	at jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:178)
	at jdk.compiler/com.sun.tools.javac.Main.compile(Main.java:66)
	at jdk.compiler/com.sun.tools.javac.Main.main(Main.java:52)
Caused by: java.lang.ClassCastException: class jdk.internal.classfile.impl.AbstractPoolEntry$StringEntryImpl cannot be cast to class java.lang.classfile.constantpool.LoadableConstantEntry (jdk.internal.classfile.impl.AbstractPoolEntry$StringEntryImpl and java.lang.classfile.constantpool.LoadableConstantEntry are in module java.base of loader 'bootstrap')
	at java.base/jdk.internal.classfile.impl.DirectCodeBuilder.ldc(DirectCodeBuilder.java:1688)
	at java.base/java.lang.classfile.CodeBuilder.ldc(CodeBuilder.java:2085)
	at java.base/java.lang.invoke.InnerClassLambdaMetafactory$4.accept(InnerClassLambdaMetafactory.java:438)
	at java.base/java.lang.invoke.InnerClassLambdaMetafactory$4.accept(InnerClassLambdaMetafactory.java:432)
	at java.base/jdk.internal.classfile.impl.DirectCodeBuilder.build(DirectCodeBuilder.java:92)
	at java.base/jdk.internal.classfile.impl.DirectMethodBuilder.withCode(DirectMethodBuilder.java:118)
	at java.base/jdk.internal.classfile.impl.DirectMethodBuilder.withCode(DirectMethodBuilder.java:125)
	at java.base/jdk.internal.classfile.impl.Util$1WithCodeMethodHandler.accept(Util.java:78)
	at java.base/jdk.internal.classfile.impl.Util$1WithCodeMethodHandler.accept(Util.java:75)
	at java.base/jdk.internal.classfile.impl.DirectMethodBuilder.run(DirectMethodBuilder.java:139)
	at java.base/jdk.internal.classfile.impl.DirectClassBuilder.withMethod(DirectClassBuilder.java:117)
	at java.base/java.lang.classfile.ClassBuilder.withMethod(ClassBuilder.java:258)
	at java.base/java.lang.classfile.ClassBuilder.withMethodBody(ClassBuilder.java:277)
	at java.base/java.lang.invoke.InnerClassLambdaMetafactory.generateSerializationFriendlyMethods(InnerClassLambdaMetafactory.java:431)
	at java.base/java.lang.invoke.InnerClassLambdaMetafactory$1.accept(InnerClassLambdaMetafactory.java:343)
	at java.base/java.lang.invoke.InnerClassLambdaMetafactory$1.accept(InnerClassLambdaMetafactory.java:310)
	at java.base/jdk.internal.classfile.impl.ClassFileImpl.build(ClassFileImpl.java:145)
	at java.base/java.lang.invoke.InnerClassLambdaMetafactory.generateInnerClass(InnerClassLambdaMetafactory.java:310)
	at java.base/java.lang.invoke.InnerClassLambdaMetafactory.spinInnerClass(InnerClassLambdaMetafactory.java:282)
	at java.base/java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite(InnerClassLambdaMetafactory.java:214)
	at java.base/java.lang.invoke.LambdaMetafactory.altMetafactory(LambdaMetafactory.java:546)
	at java.base/java.lang.invoke.BootstrapMethodInvoker.invoke(BootstrapMethodInvoker.java:143)
	... 16 more
printing javac parameters to: /home/martin/ws/jdk/build/linux-x86_64-server-release/support/javatmp/javac.20241116_200517.args
gmake[3]: *** [Gendata.gmk:64: /home/martin/ws/jdk/build/linux-x86_64-server-release/buildtools/create_symbols_javac/_the.COMPILE_CREATE_SYMBOLS_batch] Error 4
gmake[2]: *** [make/Main.gmk:139: jdk.compiler-gendata] Error 2

Comments
Changeset: eb0d1ce9 Branch: master Author: Andrew Haley <aph@openjdk.org> Date: 2024-11-27 10:27:58 +0000 URL: https://git.openjdk.org/jdk/commit/eb0d1ce9487df000b4675901cc0d18f6a1c86348
27-11-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/22365 Date: 2024-11-25 15:27:22 +0000
25-11-2024

Cloud providers are understandably focused on utilization, efficiency and scalability, and that leads to a machine monoculture that is great for production reliability, but not great for testing products designed to be run on all the world's computers. Especially products that contain lots of assembly language.
25-11-2024

@martin for better or worse QA is nearly all in the cloud these days and there is no option to have old hardware. What little old hardware we did have has all expired now. @aph thanks for the quick find and fix.
25-11-2024

Well, you definitely won the Gold Star Award this time. Thank you.
22-11-2024

> One interesting/worrying thing is that we don't do any testing, as far as I can see, on pre-SSE4.2 machines. Agreed. I'm currently just a 1% openjdk hobbyist, not doing continuous integration QA. If I were running the QA department, I would only reluctantly decommission old test machines that might find exotic bugs.
22-11-2024

I tested the patch below on my MacBookAir3,1. Success! diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp index 08e089455f2..10cb1059c99 100644 --- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp @@ -5130,7 +5130,7 @@ void MacroAssembler::lookup_secondary_supers_table_var(Register r_sub_klass, jccb(Assembler::equal, L_success); // Restore slot to its true value - xorl(slot, (u1)(Klass::SECONDARY_SUPERS_TABLE_SIZE - 1)); // slot ^ 63 === 63 - slot (mod 64) + movb(slot, Address(r_super_klass, Klass::hash_slot_offset())); // Linear probe. Rotate the bitmap so that the next bit to test is // in Bit 1.
22-11-2024

One interesting/worrying thing is that we don't do any testing, as far as I can see, on pre-SSE4.2 machines.
22-11-2024

Found it! It's good-old-fashioned register corruption. Try this: diff --git a/src/hotspot/cpu/x86/macroAssembler_x86.cpp b/src/hotspot/cpu/x86/macroAssembler_x86.cpp index 08e089455f2..79e3ade1ba4 100644 --- a/src/hotspot/cpu/x86/macroAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/macroAssembler_x86.cpp @@ -5130,7 +5143,7 @@ void MacroAssembler::lookup_secondary_supers_table_var(Register r_sub_klass, jccb(Assembler::equal, L_success); // Restore slot to its true value - xorl(slot, (u1)(Klass::SECONDARY_SUPERS_TABLE_SIZE - 1)); // slot ^ 63 === 63 - slot (mod 64) + movb(slot, Address(r_super_klass, Klass::hash_slot_offset())); // Linear probe. Rotate the bitmap so that the next bit to test is // in Bit 1.
22-11-2024

Yes, the failing machine is my only x86 machine for which grep -wc popcnt /proc/cpuinfo reports 0 ``` vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz stepping : 10 ```
22-11-2024

I have reproduced the bug.
22-11-2024

@aph could you take a look please. I'm on it. I've a hunch that it has to do with running on a pre-SSE4.2 machine. @martin, please try cat /proc/cpuinfo | grep popcnt
22-11-2024

Thanks for confirming [~martin]. @aph could you take a look please.
22-11-2024

[~dholmes] Indeed, -XX:-UseSecondarySupersTable makes the problem go away. Experimental patch to unconditionally disable the feature on x86 works: --- a/src/hotspot/cpu/x86/vm_version_x86.hpp +++ b/src/hotspot/cpu/x86/vm_version_x86.hpp @@ -827,7 +827,7 @@ class VM_Version : public Abstract_VM_Version { // x86_64 supports secondary supers table constexpr static bool supports_secondary_supers_table() { - return LP64_ONLY(true) NOT_LP64(false); // not implemented on x86_32 + return false; } constexpr static bool supports_stack_watermark_barrier() {
21-11-2024

[~martin] can you try running with `-XX:-UseSecondarySupersTable`?
21-11-2024

[~martin] sorry user error.
21-11-2024

[~dholmes] I have no experience reading these log entries, BUT I see many references to ClassCastException in the attachment Xlogexceptions java.lang.BootstrapMethodError: bootstrap method initialization exception ... Caused by: java.lang.ClassCastException: ... 38052:[6.334s][info ][exceptions] Found matching handler for exception of type "java.lang.ClassCastException" in method "invoke" at BCI: 964
20-11-2024

Just to be clear was the exception log from a failing execution? There is no ClassCastException in it.
20-11-2024

I tried adding various JIT flags to that javac command line, without any obvious effect except for execution time: -Xint -XX:TieredStopAtLevel=1 -XX:-TieredCompilation -server Is there any way to completely suppress interpreter execution?
20-11-2024

[~dholmes] See attached file Xlogexceptions. TIL about make-support/failure-logs - that's a very nice facility! I added -J-Xlog:exceptions=debug to the javac command line from buildtools_create_symbols_javac__the.COMPILE_CREATE_SYMBOLS_batch.cmdline and was surprised to see that it worked well. I didn't draw any conclusions from the output, except for added evidence that there was no OOME swallowed.
20-11-2024

[~martin] can you take the failing build command from the log and just add the -Xlog:exceptions=debug to that and run it manually? I tried running in a constrained memory environment but was unable to induce a failure.
20-11-2024

Runtime Triage: ILW -> MLM -> P4 I: exception, not a crash -> M L: not on major platforms, only reproducible on old platform: L W: M (use supported build platforms)
19-11-2024

a snippet from my openjdk infrastructure for git bisect These configure flags made no difference: --with-jvm-features=compiler2,-compiler1 --with-jvm-features=compiler1,-compiler2 openjdk:bisect-run() { local -r repo=${BISECT_REPO-"$HOME/ws/jdk"} local -r major=$(openjdk:source-tree-major-version "$repo") local -r boot_major=$(( major - 1 )) local -r boot=${BISECT_BOOT_JDK-"$HOME/jdk/jdk${boot_major}"} ( cd "$repo" rm -rf "$repo/build" declare -ar configure=( bash ./configure --with-boot-jdk="$boot" ) verbose "${configure[@]}" verbose make ) }
19-11-2024

[~dholmes] I don't know (or remember?) how to modify the build process to add java flags. I tried JDK_JAVA_OPTIONS='-Xlog:exceptions=debug' make but that made no difference. I doubt that memory constraints alone are causing the problem, because git bisect found a hotspot commit and because I have successfully built openjdk on even smaller machines, like an Orange Pi with 2GB of RAM.
19-11-2024

I tried and failed to repro the build failure on modern hardware by disabling C2, or to un-repro this on my old hardware by disabling C1 at configure-time. I'm not planning any further attempts. I suggest hotspot testers try to run tests that explicitly force use of various JIT compilation strategies to try to repro. If y'all cannot repro on your own machines, I offer guest ssh access on the failing machine for your debugging pleasure.
19-11-2024

Okay I will examine the changes from JDK-8331341. My suspicion is that we get an OOM somewhere and it manifests as a failed check rather than an OOME. [~martin] Can you run with -Xlog:exceptions=debug please and see if there is an OOM condition reported?
19-11-2024

A few comments: - javac is not using the Classfile API directly. In this case, it is just indy-calling the LambdaMetafactory, which then uses Classfile API to generate the class. As long as the indy call is correct, it is not javac's responsibility to make sure LambdaMetafactory and underlying levels work correctly. - this case is, I think, even easier: although there may be a relation to the indy call, the error is: class jdk.internal.classfile.impl.AbstractPoolEntry$StringEntryImpl cannot be cast to class java.lang.classfile.constantpool.LoadableConstantEntry (jdk.internal.classfile.impl.AbstractPoolEntry$StringEntryImpl and java.lang.classfile.constantpool.LoadableConstantEntry are in module java.base of loader 'bootstrap') StringEntryImpl is, most definitely, a subclass of LoadableConstantEntry: StringEntryImpl implements StringEntry, which extends ConstantValueEntry, which extends LoadableConstantEntry. It is unclear why the VM throws the exception saying it cannot cast LoadableConstantEntry to StringEntryImpl, despite the clear subtyping hierarchy, and how javac could affect that - unless there's a reason to believe javac generates a wrong class hierarchy. Although that would, almost surely, be widely reproducible and quickly caught, as javac is mostly deterministic. Given Martin's findings, re-assigning hotspot/runtime.
18-11-2024

I tried to revert 8331341 from master tip for testing: git revert ead0116f2624e0e34529e47e4f509142d588b994 but there were conflicts and I gave up.
18-11-2024

My git-bisect finally finished and claims the culprit is: 8331341: secondary_super_cache does not scale well: C1 and interpreter and that's plausible because 8331341 seems to be about optimizing subtype checking and my cpu and memory constrained machine will have hotspot making different optimization decisions. I'll do some more testing to verify.
18-11-2024

Interesting. I had assumed that the `var` would ensure the compiler uses a type that would not require any cast. EDIT: Ah! That was somewhat naive on my part. The compiler doesn't care about the local or the var. It has to ensure the return value of maybeClone is type-compatible with the parameter to ldcOpcode - hence the cast.
18-11-2024

[~dholmes] This maybeClone is a generic call. Here's the javap output: public java.lang.classfile.CodeBuilder ldc(java.lang.classfile.constantpool.LoadableConstantEntry); descriptor: (Ljava/lang/classfile/constantpool/LoadableConstantEntry;)Ljava/lang/classfile/CodeBuilder; flags: (0x0001) ACC_PUBLIC Code: stack=3, locals=3, args_size=2 0: aload_0 1: getfield #316 // Field constantPool:Ljdk/internal/classfile/impl/SplitConstantPool; 4: aload_1 5: invokestatic #480 // Method jdk/internal/classfile/impl/AbstractPoolEntry.maybeClone:(Ljava/lang/classfile/constantpool/ConstantPoolBuilder;Ljava/lang/classfile/constantpool/PoolEntry;)Ljava/lang/classfile/constantpool/PoolEntry; 8: checkcast #486 // class java/lang/classfile/constantpool/LoadableConstantEntry 11: astore_2 12: aload_0 13: aload_2 14: invokestatic #751 // Method jdk/internal/classfile/impl/BytecodeHelpers.ldcOpcode:(Ljava/lang/classfile/constantpool/LoadableConstantEntry;)Ljava/lang/classfile/Opcode; 17: aload_2 18: invokevirtual #501 // Method writeDirectLoadConstant:(Ljava/lang/classfile/Opcode;Ljava/lang/classfile/constantpool/LoadableConstantEntry;)V 21: aload_0 22: areturn LineNumberTable: line 1688: 0 line 1689: 12 line 1690: 21 LocalVariableTable: Start Length Slot Name Signature 0 23 0 this Ljdk/internal/classfile/impl/DirectCodeBuilder; 0 23 1 entry Ljava/lang/classfile/constantpool/LoadableConstantEntry; 12 11 2 direct Ljava/lang/classfile/constantpool/LoadableConstantEntry;
18-11-2024

[~liach] how does this line: at java.base/jdk.internal.classfile.impl.DirectCodeBuilder.ldc(DirectCodeBuilder.java:1688) var direct = AbstractPoolEntry.maybeClone(constantPool, entry); even allow for the possibility of a ClassCastException! ???
18-11-2024

I've moved this to javac. The underlying exception is: Caused by: java.lang.ClassCastException: class jdk.internal.classfile.impl.AbstractPoolEntry$StringEntryImpl cannot be cast to class java.lang.classfile.constantpool.LoadableConstantEntry (jdk.internal.classfile.impl.AbstractPoolEntry$StringEntryImpl and java.lang.classfile.constantpool.LoadableConstantEntry are in module java.base of loader 'bootstrap') which suggests an issue with javac's use of the new Classfile API.
18-11-2024

Hmm, this is weird - as a ClassFile API dev, I am 100% sure StringEntry and its sole implementation StringEntryImpl is a subtype of LoadableConstantEntry. Don't know what goes wong here.
18-11-2024

David: Sorry to file this against you, but you seem to be very good at fixing bugs with "bootstrap method initialization exception"! Yes, the build message says to file the bug against javac - you will lknow if that's right! I could git-bisect, but it would be very slow - how interested is openjdk in hunting this down?
17-11-2024

I talked myself into starting a git bisect. With luck, we will have the culprit commit within a day or two.
17-11-2024