JDK-8351833 : Unexpected increase in live nodes when splitting Phis through MergeMems in PhiNode::Ideal
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 25
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: x86_64,aarch64
  • Submitted: 2025-03-12
  • Updated: 2025-05-15
  • Resolved: 2025-04-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 25
25 b19Fixed
Related Reports
Causes :  
Causes :  
Relates :  
Relates :  
Description
# Failure analysis

After the changes for JDK-8333393, we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in PhaseIterGVN::optimize. In particular, when we are close to the MaxNodeLimit (80 000 by default), it can happen that we go from below MaxNodeLimit - NodeLimitFudgeFactor * 2 (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the PhaseIterGVN::optimize loop does not trigger as expected and we instead crash at an assert in node creation as we surpass MaxNodeLimit nodes.

# Original description

Since 11th March 2025 we saw a number of the assertions on x86_64 and aarch64, they were triggered by jtreg test javax/xml/crypto/dsig/GenerationTests.java  :


#
#  Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-dev-linux_x86_64-dbg/jdk/src/hotspot/share/opto/node.cpp:78), pid=70273, tid=70300
#  assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit
#
 
V  [libjvm.so+0x157d75b]  Node::verify_construction()+0x12b  (node.cpp:78)
V  [libjvm.so+0x158caa7]  Node::clone() const+0x277  (node.cpp:520)
V  [libjvm.so+0x91a191]  PhiNode::split_out_instance(TypePtr const*, PhaseIterGVN*) const+0xda1  (cfgnode.cpp:1092)
V  [libjvm.so+0x924e0b]  PhiNode::Ideal(PhaseGVN*, bool)+0x279b  (cfgnode.cpp:2592)
V  [libjvm.so+0x1682e5d]  PhaseIterGVN::transform_old(Node*)+0xbd  (phaseX.cpp:668)
V  [libjvm.so+0x926398]  PhiNode::Ideal(PhaseGVN*, bool)+0x3d28  (cfgnode.cpp:2579)
V  [libjvm.so+0x1682e5d]  PhaseIterGVN::transform_old(Node*)+0xbd  (phaseX.cpp:668)
V  [libjvm.so+0x926398]  PhiNode::Ideal(PhaseGVN*, bool)+0x3d28  (cfgnode.cpp:2579)
V  [libjvm.so+0x1682e5d]  PhaseIterGVN::transform_old(Node*)+0xbd  (phaseX.cpp:668)
V  [libjvm.so+0x1678834]  PhaseIterGVN::optimize()+0x94  (phaseX.cpp:1046)
V  [libjvm.so+0xa81132]  Compile::Optimize()+0xad2  (compile.cpp:2335)
V  [libjvm.so+0xa8430f]  Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1dff  (compile.cpp:852)
V  [libjvm.so+0x8c3e40]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x430  (c2compiler.cpp:141)
V  [libjvm.so+0xa91e4c]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xbfc  (compileBroker.cpp:2331)
V  [libjvm.so+0xa92d88]  CompileBroker::compiler_thread_loop()+0x598  (compileBroker.cpp:1975)
V  [libjvm.so+0xf7ecef]  JavaThread::thread_main_inner()+0x12f  (javaThread.cpp:776)
V  [libjvm.so+0x19473b6]  Thread::call_run()+0xb6  (thread.cpp:231)
V  [libjvm.so+0x15f95b8]  thread_native_entry(Thread*)+0x128  (os_linux.cpp:877)
 
Comments
Changeset: 24be888d Branch: master Author: Daniel Lundén <dlunden@openjdk.org> Date: 2025-04-15 08:58:02 +0000 URL: https://git.openjdk.org/jdk/commit/24be888d655a5227cfb9fc22f36d6ba30d732b8d
15-04-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/24325 Date: 2025-03-31 12:25:48 +0000
31-03-2025

We saw the issue now also once in jtreg test java/lang/invoke/VarHandles/VarHandleTestByteArrayAsInt (javax/xml/crypto/dsig/GenerationTests.java shows it more often,but it is not the only one). Issue occurred on macOS aarch64 . # Internal Error (/priv/jenkins/client-home/workspace/openjdk-jdk-dev-macos_aarch64-dbg/jdk/src/hotspot/share/opto/node.cpp:78), pid=2941, tid=26115 # assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x11af4d8] VMError::report(outputStream*, bool)+0x1b00 (node.cpp:78) V [libjvm.dylib+0x11b2bac] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x55c V [libjvm.dylib+0x586130] print_error_for_unit_test(char const*, char const*, char*)+0x0 V [libjvm.dylib+0xe22810] Node::verify_construction()+0x1c0 V [libjvm.dylib+0xe25f9c] Node::clone() const+0x438 V [libjvm.dylib+0x3dc168] PhiNode::split_out_instance(TypePtr const*, PhaseIterGVN*) const+0x650 V [libjvm.dylib+0x3e071c] PhiNode::Ideal(PhaseGVN*, bool)+0x14b4 V [libjvm.dylib+0xed56b8] PhaseIterGVN::transform_old(Node*)+0x164 V [libjvm.dylib+0x3e0b48] PhiNode::Ideal(PhaseGVN*, bool)+0x18e0 V [libjvm.dylib+0xed56b8] PhaseIterGVN::transform_old(Node*)+0x164 V [libjvm.dylib+0xed4b9c] PhaseIterGVN::optimize()+0xf4 V [libjvm.dylib+0x4db220] Compile::Optimize()+0x824 V [libjvm.dylib+0x4d95e4] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1548 V [libjvm.dylib+0x39e5f8] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x204 V [libjvm.dylib+0x4f9400] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x954 V [libjvm.dylib+0x4f8758] CompileBroker::compiler_thread_loop()+0x390 V [libjvm.dylib+0x8d8b5c] JavaThread::thread_main_inner()+0x1d0 V [libjvm.dylib+0x10fb928] Thread::call_run()+0xf0 V [libjvm.dylib+0xe7d364] thread_native_entry(Thread*)+0x138 C [libsystem_pthread.dylib+0x6f94] _pthread_start+0x88
18-03-2025

I believe the (at least short-term) solution here is to adjust the bailout node count in IGVN from if (C->check_node_count(NodeLimitFudgeFactor * 2, "Out of nodes")) { to something like if (C->check_node_count(NodeLimitFudgeFactor * 4, "Out of nodes")) { The problem now is that we add more than (NodeLimitFudgeFactor * 2) nodes in a single round of IGVN, which triggers the assert rather than the intended bailout above. The more difficult question is whether or not it is warranted to hit this bailout more often because of JDK-8333393. I attached a plot showing how the number of live nodes and the size of the IGVN worklist change within the specific IGVN round that is resulting in the present issue. I set the MaxNodeLimit at 100 000 so that we do not hit the bailout, and plot both before ("old") and after ("new") the changes for JDK-8333393. Before the changes, both the number of nodes and the worklist size decrease pretty much monotonically and the IGVN round finishes quickly. After the changes, the additional enabled idealizations result in a lot more work and a lot more fluctuation. However, in the end, the number of live nodes is at a reasonable level.
17-03-2025

We are seeing this in our CI with other tests now: Test: java/lang/invoke/VarHandles/VarHandleTestByteArrayAsInt.java # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/d40cae26-725f-4453-971b-121149cfc772-S5/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/c4682e2b-fe91-42b8-9c34-28c5a935b36a/runs/8c5d1ec0-cadd-4a56-ad5f-725b2f7a7ccf/workspace/open/src/hotspot/share/opto/node.cpp:78), pid=7775, tid=37891 # assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded limit --------------- T H R E A D --------------- Current thread (0x0000000148023a10): JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=37891, stack(0x000000016d660000,0x000000016d863000) (2060K)] Current CompileTask: C2:14165 1373 VarHandleTestByteArrayAsInt::testArrayReadWrite (1858 bytes) Stack: [0x000000016d660000,0x000000016d863000], sp=0x000000016d85ed90, free space=2043k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x11bfbc0] VMError::report(outputStream*, bool)+0x1b00 (node.cpp:78) V [libjvm.dylib+0x11c32a4] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void const*, void const*, char const*, int, unsigned long)+0x56c V [libjvm.dylib+0x587928] print_error_for_unit_test(char const*, char const*, char*)+0x0 V [libjvm.dylib+0xe2ee7c] Node::verify_construction()+0x1c0 V [libjvm.dylib+0xe32620] Node::clone() const+0x448 V [libjvm.dylib+0x3dbe34] PhiNode::split_out_instance(TypePtr const*, PhaseIterGVN*) const+0x658 V [libjvm.dylib+0x3e03f0] PhiNode::Ideal(PhaseGVN*, bool)+0x1494 V [libjvm.dylib+0xee3298] PhaseIterGVN::transform_old(Node*)+0x164 V [libjvm.dylib+0x3e081c] PhiNode::Ideal(PhaseGVN*, bool)+0x18c0 V [libjvm.dylib+0xee3298] PhaseIterGVN::transform_old(Node*)+0x164 V [libjvm.dylib+0xee2770] PhaseIterGVN::optimize()+0xf8 V [libjvm.dylib+0x4dcb64] Compile::Optimize()+0x824 V [libjvm.dylib+0x4daf18] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x154c V [libjvm.dylib+0x39db20] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x21c V [libjvm.dylib+0x4fad9c] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x928 V [libjvm.dylib+0x4fa114] CompileBroker::compiler_thread_loop()+0x398 V [libjvm.dylib+0x8dd640] JavaThread::thread_main_inner()+0x1c8 V [libjvm.dylib+0x110ac78] Thread::call_run()+0xf4 V [libjvm.dylib+0xe8a194] thread_native_entry(Thread*)+0x138 C [libsystem_pthread.dylib+0x7240] _pthread_start+0x94 Lock stack of current Java thread (top to bottom):
17-03-2025

[~dlunden] glad to hear you can reproduce it. > AFAICS we don't run these JDK tests on a debug build. We see the issue only in the fastdebug build test-runs, so that explains it that you did not see it first.
16-03-2025

Interestingly, it seems like this is a very local high node count peak. Printing the ideal graph at level 3, the node count remains stable after all major and minor phases (and maxes out after escape analysis at around 15 000 nodes). Within a single round of IGVN, we go from around 15 000 nodes initially to the 80 000 node count peak and then back down again to a node count below 15 000 in the end. I'm wondering if this is an IGVN worklist ordering issue. A guess: perhaps, after splitting Phis through MergeMems, we need to ensure that the newly introduced Phi nodes are at the top of the worklist so that they can immediately be merged with existing identical Phi nodes. To be continued.
14-03-2025

I managed to reproduce the failure on my local machine with the replay file. The failure is indeed due to JDK-8333393. More details below. For the problematic compilation com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator::reset, we (barely) hit the MaxNodeLimit that is set at 80 000 by default: Before JDK-8333393: 15 851 nodes at peak After JDK-8333393 with replay file (and adjusted MaxNodeLimit to 160 000): 82 154 nodes at peak After JDK-8333393 without replay file: 61 059 at peak So, also clearly excessive without the replay file. I'll have a look next at the ideal graph shape that results in this number of nodes blowup.
14-03-2025

[~thartmann] AFAICS we don't run these JDK tests on a debug build.
14-03-2025

[~dholmes] Ah right, that explains it.
14-03-2025

Thanks for reporting Matthias and thanks for the ping Christian. I'll have a look at this. I'm quite confident the termination condition in the changeset JDK-8333393 is correct, but it is still quite likely that we get a lot more nodes than before in certain cases.
13-03-2025

ILW = Assert during C2 compilation because node limit is reached (should be harmless in product), intermittent with single test, disable compilation of affected method = MLM = P4
13-03-2025

btw the Compiler Memory Statistic shows some info about #nodes , maybe this gives a bit more insight? I am not sure how those numbers were before recent changes, so maybe they are just normal ... Compiler Memory Statistic, 10 most expensive compilations: ctyp total ra node comp type states reglive regsplit superword cienv ha other #nodes result limit time id thread method c2 247105040 51712800 139661904 54485528 622936 0 0 0 0 621872 0 0 7861 ok 1073741824 181.367 9161 0x00007f60d93da750 com/sun/org/apache/xerces/internal/impl/XMLNSDocumentScannerImpl::reset((Lcom/sun/org/apache/xerces/internal/xni/parser/XMLComponentManager;)V) c2 232436064 47357592 129413720 54419944 622936 0 0 0 0 621872 0 0 7851 ok 1073741824 184.381 9162 0x00007f61131629e0 com/sun/org/apache/xerces/internal/impl/XMLDocumentScannerImpl::reset((Lcom/sun/org/apache/xerces/internal/xni/parser/XMLComponentManager;)V) c2 191645592 5449608 155076912 30201624 361032 0 0 0 0 556416 0 0 76818 err 1073741824 141.145 8418 0x00007f61b02311a0 com/sun/org/apache/xerces/internal/impl/XMLEntityManager::reset((Lcom/sun/org/apache/xerces/internal/xni/parser/XMLComponentManager;)V) c2 61587824 7776352 34633224 18316096 328304 108384 0 0 0 425464 0 0 5224 ok 1073741824 163.601 9273 0x00007f61b02311a0 com/sun/org/apache/xerces/internal/impl/dtd/XMLDTDProcessor::reset((Lcom/sun/org/apache/xerces/internal/xni/parser/XMLComponentManager;)V) c2 57427144 7544352 31589440 17300248 361032 108384 0 0 0 523688 0 0 4921 ok 1073741824 148.212 9163 0x00007f61b02311a0 com/sun/org/apache/xerces/internal/impl/XMLDocumentFragmentScannerImpl::reset((Lcom/sun/org/apache/xerces/internal/xni/parser/XMLComponentManager;)V) c2 53500464 16932816 23920160 9363104 1146744 42928 0 0 0 2094712 0 0 14882 ok 1073741824 102.511 8733 0x00007f61130185b0 sun/net/httpserver/ServerImpl$Exchange::run(()V) c2 50876872 6564616 26197272 17318288 328304 108384 0 0 0 360008 0 0 4486 ok 1073741824 56.913 7105 0x00007f61b02311a0 com/sun/org/apache/xerces/internal/impl/XMLScanner::reset((Lcom/sun/org/apache/xerces/internal/xni/parser/XMLComponentManager;)V) c2 50311064 18387368 21138560 8286224 1277656 42928 0 0 0 1178328 0 0 13933 ok 1073741824 158.180 9259 0x00007f61b02311a0 com/sun/org/apache/xerces/internal/parsers/XML11Configuration::<init>((Lcom/sun/org/apache/xerces/internal/util/SymbolTable;Lcom/sun/org/apache/xerces/internal/xni/grammars/XMLGrammarPool;Lcom/sun/org/apache/xerces/internal/xni/parser/XMLComponentManager;Lcom/sun/org/apache/xerces/internal/utils/XMLSecurityPropertyManager;Ljdk/xml/internal/XMLSecurityManager;)V) c2 49414528 25705816 15507544 7469888 361032 42928 0 0 0 327320 0 0 15096 ok 1073741824 125.686 9019 0x00007f61125159f0 sun/security/ec/ECOperations::setSum((Lsun/security/ec/point/ProjectivePoint$Mutable;Lsun/security/ec/point/AffinePoint;Lsun/security/util/math/MutableIntegerModuloP;Lsun/security/util/math/MutableIntegerModuloP;Lsun/security/util/math/MutableIntegerModuloP;Lsun/security/util/math/MutableIntegerModuloP;Lsun/security/util/math/MutableIntegerModuloP;)V) c2 48303592 24218328 15294512 8092200 328304 42928 0 0 0 327320 0 0 14548 ok 1073741824 9.549 1084 0x00007f6110fff250 sun/security/ec/ECOperations::setSum((Lsun/security/ec/point/ProjectivePoint$Mutable;Lsun/security/ec/point/ProjectivePoint$Mutable;Lsun/security/util/math/MutableIntegerModuloP;Lsun/security/util/math/MutableIntegerModuloP;Lsun/security/util/math/MutableIntegerModuloP;Lsun/security/util/math/MutableIntegerModuloP;Lsun/security/util/math/MutableIntegerModuloP;)V) Total: 10 (C1: 0, C2: 10)
12-03-2025

Could be related to JDK-8333393 which went in two days ago. It changed `PhiNode::Ideal(`) to split more phis through mergemems. There is a check to avoid an endless splitting. Maybe we are missing a case and still repeat splitting over and over again (also pinging [~dlunden] who worked on JDK-8333393).
12-03-2025

I added a replay file replay_pid31890.log from the assert on Linux x86_64 (Intel(R) Xeon(R) Platinum 8260M CPU machine).
12-03-2025

[~chagedorn] , [~thartmann] maybe it is related to the hotspot changes (especially C2 changes?) from 10th March; could you maybe look into this ?
12-03-2025

We run fastdebug binaries; whole suite tier2 is executed, with vmoption:-Xmx768m . I see -conc:8 in one of the failing runs but this might differ a bit from test machine to test machine. HW is for example (we fail on more machines) Linux aarch64 : CPU: total 16 (initial active 16) 0x41:0x3:0xd0c:1, fp, asimd, evtstrm, aes, pmull, sha1, sha256, crc32, lse, dcpop Linux x86_64 : Intel(R) Xeon(R) Platinum 8260M CPU at 2.40GHz, 8 cores macOS aarch64 : CPU: total 8 (initial active 8) 0x61:0x0:0xda33d83d:0, fp, asimd, aes, pmull, sha1, sha256, crc32, lse, sha3, sha512 machdep.cpu.brand_string:Apple M2
12-03-2025

[~mbaesken] We don't see these failures in our testing. How do you run the test and on what hardware?
12-03-2025