JDK-8292301 : [REDO v2] C2 crash when allocating array of size too large
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,17,18,19,20
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2022-08-12
  • Updated: 2024-07-09
  • Resolved: 2022-09-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 20
17.0.7-oracleFixed 20 b18Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
The original fix caused problems and was backed out (backout bug JDK-8279204). The redo, JDK-8279219, got backed out as well (backout bug JDK-8292260). We'll need another redo (v2).

This bug should redo the fix which should also fix the found problems with JDK-8278413:
- JDK-8279021
- JDK-8279062
- JDK-8279125

As well as JDK-8291665, JDK-8291919 and JDK-8288184.
Comments
A pull request was submitted for review. URL: https://git.openjdk.org/jdk17u-dev/pull/1204 Date: 2023-03-16 08:16:50 +0000
16-03-2023

Fix request [17u] I backport this for parity with 17.0.7-oracle. I delayed this to 17.0.8 to gather a bit more testing of this second redo before bringing it in 17. This fixes a real problem in C2, and Oracle fixed it, so we should take this too. The risk is above medium I would say. The fact that fixing the bug failed twice shows this. I had to resolve three places, all straight forward adaptions. Tests pass. SAP nightly testing passed.
16-03-2023

Changeset: 1ea0d6b4 Author: Roland Westrelin <roland@openjdk.org> Date: 2022-09-28 07:16:59 +0000 URL: https://git.openjdk.org/jdk/commit/1ea0d6b424c263590fd145913280a180d7ce5fe1
28-09-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10038 Date: 2022-08-26 08:35:14 +0000
26-08-2022

[~xliu] It reproduces with no extra flag. I use a jdk-11.0.16-ga build. JDK-8288184 is not the same as JDK-8291665 The fix I pasted above is for JDK-8291665 I added a fix for JDK-8288184 in the comments of that one. I plan to include both fixes together with the redo.
19-08-2022

Is JDK-8288184 same as this issue? currently, it's closed as a dup. I tried your reproducible but I can't trigger it. here is my trace. 5104 462 b 4 TestNewArrayOutsideLoopValidLengthTestInLoop::test1 (70 bytes) Loop: N0/N0 has_sfpt Loop: N358/N341 limit_check profile_predicated predicated sfpts={ 332 } Loop: N0/N0 has_sfpt Loop: N358/N341 limit_check profile_predicated predicated sfpts={ 332 } Predicate IC Loop: N358/N341 limit_check profile_predicated predicated sfpts={ 332 } Predicate IC Loop: N358/N341 limit_check profile_predicated predicated sfpts={ 332 } Loop: N0/N0 has_sfpt Loop: N358/N341 limit_check profile_predicated predicated sfpts={ 332 } Unswitch 1 Loop: N358/N341 limit_check profile_predicated predicated sfpts={ 332 } Loop: N0/N0 has_sfpt Loop: N419/N418 limit_check profile_predicated predicated sfpts={ 416 } Loop: N358/N341 limit_check profile_predicated predicated sfpts={ 332 } Loop: N0/N0 has_sfpt Loop: N419/N418 limit_check profile_predicated predicated sfpts={ 416 } Loop: N358/N341 limit_check profile_predicated predicated sfpts={ 332 } PredicatesOff Loop: N0/N0 has_sfpt Loop: N419/N418 sfpts={ 416 } Loop: N358/N341 sfpts={ 332 } I do see 215 CmpU and 408 CmpU have the same code shape like JDK-8291665 , but it looks like we still need a Split-If after Unswitch-1. Do you have some flags to trigger it?
19-08-2022

Attached test case can be used to reproduce the issue. The issue doesn't occur with the current development version, not because the bug is not there but some unrelated change must cause the graph to be slightly different and split if doesn't trigger. Tentative fix is: diff --git a/src/hotspot/share/opto/loopopts.cpp b/src/hotspot/share/opto/loopopts.cpp index 45662010b70..4266eb5a269 100644 --- a/src/hotspot/share/opto/loopopts.cpp +++ b/src/hotspot/share/opto/loopopts.cpp @@ -1742,7 +1742,8 @@ void PhaseIdealLoop::clone_loop_handle_data_uses(Node* old, Node_List &old_new, // in the loop to break the loop, then test is again outside of the // loop to determine which way the loop exited. // Loop predicate If node connects to Bool node through Opaque1 node. - if (use->is_If() || use->is_CMove() || C->is_predicate_opaq(use) || use->Opcode() == Op_Opaque4) { + if (use->is_If() || use->is_CMove() || C->is_predicate_opaq(use) || use->Opcode() == Op_Opaque4 || + (use->Opcode() == Op_AllocateArray && use->in(AllocateNode::ValidLengthTest) == old)) { // Since this code is highly unlikely, we lazily build the worklist // of such Nodes to go split. if (!split_if_set) { @@ -2240,9 +2241,10 @@ void PhaseIdealLoop::clone_loop( IdealLoopTree *loop, Node_List &old_new, int dd if (split_if_set) { while (split_if_set->size()) { Node *iff = split_if_set->pop(); - if (iff->in(1)->is_Phi()) { - Node *b = clone_iff(iff->in(1)->as_Phi(), loop); - _igvn.replace_input_of(iff, 1, b); + uint input = iff->Opcode() == Op_AllocateArray ? AllocateNode::ValidLengthTest : 1; + if (iff->in(input)->is_Phi()) { + Node *b = clone_iff(iff->in(input)->as_Phi(), loop); + _igvn.replace_input_of(iff, input, b); } } }
18-08-2022

[~xliu] Yes. I'm working on a stripped down test case. I'll attach it to the bug once I have one.
17-08-2022

Noting that JDK-8288184 has a reproducer that fails on all versions (if that's the same issue).
17-08-2022

I see. Actually neither 865 nor 855 belongs to the loop. it's because 'break' statements are in the blocks and they won't go back the loop header in DFS.
17-08-2022

hi, @Roland, Do you get a chance to take a look at the replay file? if we use -XX:+TraceLoopOpts -XX:+Verbose, we can dump the initial loop body before doing anything. LoopNode was original Region 1076. Loop: N0/N0 has_call has_sfpt body={ } Loop: N2469/N1477 limit_check profile_predicated predicated has_call has_sfpt body={ 1516 1791 1513 1788 1490 1774 685 1110 1514 1954 2117 2240 1789 1494 1776 1100 1502 1782 1961 2121 2243 665 1096 1487 1773 1101 1503 1783 1963 2123 1493 1775 659 1090 1491 655 1087 1955 2118 1079 1482 1969 2126 1497 1778 12 87 1481 656 1088 666 1097 1327 1654 1881 1325 1653 1880 684 2241 1109 1966 2124 1507 1786 1967 2125 2244 1777 1958 1484 1772 1956 2119 2242 1781 1962 2122 1957 2120 662 654 1086 1488 1091 1492 2009 1607 1847 1288 1608 1505 1784 2238 2318 2185 1879 2050 2186 1623 1620 1077 859 1083 1298 650 1300 1625 452 668 19 59 861 453 672 673 862 863 1107 1302 678 462 683 872 463 687 2245 873 1332 1883 906 1658 1657 1333 1330 467 908 877 468 909 878 469 910 879 470 911 880 952 893 926 930 934 938 942 944 946 948 950 645 860 1082 1299 649 1223 1485 664 653 651 1081 1495 1483 1080 1085 1084 1489 1089 1093 1092 1499 679 1628 1630 67 5 1099 864 1106 1303 677 1227 1510 1105 2049 1878 1652 1323 1584 466 907 1337 876 831 1336 1239 921 465 2051 1331 1659 875 830 1882 1238 464 905 874 829 1655 1328 1329 1237 1626 670 1095 1094 1960 1501 1780 1858 1779 1498 1500 1582 835 1953 1768 1477 2469 1769 1480 1078 2011 647 2114 1949 1766 1475 1075 1621 1 619 1624 1622 1479 1243 834 1340 1242 833 1339 1241 832 1338 1240 828 1968 1790 1656 1515 1112 1326 1236 827 1512 1108 1324 1235 1226 1511 1583 817 1504 1102 1965 1785 1506 1103 1627 1629 1225 816 1496 1098 1301 1224 1222 1486 1581 } I don't understand why 865 and 855 are not part of this loop. those 2 allocation arrays belong to the while loop. https://github.com/corretto/corretto-11/blob/develop/src/java.base/share/classes/sun/security/ssl/SSLEngineInputRecord.java#L283 https://github.com/corretto/corretto-11/blob/develop/src/java.base/share/classes/sun/security/ssl/SSLEngineInputRecord.java#L313
16-08-2022

One lead is the replay file from https://bugs.openjdk.org/browse/JDK-8291665?focusedCommentId=14515910&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14515910 I can only use the replay file with jdk11.0.16 to trigger this issue. the crash happens in LoopOpts split_if . it has difficulty to handle the phi node which attempts to merge 'ValidLengthTest' results. Disabling either SplitIfBlocks or LoopUnswitching will avoid the crash. One thing I don't quite understand is why 865 AllocateArray which comes from bci@590, or https://github.com/corretto/corretto-11/blob/develop/src/java.base/share/classes/sun/security/ssl/SSLEngineInputRecord.java#L313, is not part of loop body.
15-08-2022

ILW = same as JDK-8278413 = P3
13-08-2022

JDK-8292260 also backs out fix JDK-8284369 in the removed test compiler/allocation/TestFailedAllocationBadGraph.java In redo(v2) we need that fix in restored test.
12-08-2022