Bug ID: JDK-8294677 chunklevel::MAX_CHUNK_WORD_SIZE too small for some applications

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 17

Priority: P3
Status: Open
Resolution: Unresolved
OS: generic
CPU: generic

Submitted: 2022-09-23
Updated: 2023-01-14

A DESCRIPTION OF THE PROBLEM :
Failed to compile if a test has lots of function blocks if running on JDK 17, but JDK 8 works.

REGRESSION : Last worked in version 8

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-17.jdk/Contents/Home
export PATH=${JAVA_HOME}/bin:${PATH}

git clone https://github.com/delta-io/delta.git
cd delta
git checkout 6a30e958de4322100b2ccfa13fa29ae155369a07
build/sbt clean  "core/testOnly *.DeltaErrorsSuite"

ACTUAL -
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (metaspaceArena.cpp:93), pid=45644, tid=6147
#  guarantee(requested_word_size <= chunklevel::MAX_CHUNK_WORD_SIZE) failed: Requested size too large (528698) - max allowed size per allocation is 524288.
#
# JRE version: Java(TM) SE Runtime Environment (17.0+35) (build 17+35-LTS-2724)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17+35-LTS-2724, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64)
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/yumwang/opensource/delta/core/hs_err_pid45644.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
Exception in thread "Thread-9" java.io.EOFException
  | => cat java.base/java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:3192)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1693)
        at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:514)
        at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:472)
        at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1$React.react(Framework.scala:839)
        at org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1.run(Framework.scala:828)
        at java.base/java.lang.Thread.run(Thread.java:833)

---------- BEGIN SOURCE ----------
https://github.com/delta-io/delta/blob/2499f5408c63de39914a789cf8bb57137224fb3a/core/src/test/scala/org/apache/spark/sql/delta/DeltaErrorsSuite.scala#L146
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
https://github.com/delta-io/delta/pull/1391/files#diff-7fb91cd0f8dfcccf78ab6ff32c97945a15df962a6572f6236575addc3067d824R146

FREQUENCY : always

I have a VM with - as a test - increased Root chunk size of 32MB. Such a change would be minimally invasive and it fixes the problem here with Delta. However, understanding more about the problem, I think StackMapTables can get really big for very large and inefficiently generated methods. Therefore another option - possibly in addition to increasing the root chunk size - would be to allow side allocations per malloc. I dislike this on principle, since it makes the allocator and things like "Metaspace::contains()" more complex. Therefore lets see if the increased root chunk size already does the trick. A little mental calculation: A method with 64k of byte codes, consisting of almost exclusively stores into the local var array, having a stack map table entry for every bytecode (I am still unsure about the "block" term in the class file spec), and each entry being expressed in full, could come to a stack map table size of several GB. 64k byte codes -> 64kish stack map table entries, each repeatedly describing an ever growing var array of max 64k size... Of course, such a class would be absurd, and would it be loadable, it would eat up tons of memory.
14-01-2023
The generated code contains two functions, both seem from the bytecode-like very large switch constructs. They both scrape at the very end of method size (37k resp. 56k bytecodes). They accumulate an ever growing local var array, in the thousands range. I'm pretty sure this is unintended. e.g. ``` grep store DeltaErrorsSuiteBase-javap.txt 143: astore 7 148: astore 8 164: astore 9 201: astore 6 244: astore 11 249: astore 12 265: astore 13 302: astore 10 383: astore 14 390: astore 16 395: astore 17 411: astore 18 448: astore 15 ... 37041: astore_w 1148 37084: astore_w 1145 37134: astore_w 1150 37157: astore_w 1151 37183: astore_w 1152 37226: astore_w 1149 37311: astore_w 1153 37322: astore_w 1155 37329: astore_w 1156 37355: astore_w 1157 37398: astore_w 1154 37448: astore_w 1159 37455: astore_w 1160 37481: astore_w 1161 37524: astore_w 1158 37574: astore_w 1163 37597: astore_w 1164 37623: astore_w 1165 37666: astore_w 1162 ``` Also not optimal is the way the stackmap table is present. There are a lot of function blocks, hence a lot of stackmap entries (>1000). All of them are full entries (https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7.4). Unfortunately, since the operand stack seems to never be empty, or only 1-element-sized, none of the "same_frame_" presentations can be used, which means that on each StackMapTable entry the whole insane local var array is repeated.
14-01-2023
Seems rare but another occurrence found here: https://github.com/rakudo/rakudo/issues/4952 . In both cases (here and there) gigantic generated classes are involved with insane stack maps. In case of delta, I boiled it down to a single class, "org/apache/spark/sql/delta/DeltaErrorsSuiteBase.class", that is 6.8MB and about 10x the size of any other large class. There is one StackMapTable with 1600~ entries, and the entries are very large themselves. See class and javap output attached. I'll check what the best way to deal with this is.
14-01-2023
This issue is a direct consequence of JDK-8251158 which limited the maximum allocation size in Metaspace to MAX_CHUNK_WORD_SIZE (currently 4M / 524288 words). It seems that some class files can have larger stackmap tables (see attached hs_err files). The problem is easily reproducible (see CUSTOMER SUBMITTED WORKAROUND in the initial report): ``` export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-17.jdk/Contents/Home export PATH=${JAVA_HOME}/bin:${PATH} git clone https://github.com/delta-io/delta.git cd delta git checkout 6a30e958de4322100b2ccfa13fa29ae155369a07 build/sbt clean "core/testOnly *.DeltaErrorsSuite" ``` Looks like we have to increase MAX_CHUNK_WORD_SIZE and/or make it configurable through a command line options to avoid such issues in the future.
13-01-2023
ILW = HLM = P3
25-10-2022