JDK-8144990 : java/util/concurrent/forkjoin/FJExceptionTableLeak.java: OOM with Xcomp,G1GC
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 9
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: x86_64,aarch64
  • Submitted: 2015-12-09
  • Updated: 2017-03-21
  • Resolved: 2016-01-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b104Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
java/util/concurrent/forkjoin/FJExceptionTableLeak.java fails with OOM on latest nightly builds.

X64 platform results:

java -Xmx2200k -XX:-UseCompressedOops FJExceptionTableLeak: Passed

java -Xcomp  -XX:+UseSerialGC -Xmx2200k -XX:-UseCompressedOops FJExceptionTableLeak :Passed

java -Xcomp -Xmx2200k -XX:-UseCompressedOops FJExceptionTableLeak:OOM








Test fails from JDK9b72 (G1 GC is made true by default).

java/util/concurrent/forkjoin/FJExceptionTableLeak.java" Exception java.lang.OutOfMemoryError: Java heap space 
Comments
I expedited this fix ahead of other pending jsr166 changes.
28-01-2016

@martinb the changes in your webrev look ok to me. Consider it reviewed.
28-01-2016

Please send an appropriate review request as outlined at http://openjdk.java.net/contribute/ to the relevant lists. I would think core-libs@openjdk.java.net and hotspot-[gc-]dev@openjdk.java.net would be appropriate here.
28-01-2016

I intend to commit http://cr.openjdk.java.net/~martin/webrevs/openjdk9/jsr166-jdk9-integration/FJExceptionTableLeak/ soon. Would someone like to be a Reviewer?
28-01-2016

In JDK 9 build 102 on Linux x64, we are now seeing this test fail about 6% of the time; also failing commonly on windows x64.
27-01-2016

After updating to jdk-9-ea+100, I see this test failing even with default VM options. I don't like the relentless creep of VM memory requirements, but I will increase Xmx to 8m, hopefully keeping this test passing as is for a few more years. And just in case, we will limit common pool parallelism as well.
16-01-2016

On another note: please ask such questions about general GC behavior in the appropriate forums, e.g. hotspot-gc-dev/hotspot-gc-use mailing lists. I only stumbled across these particular questions by coincidence. There is a much higher chance for getting answers from the source for such questions there.
23-12-2015

Martin Buchholz: The question of what is the minimum Xmx setting that a test is allowed to use should be answered jdk-wide - it should support all the standard GCs. And standard GCs should strive hard not to allow their minimum Xmx requirements to grow. That question can not be answered because obviously it depends on how much long-living data the test uses at any point in time. Which changes with different JDKs, and different options. Java heap memory limitations of the GCs have been very stable. The minimum memory requirements by the GCs (e.g. G1) have not changed (sans bugs like the above that now ignores "odd" heap sizes). And btw, there has always been a hard limit of a 2M heap for any GC as far as I remember - obviously that 2200k in the test came from the original developer trying to optimize for something.
23-12-2015

Martin Buchholz: I can see my jdk9 rounds up to nearest 2MB whether or not I ask for large pages - is that a bug? This is not a bug but expected behavior (that may have been fixed some time ago). Several internal data structures constrain heap size to multiples of particular values. In this case it is the card table, which is required to be a multiple of OS (small) page size, on x86 4k. Every byte of the card table itself represents information about 512 bytes of java heap memory, and since we (as mentioned, since some time ago) require these internal data structures to cover the java heap exactly, this makes a minimum alignment of heap increase of 2M = 4k*512 (on x86). If you ask for large pages, the heap will always be aligned to large page size, which coincidentally is also (typically) 2M on x86. In previous versions this requirement may not have been enforced properly, potentially causing undefined GC behavior. So no, this is not a bug but again a hard limitation of the implementation.
23-12-2015

Maybe the amount of heap it allocates is dependent on the number of threads in the system, i.e. it scales with the number of threads. If that, then the test needs to make sure that it is run with the appropriate amount of threads itself in whatever way.
23-12-2015

Analyzing the heap dump generated with -XX:HeapDumpOnOutOfMemoryError shows that the live set size of that test at that point is ~3.3M. This is not a supported operation environment in G1. I also tried 9b72 with Serial, Parallel and CMS and they also fail when run with -Xmx4M. Eg.#!/bin/bash test(){ rm -rf JT* $JT_HOME/bin/jtreg \ -agentvm -a -ea -esa -v:fail,error,time -retain:fail,error -ignore:quiet -timeoutFactor:4 \ -vmoption:-Xmx4M -XX:+UseSerialGC -javaoptions:-d64 -vmoption:-XX:+HeapDumpOnOutOfMemoryError -vmoption:-XX:HeapDumpPath=<path-to-dump-to> \ -vmoption:-XX:+PrintGCDetails \ -nr \ -jdk:$JT_JDK \ <path-to-test>/java/util/concurrent/forkjoin/FJExceptionTableLeak.java } JT_JDK=<path-to-9b72> JT_JAVA=$JT_JDK/bin/java JT_HOME=<path-to-jtreg> export JT_JAVA JT_HOME JT_JDK for i in `seq 1 1` do test done It is very unlikely this is a GC bug but a test configuration bug. Serial GC fails with a live heap size of 3.5M (at -Xmx4M). The test passes beginning with an -Xmx22M on any collector (on my dual-socket machine with 40 threads).
23-12-2015

Moved back to core-libs/java-util-concurrent as after analysis this seems to be a test bug.
23-12-2015

One reason for the problem may that the generations in G1 are required to be multiple of the region size (e.g. 1 M min). Given that we typically have at least eden, survivor and old, this means, at 4M heap size 1 M for any longer living objects. In some degraded mode, I think that G1 can run with eden and old only. So if your application has more than 3M of longer living objects, which might easily happen due to recent changes to the jdk (jigsaw, or one of these additional switches like -Xcomp that may generate additional java.lang.Class instances, or -XX:-UseUncompressedOops on 64 bit that causes generation of larger instances), you are out of luck. Particularly with jigsaw we recently had quite a few tests that started failing when assuming that a particular small heap size would be "enough", and this seems to be another one. Serial and other collectors I think may decrease generations to smaller values than 1 M. This restriction of region sizes for G1 is intentional and by design.
23-12-2015

I tried java -Xcomp -Xmx2200k -XX:-UseCompressedOops FJExceptionTableLeak with latest linux-x64 jdk-9+95 and it passes!
15-12-2015

You're welcome :-)
15-12-2015

Christian, I wrote that comment, but - I wasn't talking about the VM, but the actual java library bug that this is a regression test for. I do have vague plans to use a different strategy for leak tests, that doesn't require detecting OOME. Leak detection testing is a hard problem. For strange VM flag rotation, I would maintain exclude lists. You can't help it, e.g. some tests definitely fail with -XX:+DisableExplicitGC But thank you for having VM flag rotation !
15-12-2015

Well, it all depends on how testing is done. Oracle does flag rotations to increase the chance that different flag combinations work. The test is run with: * @run main/othervm -Xmx2200k FJExceptionTableLeak so nothing special. There is a comment saying: // This test was observed to fail with jdk7 -Xmx2200k, // using STEPS = 220 and TASKS_PER_STEP = 100 The JDK changed a lot since 7 and it could just allocate more objects (think Lambdas). Also, there are a couple of command line flags which affect the heap memory usage. Most notably ObjectAlignmentInBytes but I'm sure there are others. A flag rotation with ObjectAlignmentInBytes might bump you over the limit. In my opinion all tests with a very small maximum heap defined will fail sooner or later.
14-12-2015

Even though this test asks for 2200k, it normally gets 4MB because heap comes in increments of 2MB. It is certainly possible that e.g. on ARM you actually get a smaller heap, in which case it's reasonable to bump up the -Xmx flag to a value that will work on ARM. A trivial java process should be able to start up with 4MB (or even 2MB!) of heap. Maybe ensuring that should be somebody's job?
14-12-2015

On ARM architecture it fails in every build from b72 without any additional options. For other architectures it fails with Xcomp mode only, so moving the bug to compiler queue.
14-12-2015

I still have not seen this test fail without the eeevil -XX:+AggressiveOpts flag. Perhaps there's something else you're doing different, like running a debug jdk?
12-12-2015

./java -XX:-UseCompressedOops -Xmx2200k FJExceptionTableLeak Exception in thread "ForkJoinPool-1-worker-0" Exception in thread "ForkJoinPool-1-worker-3" java.lang.OutOfMemoryError: Java heap space whereas ./java -XX:-UseCompressedOops -XX:+UseSerialGC -Xmx2200k FJExceptionTableLeak passes. ./java -XX:+PrintFlagsFinal -Xmx2200k -version | grep MaxHeapSize size_t MaxHeapSize := 4194304 {product} jtreg tht failed had options -Xcomp -server -XX:MaxRAMFraction=8 -XX:+CreateCoredumpOnCrash -ea -esa -XX:-TieredCompilation -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -XX:+IgnoreUnrecognizedVMOptions -XX:+AggressiveOpts -XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedOops -Xmx2200k If the mimimum alignment for a 4k page size platform is 2MB then we should always have at least 4MB in the heap, no? Yes if you ask size btwn 2M-4M. it will always have 4M.
11-12-2015

I have yet again tried and failed to get FJExceptionTableLeak to fail. Jamsheed, why don't you provide an actual failing jtreg command? If the mimimum alignment for a 4k page size platform is 2MB then we should always have at least 4MB in the heap, no?
11-12-2015

Ok, this is not a bug, its minimum upper alignment of the heap for 4k page size platform. This machine should reproduce the issue. can you please try -Xmx2200k -XX:-UseCompressedOops for serial and g1gc?
11-12-2015

I can see my jdk9 rounds up to nearest 2MB whether or not I ask for large pages - is that a bug? How do I get 4k pages? $ (for k in 1200 2200; do for b in - +; do a=(java -XX:${b}UseLargePages -Xmx${k}k -XX:+PrintFlagsFinal); echo "${a[@]}"; "${a[@]}" |& grep MaxHeapSize; done; done) java -XX:-UseLargePages -Xmx1200k -XX:+PrintFlagsFinal size_t MaxHeapSize := 2097152 {product} java -XX:+UseLargePages -Xmx1200k -XX:+PrintFlagsFinal size_t MaxHeapSize := 2097152 {product} java -XX:-UseLargePages -Xmx2200k -XX:+PrintFlagsFinal size_t MaxHeapSize := 4194304 {product} java -XX:+UseLargePages -Xmx2200k -XX:+PrintFlagsFinal size_t MaxHeapSize := 4194304 {product}
11-12-2015

you can check what is MaxHeapSize allocated for your platform using java -Xmx2200k -XX:+PrintFlagsFinal 2>&1 | grep MaxHeapSize, if its more than 4M, you need to find a machine to reproduce this issue.
11-12-2015

Jamsheed, you keep providing instructions that are insufficiently clear. I have no idea how to "make sure that internal GC alignment dont allocate more than 4M" ! It's also not clear how to query or change "page size of platform" on linux-x64. Hotspot should try hard not to use 4M page sizes when the user asked for a heap size smaller than that!
11-12-2015

set -Xmx2200k , also make sure that internal heap alignment dont allocate more than 4M.( it depends of page size of platform you run)
11-12-2015

Please provide platform and exact command you used to reproduce this failure. Please compare and contrast with the following, where I succeeded with -XX:-UseCompressedOops .../4.1-b12/bin/jtreg -agentvm -verbose:nopass,fail,error -vmoption:-enablesystemassertions -automatic -ignore:quiet -compilejdk:/home/martin/ws/jdk9/build/linux-x86_64-normal-server-release/images/jdk -testjdk:/home/martin/ws/jdk9/build/linux-x86_64-normal-server-release/images/jdk -XX:-UseCompressedOops ./java/util/concurrent/forkjoin/FJExceptionTableLeak.java Test results: passed: 1
11-12-2015

yes, there was one difference UseCompressedOops was false. (Issue is seen only in 64 bit platform) but serial GC was taking less than 4M with this option false.
11-12-2015

I agree this is not a duplicate. Our tests should run with "sane" VM flags, and G1 is the default, hence sane by definition. BUT I don't see any failures running with the default flags, and I checked that G1 is indeed turned on by default. Are there other environmental differences? The question of what is the minimum Xmx setting that a test is allowed to use should be answered jdk-wide - it should support all the standard GCs. And standard GCs should strive hard not to allow their minimum Xmx requirements to grow.
11-12-2015

I don't consider this a duplicate of JDK-814436 as that is not a test issue. The current problem is a test issue in so much as it should either work with G1 or exclude its use.
11-12-2015