JDK-8001590 : NPG: Footprint increase when running Jetty
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs25
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2012-10-25
  • Updated: 2015-12-24
  • Resolved: 2013-03-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
hs25Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
When doing VmRSS footprint measurements for Jetty runs, the footprint increased by about 10% between b57 and b58. NPG was introduced in b58.

b57:
  footprint3_real          10    15459.14    452.20           
    jetty                  10    25644.40   1444.98             0.50     

b58:
   footprint3_real          10    16383.54      4.43   -5.98 0.000          Yes
     jetty                  10    28253.60     14.26  -10.17 0.000          Yes
Comments
My measurements (with doit.sh from above), specifically j2se.options="-client\ -Xmx128m\ -XX:+UseSerialGC": I cannot explain why b81 has a big improvement, even though the fixes in JDK-8008290 is supposed to be committed after 03/14/2013. This is contrary to what I saw with the linux-amd64 build. sh doit.sh 2>&1 | tee test.client-i586.log tclsh merge.tcl < test.client-i586.log Samples size stddev date b55 6 46640.00 65.87 09/06/2012 b56 6 47407.33 92.38 09/13/2012 b57 6 47341.33 313.18 09/20/2012 b58 6 50073.33 110.23 09/27/2012 b59 6 50044.67 59.73 10/04/2012 b60 6 50112.00 152.76 10/11/2012 b61 6 50000.00 79.36 10/18/2012 b62 6 49970.67 74.00 10/25/2012 b63 6 50100.00 89.30 11/01/2012 b64 6 50123.33 56.31 11/08/2012 b65 6 50092.00 25.17 11/15/2012 b66 6 50065.33 93.11 11/29/2012 b67 6 49868.67 116.25 12/06/2012 b68 6 50070.00 100.07 12/13/2012 b69 6 50242.00 73.88 12/20/2012 b70 6 50285.33 83.82 12/27/2012 b71 6 50322.00 98.07 01/03/2013 b72 6 50370.67 38.75 01/10/2013 b73 6 50439.33 71.57 01/16/2013 b74 6 50554.00 61.49 01/24/2013 b75 6 50773.33 72.60 01/31/2013 b76 6 50703.33 104.53 02/07/2013 b77 6 50641.33 71.67 02/14/2013 b78 6 61908.67 93.14 02/21/2013 b79 6 61853.33 118.41 02/28/2013 b80 6 61992.00 84.82 03/07/2013 b81 6 48108.00 237.33 03/14/2013 <<< ??? b82 6 51452.00 213.77 03/21/2013 ===========================merge.tcl========================== puts " Samples size stddev date" set data [read stdin] regsub -all "VERSION: " $data \uffff data foreach n [split $data \uffff] { if {[regexp {^b[0-9][0-9]} $n version]} { set file /java/re/jdk/8/promoted/all/$version/binaries/linux-i586/bin/java set date [file mtime $file] set date [clock format $date -format %D] regsub {^(b[0-9][0-9]).*footprint3_real} $n "\\1" n regsub "\n.*" $n "" n puts "$n $date" } } ======================================================
25-03-2013

I am re-running with i586/client and simpler VM parameters now. Here's my script. ================================== for i in {55..82}; do echo j2se.home=/java/re/jdk/8/promoted/all/b${i}/binaries/linux-i586/ > test.prop cat >> test.prop <<EOF j2se.options="-client\ -Xmx128m\ -XX:+UseSerialGC" benchmarks=footprint3_real applications=jetty resultsdir=results.test iterations=6 component=j2se footprint3.applications=jetty footprint3.debug footprint3.mode=elapsed jetty.home=/home/iklam/jdk/tools/refworkload/jetty/jetty-distribution-8.0.0.v20110901 EOF echo VERSION: b${i} rm -rf results.test && sh run test.prop done ===========================================
25-03-2013

We actually knew about the b78 increase before it happened. It was first introduced and discovered in the profiles repo, which was merged into b78. It is covered by JDK-8008290, which was fixed in b82. However, based on your results it looks like there may be other increases that came over as part of the profiles repo merge, although potentially some other change that went into b78 could also have caused this remaining part of the increase.
25-03-2013

Note: my numbers above contain both JDK and HotSpot changes, so some regression may have been introduced at the JDK level. But it would be good to look at the big difference between b77 and b78 to see what caused it. Also, I enabled NMT tracking (-XX:NativeMemoryTracking=detail) in the above runs. I will remove this option and re-run the tests (takes a couple of hours).
25-03-2013

Thanks. This is very useful data. Embedded will have a look at why the fixes in b82 did not get us back down to pre b78 numbers. Looks like about 5700k of the b57/b82 difference is due to this. That still leaves about a 3000k increase from b57 unaccounted for. Are there still more NPG footprint fixes in the pipeline, or are they all still in place..
25-03-2013

For curiosity, I have tested jetty for the promoted builds from the past 6 months -- from /java/re, linux_amd64: b57: according to Stefan, this is the last version that did not have NPG b78: has a big regression. It's fixed somewhat in b82 but still we are much worse than b57 or b58. ========================================================================== Samples size stddev date b55 6 96644.67 27.18 09/06/2012 b56 6 97334.00 26.50 09/13/2012 b57 6 97326.67 73.13 09/20/2012 << Last w/o NPG b58 6 103212.00 70.70 09/27/2012 b59 6 103220.67 46.61 10/03/2012 b60 6 103187.33 102.68 10/11/2012 b61 6 100036.00 79.44 10/18/2012 b62 6 100020.67 78.77 10/25/2012 b63 6 100323.33 135.85 11/01/2012 b64 6 100297.33 76.72 11/08/2012 b65 6 101616.67 70.63 11/15/2012 b66 6 101445.33 71.63 11/29/2012 b67 6 101135.33 80.26 12/06/2012 b68 6 101596.00 89.66 12/13/2012 b69 6 101644.00 72.62 12/20/2012 b70 6 101716.00 106.19 12/27/2012 b71 6 101852.00 119.71 01/03/2013 b72 6 101844.67 144.12 01/10/2013 b73 6 102121.33 83.20 01/16/2013 b74 6 102473.33 61.43 01/24/2013 b75 6 101633.33 61.54 01/31/2013 b76 6 101760.00 65.73 02/07/2013 b77 6 101188.00 45.75 02/14/2013 b78 6 117358.67 154.79 02/21/2013 << big regression b79 6 117236.67 58.97 02/28/2013 b80 6 117454.00 198.63 03/07/2013 b81 6 117308.67 30.61 03/14/2013 b82 6 106822.00 280.24 03/21/2013 ==========================================================================
25-03-2013

The version of Jetty mentioned by Stefan's comment @ 2013-01-14 03:35 can be found here http://archive.eclipse.org/jetty/8.0.0.v20110901/dist/
23-03-2013

These are the current numbers I get when I run the jetty benchmark on 32-bit client HotSpot VMs: JVM Flags in all runs: -client -Xms128m || b57 || b81 || Extra JVM flags | 47004 | 47568 | -XX:-StackTraceInThrowable | 46796 | 46516 | | 40204 | 40140 | -Xint -XX:-StackTraceInThrowable | 39976 | 40404 | -Xint | 46968 | 47596 | -XX:-TieredCompilation -XX:-StackTraceInThrowable | 46804 | 47016 | -XX:-TieredCompilation | 47040 | 47332 | -XX:+TieredCompilation -XX:-StackTraceInThrowable | 47072 | 47292 | -XX:+TieredCompilation The runs have been done with the same JDK build (JDK8-b57) but with either the b57 HotSpot or the b81 HotSpot. The latest JDK shows a regression in it self, and by putting the HotSpot in b81, I don't have to see those regressions.
15-03-2013

I can reproduce this when running refworkload locally, with a refworkload.props properties file containing: --- j2se.java=/localhome/java/changed.jdk-8-ea-bin-b57/bin/java j2se.home=/localhome/java/changed.jdk-8-ea-bin-b57 j2se.options="<JVM flags>" benchmarks=footprint3_real applications=jetty resultsdir=results iterations=1 component=j2se footprint3.applications=jetty footprint3.debug footprint3.mode=elapsed jetty.home=/localhome/tests/jetty-distribution-8.0.0.v20110901 --- To run: sh run refworkload.props I had to download jetty-distribution-8.0.0.v20110901 manually.
15-03-2013

A fix for JDK-8008549 is currently out for review.
26-02-2013

The following was done in order to track down the bug: 1. Tracing calls to malloc and free using DTrace on Solaris. Found four frames that allocated more (or did not release as much) data compared to b57. - inflateInit2: 1 231 068 bytes - ConstantPool* ConstantPool::allocate(ClassLoader*, int, Thread*): 192 480 bytes - updatewindow: 163 840 bytes - void* GenericGrowableArray::raw_allocated(int): 38 796 bytes 2. We identified two of the leaks pretty quick: - ConstantPool* ConstantPool::allocate(ClassLoader*, int, Thread*): Each ConstantPool now has an additional Monitor - void* GenericGrowableArray::raw_allocated(int): Each Java thread now has an additional GrowableArray of 300 pointers 3. The stack traces for the leaks originating from inflateInit2 and updatewindow show that they both originated from the same code: java.util.zip.Inflater: inflateInit2: malloc inflateInit2_ Java_java_util_zip_Inflater_init ... (Java frames) updatewindow: malloc updatewindow inflate Java_java_util_zip_Inflater_inflateBytes ... (Java frames) 4. We identified that the reason for the memory regression was that the finalize method for java.util.zip.Inflate was not being run in hs25-b17 during young GCs. We could verify this by doing a full GC via "jcmd GC.run", which run the finalizers and the regression went away. 5. The finalize methods on java.util.zip.Inflate weren't executed because hs25-b17 marked many more of the java.util.zip.Inflate objects as live during a young GC. 6. After instrumenting the GC code to output data so that we could do a reverse pointer analysis (see which objects on the heap that referred to each other), we could see that a lot more java.util.zip.Inflate objects were being kept alive in hs25-b17 because they were being referenced by class loaders. 7. The class loaders were being kept alive since ClassLoaderData had been created for them, and a class loader with with ClassLoaderData will not die during a young GC. 8. The ClassLoaderData was created because a call to SystemDictionary::find(...) called SystemDictionary::register_loader(...) which unconditionally created ClassLoaderData for the class loader. Stefan has created (and verified) a fix for the problem in step 8, but we still have to discuss it some more to make sure that the fix will not introduce any unexpected behaviour.
19-02-2013

There is a comment above dated 2013-01-15 14:57 that talks about an 8-10mb footprint increase between 7u and 8. All but 2mb of this increase is due to a bug that currently is only in the profiles repo (although will probably end up in jdk8 within a couple of weeks). It is caused by JDK-8008290, and has also been reported and described in JDK-8008213. If in a couple of weeks you start seeing this problem in jdk8 builds, you can avoid it by using an older rt.jar.
15-02-2013

The test should (for now) be run with the following flags (embedded has asked to test with -Xmx128m and -client): -Xint -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xmx128m -XX:+UseSerialGC -XX:-StackTraceInThrowable -client
07-02-2013

There is one bug you have to fix in refworkload to be able to run jetty: 1. Open the file "applications/jetty/env" 2. On line 29, add a " as the last character of the line (the string is not closed)
07-02-2013

I've been testing: I've run the test with: http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot @ 3990 which fixes the annotation increase and removes one word per class. With this command line: -Xint -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xmn128m -Xmx256m -Xms256m -XX:-UseTLAB -XX:+UseSerialGC -XX:-StackTraceInThrowable I'm turning off the collection of backtraces since we know they require more memory in post-NPG. By doing that it will help us detect other differences. If I'm putting this HotSpot into the pre-NPG build JKD8-b57, the memory difference is less than 1%. However, running with a later JDK8 builds the difference is about 2.5%. So, that regression doesn't seem to be related to the NPG removal. So, without accounting for the extra memory usage introduced by the backtraces, I don't see any substantial memory regression anymore. Note, that we don't have any concrete plans to fix the backtrace extra memory usage. If you want me to investigate this any more, you'll have to provide concrete log files with memory usage numbers, jvm args, machine used and preferably GC logs. Thanks.
18-01-2013

Well, I get 23000 constMethods when running Jetty, which would give 184000 bytes extra This is my latest numbers: JDK8-b57 with pre-NPG hotspot: 89924 KB JDK8-b57 with latest hotspot-gc: 90172 KB JVM args: -Xint -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xmn128m -Xmx256m -Xms256m -XX:-UseTLAB -XX:+UseSerialGC -XX:-StackTraceInThrowable So, it does explain some of the difference I'm seeing.
18-01-2013

NM. There isn't the extra word in Array<> objects, just ConstMethod objects.
18-01-2013

I just noticed that the types inherited from MetaspaceObj have a null first word in their types for no good reason. There's a comment in allocation.hpp that this can happen with ValueObj but frankly it's 2013 and I didn't believe it. All the Array<> types are inherited from MetaspaceObj. This could be a big waste of space. MetaspaceObj is supposed to encapsulate the metaspace new and delete operators, but I think these are best moved to the subtypes that are inherited from MetaspaceObj. Maybe a macro like VALUE_OBJ_SPEC would be good. The types like ConstMethod cannot be inherited from Metadata because they cannot have vtables. vtables are patched in CDS and ConstMethod is in the read-only section. So are the Array<>. Stupid gcc compiler.
18-01-2013

Hi Stefan, You said "I've been comparing b57 (pre NPG) vs b71, and I don't see a 10% regression anymore. However, I see a Rss difference of about 2MB, with my flags." Since you are seeing the footprint increase, I wasn't sure what the 10% reference was to, so I assumed it was performance. Yes, we can use a different bug to address any increases that may have occurred since NPG. I just thought it important to note that what Roman is currently seeing in his measurements with a current hotspot is much bigger than what was measured by Gary at the time this bug was filed. You might see the same when you are investigating. Roman and Gary are going to look into this a bit more to get a better idea of when the extra increase was introduced, and make sure the way Jetty was run by Gary for the filing of this bug is the same as the way it is being run by Roman for embedded testing. I'll send links via email. I don't think internal wiki links belong in CRs.
16-01-2013

Chris, I didn't mention anything about a _performance_ regression. Could you please add links with results showing the memory increases. Note that the footprint tool is extremely sensitive to when the measurement is taken. If you shift the point at when a GC occurs the numbers will be completely different. I'll run with the latest build to see if I can reproduce what you are seeing. All, Can we please only discuss the memory regressions introduced by the permgen removal, and create other CRs for other changes?
16-01-2013

The same behavior for the embedded builds (compared: JDK7u6-b16 client and JRE8-b36 client, JDK7u6-b16 client and JRE8-Compact1 minimal)
16-01-2013

After having looking at the latest SQE measurements for Jetty, comparing 7u to the latest 8, I see that (on all platforms) the 7u footprint comes in close to the 25mb figure given in the description of this bug for the pre-NPG measurement done with b57. The current HS measurements are all 8-10mb bigger than this 25mb value depending on the platform. It used to be 3mb bigger immediately post NPG (b58). So this points to additional recent changes adding to our footprint problems.
15-01-2013

I'm not sure what you meant by "I don't see a 10% regression anymore". 10% was the footprint increase, not a performance regression. The increase was about 3mb, and it seems you are accounting for most of that. However... Our latest measurements are showing an 8-10mb increase over 7u (I believe 7u6). I'm not sure if we've compared to 7u before. I know haven't compared the latest HS to b58. This could be due to baggage added before NPG, or possibly even more footprint increase added recently.
15-01-2013

Part of this regression comes from the identified issue in JDK-8005994.
14-01-2013

I've been able to get stable results by running with JVM args: j2se.options="-Xint\ -XX:+PrintGCDetails\ -XX:+PrintHeapAtGC\ -Xmn128m\ -Xmx256m\ -Xms256m\ -XX:NativeMemoryTracking=detail\ -XX:-UseTLAB\ -XX:+UseSerialGC" I've been comparing b57 (pre NPG) vs b71, and I don't see a 10% regression anymore. However, I see a Rss difference of about 2MB, with my flags. The most prominent difference between these runs is the amount of objects allocated in the young gen. There's a couple of reasons for that: 1) 1.5M - The Throwable backtraces are bigger post NPG. The reason for this is that we don't compress the Method pointers and we've added the mirrors arrays. This could be found by identifying the allocated long arrays ([J) of size 34 words, that wasn't present in the pre NPG runs. See the NPG code: typeArrayOop methods = oopFactory::new_metaDataArray(trace_chunk_size, CHECK); // trace_chunk_size = 32, [J header size: 2 2) 64K - What we previously called the System Object Arrays ([[I) and allocated in the permgen, is now allocated as ordinary Java Object arrays in the young gen. 3) 32K - java.lang.Class objects are now 2 words bigger. Because we don't compress the three metadata fields anymore. We have around 2000 classes during the run. 4) 32K - The InstanceKlass::_init_lock. Allocates 2 extra words per loaded class. Another significant memory regression comes from bug that Ioi Lam recently found. We unnecessarily allocate arrays of AnnotationArrays* when the class don't have annotations. Fixing that reduced the memory overhead with around 0.5M.
14-01-2013

Mail thread on this CR
09-01-2013

I believe Jon M is tackling footprint tuning for NPG.
06-11-2012