JDK-8175116 : jtreg agentvms uses more virtual address space in langtool/test :tier1 runs
  • Type: Bug
  • Component: tools
  • Sub-Component: javac
  • Affected Version: 9,10
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2017-02-17
  • Updated: 2019-09-13
  • Resolved: 2017-03-31
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10 JDK 9
10Fixed 9 b164Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
The changes made under JDK-8166002 emulated the old client build in the win 32 environment with reduced address space. Before that change was made, several very stable langtools regression tests had high frequency intermittent failures, requiring updates to the tests to pass reliably again (JDK-8168499, JDK-8163999, JDK-8165193).

After JDK-8166002 was integrated, the test updates were reverted (JDK-8173305) and the failures were seen again, requiring the test updates to be reapplied (JDK-8174059).

Therefore, at least in some cases, the previous virtual address space behavior of the client compiler is not fully replicated.

From analysis, it was concluded that this is not emulated client issue.
This behavior is observed from 9b113, JDK-8152733 integration.
Comments
Fix approved
30-03-2017

Fix Request: Removing unnecessary code that is causing objects linger in memory. The risk is low. Tests are passing with this change. Proposed patch (by Claes Redestad, reviewed by Jan Lahoda): https://bugs.openjdk.java.net/secure/attachment/67384/8175116
29-03-2017

Proposed patch (by Claes Redestad) attached.
29-03-2017

[~redestad] Thanks for your explanation. I agree that the call of ensureReadable should be removed, but I still see it as something of a "canary in the coal mine" indicator that a seemingly innocuous call like ensureReadable can cause such big undocumented side effects. Now, we can separately discuss whether that is a problem in the immediate implementation of ensureReadable or a problem deeper down in the responsiveness of GC, but I still think there's an issue there.
21-03-2017

[~jlahoda] I agree with your general approach to look at the Big Picture, before doing too many point fixes that workaround the problem. I would add an item (d) to your list. d) investigate whether it is possible to proactively close the URLClassLoader early (i.e. as soon as possible) if it determined that there are no processors to be found and run. In other words, from an AP point of view there are two distinct issues in this general thread: 1. opening processor class loaders "unnecessarily" when we know they will not be required ... this is a simple performance issue. 2. not closing processor class loaders in a timely fashion once they have been opened ... this leads to a GC problem.
21-03-2017

To see if the changes suggested fix the precise problem that was observed, annotation should be disabled for the set of tests (as has already been done for TreePosTest) and then the /othervm workaround in the test tags should be removed.
21-03-2017

I was looking at CheckAttributedTree, and the reason why it works is that it will actually initialize the annotation processing machinery only once, i.e. create the URLClassLoader only once, avoiding this problem. This is not quite right, but OK for most tests. I note that this is similar to adding -proc:none to this test, which was done. What I think should be done: a) apply Claes' patch (as javac indeed seems to be doing something it does not need to do). But, this does not solve the real problem of not releasing the resources in time, so in principle some other change soon can break this again. b) add (or at least very serious consider adding) JavacTask.close, to be called when the task is not needed anymore, and use it where appropriate. c) convert tests to ComboTestHelper (or use "-proc:none") where appropriate (this is mostly orthogonal to the above, and do some degree independent of this bug). The main question is when to do these - I don't think we can realistically do b) for 9, but I think we should try with a). One question, though: do the tests fail now? If not, then nothing really wrong will happen if a) is in some 9 update, right? Comments?
21-03-2017

With no weak references to it, then once an object becomes garbage it can be GCd more or less immediately; a WeakReference'd object however, once there's no longer any strong reference to it, will first have to be processed by the GC and put on a reference queue, which then needs to be processed (in java; for this specific instance see java.lang.reflect.WeakPairMap) before it can ultimately be GCd on the next cycle or STW pause - this means there's an added latency in how promptly each object is finalized, sometimes many times longer than for a corresponding strongly referenced object. The evidence suggests this was what happened here, and that it's enough to blow up the amount of native memory retained at any particular time in a test like TreePosTest by a significant factor.
20-03-2017

Can you explain why weak references have such an adverse effect on GC. In general, a call of ensureReadable seems very innocuous, and so it seems "wrong" that it should have such an impact. I agree that it is unnecessary in this situation, because we've given reflective code readability for free, but are we really going to tell people that they shouldn't use ensureReadable because it might clog up garbage collection?
20-03-2017

[~jjg], not sure which GC characteristics you're referring to, but the significant regression in GC characteristics I'm seeing is directly induced by the weak references added to Module::reflectivelyReads by ensureReadable: Remove ensureReadable and GC behavior normalizes to a pattern that's actually somewhat better than on 8 (since actual GC changes seem to improve how promptly we clear things up). There are of course other changes playing along here, e.g., some of the native memory JarFile allocates can't be free'd eagerly while the JarFile is still in use, but since a lot of the handling in this area has moved from native to java it's not clear if there's even been a net increase in retained native memory per JarFile compared to 8.
20-03-2017

@Maurizio, the problem is not so much closing the file manager as not creating unnecessary URLClassLoaders to see if anno processing is required. Or if they are created, not closing them.
20-03-2017

While getting rid of an arguably minor product regression is important, let's stop barking up BaseFileManager ensureReadable for a second and not forget there is a significant regression in the GC characteristics, such that we are now having to visit code and work around the degraded GC performance. Yes, we can make our tests go faster and not generate so much garbage in the first place, but I regard this issue is symptomatic of what other users might experience in real world use.
20-03-2017

While optimizing test resource usage to improve stability is important, let's stop barking up the TreePosTest for a second and not forget that at the root here there is an actual (but arguably minor) product regression that we should fix at the earliest convenience. Perhaps split this into subtasks - one to examine getting rid of ensureReadable (to be deferred to 10 at the langtools team's discretion), and another to go over relevant tests to ensure sufficient resource cleanup?
20-03-2017

Similar tests executing many invocations of the compiler now use the so called 'combo tests' API. This API attempts to clean up after the test by closing the (shared) file manager. An example of a test that runs javac on every source, but using the combo test api is: test/tools/javac/failover/CheckAttributedTree.java If that test has better memory charateristics, perhaps a short-term fix would be to convert TreePosTest to use the combo test API?
20-03-2017

as stated before, with Claes's javac fix, maximum virtual space usage for agentvm is 1 GB (with Xmx 768M) without fix agentvms reach 2GB limit easily.
17-03-2017

I agree with Jon that the main problem here is that javac invocations through JavacTask that don't call generate() won't cleanup the resources after themselves (because the compilation can continue at any time until the task is GCed). I think we should make javacTask Closeable and start to close the task. I don't think , however, that's something we could do for 9. So, while I think we should remove the ensureReadable, we might get even better results with using -proc:none on critical places (as that avoids creating the garbage altogether rather than simplifying GC of it).
17-03-2017

[~jjg] there are more than one test issue in javac i believe!. fixing TreePosTest alone doesn't fully solve the problem.
17-03-2017

Most uses of javac perform full compilations, and a full compilation will ensure resources are closed. This test is unusual for performing thousands of partial compilations, just parsing a source file in each case, in order to perform some validation on the AST. There are various derivative improvements the could be made here. It might be possible to automatically close the classloader earlier if it is not being used. It might be desirable to make the JavacTask Closeable and instruct folk to .close() it, possibly in a try-with-resources, to hide the specific details of what needs to be closed internally.
17-03-2017

It seems that disabling annotation processing for the JavacTasks indeed does get rid of all the URLClassLoaders - so that should avoid test issues. (To diagnose I'm simply adding a System.gc() at the end of the test and inspect heap dumps generated with -XX:+HeapDumpBeforeFullGC) I do still think it's important to examine and possibly remove ensureReadable (at the very least document when and for whom it's useful), but it might very well be the case that this test is/was an extreme corner case and that for any real-world usage of javac it's not really an issue.
17-03-2017

JDK-8176900 has now been fixed. That should fix the gross leak of classloaders in TreePosTest. We should probably still fix the ensureReadable call if we believe it is no longer required, but I'd like to see if the TreePosTest fix fully solves the general issue first.
16-03-2017

I have a suspicion I know what the problem is, and how to fix it, in TreePosTest at least. I can do some ad hoc testing of my own, but what would folk recommend as the best way to verify the behavior has been fixed. i.e. how are you observing the problem now.
16-03-2017

I will investigate TreePosTest.
16-03-2017

I agree with Claes' comment. I mistakenly thought that this must be a jtreg bug too, until the extent of the scope of TreePosTest was fully understood. This bug should probably move back to tools/javac.
16-03-2017

For comparison purposes I ran TreePosTest on 8 and the same pattern exists there: lots of jtreg jar files are being opened, around 11k JarFile objects are still alive when I take a heap dump (it's possible that G1 is better at clearing these out since it can do it concurrently). So we can at least conclude that this is not new behavior. The earlier comments about jtreg itself creating too many class loaders can likely be ignored: this was still me guessing at what was happening from looking at a heap dump of what I thought was many jtreg tests run back to back, but when in reality it was mostly the result of running TreePosTest.
16-03-2017

OK, I apologize for coming somewhat late to this thread. While removing the now-unnecessary ensureReadable call may provide a workaround to the problem, I would like to understand more of root causes. I've seen comments made about jtreg creating too many class loaders, causing jtreg jar files to be opened too much. That sounds really strange, because the jtreg jar files should only be opened once for the agent, and then child loaders should be created for the test classes only, without causing javatest.jar and jtreg.jar to be reopened. If we are seeing jtreg jar files being opened thousands of times, that is a bug that should be filed and fixed. I've also seen comments about the javac test TreePosTest using too many resources. If it is opening resources that should be closed, that is a bug that should be filed and fixed.
16-03-2017

Jamsheed writes: "with patch maximum virtual space usage : 1 GB (with Xmx 768M) without patch agentvms reach 2GB limit easily." I think this verifies that the root cause for the increased native memory retention is due to the use of BaseFileManager::ensureReadable that was added with the initial jigsaw integration.
16-03-2017

Sure, I imagine the ideal solution would be to have a means to explicitly close out all resources associated with each compilation as they complete, but I'm not sure this is possible with the current API? However, if it's the case that the ensureReadable method in BaseFileManager is not really needed, it might be a sufficient fix to avoid this resource exhaustion issue seen here. To summarize: The root problem is the reliance in the test on GC to clear up the class loaders and related native resources (such as jar files) as we progress through the test, and it seems a perfect storm of things (jigsaw, JDK-8152733, ..) has led to these tests failing. Experimentally removing ensureReadable gives empirical evidence that the read edges added to Module::reflectivelyReads from the module of the file manager to each respective unnamed modules are keeping the latter alive somewhat longer than otherwise (they do get finalized and collected on a full GC, whereas if there's no weak reference they'll not even survive young/concurrent GC cycles).
15-03-2017

@Claes, it would be reasonable to review TreePosTest to ensure that it does not require monotonically increasing resources. Each iteration within the test should be self-contained, and should allow its resources to be reclaimed when completed. You are addressing the wrong problem by looking to adjust BaseFileManager. That's like licking the icing off the cake but still leaves the calorific insides.
15-03-2017

So it seems tools/javac/tree/TreePosTest.java is compiling every test in langtools/test - seems like a steadily increasing burden! As every compilation unit creates a URLClassLoader, opens every jar on the jtreg classpath as it runs and no-one is there to explicitly close things out as the compilation task is done, we accumulate native memory and lean on the GC to finalize the class loaders and close the JarFiles. Since the initial jigsaw implementation in 9+111, the URLClassLoader and its respective unnamed module are kept alive for a longer time due to BaseFileManager::getClassLoader reflectively adding read edges from BaseFileManager's module (jdk.compiler) to each unnamed module. I've checked with Alan, and we both believe this is unnecessary and should be removed. By experimentally removing the ensureReadable call around the URLClassLoader creation at line 194 in BaseFileManager.java we avoid adding any unnecessary(?) weak references to each unnamed module, which means the class loaders and the JarFiles they are holding open are GCd and finalized quicker (when I collect a heap dump using -XX:+HeapDumpBeforeFullGC the number of live JarFiles has dropped from 25k+ to 1800). I believe this should avoid this issue entirely, and might also improve performance of javac somewhat when compiling many small tasks in a single process.
15-03-2017

It might be worth giving this patch a try (only the changes in Inflater.c and zip_util.c) http://cr.openjdk.java.net/%7Eakasko/jdk9/8133206/webrev.00/ it is supposed to reduce the native memory usage for the decompression, which might help the memory balance in this use scenario. This is for https://bugs.openjdk.java.net/browse/JDK-8133206
14-03-2017

[~jcm] thanks, that's helpful!
14-03-2017

[~redestad] running single test tools/javac/tree/TreePosTest.java shows many JarFile(~8000+) instances created
14-03-2017

jtreg opens 4 jar files (jtreg.jar, javatest.jar, asmtools.jar, jh.jar) and in one heap dump the indication is that each of these have been opened more than 6000 times, indicating these instances are simply the result of running a large amount of tests in a single agent VM. Are you sure a similar pattern isn't seen in other test suites, and that this is simply a case of running too many tests in a single agent with no cleanup guarantees? There's some added heap overhead due to the fact that there are java.lang.reflect.Module::reflectivelyReads mappings from jdk.compiler to each unnamed module (one per URLClassLoader), but these are weak references which shouldn't affect lifecycles (much) and although there's no indication of a leak here it's not obvious what's adding these mappings. Another possibly related find is that JarFileSystem.isMultiReleaseJar is inefficiently implemented and possibly incorrect (only checks for existence of the Multi-Release attribute, ignoring its value). I think it should share checking logic with JarFile, which would bypass Manifest creation as a side-effect. The current impl means that any use of JarFileSystem eagerly creates (and throws away) java.util.jar.Manifest objects since 9-b100 (JDK-8144355), which might be cause for some overhead since there are explicit uses of JarFileSystem in certain cases (there are 12k java.util.jar.Manifest instances in a heap with 24K JarFiles).
13-03-2017

may be langtools/test are creating many JarFile instances, but don't close as part of testing? growth is seen mainly in javac tests.
13-03-2017

The method URLClassLoader.close was only added in JDK 7. I don't know offhand if jtreg has a direct or reflective call to that method as it would not have always been present in the range of releases jtreg has covered.
10-03-2017

From /scratch/HeapDumps-8175116/HighMem it can be seen that there are several thousand URLClassLoader instances that retain JarLoaders/JarFiles to several jtreg jar files. Since the move of ZIP CEN into java code ZipFile$Source (zsrc) can be seen to retain several hundreds of K's, depending on the actual jar from jtreg. It can also be seen that the JarFile has not been closed. It appears that jtreg is not closing the URLClassLoader instance, even though it is no longer reachable. Eagerly closing out the URLClassLoader will help GC.
10-03-2017

@Chris, the tests in question were changed back to use /othervm which is a workaround for the failures. In other words, the tests for the last few days are not configured to trigger the problematic condition.
09-03-2017

As of JDK 9 b158, JDK-8173140, the zlib library has been upgraded to 1.2.11. So more recent changes that could potentially affect memory usage ( some specific zlib changes can be see at http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/b035f72cddfc#l34.7 ) It is probably worth doing a fresh analysis to memory usage.
09-03-2017

Does the "virtual address space" include the memory allocation at C level via m/alloc? The j.u.z.ZipFile change (JDK-8145260) indeed has the side effect of java heap usage "increase". In old implementation the zip/jar ZIP file is handled and held at native level, all those zip/jar tables and entries are held in native memory via (m)alloc/free, j.u.z.ZipFile accesses those data/tables via jni. The new implementation moved everything into Java with the consequence of holding all those tables/entries in java heap, which becomes "visible" and are counted as part of the java heap memory usage. I guess this might be the trigger of "uses more virtual address space ..." if the test case holds hundreds of thousands of zip/jar files opened during testing/compiling. ZipFile$ZipFileInputStream::close() no longer needs to free any native memory and itself does not hold any real memory chunk, so no need for explicit "freeing" There is a reported "native memory usage" increase in the zip/inflator area triggered by the zlib 1.2.5->1.2.8 upgrade (JDK-8133206, there is a workaround, but not applied yet), but it got into jdk9 much earlier.
07-03-2017

[~redestad] Thank you, this is not emulated client only issue, i have updated the bug description now. [~psandoz] as i commented earlier there were ~50k jtreg(asmtools.jar,jtreg.jar,jh..) java.util.jar.JarFile instances in Java Heap, and 26k+ zip!zcalloc allocations (each allocation with size ~32k) (contributing most of heap usage)
07-03-2017

Ok, sorry I interpreted this as if the emulated settings had been applied to 9-b113 etc.. JDK-8152733 addressed an issue that came into 9-b108 and while it sure looks like a smoking gun I think it only accidentally triggers this issue due to changing the GC pressure and always reading the manifest JarEntry (which, since it is guarded by a try-with-resources, superficially seems fine). A much larger rewrite of JarFile/ZipFile was introduced in 9-b99 (JDK-8145260), which replaced the native Zip implementation with a java-based counterpart, and tracing the steps it seems there's no equivalent of the explicit freeing that was previously done in ZipFile$ZipFileInputStream::close(), which means the memory touched by input streams will be held on until the JarFile is GCd and then finalized. [~sherman], can you take a look at this?
07-03-2017

I too am skeptical that the cause is directly related to JDK-8152733, which is associated with efficient parsing of the jar manifest, it should not impact the processing of jar files beyond that i.e. the memory impact should not be that extensive (although i do not know how many jars are loaded by the test). Need more specific data on Java object allocations and GC rates.
06-03-2017

JDK-8152733 resolved various ~10-15% startup and footprint regressions by avoiding creation of the jar Manifests to determine if a jar is a multi-release jar. It seems highly unlikely that this has any direct relation to the issues you see here, but my guess is this may interplay indirectly since the JarFile/JarEntry code before JDK-8152733 did *a lot* more java heap allocations on jar loads, which could have a marked impact on the rate of GC events, which would then mean that java objects which are holding on to native memory in one way or another will get cleared quick enough (https://bugs.openjdk.java.net/browse/JDK-8026710 comes to mind).
03-03-2017

On 3/2/2017 9:59 PM, Vladimir Kozlov wrote: > Thanks. It means the only thing left are changes in JDK then. > > Vladimir > > On 3/2/17 8:16 AM, Jamsheed C m wrote: >> normally i download src too with test jdk ( from >> jre.us.oracle.com/java/re/jdk/9/promoted/ea/). but as many promotion >> don't have any changes in langtool i reused 9b116 langtool/test for >> these runs. >> >> so answer is all 9b111, 9b112, 9b113 jdks are tested with 9b116 >> langtool/test >> Best Regards, >> Jamsheed >> >>> >>> I attached report mail which lists changes in b113. >>> >>> Thanks, >>> Vladimir >>> >>> On 3/2/17 7:45 AM, Jamsheed C m wrote: >>>> Vladimir, Thank you for the input! >>>> >>>> some more observations, >>>> >>>> 9b111, 9 b112 - issue not observed >>>> >>>> 9b113 issue is observed. >>>> >>>> There is no much changes went in 9b113, no hotspot changes in this build >>>> >>>> Best Regards, >>>> >>>> Jamsheed
03-03-2017

It was JDK-8152733 : Avoid creating Manifest when checking for Multi-Release attribute. I am moving this bug to core libs.
03-03-2017

this behavior is seen after changeset http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/faf6d930aef4 back-out this changeset shows normal memory usage. [~redestad] Claes, could you please have a look at this issue?
03-03-2017

no i believe, the main catch is, problem is observed in 9b125 client VM as well. finalization queue is empty when problem occur, full gc didn't happen for quite sometime. when i trigger full GC externally, all native allocation get freed up. the problem was always there it seems, just that it started pressing the limit recently.
01-03-2017

Are there significant differences in the GC/finalization implementation in the old C1 win32 compiler configuration and the current approximate C1 configuration now being used?
01-03-2017

can this be a JTREG issue, as i see there are ~50k jtreg(asmtools,jtreg,jh..) java.util.jar.JarFile instances and many are in closed state.?
01-03-2017

We may hit OOM because a lot more jtreg tests are running at the same VM (not using @run main/othervm) and memory is not cleaned up as fast as before. Note, Jamsheed ran tests with Client VM (from b125) and it still hit OOM - so it is nothing to do with JIT or JVM.
28-02-2017

Heap stats when process was using ~2G virtual space. ************************************************************************************************************************ NT HEAP STATS BELOW ************************************************************************************************************************ NtGlobalFlag enables following debugging aids for new heaps: stack back traces LFH Key : 0x01528fcc Termination on corruption : ENABLED Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast (k) (k) (k) (k) length blocks cont. heap ----------------------------------------------------------------------------- 03710000 08000002 1074012 912340 1073588 13439 1476 265 0 575 LFH Virtual address fragmentation 15 % (265 uncommited ranges) 00a70000 08001002 484 64 60 23 4 1 0 0 LFH ----------------------------------------------------------------------------- 21k+ zip!zcalloc allocations (each allocation with size ~32k) (contributing most of heap usage) native call stack :7143ed63: msvcr120!malloc+0x00000049 71a8986f: zip!zcalloc+0x00000010 i issued a full GC using jcmd to the process, creates a fully fragmented heap with below stats ************************************************************************************************************************ NT HEAP STATS BELOW ************************************************************************************************************************ NtGlobalFlag enables following debugging aids for new heaps: stack back traces LFH Key : 0x01528fcc Termination on corruption : ENABLED Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast (k) (k) (k) (k) length blocks cont. heap ----------------------------------------------------------------------------- 03710000 08000002 1074012 204436 1073588 141326 1259 505 0 576 LFH External fragmentation 69 % (1259 free blocks) Virtual address fragmentation 80 % (505 uncommited ranges) 00a70000 08001002 484 64 60 23 4 1 0 0 LFH ----------------------------------------------------------------------------- almost all zcalloc gets freedup 7143ed63: msvcr120!malloc+0x00000049 71a8986f: zip!zcalloc+0x00000010 no virtual space reclaim(in 64k alloc granularity terms)
25-02-2017

[~darcy] Joe Darcy, from above stats it is clear that most of heap is held by zip!zcalloc, even cleanup don't reclaim virtual space due to high fragmentation. should it be analyzed by some other team ?
24-02-2017

The test is question ran extremely reliably for years on win32 and other platforms until the client VM was replaced on win32. The problem persists if the test is run on the modified replacement VM config. I don't think it is reasonable to just dismiss this situation as a test bug.
24-02-2017

i see zip zip!zcalloc using most of the memory, can this be closed as TestBug ?
24-02-2017

if vm alone allocating memory in these test runs, this could be a memory leak in vm. had a look at NMT logs, no leak in NMT tracked memory! NMT doesn't explain all the allocation that happens in agentvm in failure scenario. now, checking who is allocating and holding so much of memory!
22-02-2017

[~kvn] Vladimir, // You said you reproduced this problem with 9b125 client build. Do you mean you tested it with Client VM? // yes i tested with Client VM 9b125 EA bits( log snippet in next comment) //And do agentvm run also in Client emulation mode? May be they use flags which cancel emulation mode? // Yes, it runs in client emulation mode (verified from logs)
22-02-2017

[~jcm] Jamsheed, please clarify. You said you reproduced this problem with 9b125 client build. Do you mean you tested it with Client VM? And do agentvm run also in Client emulation mode? May be they use flags which cancel emulation mode?
21-02-2017

changing the title as this is not emulated client only issue
21-02-2017

i could reproduce this issue with 9b125 client build, i closely monitored mem usage of agentvms, and 1-2 agentvms goes till 2GB in most runs! and this seems to be a test config issue, than more virtual address space usage issue.
20-02-2017

I'd recommend we give this issue the same priority as that of the original emulated client behavior change (JDK-8166002): Initial ILW = Crash due to OOM in compiler, rare/win32 only, no workaround = HLH = P2
17-02-2017

Thank you, Joe. We will look on data from JDK-8174059.
17-02-2017

Joe, can you give us an example we can work with?
17-02-2017