JDK-8059066 : CardTableModRefBS might commit the same page twice
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 8u40,9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_11
  • CPU: sparc
  • Submitted: 2014-09-24
  • Updated: 2015-09-27
  • Resolved: 2014-12-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b44Fixed
Related Reports
Relates :  
Relates :  
Description
#section:main
----------messages:(3/315)----------
command: main -Xmx8m -XX:-CMSYield -XX:-CMSPrecleanRefLists1 -XX:CMSInitiatingOccupancyFraction=0 BubbleUpRef 16000 50 10000
reason: User specified action: run main/othervm -Xmx8m -XX:-CMSYield -XX:-CMSPrecleanRefLists1 -XX:CMSInitiatingOccupancyFraction=0 BubbleUpRef 16000 50 10000 
elapsed time (seconds): 3.988
----------System.out:(20/1031)----------
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/virtualMemoryTracker.cpp:87
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/opt/jprt/T/P1/125225.staffan/s/hotspot/src/share/vm/services/virtualMemoryTracker.cpp:87), pid=2361, tid=2
#  assert(rgn->contain_region(addr, size)) failed: Must cover this region
#
# JRE version:  (9.0) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.9.0-internal-fastdebug-201409231252.staffan.jdk9-hs-rt-b00 mixed mode solaris-sparc compressed oops)
# Core dump written. Default location: /export/local/aurora/sandbox/results/workDir/closed/gc/4950157/BubbleUpRef/core or core.2361
#
# An error report file with more information is saved as:
# /export/local/aurora/sandbox/results/workDir/closed/gc/4950157/BubbleUpRef/hs_err_pid2361.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#
Current thread is 2
Dumping core ...

Comments
I have to check but I think NMT only asserts if the regions overlap but don't have the same bounds. I think we still want to assert for this because it seems like an error.
15-12-2014

Coleen: Since this is a GC bug (the problem is in the card table) the review request was sent to hotspot-gc-dev. Yes, the fix fixed the assertion for this particular case (the card table committing the same page twice). However, the assert about pages being committed twice is still present in the NMT code. Regarding the impact, the failure reported in this case *only* affects tests because the assert is too strict (for every OS we support, committing the same page twice is a no-op). Even though the assert is too strict, I still believe it is valuable because if we are committing the same page twice, then we most likely don't know what we are doing :) Of course, if we add code that utilizes the fact that committing the same page twice is a no-op, then we need to remove this assert.
15-12-2014

ILW HLM = P3 I: High, Asserts L: Low, the user must run with a *very* small Java heap (4 or 8 MB) and at use NMT. NMT will only assert in a fastdebug build, I don't know what NMT would report in a product build. W: Medium, disable NMT or increase the heap to 12MB (8 MB for all non-SPARC CPUs).
12-12-2014

I didn't see the code review. Did the push fix the NMT assertion? I assumed it would fix that. I think the impact is not low and this is not a P5. We need to be able to test the new NMT rewrite with confidence and not have this assert. This assert also happens if some other things go wrong also, so the harm to testing this feature is bad. Also, an assert (crash) should never be low impact in my opinion.
12-12-2014

Since this has been pushed, the ILW can re-evaluated: - Impact: Low, the failure reported by NMT will is benign, it will not have any impact on the production JVM. The only impact for this particular scenario is that NTM asserts (therefore nightly testing is affected). - Likelihood: Low, the user must run with a *very* small Java heap (4 or 8 MB) and at use NMT. NMT will only assert in a fastdebug build, I don't know what NMT would report in a product build. - Workaround: Medium: disable NMT or increase the heap to 12MB (8 MB for all non-SPARC CPUs). ILW = LLM => P5
12-12-2014

It was -Xprintflags weirdness that made me think -XX:-UseLargePages didn't turn off. nvm.
10-10-2014

The bug happens whenever the heap gets set to 8 MB and the smallest page size on the system is larger than 4 kB. Now, if you are running on a system where the JVM selects 2 MB as the large page size, you will end up with a heap of 8 MB if you are running with -XX:+UseLargePages. If the machine you are running on also has 8 kB as the minimum page size, you will run into the bug. It does not matter whether you use large pages or not, the only thing that matters is if the JVM decides to create an 8 MB heap and the system has a minimum page size larger than 4 kB. Compare this to the machine I was running on, the JVM picked 4 MB large pages on that machine. If I used -XX:+UseLargePages on that particular machine, I would get a 16 MB heap and the issue does not occur. Large pages should be turned off when using -XX:-UseLargePages, even on Solaris, please file a bug (including steps to reproduce) if it doesn't.
08-10-2014

It's the opposite, right? Only happens with -XX:+UseLargePages (which is default on this solaris machine I used, and seemed to not turn off with -XX:-UseLargePages)
07-10-2014

The reason it only happens with -XX:-UseLargePages is because with -XX:+UseLargePages the VM will use 4MB pages for the heap and we end up with 16MB heap, even though we specified -Xmx8m.
07-10-2014

The issue is that the card table, for some reason, with -Xmx8m specified, commits overlapping memory (as Coleen stated): -bash-4.1$ bin/jdk9-b27-patch/bin/java -XX:NativeMemoryTracking=detail -XX:-UseLargePages -Xmx8m -version Committing 0xffffffff7da02000 with size 8192 Committing 0xffffffff7da00000 with size 16384 java version "1.9.0-internal-fastdebug" Java(TM) SE Runtime Environment (build 1.9.0-internal-fastdebug-201410071142.ehelin.hs-gc-b00) Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-internal-fastdebug-201410071142.ehelin.hs-gc-b00, mixed mode) As can be seen above, the bug is still there in jdk9-b27, but, running with jdk9-b28, it fails: -bash-4.1$ bin/jdk9-b28/jdk1.9.0/fastdebug/bin/java -XX:NativeMemoryTracking=detail -XX:-UseLargePages -Xmx8m -version # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/virtualMemoryTracker.cpp:87 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/repool/java_re/builds/workspace/9-2-build-solaris-sparcv9/jdk9/1220/hotspot/src/share/vm/services/virtualMemoryTracker.cpp:87), pid=27883, tid=2 # assert(rgn->contain_region(addr, size)) failed: Must cover this region This is most likely (still have to confirm) because NMT 2.0 was committed between b27 and b28, see JDK-8046598. NMT 2.0 simply discovers the bug, it does not cause it (neither does the previously linked change). This bug seems to have been around for quite some time... The issue only happens with -Xmx8m, it works fine with -Xmx4, -Xmx16m etc. The bug probably has to do with CardTableModRefBS doing some align_size_up with os::vm_page_size. The bug only happens with -XX:-UseLargePages, -Xmx8m and when running on a CPU where the smallest page size is 8k.
07-10-2014

This change linked cause these assertions. I don't know this code or why. The NMT code tries to track the committed region within a reserved region and there's a committed region already + 0x2000 that overlaps with the one it's trying to commit. committed region 0xffffffff7d400000 0x4000 already commit 0xffffffff7d402000 0x4000 I don't know if NMT should handle overlapping commits or if the GC code is in error for having these overlapping commits (only with solaris with large pages). I'm going to reassign to GC. Aside, it seem that -XX:-UseLargePages on solaris with -Xprintflags claims that large pages are still on.
01-10-2014

This may be an interaction of the code cache change and NMT. No, maybe not. Definitely not. It reproduces on solaris/sparc with: java -XX:NativeMemoryTracking=detail -Xmx8m -version
25-09-2014