JDK-8170812 : Metaspace corruption caused by incorrect memory size for MethodCounters
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: linux
  • CPU: x86
  • Submitted: 2016-12-06
  • Updated: 2017-04-13
  • Resolved: 2017-04-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10 JDK 9
10Fixed 9 b165Fixed
Related Reports
Blocks :  
Description
/home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/jdk/bin/jlink -J-XX:+UseSerialGC -J-Xms32M -J-Xmx512M -J-XX:TieredStopAtLevel=1 --module-path /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/images/jmods --endian little --release-info /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/jdk/release --order-resources=**module-info.class,/java.base/java/**,/java.base/jdk/**,/java.base/sun/**,/java.base/com/**,/jdk.localedata/**   --add-modules java.activation,java.annotations.common,java.base,java.compact1,java.compact2,java.compact3,java.compiler,java.corba,java.datatransfer,java.desktop,java.httpclient,java.instrument,java.logging,java.management,java.naming,java.prefs,java.rmi,java.scripting,java.se,java.se.ee,java.security.jgss,java.security.sasl,java.smartcardio,java.sql,java.sql.rowset,java.transaction,java.xml,java.xml.bind,java.xml.crypto,java.xml.ws,jdk.accessibility,jdk.attach,jdk.charsets,jdk.compiler,jdk.crypto.ec,jdk.crypto.pkcs11,jdk.desktop,jdk.dynalink,jdk.editpad,jdk.httpserver,jdk.internal.ed,jdk.internal.le,jdk.internal.opt,jdk.jartool,jdk.javadoc,jdk.jcmd,jdk.jconsole,jdk.jdeps,jdk.jdi,jdk.jdwp.agent,jdk.jlink,jdk.jshell,jdk.jsobject,jdk.jstatd,jdk.jvmstat,jdk.localedata,jdk.management,jdk.naming.dns,jdk.naming.rmi,jdk.net,jdk.pack200,jdk.policytool,jdk.rmic,jdk.scripting.nashorn,jdk.scripting.nashorn.shell,jdk.sctp,jdk.security.auth,jdk.security.jgss,jdk.unsupported,jdk.vm.ci,jdk.xml.bind,jdk.xml.dom,jdk.xml.ws,jdk.zipfs \
>     --keep-packaged-modules /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/images/jdk/jmods \
>     --output /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/images/jdk
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/os_linux_zero.cpp:260
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/os_cpu/linux_zero/vm/os_linux_zero.cpp:260), pid=19938, tid=19940
#  fatal error: 
#
#    /--------------------\
#    | segmentation fault |
#    \---\ /--------------/
#        /
#    [-]        |\_/|    
#    (+)=C      |o o|__  
#    | |        =-*-=__\ 
#    OOO        c_c_(___)
#
# JRE version: OpenJDK Runtime Environment (9.0) (fastdebug build 9-internal+0-2016-12-06-171656.sgehwolf.openjdk9-hs)
# Java VM: OpenJDK 64-Bit Zero VM (fastdebug 9-internal+0-2016-12-06-171656.sgehwolf.openjdk9-hs, interpreted mode, serial gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %P %I" (or dumping to /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/core.19938)
#
# An error report file with more information is saved as:
# /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hs_err_pid19938.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Current thread is 19940
Dumping core ...
Aborted (core dumped
Comments
Fix request: Importance: The bug will cause the JVM to be very unstable if sizeof(MethodCounters) is not a multiple of Wordsize. This happens because the metadata for the memory allocator is corrupted. Risk: Very low. The proposed fix has no effect if sizeof(MethodCounters) is already a multiple of Wordsize. If it is not a multiple, the fix averts inevitable memory corruption. Test coverage: Build Zero with bootcycle-images.
05-04-2017

I believe the above analysis to be incorrect. The actual bug is that the metaspace allocation size for MethodCounters is wrong: its size in words is calculated by sizeof(MethodCounters) / wordSize and this is wrong if sizeof(MethodCounters) is not a round number of words. The obvious fix is attached.
05-04-2017

With the work-around patch a simple build completes, but a bootcycle-image build fails on: /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/images/jdk/bin/javac -g -implicit:none -d /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/bootcycle-build/hotspot/variant-zero/tools/jvmti @/home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/bootcycle-build/hotspot/variant-zero/tools/jvmti/_the.BUILD_JVMTI_TOOLS_batch.tmp with: # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/metaspace.cpp:999 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/memory/metaspace.cpp:999), pid=30534, tid=30535 # assert(_virtual_space.committed_size() == _virtual_space.actual_committed_size()) failed: The committed memory doesn't match the expanded memory. # # JRE version: OpenJDK Runtime Environment (9.0) (fastdebug build 9-internal+0-2016-12-15-152444.sgehwolf.openjdk9-hs) # Java VM: OpenJDK 64-Bit Zero VM (fastdebug 9-internal+0-2016-12-15-152444.sgehwolf.openjdk9-hs, interpreted mode, serial gc, linux-amd64) # Core dump will be written. Default location: Core dumps may be processed with "/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %P %I" (or dumping to /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/core.30534) # # An error report file with more information is saved as: # /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hs_err_pid30534.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp #
15-12-2016

Proposed work-around fix.
15-12-2016

Current workaround is: Set "InitialBootClassLoaderMetaspaceSize=8388608". That's twice the default size. With that the reproducer works successfully.
14-12-2016

I've used this gdb script to get to the interesting case where memory gets overwritten: $ cat debug_script.gdb set breakpoint pending on set pagination off handle SIGSEGV nostop noprint pass break src/share/vm/oops/instanceKlass.hpp:377 run while true if $in_scope("klass") # klass if klass->_name != 0 if strcmp(klass->_name->as_C_string(), "sun/util/calendar/BaseCalendar") == 0 loop_break end end continue else # this_klass if _class_name != 0 if strcmp(_class_name->as_C_string(), "sun/util/calendar/BaseCalendar") == 0 p _class_name->as_C_string() set $addr = &_methods->_length watch *(int*)$addr delete 1 break share/vm/memory/metaspace.cpp:3553 loop_break end end continue end end # Continue 1184 times for the Metaspace::allocate break point to hit # the interesting case. continue continue 1183 break src/share/vm/oops/methodCounters.hpp:99 continue Run the above as: $ rm -rf /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/images/jdk $ gdb -x debug_script.gdb --args /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/jdk/bin/jlink -J-XX:+UseSerialGC -J-Xms32M -J-Xmx512M -J-XX:TieredStopAtLevel=1 --module-path /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/images/jmods --endian little --release-info /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/jdk/release --order-resources=**module-info.class,/java.base/java/**,/java.base/jdk/**,/java.base/sun/**,/java.base/com/**,/jdk.localedata/** --add-modules java.activation,java.annotations.common,java.base,java.compact1,java.compact2,java.compact3,java.compiler,java.corba,java.datatransfer,java.desktop,java.httpclient,java.instrument,java.logging,java.management,java.naming,java.prefs,java.rmi,java.scripting,java.se,java.se.ee,java.security.jgss,java.security.sasl,java.smartcardio,java.sql,java.sql.rowset,java.transaction,java.xml,java.xml.bind,java.xml.crypto,java.xml.ws,jdk.accessibility,jdk.attach,jdk.charsets,jdk.compiler,jdk.crypto.ec,jdk.crypto.pkcs11,jdk.desktop,jdk.dynalink,jdk.editpad,jdk.httpserver,jdk.internal.ed,jdk.internal.le,jdk.internal.opt,jdk.jartool,jdk.javadoc,jdk.jcmd,jdk.jconsole,jdk.jdeps,jdk.jdi,jdk.jdwp.agent,jdk.jlink,jdk.jshell,jdk.jsobject,jdk.jstatd,jdk.jvmstat,jdk.localedata,jdk.management,jdk.naming.dns,jdk.naming.rmi,jdk.net,jdk.pack200,jdk.policytool,jdk.rmic,jdk.scripting.nashorn,jdk.scripting.nashorn.shell,jdk.sctp,jdk.security.auth,jdk.security.jgss,jdk.unsupported,jdk.vm.ci,jdk.xml.bind,jdk.xml.dom,jdk.xml.ws,jdk.zipfs --keep-packaged-modules /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/images/jdk/jmods --output /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/build/linux-x86_64-normal-zero-fastdebug/images/jdk Memory gets overwritten in line 101 of src/share/vm/oops/methodCounters.hpp (constructor of MethodCounters). It just so happens that the address of that instances _backedge_mask is the same as $addr as used in our watch point. And indeed, "right_n_bits(Arguments::scaled_freq_log(Tier0BackedgeNotifyFreqLog, scale)) << InvocationCounter::count_shift" evaluates to 8184. Question is why memory overlaps.
13-12-2016

It looks like a stack overrun. When the methods array is set, the length is correctly set to 20. But then it changes to 8184, which is wrong. (gdb) p _methods->_length $5 = 20 (gdb) p &_methods->_length $6 = (int *) 0x7ffff438d128 (gdb) watch *(int *) 0x7ffff438d128 Hardware watchpoint 2: *(int *) 0x7ffff438d128 (gdb) continue Continuing. Thread 2 "jlink" hit Hardware watchpoint 2: *(int *) 0x7ffff438d128 Old value = 20 New value = 8184 MethodCounters::allocate (mh=..., __the_thread__=__the_thread__@entry=0x7ffff0016700) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/oops/methodCounters.cpp:30 30 return new(loader_data, size(), false, MetaspaceObj::MethodCountersType, THREAD) MethodCounters(mh); (gdb) bt #0 MethodCounters::allocate (mh=..., __the_thread__=__the_thread__@entry=0x7ffff0016700) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/oops/methodCounters.cpp:30 #1 0x00007ffff6a000fc in Method::build_method_counters (m=m@entry=0x7fffcd049f60, __the_thread__=__the_thread__@entry=0x7ffff0016700) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/oops/method.cpp:433 #2 0x00007ffff67aad11 in InterpreterRuntime::build_method_counters (thread=0x7ffff0016700, m=0x7fffcd049f60) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/interpreter/interpreterRuntime.cpp:1032 #3 0x00007ffff6333b3b in BytecodeInterpreter::run (istate=istate@entry=0x7ffff7fcaa78) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/interpreter/bytecodeInterpreter.cpp:2702 [...]
13-12-2016

The issue seems to be coming from an allocation failure, which triggers GC, which calls InstanceClass::clean_weak_instance_class_links => that gets a method pointer from the methods array in clean_method_data. When accessing index 20, the returned Method* is 0x9 causing the segfault. The class name is: sun/util/calendar/BaseCalendar. The backtrace looks like this: #16 0x00007ffff6791038 in Method::method_data (this=<optimized out>) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/oops/method.hpp:330 #17 InstanceKlass::clean_method_data (is_alive=0x7ffff71b6f08 <MarkSweep::is_alive>, this=0x7fffcd02f7d8) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/oops/instanceKlass.cpp:1946 #18 InstanceKlass::clean_weak_instanceklass_links (this=this@entry=0x7fffcd02f7d8, is_alive=is_alive@entry=0x7ffff71b6f08 <MarkSweep::is_alive>) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/oops/instanceKlass.cpp:1918 #19 0x00007ffff6973fa5 in Klass::clean_weak_klass_links (is_alive=0x7ffff71b6f08 <MarkSweep::is_alive>, clean_alive_klasses=clean_alive_klasses@entry=true) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/oops/klass.cpp:433 #20 0x00007ffff670da47 in GenMarkSweep::mark_sweep_phase1 (clear_all_softrefs=clear_all_softrefs@entry=false) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/gc/serial/genMarkSweep.cpp:232 #21 0x00007ffff6710d5c in GenMarkSweep::invoke_at_safepoint (rp=0x7ffff00d0d80, clear_all_softrefs=clear_all_softrefs@entry=false) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/gc/serial/genMarkSweep.cpp:90 #22 0x00007ffff6c2ef38 in TenuredGeneration::collect (this=0x7ffff002ddb0, full=<optimized out>, clear_all_soft_refs=<optimized out>, size=<optimized out>, is_tlab=<optimized out>) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/gc/serial/tenuredGeneration.cpp:181 #23 0x00007ffff670659f in GenCollectedHeap::collect_generation (this=this@entry=0x7ffff00275c0, gen=0x7ffff002ddb0, full=full@entry=false, size=size@entry=0, is_tlab=is_tlab@entry=false, run_verification=run_verification@entry=true, clear_soft_refs=false, restore_marks_for_biased_locking=true) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/gc/shared/genCollectedHeap.cpp:383 #24 0x00007ffff6708b85 in GenCollectedHeap::do_collection (this=this@entry=0x7ffff00275c0, full=full@entry=false, clear_all_soft_refs=clear_all_soft_refs@entry=false, size=0, size@entry=5, is_tlab=is_tlab@entry=false, max_generation=max_generation@entry=GenCollectedHeap::OldGen) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/gc/shared/genCollectedHeap.cpp:501 #25 0x00007ffff64ca334 in GenCollectorPolicy::satisfy_failed_allocation (this=0x7ffff0027430, size=5, is_tlab=<optimized out>) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/gc/shared/collectorPolicy.cpp:742 #26 0x00007ffff6703184 in GenCollectedHeap::satisfy_failed_allocation (this=this@entry=0x7ffff00275c0, size=<optimized out>, is_tlab=<optimized out>) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/gc/shared/genCollectedHeap.cpp:552 #27 0x00007ffff6cb59f3 in VM_GenCollectForAllocation::doit (this=0x7ffff7f112c0) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/gc/shared/vmGCOperations.cpp:163 #28 0x00007ffff6ceb4a1 in VM_Operation::evaluate (this=this@entry=0x7ffff7f112c0) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/runtime/vm_operations.cpp:67 ---Type <return> to continue, or q <return> to quit--- #29 0x00007ffff6ce6428 in VMThread::evaluate_operation (op=0x7ffff7f112c0, this=0x7ffff00d3ba0) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/runtime/vmThread.cpp:348 #30 0x00007ffff6ce8ace in VMThread::loop (this=this@entry=0x7ffff00d3ba0) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/runtime/vmThread.cpp:470 #31 0x00007ffff6ce9390 in VMThread::run (this=0x7ffff00d3ba0) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/share/vm/runtime/vmThread.cpp:262 #32 0x00007ffff6aab1c2 in thread_native_entry (thread=0x7ffff00d3ba0) at /home/sgehwolf/Documents/openjdk/upstream-sources/openjdk9-hs/hotspot/src/os/linux/vm/os_linux.cpp:679 #33 0x00007ffff79ae5ca in start_thread () from /lib64/libpthread.so.0 #34 0x00007ffff72d60ed in clone () from /lib64/libc.so.6
12-12-2016