Bug ID: JDK-8209971 TestOptionsWithRanges.java crashes in CDS mode with G1UpdateBufferSize=1

JDK 12
12 b11Fixed

URL: http://hg.openjdk.java.net/jdk/jdk/rev/d7dcaacb95dd User: jiangli Date: 2018-09-07 19:21:45 +0000
07-09-2018
After some investigations, I think the safest and least disturbing change is to do MetaspaceShared::fixup_mapped_heap_regions() after resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), scan, CHECK) in SystemDictionary::resolve_preloaded_classes(). The fixup_mapped_heap_regions() call happens after resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), scan, CHECK) but before Object_klass()->constants()->restore_unshareable_info(CHECK). At this time, the SystemDictionary::Object_klass() is already set up by the resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), scan, CHECK). That makes sure the archived java heap regions are fixed up before anyone tries to access the regions. Note, the archived java heap regions (or any other heap regions) are not accessed until the first time we restore an archived java object. The fix makes sure the mapped archived heap regions are fixed-up before Object_klass()->constants()->restore_unshareable_info(CHECK), which restores the constant pool resolved_references array for Object klass. To complete the fix, we also need to postpone the mirror restoration for Object until after the archived heap region is fixed-up. That can be done by utilizing the mirror fixup process used by the normal class loading. The mirror fixup process postpones mirror restoration/creation until after java.lang.Class is loaded. That guarantees the MetaspaceShared::fixup_mapped_heap_regions() call happen before any of the archived java mirrors is accessed/restored. The mirror_fixup process was fully supported and tested already for archived mirrors when the feature was first implemented. However, mirror_fixup was not needed for archived mirrors because field offsets can be loaded from the archive, and there was no need to postpone mirror processing until java.lang.Class was loaded. --- a/src/hotspot/share/classfile/systemDictionary.cpp +++ b/src/hotspot/share/classfile/systemDictionary.cpp @@ -2028,6 +2028,7 @@ #if INCLUDE_CDS if (UseSharedSpaces) { resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), scan, CHECK); + MetaspaceShared::fixup_mapped_heap_regions(); // Initialize the constant pool for the Object_class Object_klass()->constants()->restore_unshareable_info(CHECK); resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Class_klass), scan, CHECK); --- a/src/hotspot/share/memory/universe.cpp +++ b/src/hotspot/share/memory/universe.cpp @@ -332,7 +332,6 @@ SystemDictionary::Cloneable_klass(), "u3"); assert(_the_array_interfaces_array->at(1) == SystemDictionary::Serializable_klass(), "u3"); - MetaspaceShared::fixup_mapped_heap_regions(); } else #endif --- a/src/hotspot/share/classfile/javaClasses.cpp +++ b/src/hotspot/share/classfile/javaClasses.cpp @@ -1213,6 +1213,13 @@ bool java_lang_Class::restore_archived_mirror(Klass *k, Handle class_loader, Handle module, Handle protection_domain, TRAPS) { + + if (!SystemDictionary::Class_klass_loaded()) { + assert(fixup_mirror_list() != NULL, "fixup_mirror_list not initialized"); + fixup_mirror_list()->push(k); + return true; + } + oop m = MetaspaceShared::materialize_archived_object(k->archived_java_mirror_raw_narrow()); if (m == NULL) { @@ -1225,10 +1232,6 @@ assert(MetaspaceShared::is_archive_object(m), "must be archived mirror object"); Handle mirror(THREAD, m); - // The java.lang.Class field offsets were archived and reloaded from archive. - // No need to put classes on the fixup_mirror_list before java.lang.Class - // is loaded. - if (!k->is_array_klass()) { // - local static final fields with initial values were initialized at dump time
06-09-2018
Some analysis about this specific crash: The crash happens when the VM calls AccessInternal::PostRuntimeDispatch::oop_access_barrier() during restoring an archived mirror object for a well-known class that's being resolved (in ystemDictionary::resolve_preloaded_classes). The first 40 classes are actually resolved (and their mirrors are also restored) successfully without crashing. The crash only happens at the 41th class restoration time, where the GC code path is different and it triggers HeapRegion::block_size(). An naive fix could place the MetaspaceShared::fixup_mapped_heap_regions() after Object_klass()->constants()->restore_unshareable_info(CHECK) and can also avoid the crash. However, that relies on the internal knowledge of the underly GC implementation, which is subject to change. The fix proposed above is the safest and proper solution.
05-09-2018
With the above changes, I've also tested and verified the archived region fixup by doing explicit 2-HeapWord filling, which uses the following code in the CollectedHeap::fill_with_object_impl() 'else' case. ObjAllocator allocator(SystemDictionary::Object_klass(), words); allocator.initialize(start);
05-09-2018
I'm reassigning the bug since [~iklam] is busy with the Lambda subgraphs archiving work and this bug needs to be fixed before the default CDS archive integration.
04-09-2018
In both crashes found by Jiangli, the VM options are "-XX:+UseG1GC -XX:G1UpdateBufferSize=1".
28-08-2018
In the crash scenario, the open archive region is mapped at 0x00000007ffb00000. However, due to HeapRegion::GrainBytes, the actual G1 region that is used starts at 0x00000007ffa00000. As a result, we need to fill the space between [0x00000007ffa00000, 0x00000007ffb00000). This is done inside MetaspaceShared::fixup_mapped_heap_regions. fixup_mapped_heap_regions calls G1CollectedHeap::heap()->fill_archive_regions(), which calls CollectedHeap::fill_with_objects, which would fill the range with either int arrays or java.lang.Objects. The trouble is, SystemDictionary::Object_klass() is initialized inside SystemDictionary::initialize, so to be safe, MetaspaceShared::fixup_mapped_heap_regions is called only afterwards. As a result, in the middle of SystemDictionary::initialize, the open archive region is not yet fully initialized. The crash happens when the barrier code calls HeapRegion::block_size(addr==0x00000007ffa00000) while that address is not yet filled by CollectedHeap::fill_with_objects. Here's a hack that works around the crash, but it might crash if SystemDictionary::Object_klass() is called too early. diff -r dda0f219dafa src/hotspot/share/memory/universe.cpp --- a/src/hotspot/share/memory/universe.cpp Thu Aug 23 21:16:45 2018 -0700 +++ b/src/hotspot/share/memory/universe.cpp Mon Aug 27 19:02:04 2018 -0700 @@ -362,7 +362,9 @@ } vmSymbols::initialize(CHECK); - + if (UseSharedSpaces) { + MetaspaceShared::fixup_mapped_heap_regions(); + } SystemDictionary::initialize(CHECK); Klass* ok = SystemDictionary::Object_klass(); @@ -377,7 +379,6 @@ SystemDictionary::Cloneable_klass(), "u3"); assert(_the_array_interfaces_array->at(1) == SystemDictionary::Serializable_klass(), "u3"); - MetaspaceShared::fixup_mapped_heap_regions(); } else #endif
28-08-2018
Update: I can 100% reproduce the problem on the same host (Oracle engineers, see next comment) that executed the job "jianzhou-jdk_default_archive-20180823-0438-35786"), using the same JDK binary: $ ./jdk-12/fastdebug/bin/java -Xshare:dump -Xmx128m $ ./jdk-12/fastdebug/bin/java -Xshare:on -Xlog:cds -XX:+UseG1GC -XX:G1UpdateBufferSize=1 -version ... # assert(!is_null(v)) failed: narrow klass value can never be zero ... The crash is somewhat dependent on the number of the number of logical CPUs and concurrent threads. On a bigger machine with 32 logical cores, I can also 100% reproduce the problem with java -Xshare:on -XX:+UseG1GC -XX:G1UpdateBufferSize=1 -Xlog:cds -Xlog:gc+region=trace -Xmx15174M -XX:ParallelGCThreads=8 -version
28-08-2018
According to one of the crashes (from the job "jianzhou-jdk_default_archive-20180823-0438-35786"), the crash happens in the following location: # assert(!is_null(v)) failed: narrow klass value can never be zero # # JRE version: (12.0) (fastdebug build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 12-internal+0-2018-08-23-0437139.jiangli.zhou.jdkdefaultarchive, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) # Core dump will be written. Default location: .../core.16808 Command Line: -XX:+UseG1GC -XX:G1UpdateBufferSize=1 optionsvalidation.JVMStartup Host: xxx.us.oracle.com, Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz, 8 cores, 59G, Oracle Linux Server release 7.1 Time: Wed Aug 22 22:21:22 2018 PDT elapsed time: 0 seconds (0d 0h 0m 0s) Stack: [0x00007f4dd4605000,0x00007f4dd4706000], sp=0x00007f4dd4703b80, free space=1018k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x18b3087] VMError::report_and_die(int, char const, char const, __va_list_tag, Thread, unsigned char, void, void, char const, int, unsigned long)+0x2c7 V [libjvm.so+0x18b3eef] VMError::report_and_die(Thread, void, char const, int, char const, char const, __va_list_tag)+0x2f V [libjvm.so+0xb52640] report_vm_error(char const, int, char const, char const, ...)+0x100 V [libjvm.so+0x6e2c17] Klass::decode_klass_not_null(unsigned int)+0xb7 V [libjvm.so+0xe2fb47] HeapRegion::block_size(HeapWord const) const+0x347 V [libjvm.so+0xd694f8] bool HeapRegion::oops_on_card_seq_iterate_careful<false, G1ConcurrentRefineOopClosure>(MemRegion, G1ConcurrentRefineOopClosure)+0x228 V [libjvm.so+0xd5eb39] G1RemSet::refine_card_concurrently(signed char, unsigned int)+0x1b9 V [libjvm.so+0xc2b1ce] DirtyCardQueueSet::apply_closure_to_buffer(CardTableEntryClosure, BufferNode, bool, unsigned int) [clone .part.46]+0x7e V [libjvm.so+0xc2b7ef] DirtyCardQueueSet::mut_process_buffer(BufferNode)+0x4f V [libjvm.so+0x15ef390] PtrQueueSet::process_or_enqueue_complete_buffer(BufferNode)+0x250 V [libjvm.so+0x15ef9c7] PtrQueue::handle_zero_index()+0x257 V [libjvm.so+0x15effb8] PtrQueue::enqueue_known_active(void)+0x28 V [libjvm.so+0x95bc82] AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<573558ul, G1BarrierSet>, (AccessInternal::BarrierType)1, 573558ul>::oop_access_barrier(oop, long, oop)+0x322 V [libjvm.so+0x95edf9] oopDesc::obj_field_put(int, oop)+0xa9 V [libjvm.so+0xed7ef9] java_lang_Class::set_init_lock(oop, oop)+0x49 V [libjvm.so+0xef3f1c] java_lang_Class::restore_archived_mirror(Klass, Handle, Handle, Handle, Thread)+0x54c V [libjvm.so+0x11f6d3c] Klass::restore_unshareable_info(ClassLoaderData, Handle, Thread)+0x8bc V [libjvm.so+0xe92dc1] InstanceKlass::restore_unshareable_info(ClassLoaderData, Handle, Thread)+0x51 V [libjvm.so+0x178230b] SystemDictionary::load_shared_class(InstanceKlass, Handle, Handle, Thread)+0x2bb V [libjvm.so+0x1782a87] SystemDictionary::load_instance_class(Symbol, Handle, Thread)+0x3f7 V [libjvm.so+0x178120c] SystemDictionary::resolve_instance_class_or_null(Symbol, Handle, Handle, Thread)+0xb4c V [libjvm.so+0x1781606] SystemDictionary::resolve_instance_class_or_null_helper(Symbol, Handle, Handle, Thread)+0xa6 V [libjvm.so+0x178346c] SystemDictionary::resolve_wk_klass(SystemDictionary::WKID, int, Thread)+0x17c V [libjvm.so+0x1783591] SystemDictionary::resolve_wk_klasses_until(SystemDictionary::WKID, SystemDictionary::WKID&, Thread)+0x61 V [libjvm.so+0x17838b8] SystemDictionary::resolve_preloaded_classes(Thread)+0x238 V [libjvm.so+0x1783b8a] SystemDictionary::initialize(Thread)+0x21a V [libjvm.so+0x1835d18] Universe::genesis(Thread)+0x448 V [libjvm.so+0x1836acc] universe2_init()+0x2c V [libjvm.so+0xe81838] init_globals()+0xa8 V [libjvm.so+0x17e7eb5] Threads::create_vm(JavaVMInitArgs, bool)+0x2c5 V [libjvm.so+0xff89ca] JNI_CreateJavaVM+0x6a C [libjli.so+0x3ff6] JavaMain+0x86 This happens when the first few classes are being resolved during bootstrap void SystemDictionary::resolve_preloaded_classes(TRAPS) { assert(WK_KLASS(Object_klass) == NULL, "preloaded classes should only be initialized once"); // Create the ModuleEntry for java.base. This call needs to be done here, // after vmSymbols::initialize() is called but before any classes are pre-loaded. ClassLoader::classLoader_init2(CHECK); // Preload commonly used klasses WKID scan = FIRST_WKID; // first do Object, then String, Class #if INCLUDE_CDS if (UseSharedSpaces) { resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Object_klass), scan, CHECK); // Initialize the constant pool for the Object_class Object_klass()->constants()->restore_unshareable_info(CHECK); resolve_wk_klasses_through(WK_KLASS_ENUM_NAME(Class_klass), scan, CHECK); <<<<<<< CRASH } else #endif From the crash log, the shared archive heap regions have been successfully mapped: 7ffb00000-7ffb48000 rw-p 011a2000 fc:00 6297851 /.../fastdebug/lib/server/classes.jsa 7ffc00000-7ffc6b000 rw-p 01137000 fc:00 6297851 /.../fastdebug/lib/server/classes.jsa
28-08-2018

Blocks :	JDK-8202951 - Implementation of JEPJDK-8204247: Include default CDS (Class Data Sharing) archive in JDK binary
Relates :	JDK-8202951 - Implementation of JEPJDK-8204247: Include default CDS (Class Data Sharing) archive in JDK binary