Bug ID: JDK-8147481 VM crash fatal error: corrupted C heap (runtime.threads.Threads2a.Threads2a)

Likely a Windows internal bug in HeapWalk/HeapValidate. Not a HotSpot bug and no indication of actual heap error.
02-02-2016
With the above patch, I reproduced on my laptop after about 3000 iterations. From the values in the tail of the saved[] array, it seems like HeapWalk suddenly jumps backwards and lands on a free block. This strange behavior cannot be explained by the actual memory contents of the heap. Also, running "!heap -triage" in WinDBG shows no heap errors.
02-02-2016
I found that the output of "!heap -a nnnnnnn" in windbg.exe does not contain the block reported by os::check_heap, so I am adding a log of all the blocks seen by os::check_heap, and try to get another crash: ~/jdk/bug8147481/hotspot$ hg diff diff -r 09636a2eaeff src/os/windows/vm/os_windows.cpp --- a/src/os/windows/vm/os_windows.cpp Thu Jan 14 15:45:31 2016 -0500 +++ b/src/os/windows/vm/os_windows.cpp Mon Jan 25 09:26:53 2016 -0800 @@ -5225,6 +5225,9 @@ static int mallocDebugIntervalCounter = 0; static int mallocDebugCounter = 0; bool os::check_heap(bool force) { +#define SAVE_COUNT 100000 + static PROCESS_HEAP_ENTRY saved[SAVE_COUNT]; + if (++mallocDebugCounter < MallocVerifyStart && !force) return true; if (++mallocDebugIntervalCounter >= MallocVerifyInterval \|\| force) { // Note: HeapValidate executes two hardware breakpoints when it finds something @@ -5242,9 +5245,14 @@ // or some other special heap flag has been set that prevents // locking. We don't try to walk a heap we can't lock. if (HeapLock(heap) != 0) { + memset(saved, 0, sizeof(saved)); PROCESS_HEAP_ENTRY phe; phe.lpData = NULL; + int saved_index = 0; while (HeapWalk(heap, &phe) != 0) { + if (saved_index < SAVE_COUNT) { + saved[saved_index++] = phe; + } if ((phe.wFlags & PROCESS_HEAP_ENTRY_BUSY) && !HeapValidate(heap, 0, phe.lpData)) { tty->print_cr("C heap has been corrupted (time: %d allocations)", mallocDebugCounter);
26-01-2016
I found that once Visual Studio, Gflags.exe and/or windbg.exe are running on my laptop, it's impossible to reproduce the crash (no crash for 4 days). A semi-reliable way to reproduce it is to reboot the laptop, run only CMD.exe and start the test loop. Afterwards, I reproduced 3 times again (7 loops, 168 loops, and 329 loops).
26-01-2016
After I enabled 'page heap' for java.exe with gflags, the following command ran for more than 16 hours without any issue. jdk\bin\java -Xmixed -XX:MaxRAMFraction=8 -XX:-CreateCoredumpOnCrash -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:MaxRAMFraction=8 -XX:MallocVerifyInterval=1 -XX:+UseOSErrorReporting -XX:+SuppressFatalErrorMessage -cp classes/ runtime.threads.Threads2a.Threads2a I will disable page heap and run the tests again. So far, this looks like a problem not with heap corruption, but our implementation of os::check_heap.
20-01-2016
The above crash seems very intriguing. This happened when the JVM is still in "single threaded" mode when it's generating the interpreter thread. What's more weird is: C heap has been corrupted (time: 1564 allocations) corrupted block near address 0x4ac6e0, length 26 Note that the string "D:\ade\bugs\8147481_c_heap" is exactly 26 bytes, EXCLUDING the trailing zero. Does this mean someone has allocated a string that's one-byte too short? Incidentally, that's the current directory. This block of memory doesn't seem to be allocated via os::malloc, or else it would have memory guard starting at 0x00000000004AC6C0 (see hotspot/src/share/vm/memory/guardedMemory.hpp). V [jvm.dll+0x47225a] report_fatal+0x7a V [jvm.dll+0x8ee4b3] os::check_heap+0x113 V [jvm.dll+0x8e6af0] os::malloc+0x130 V [jvm.dll+0x8e69b8] os::malloc+0xa8 V [jvm.dll+0x8e588c] os::_strdup+0x2c V [jvm.dll+0x3e0091] CodeStrings::add_comment+0x81 V [jvm.dll+0x806fdb] MacroAssembler::call_VM_base+0x23b V [jvm.dll+0x6047db] InterpreterMacroAssembler::call_VM_base+0x11b V [jvm.dll+0x807224] MacroAssembler::call_VM_helper+0xd4 V [jvm.dll+0x806c9b] MacroAssembler::call_VM+0xbb V [jvm.dll+0xa45bd9] TemplateTable::resolve_cache_and_index+0x139 V [jvm.dll+0xa3ea15] TemplateTable::getfield_or_static+0x85 V [jvm.dll+0xa3382d] Template::generate+0x2d V [jvm.dll+0xa2b9a9] TemplateInterpreterGenerator::generate_and_dispatch+0x179 V [jvm.dll+0xa31d51] TemplateInterpreterGenerator::set_vtos_entry_points+0x1f1 V [jvm.dll+0xa2bcfb] TemplateInterpreterGenerator::set_entry_points+0x30b V [jvm.dll+0xa2bf13] TemplateInterpreterGenerator::set_entry_points_for_all_bytes+0x73 V [jvm.dll+0xa2b792] TemplateInterpreterGenerator::generate_all+0x1342 V [jvm.dll+0xa2a43d] TemplateInterpreterGenerator::TemplateInterpreterGenerator+0x3d V [jvm.dll+0xa29ddf] TemplateInterpreter::initialize+0x17f V [jvm.dll+0x60b9e9] interpreter_init+0x9 V [jvm.dll+0x5f5942] init_globals+0xb2 V [jvm.dll+0xa4bf56] Threads::create_vm+0x246 V [jvm.dll+0x690da7] JNI_CreateJavaVM_inner+0xd7 V [jvm.dll+0x693eef] JNI_CreateJavaVM+0x1f C [java.exe+0x2dc8] C [java.exe+0xd87b] C [java.exe+0xda22] <actual memory from the mdmp> 0x00000000004AC6C0 50 51 41 00 00 00 00 00 01 00 00 00 00 00 00 00 PQA............. 0x00000000004AC6D0 00 00 00 00 00 00 00 00 fb e2 b8 6d e5 00 00 95 ........��m��... 0x00000000004AC6E0 44 3a 5c 61 64 65 5c 62 75 67 73 5c 38 31 34 37 D:\ade\bugs\8147 0x00000000004AC6F0 34 38 31 5f 63 5f 68 65 61 70 00 00 00 00 00 00 481_c_heap...... 0x00000000004AC700 00 00 00 00 00 00 00 00 e6 e2 b8 6d e5 00 00 90 ........��m��... 0x00000000004AC710 c8 4e 46 00 00 00 00 00 40 c7 4a 00 00 00 00 00 ��NF.....@��J..... 0x00000000004AC720 20 6b 41 00 00 00 00 00 01 00 00 00 00 00 00 00 kA............. <would be this with memory guard added vis os::malloc> 0x00000000004AC6C0 ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba 0x00000000004AC6D0 1a 00 00 00 00 00 00 00 <size = 26> 0x00000000004AC6D8 00 00 00 00 00 00 00 00 <tag, unused> Strange thing is, when I load up hs_err_pid62060.mdmp in windbg.exe, and run "!heap -a", it shows that 04AC6E0 is NOT a busy block. So, is HeapWalk returning an bad value inside os::check_heap?? 00000000004ac500: 00430 . 00050 [100] 00000000004ac550: 00050 . 00800 [101] - busy (7f0) Internal 00000000004acd50: 00800 . 00420 [101] - busy (410) 00000000004ad170: 00420 . 00480 [101] - busy (478) 00000000004ad5f0: 00480 . 08000 [101] - busy (7ff0) Internal 00000000004b55f0: 08000 . 08000 [101] - busy (7ff0) Internal
20-01-2016
Since the above crash didn't run any Java code, it seems like the actual Java main class is irrelevant, so I reran my test as: i=0; while ./jdk/bin/java -XX:MaxRAMFraction=8 -XX:+CreateCoredumpOnCrash -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:MaxRAMFraction=8 -XX:MallocVerifyInterval=1 -Xint -cp classes/ -XX:+UseOSErrorReporting -version; do i=$(expr $i + 1); echo $i; done But this did not reproduce the crash after 5 hours (4744 iterations).
20-01-2016
I am happy to report that on my laptop, I can reproduce the crash with the following command, on the 58-th iteration (23 minutes) using 9-internal+0-2016-01-14-205852.cphillim.jdk9-temp-new-sym: i=0; while ./jdk/bin/java -Xmixed -XX:MaxRAMFraction=8 -XX:+CreateCoredumpOnCrash -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:MaxRAMFraction=8 -XX:MallocVerifyInterval=1 -cp classes/ runtime.threads.Threads2a.Threads2a; do i=$(expr $i + 1); echo $i; done 56 1000 Diff: 0 Diff: 0.0 1000 Startup time: 17128 57 C heap has been corrupted (time: 10014 allocations) corrupted block near address 0x417fb0, length 1144 # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=\os_windows.cpp:5253 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (C:\jprt\T\P1\205852.cphillim\s\hotspot\src\os\windows\vm\os_windows.cpp:5253), pid=60472, tid=56660 # fatal error: corrupted C heap # # JRE version: Java(TM) SE Runtime Environment (9.0) (build 9-internal+0-2016-01-14-205852.cphillim.jdk9-temp-new-sym) # Java VM: Java HotSpot(TM) 64-Bit Server VM (9-internal+0-2016-01-14-205852.cphillim.jdk9-temp-new-sym, mixed mode, tiered, compressed oops, parallel gc, windows-amd64) # Core dump will be written. Default location: D:\ade\bugs\8147481_c_heap\hs_err_pid60472.mdmp # # An error report file with more information is saved as: # D:\ade\bugs\8147481_c_heap\hs_err_pid60472.log Similar to several crashes identified in JDK-8143249, the crashing thread is a compiler thread. However, it's not yet clear whether the corruption was introduced by the compiler, or the compiler thread died because of corruption introduced elsewhere. I will now repeat the test with compiled disabled. ======================== V [jvm.dll+008f6440] = 008f5340+ 256 bool os::platform_print_native_stack(class outputStream ,void const ,char ,int) 00000000088f6340 V [jvm.dll+00a9e00f] = 00a9c550+2751 void VMError::report(class outputStream ,bool) 0000000008a9d550 V [jvm.dll+00a9ef17] = 00a9daa0+1143 void VMError::report_and_die(int,char const ,char const ,char ,class Thread ,unsigned char ,void ,void ,char const ,int,unsigned __int64) V [jvm.dll+00a9f50d] = 00a9e4b0+ 93 void VMError::report_and_die(class Thread ,char const ,int,char const ,char const ,char ) V [jvm.dll+0047225a] = 004711e0+ 122 void report_fatal(char const ,int,char const ,...) 00000000084721e0 V [jvm.dll+008ee4b3] = 008ed3a0+ 275 bool os::check_heap(bool) 00000000088ee3a0 V [jvm.dll+008e6af0] = 008e59c0+ 304 void os::malloc(unsigned __int64,enum MemoryType,class NativeCallStack const &) 00000000088e69c0 V [jvm.dll+001ebe5d] = 001eae40+ 29 char * AllocateHeap(unsigned __int64,enum MemoryType,class NativeCallStack const &,enum AllocFailStrategy::AllocFailEnum) 00000000081ebe40 V [jvm.dll+001ebf72] = 001eaec0+ 178 char * AllocateHeap(unsigned __int64,enum MemoryType,enum AllocFailStrategy::AllocFailEnum) 00000000081ebec0 V [jvm.dll+008e03e8] = 008df2b0+ 312 class ImmutableOopMapSet * ImmutableOopMapSet::build_from(class OopMapSet const ) 00000000088e02b0 V [jvm.dll+003de302] = 003dd170+ 402 CodeBlob::CodeBlob(char const ,class CodeBuffer ,int,int,int,int,class OopMapSet ) 00000000083de170 V [jvm.dll+003df199] = 003de070+ 297 class RuntimeStub * RuntimeStub::new_runtime_stub(char const ,class CodeBuffer ,int,int,class OopMapSet ,bool) 00000000083df070 V [jvm.dll+0031d1b6] = 0031bed0+ 742 void Runtime1::generate_blob_for(class BufferBlob ,enum Runtime1::StubID) 000000000831ced0 V [jvm.dll+0031d3ea] = 0031c3c0+ 42 void Runtime1::initialize(class BufferBlob ) 000000000831d3c0 V [jvm.dll+002822b8] = 00281240+ 120 void Compiler::init_c1_runtime(void) 0000000008282240 V [jvm.dll+0028232a] = 002812e0+ 74 virtual void Compiler::initialize(void) 00000000082822e0 V [jvm.dll+00421de0] = 00420c80+ 352 bool CompileBroker::init_compiler_runtime(void) 0000000008421c80 V [jvm.dll+00421407] = 004201e0+ 551 void CompileBroker::compiler_thread_loop(void) 00000000084211e0 V [jvm.dll+00a53f38] = 00a52d90+ 424 void JavaThread::thread_main_inner(void) 0000000008a53d90 V [jvm.dll+00a52d35] = 00a51b40+ 501 virtual void JavaThread::run(void) 0000000008a52b40 V [jvm.dll+008f1ce6] = 008f0c00+ 230 unsigned int java_start(class Thread ) 00000000088f1c00
19-01-2016
Coleen: It was Christian's contention that this bug and the other that looked just like it that we marked it as duplicate were because of the hypervisor. I reproduced on my own laptop with Windows 7 running on bare metal -- i.e., not in a hypervisor.
19-01-2016
Looks similar to JDK-8143249 "fatal error: corrupted C heap" with '-XX:MallocVerifyInterval=1' option on win-64" Thus, it seems that it's not a new problem.
19-01-2016

Duplicate :	JDK-8069282 - Corrupted C heap in jit/graph/cgt2
Relates :	JDK-8148813 - Windows os::check_heap needs more information
Relates :	JDK-8163146 - Remove os::check_heap on Windows
Relates :	JDK-8144947 - Crash with memory stomping error Trailer guard is BROKEN
Relates :	JDK-8143249 - "fatal error: corrupted C heap" with '-XX:MallocVerifyInterval=1' option on win-64
Relates :	JDK-8028749 - java -version crashes with 'fatal error: heap walk aborted with error 1'
Relates :	JDK-8069282 - Corrupted C heap in jit/graph/cgt2
Relates :	JDK-8159065 - Corrupted c heap with runtime/threads/ThreadInterruptedNotStarted