Bug ID: JDK-7197666 java -d64 -version core dumps in a box with lots of memory

JDK-7197666 : java -d64 -version core dumps in a box with lots of memory

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: hs23,hs24,hs25,7u7

Priority: P3
Status: Resolved
Resolution: Fixed
OS: solaris_11
CPU: x86

Submitted: 2012-09-11
Updated: 2013-09-12
Resolved: 2013-04-08

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 7	JDK 8	Other
7u40Fixed	8Fixed	hs24Fixed

Related Reports

Relates :	JDK-8011661 - Insufficient memory message says "malloc" when sometimes it should say "mmap"
Relates :	JDK-8013938 - Native OOME on fastdebug VM on Solaris
Relates :	JDK-7188263 - G1: Excessive c_heap (malloc) consumption
Relates :	JDK-8003423 - OOM in PSPromotionManager::initialize

Description

System description:
X4800 oaf657 128 x 2267MHz 262135 Megabytes

oaf657# java -d64 -version 2>/dev/stdout
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /HUDSON/workspace/jdk7u7-2-build-solaris-amd64-product/jdk7u7/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /export/bench/autobench/5.x/Results/hs_err_pid7393.log

oaf657# java -version 2>/dev/stdout
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)

oaf657# swap -lh
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap 274,2        4K     218G     218G

oaf657# echo ::memstat |mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                    4236512             16548    6%
ZFS File Data              269796              1053    0%
Anon                     48217526            188349   72%
Exec and libs               54653               213    0%
Page cache                   6824                26    0%
Free (cachelist)            32705               127    0%
Free (freelist)          14288517             55814   21%
Total                    67106533            262134
using 1.6.0 installed in the box it doesn't have the same problem:

oaf657# ./java -d64 -version
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode) 

but I've installed 1.7.0_05 and it fails:

oaf657# ./java -version
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b06)
Java HotSpot(TM) Server VM (build 23.1-b03, mixed mode)

oaf657# ./java -d64 -version
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /export/HUDSON/workspace/jdk7u5-2-build-solaris-amd64-product/jdk7u5/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /root/7u5/solaris-amd64/bin/hs_err_pid1353.log

I'm copying older updates, I'll be updating this bug once I've got more information.
The problem starts in 7u4.

7u3:

oaf657# ./java -version
java version "1.7.0_03"
Java(TM) SE Runtime Environment (build 1.7.0_03-b04)
Java HotSpot(TM) Server VM (build 22.1-b02, mixed mode)

oaf657# ./java -d64 -version
java version "1.7.0_03"
Java(TM) SE Runtime Environment (build 1.7.0_03-b04)
Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)

7u4:

oaf657# ./java -version
java version "1.7.0_04"
Java(TM) SE Runtime Environment (build 1.7.0_04-b20)
Java HotSpot(TM) Server VM (build 23.0-b21, mixed mode)

oaf657# ./java -d64 -version
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /export/HUDSON/workspace/jdk7u4-2-build-solaris-amd64-product/jdk7u4/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /root/7u4/solaris-amd64/bin/hs_err_pid2894.log
So far I haven't found this problem anywhere else.

oaf657# echo ::memstat |mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                    3938332             15384    6%
ZFS File Data              237030               925    0%
Anon                        54886               214    0%
Exec and libs                1215                 4    0%
Page cache                   4649                18    0%
Free (cachelist)           120686               471    0%
Free (freelist)          62749735            245116   94%
Total                    67106533            262134

oaf657# ./java -d64 -version
java version "1.7.0_03-fastdebug"
Java(TM) SE Runtime Environment (build 1.7.0_03-fastdebug-b04)
Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02-fastdebug, mixed mode)

oaf657# ./java -d64 -version
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /export/HUDSON/workspace/jdk7u4-2-build-solaris-amd64-product/jdk7u4/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /root/7u4/solaris-amd64/fastdebug/bin/hs_err_pid2616.log

I've attached the log file.
I've just reproduced the same problem in another x4800:

X4800 oaf690 128 x 2267MHz 524279 Megabytes

oaf690# java -d64 -version
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /HUDSON/workspace/jdk7u7-2-build-solaris-amd64-product/jdk7u7/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /root/hs_err_pid20757.log

both boxes are running s11u1_24a.

Comments

RULE nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/doubleAllocationTest_proxy_custom_array MemError Could not reserve enough space for ...
27-03-2013
Using DTrace I can see that it is actually the java heap that gets mapped at 0x0000000010000000: oaf690# dtrace -n 'pid$target::mmap:return /arg1 == 0x10000000/ { ustack(); }' -c 'java -d64 -version' dtrace: description 'pid$target::mmap:return ' matched 2 probes dtrace: buffer size lowered to 2m # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /HUDSON/workspace/jdk7u17-2-build-solaris-amd64-product/jdk7u17/hotspot/src/share/vm/utilities/taskqueue.hpp # An error report file with more information is saved as: # /root/hs_err_pid11051.log dtrace: pid 11051 has exited CPU ID FUNCTION:NAME 90 83063 mmap:return libc.so.1`mmap+0x10 libjvm.so`__1cRReservedHeapSpace2t5B6MLLLLpc_v_+0xcc libjvm.so`__1cUParallelScavengeHeapKinitialize6M_i_+0x206 libjvm.so`__1cIUniversePinitialize_heap6F_i_+0x1a9 libjvm.so`__1cNuniverse_init6F_i_+0xd0 libjvm.so`__1cMinit_globals6F_i_+0x82 libjvm.so`__1cHThreadsJcreate_vm6FpnOJavaVMInitArgs_pb_i_+0x1a4 libjvm.so`JNI_CreateJavaVM+0x6c libjli.so`JavaMain+0x144 libc.so.1`_thrp_setup+0xa5 libc.so.1`_lwp_start 90 83063 mmap:return libc.so.1`mmap+0x10 libjvm.so`__1cOPSVirtualSpaceJexpand_by6ML_b_+0x43 libjvm.so`__1cIPSOldGen2t5B6MnNReservedSpace_LLLLpkci_v_+0x105 libjvm.so`__1cJPSPermGen2t5B6MnNReservedSpace_LLLLpkci_v_+0x50 libjvm.so`__1cUParallelScavengeHeapKinitialize6M_i_+0x6af libjvm.so`__1cIUniversePinitialize_heap6F_i_+0x1a9 libjvm.so`__1cNuniverse_init6F_i_+0xd0 libjvm.so`__1cMinit_globals6F_i_+0x82 libjvm.so`__1cHThreadsJcreate_vm6FpnOJavaVMInitArgs_pb_i_+0x1a4 libjvm.so`JNI_CreateJavaVM+0x6c libjli.so`JavaMain+0x144 libc.so.1`_thrp_setup+0xa5 libc.so.1`_lwp_start Now that I know what to look for I can see it in the hs_err file as well: PSPermGen total 22528K, used 766K [0x0000000010000000, 0x0000000011600000, 0x0000000015200000) object space 22528K, 3% used [0x0000000010000000,0x00000000100bfb48,0x0000000011600000) PermGen starts at 0x0000000010000000. This also explains why this only happens on Solaris x64. The java heap address is using HeapBaseMinAddress, which is a platform dependent flag and Solaris x64 is the only platform that starts at 256M: src/closed/os_cpu/linux_arm/vm/globals_linux_arm.hpp 23:define_pd_global(uintx,HeapBaseMinAddress, 2G); src/closed/os_cpu/linux_ppc/vm/globals_linux_ppc.hpp 23:define_pd_global(uintx,HeapBaseMinAddress, 2G); src/os_cpu/bsd_x86/vm/globals_bsd_x86.hpp 50:define_pd_global(uintx, HeapBaseMinAddress, 2G); src/os_cpu/bsd_zero/vm/globals_bsd_zero.hpp 45:define_pd_global(uintx, HeapBaseMinAddress, 2G); src/os_cpu/linux_sparc/vm/globals_linux_sparc.hpp 37:define_pd_global(uintx, HeapBaseMinAddress, CONST64(4)G); src/os_cpu/linux_x86/vm/globals_linux_x86.hpp 48:define_pd_global(uintx,HeapBaseMinAddress, 2G); src/os_cpu/linux_zero/vm/globals_linux_zero.hpp 45:define_pd_global(uintx, HeapBaseMinAddress, 2G); src/os_cpu/solaris_sparc/vm/globals_solaris_sparc.hpp 38:define_pd_global(uintx, HeapBaseMinAddress, CONST64(4)G); 40:define_pd_global(uintx, HeapBaseMinAddress, 2G); src/os_cpu/solaris_x86/vm/globals_solaris_x86.hpp 47:define_pd_global(uintx,HeapBaseMinAddress, 256M); src/os_cpu/windows_x86/vm/globals_windows_x86.hpp 49:define_pd_global(uintx, HeapBaseMinAddress, 2*G);
22-03-2013
I thought this was the CDS archive but it's not mapped because it's not on by default for 64 bit.
21-03-2013
Jorge helped me reproduce the problem on oaf657. Thanks! I get similar results as David before. We can start with 78 GC threads but get OOME with 79 GC threads. I got some pmap output, attached as pmap-oaf657.txt. From that we can see that something has been mapped at address 0x0000000010000000: 0000000010000000 2048 2048 2048 - 2M rw--- [ anon ] Since the C heap starts at 0x0000000000411000 this means that we only have 251 MB C heap even though the machine has 255 GB physical memory. So, I think it is safe to say that we allocate a bit too much memory for each GC thread, but the main problem is that Solaris malloc can only handle consecutive memory and someone maps something at a fairly low address. I'll see if I can figure out who maps the memory at 0x0000000010000000. If it wasn't for 5 anonymous mappings there would be a lot of address space for the C heap to use: 000000000FFD3000 168 168 168 - 4K rw--- [ heap ] 0000000010000000 2048 2048 2048 - 2M rw--- [ anon ] 0000000010200000 20480 - - - - rw--- [ anon ] 0000000015200000 1396736 - - - - rw--- [ anon ] 000000055C600000 2048 2048 2048 - 2M rw--- [ anon ] 000000055C800000 698368 - - - - rw--- [ anon ] FFFF80FF5EFE8000 84 84 84 - 4K rw--R [ stack tid=84 ]
20-03-2013
pmap output from oaf657
20-03-2013
I have looked a little more at this. Both hs_err files attached here report: # Native memory allocation (malloc) failed to allocate 1048576 bytes But there is actually lots of free memory available: hs_err_pid2616.log: Memory: 4k page, physical 268426132k(249042936k free) hs_err_pid2709.log: Memory: 4k page, physical 268426132k(34314680k free) So, this might be a bug/limitation in the Solaris malloc. One limitation in Solaris malloc that I know of is that it can only use one chunk of consecutive memory. If anything is mapped close to where the C heap starts there will be very little C heap to allocate from. That might be the case here, but it is difficult to know without having pmap output from the run. Is there any way I can get access to a machine where this is reproducible? I have not been able to reproduce this on any of the machines I have tried.
20-03-2013
PUBLIC COMMENTS I did some tracing with the fastdebug build, looking at TraceGCTaskManager/Threads/Queue. It shows that all the worker threads get created and started okay but that the memory allocation failures follows shortly after - in taskqueue.hpp I can not see anything in the source to indicate a potential difference between GCTaskThreads and other threads. So at this stage I have to assume it is something related to the taskqueue which is somehow 64-bit and x86 specific.
20-09-2012
PUBLIC COMMENTS No problem creating 80 GC threads on a sparcv9 S11 system (different S11 version) ~ > /java/re/jdk/7u7/latest/binaries/solaris-sparcv9/bin/java -d64 -XX:ParallelGCThreads=80 -version java version "1.7.0_07" Java(TM) SE Runtime Environment (build 1.7.0_07-b10) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) ~ > cat /etc/release Oracle Solaris 11 11/11 SPARC Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved. Assembled 18 October 2011
19-09-2012
EVALUATION The system has 128 processors and this is causing the GC to try and create a lot of parallel GC threads. Using -XX:ParallelGCThreads=N I can set N to a maximum of 78 after which the error appears. Now we should be able to create a lot more threads than that, but there may be some system configuration setting that is affecting the default stack size or placement, that results in the virtual address space filling up. Need to try this on other large machines using different OS and/or OS version. Further with 78 GC threads I was able to create another 325 Java threads before C-Heap was exhausted. Then with 1 GC thread I was able to create 29850 Java threads. This would seem to indicate that there is a problem with the way the GC threads are defined.
19-09-2012
WORK AROUND Reduce the number of GC threads eg using -XX:ParallelGCThreads=N N <79 But there may still be a limit on the number of application threads that can be created.
19-09-2012
EVALUATION The VM is aborting doing a simple "java -version", which means the failure is completely unreasonable. However I am unable to duplicate this problem on another Solaris 11 x86 machine. I am wondering if this may be specific to a particular drop of Solaris 11 (tested Oracle Solaris 11 11/11 X86) or is peculiar to a given kind of hardware? Can you duplicate the issue on any other systems? Also can you try to run the fastdebug VM on the problematic system.
19-09-2012
EVALUATION The VM will always abort if a native memory allocation request fails as is the case here. The question to answer is whether the ~1MB allocation was unreasonable or not in the operating conditions. The hs_err log may shed some light but bugster is not letting me access the attachment.
12-09-2012