United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
JDK-7197666 : java -d64 -version core dumps in a box with lots of memory

Details
Type:
Bug
Submit Date:
2012-09-11
Status:
Resolved
Updated Date:
2013-05-28
Project Name:
JDK
Resolved Date:
2013-04-08
Component:
hotspot
OS:
solaris_11
Sub-Component:
gc
CPU:
x86
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs23,hs24,hs25,7u7
Fixed Versions:
hs25 (b28)

Related Reports
Backport:
Backport:
Backport:
Duplicate:
Relates:
Relates:
Relates:
Relates:
Relates:

Sub Tasks

Description
System description:
X4800 oaf657 128 x 2267MHz 262135 Megabytes

oaf657# java -d64 -version 2>/dev/stdout
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /HUDSON/workspace/jdk7u7-2-build-solaris-amd64-product/jdk7u7/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /export/bench/autobench/5.x/Results/hs_err_pid7393.log

oaf657# java -version 2>/dev/stdout
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) Server VM (build 23.3-b01, mixed mode)

oaf657# swap -lh
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/rpool/swap 274,2        4K     218G     218G

oaf657# echo ::memstat |mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                    4236512             16548    6%
ZFS File Data              269796              1053    0%
Anon                     48217526            188349   72%
Exec and libs               54653               213    0%
Page cache                   6824                26    0%
Free (cachelist)            32705               127    0%
Free (freelist)          14288517             55814   21%
Total                    67106533            262134
using 1.6.0 installed in the box it doesn't have the same problem:

oaf657# ./java -d64 -version
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode) 

but I've installed 1.7.0_05 and it fails:

oaf657# ./java -version
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b06)
Java HotSpot(TM) Server VM (build 23.1-b03, mixed mode)

oaf657# ./java -d64 -version
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /export/HUDSON/workspace/jdk7u5-2-build-solaris-amd64-product/jdk7u5/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /root/7u5/solaris-amd64/bin/hs_err_pid1353.log

I'm copying older updates, I'll be updating this bug once I've got more information.
The problem starts in 7u4.

7u3:

oaf657# ./java -version
java version "1.7.0_03"
Java(TM) SE Runtime Environment (build 1.7.0_03-b04)
Java HotSpot(TM) Server VM (build 22.1-b02, mixed mode)

oaf657# ./java -d64 -version
java version "1.7.0_03"
Java(TM) SE Runtime Environment (build 1.7.0_03-b04)
Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)

7u4:

oaf657# ./java -version
java version "1.7.0_04"
Java(TM) SE Runtime Environment (build 1.7.0_04-b20)
Java HotSpot(TM) Server VM (build 23.0-b21, mixed mode)

oaf657# ./java -d64 -version
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /export/HUDSON/workspace/jdk7u4-2-build-solaris-amd64-product/jdk7u4/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /root/7u4/solaris-amd64/bin/hs_err_pid2894.log
So far I haven't found this problem anywhere else.

oaf657# echo ::memstat |mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                    3938332             15384    6%
ZFS File Data              237030               925    0%
Anon                        54886               214    0%
Exec and libs                1215                 4    0%
Page cache                   4649                18    0%
Free (cachelist)           120686               471    0%
Free (freelist)          62749735            245116   94%
Total                    67106533            262134

oaf657# ./java -d64 -version
java version "1.7.0_03-fastdebug"
Java(TM) SE Runtime Environment (build 1.7.0_03-fastdebug-b04)
Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02-fastdebug, mixed mode)

oaf657# ./java -d64 -version
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /export/HUDSON/workspace/jdk7u4-2-build-solaris-amd64-product/jdk7u4/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /root/7u4/solaris-amd64/fastdebug/bin/hs_err_pid2616.log

I've attached the log file.
I've just reproduced the same problem in another x4800:

X4800 oaf690 128 x 2267MHz 524279 Megabytes

oaf690# java -d64 -version
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /HUDSON/workspace/jdk7u7-2-build-solaris-amd64-product/jdk7u7/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /root/hs_err_pid20757.log

both boxes are running s11u1_24a.

                                    

Comments
EVALUATION

The VM will always abort if a native memory allocation request fails as is the case here. The question to answer is whether the ~1MB allocation was unreasonable or not in the operating conditions. The hs_err log may shed some light but bugster is not letting me access the attachment.
                                     
2012-09-12
EVALUATION

The VM is aborting doing a simple "java -version", which means the failure is completely unreasonable.

However I am unable to duplicate this problem on another Solaris 11 x86 machine. I am wondering if this may be specific to a particular drop of Solaris 11 (tested  Oracle Solaris 11 11/11 X86) or is peculiar to a given kind of hardware? Can you duplicate the issue on any other systems?

Also can you try to run the fastdebug VM on the problematic system.
                                     
2012-09-19
WORK AROUND

Reduce the number of GC threads eg using -XX:ParallelGCThreads=N N <79

But there may still be a limit on the number of application threads that can be created.
                                     
2012-09-19
EVALUATION

The system has 128 processors and this is causing the GC to try and create a lot of parallel GC threads. Using -XX:ParallelGCThreads=N I can set N to a maximum of 78 after which the error appears.

Now we should be able to create a lot more threads than that, but there may be some system configuration setting that is affecting the default stack size or placement, that results in the virtual address space filling up. Need to try this on other large machines using different OS and/or OS version.

Further with 78 GC threads I was able to create another 325 Java threads before C-Heap was exhausted. Then with 1 GC thread I was able to create 29850 Java threads. This would seem to indicate that there is a problem with the way the GC threads are defined.
                                     
2012-09-19
PUBLIC COMMENTS

No problem creating 80 GC threads on a sparcv9 S11 system (different S11 version)

 ~ > /java/re/jdk/7u7/latest/binaries/solaris-sparcv9/bin/java -d64 -XX:ParallelGCThreads=80 -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b10)
Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode)
 ~ > cat /etc/release
                          Oracle Solaris 11 11/11 SPARC
  Copyright (c) 1983, 2011, Oracle and/or its affiliates.  All rights reserved.
                            Assembled 18 October 2011
                                     
2012-09-19
PUBLIC COMMENTS

I did some tracing with the fastdebug build, looking at TraceGCTaskManager/Threads/Queue. It shows that all the worker threads get created and started okay but that the memory allocation failures follows shortly after - in taskqueue.hpp

I can not see anything in the source to indicate a potential difference between GCTaskThreads and other threads. So at this stage I have to assume it is something related to the taskqueue which is somehow 64-bit and x86 specific.
                                     
2012-09-20
I have looked a little more at this. Both hs_err files attached here report:

# Native memory allocation (malloc) failed to allocate 1048576 bytes

But there is actually lots of free memory available:

hs_err_pid2616.log: Memory: 4k page, physical 268426132k(249042936k free)
hs_err_pid2709.log: Memory: 4k page, physical 268426132k(34314680k free)

So, this might be a bug/limitation in the Solaris malloc. One limitation in Solaris malloc that I know of is that it can only use one chunk of consecutive memory. If anything is mapped close to where the C heap starts there will be very little C heap to allocate from. That might be the case here, but it is difficult to know without having pmap output from the run.

Is there any way I can get access to a machine where this is reproducible? I have not been able to reproduce this on any of the machines I have tried.
                                     
2013-03-20
pmap output from oaf657
                                     
2013-03-20
Jorge helped me reproduce the problem on oaf657. Thanks!

I get similar results as David before. We can start with 78 GC threads but get OOME with 79 GC threads. I got some pmap output, attached as pmap-oaf657.txt.

From that we can see that something has been mapped at address 0x0000000010000000:

0000000010000000       2048       2048       2048          -   2M rw---    [ anon ]

Since the C heap starts at 0x0000000000411000 this means that we only have 251 MB C heap even though the machine has 255 GB physical memory.

So, I think it is safe to say that we allocate a bit too much memory for each GC thread, but the main problem is that Solaris malloc can only handle consecutive memory and someone maps something at a fairly low address.

I'll see if I can figure out who maps the memory at 0x0000000010000000. 

If it wasn't for 5 anonymous mappings there would be a lot of address space for the C heap to use:

000000000FFD3000        168        168        168          -   4K rw---    [ heap ]
0000000010000000       2048       2048       2048          -   2M rw---    [ anon ]
0000000010200000      20480          -          -          -    - rw---    [ anon ]
0000000015200000    1396736          -          -          -    - rw---    [ anon ]
000000055C600000       2048       2048       2048          -   2M rw---    [ anon ]
000000055C800000     698368          -          -          -    - rw---    [ anon ]
FFFF80FF5EFE8000         84         84         84          -   4K rw--R    [ stack tid=84 ]

                                     
2013-03-20
I thought this was the CDS archive but it's not mapped because it's not on by default for 64 bit.
                                     
2013-03-21
Using DTrace I can see that it is actually the java heap that gets mapped at 0x0000000010000000:

oaf690# dtrace -n 'pid$target::mmap:return /arg1 == 0x10000000/ { ustack(); }' -c 'java -d64 -version'
dtrace: description 'pid$target::mmap:return ' matched 2 probes
dtrace: buffer size lowered to 2m
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1048576 bytes for E in /HUDSON/workspace/jdk7u17-2-build-solaris-amd64-product/jdk7u17/hotspot/src/share/vm/utilities/taskqueue.hpp
# An error report file with more information is saved as:
# /root/hs_err_pid11051.log
dtrace: pid 11051 has exited
 CPU     ID                    FUNCTION:NAME
  90  83063                      mmap:return 
              libc.so.1`mmap+0x10
              libjvm.so`__1cRReservedHeapSpace2t5B6MLLLLpc_v_+0xcc
              libjvm.so`__1cUParallelScavengeHeapKinitialize6M_i_+0x206
              libjvm.so`__1cIUniversePinitialize_heap6F_i_+0x1a9
              libjvm.so`__1cNuniverse_init6F_i_+0xd0
              libjvm.so`__1cMinit_globals6F_i_+0x82
              libjvm.so`__1cHThreadsJcreate_vm6FpnOJavaVMInitArgs_pb_i_+0x1a4
              libjvm.so`JNI_CreateJavaVM+0x6c
              libjli.so`JavaMain+0x144
              libc.so.1`_thrp_setup+0xa5
              libc.so.1`_lwp_start

  90  83063                      mmap:return 
              libc.so.1`mmap+0x10
              libjvm.so`__1cOPSVirtualSpaceJexpand_by6ML_b_+0x43
              libjvm.so`__1cIPSOldGen2t5B6MnNReservedSpace_LLLLpkci_v_+0x105
              libjvm.so`__1cJPSPermGen2t5B6MnNReservedSpace_LLLLpkci_v_+0x50
              libjvm.so`__1cUParallelScavengeHeapKinitialize6M_i_+0x6af
              libjvm.so`__1cIUniversePinitialize_heap6F_i_+0x1a9
              libjvm.so`__1cNuniverse_init6F_i_+0xd0
              libjvm.so`__1cMinit_globals6F_i_+0x82
              libjvm.so`__1cHThreadsJcreate_vm6FpnOJavaVMInitArgs_pb_i_+0x1a4
              libjvm.so`JNI_CreateJavaVM+0x6c
              libjli.so`JavaMain+0x144
              libc.so.1`_thrp_setup+0xa5
              libc.so.1`_lwp_start


Now that I know what to look for I can see it in the hs_err file as well:

 PSPermGen       total 22528K, used 766K [0x0000000010000000, 0x0000000011600000, 0x0000000015200000)
  object space 22528K, 3% used [0x0000000010000000,0x00000000100bfb48,0x0000000011600000)

PermGen starts at 0x0000000010000000.

This also explains why this only happens on Solaris x64. The java heap address is using HeapBaseMinAddress, which is a platform dependent flag and Solaris x64 is the only platform that starts at 256M:

src/closed/os_cpu/linux_arm/vm/globals_linux_arm.hpp
23:define_pd_global(uintx,HeapBaseMinAddress,       2*G);

src/closed/os_cpu/linux_ppc/vm/globals_linux_ppc.hpp
23:define_pd_global(uintx,HeapBaseMinAddress,       2*G);

src/os_cpu/bsd_x86/vm/globals_bsd_x86.hpp
50:define_pd_global(uintx, HeapBaseMinAddress,      2*G);

src/os_cpu/bsd_zero/vm/globals_bsd_zero.hpp
45:define_pd_global(uintx, HeapBaseMinAddress,      2*G);

src/os_cpu/linux_sparc/vm/globals_linux_sparc.hpp
37:define_pd_global(uintx, HeapBaseMinAddress,      CONST64(4)*G);

src/os_cpu/linux_x86/vm/globals_linux_x86.hpp
48:define_pd_global(uintx,HeapBaseMinAddress,       2*G);

src/os_cpu/linux_zero/vm/globals_linux_zero.hpp
45:define_pd_global(uintx, HeapBaseMinAddress,      2*G);

src/os_cpu/solaris_sparc/vm/globals_solaris_sparc.hpp
38:define_pd_global(uintx, HeapBaseMinAddress,      CONST64(4)*G);
40:define_pd_global(uintx, HeapBaseMinAddress,      2*G);

src/os_cpu/solaris_x86/vm/globals_solaris_x86.hpp
47:define_pd_global(uintx,HeapBaseMinAddress,       256*M);

src/os_cpu/windows_x86/vm/globals_windows_x86.hpp
49:define_pd_global(uintx, HeapBaseMinAddress,      2*G);

                                     
2013-03-22
Twelve except* tests failed in PIT on 23.03.2013 because of this issue:

RULE nsk/stress/except/except001 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except002 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except003 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except004 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except005 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except006 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except007 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except008 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except009 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except010 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except011 Crash Out of Memory Error ...allocation.i... 
RULE nsk/stress/except/except012 Crash Out of Memory Error ...allocation.i... 

http://aurora.ru.oracle.com/functional/faces/RunDetails.xhtml?names=190312.JAVASE.PIT.VM-292#nsk.stress%20[!desktop]%20%28tonga%29_nsk/stress/except/except001

It failed on 18 March as well:
http://aurora.ru.oracle.com/functional/faces/RunDetails.xhtml?names=187233.JAVASE.PROMOTION.VM-471&show-limit=2000&filter=
                                     
2013-03-25
Test gc/startup_warnings/TestG1.java failed in PIT of hs25-b24.
Run details: http://aurora.ru.oracle.com/functional/faces/RunDetails.xhtml?names=190544.JAVASE.PIT.VM-274#JT_HS%20%28jtreg%29_gc/startup_warnings/TestG1.java

RULE gc/startup_warnings/TestG1.java Crash Out of Memory Error ...allocation.inline.hpp...

#  Out of Memory Error (/opt/jprt/T/P1/144305.hseigel/s/src/share/vm/memory/allocation.inline.hpp:61), pid=8572, tid=2
                                     
2013-03-25
RULE nsk/monitoring/ThreadMXBean/GetThreadAllocatedBytes/doubleAllocationTest_proxy_custom_array MemError Could not reserve enough space for ... 
                                     
2013-03-27
URL:   http://hg.openjdk.java.net/hsx/hotspot-gc/hotspot/rev/83f27710f5f7
User:  brutisso
Date:  2013-04-08 08:33:26 +0000

                                     
2013-04-08
URL:   http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/83f27710f5f7
User:  amurillo
Date:  2013-04-12 10:29:37 +0000

                                     
2013-04-12



Hardware and Software, Engineered to Work Together