JDK-6750401 : SSL stress test with GF leads to 32 bit max process size in less than 5 minutes,with PCKS11 provider
  • Type: Bug
  • Component: security-libs
  • Sub-Component: javax.crypto:pkcs11
  • Affected Version: 6,6u14
  • Priority: P1
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,solaris_10
  • CPU: generic,sparc
  • Submitted: 2008-09-19
  • Updated: 2011-04-18
  • Resolved: 2011-04-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other JDK 6 JDK 7
5.0u24-revFixed 5.0u25Fixed 6u18Resolved 7 b44Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Please see issue : https://glassfish.dev.java.net/issues/show_bug.cgi?id=5250 for a detailed discussion
I am copying over the info from the jdk6.dev.java.net issue tracker.  The URL for 5250 is in the first Description entry.  The attachment mentioned has been added to this bug.

---begin---

Please see also Issue 5250 for Glassfish for more background information...

A JVM crash is noticed during a simple load test against Glassfish v2 ur b40
running any of the following JDK : jdk1.5.0_15, jdk1.6.0_06, jdk1.6.0_10rc
(build28).

By simply requesting the index.html on the SSL-enabled listener of the Glassfish
server, the process size (as reported by prstat) will increase steadily and
reach close to 4Gb (3.8) within 5 minutes.

hs_err_pidxxx.log and core file are available and can be generated as needed.

Most of them report one of the following 2 errors :

#
# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xff390858, pid=9667, tid=201
#
# Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode solaris-sparc)
# Problematic frame:
# C  [libc_psr.so.1+0x858]  memcpy+0x450
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#


or 


#
# An unexpected error has been detected by Java Runtime Environment:
#
# java.lang.OutOfMemoryError: requested 140 bytes for CHeapObj-new. Out of swap
space?
#
#  Internal Error (allocation.inline.hpp:42), pid=9244, tid=118
#  Error: CHeapObj-new
#
# Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode solaris-sparc)
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

The crash though are always generated when the process is close to the 32bit
proc memory max size.

T2000 machines were used to run Glassfish and the load generator (Grinder). Both
machines are within the same subnet.

Running the same test against the HTTP listener (non-SSL) doesn't show any
leaks: the process size is very stable (no growth) and the process doesn't crash.

We've already involved the Glassfish team (see issue 5250) and after
investigation they recommended filing a bug against the JVM since all error
messages seem to point to a C heap corruption.

This is an urgent issue for the OpenSSO team.

Thanks,
N.

------- Additional comments from nphilipp Fri Aug 22 19:16:07 +0000 2008 -------

Created an attachment (id=6)
Tar file of all the hs_err_pid***.log files generated during testing

------- Additional comments from nphilipp Fri Aug 22 19:18:06 +0000 2008 -------

I also created a bug report earlier at bugreport.sun.com , just FYI.

Review ID: 1324946

---end---

I'm not sure what bug was filed at bugreport.sun.com.

Comments
EVALUATION The main problem was that the 128 threads were creating objects quickly, and only one Finalizer thread was being run. As a result, there were a large number of native-heap backed Java objects waiting for finalization, and thus "leaking" memory. Also, the JSSE ciphers were never calling Cipher.doFinal(), and thus the Cipher objects had to cancel their PKCS11 backed cipher operations, which was also expensive and adding to the finalizer backlog. We converted the P11Keys to use WeakReferences, plus added the doFinal in JSSE, and in the P11SecretKeyFactory to remove the final cancellation, and that took care of the finalization backlog. Still showing a small amount of leaking, which is under investigation. I put in some code to show PKCS11 native malloc/frees, and all are matching up. Whereever this new leak is, it's not from the PKCS11 native calls.
26-11-2008

EVALUATION Much of the IAIK code does not check for (malloc() == null) problems, which will lead to SIGSEGV's if memory is out. This needs to be fixed, and will be a fair amount of work. In the first example: hs_err_pid6293.log, this clearly shows a null pointer being dereferenced following a call to wrapper_PKCS11_C_1EncryptUpdate, and a SIGSEGV being returned. This is not the root cause of the memory leak, but definitely a problem which needs to be fixed.
21-10-2008