Bug ID: JDK-7133330 LDAP/SSL Performance degradation in JDK 1.6.0

JDK-7133330 : LDAP/SSL Performance degradation in JDK 1.6.0_29/30

Type: Bug
Component: security-libs
Sub-Component: javax.net.ssl
Affected Version: 6u30,7

Priority: P3
Status: Closed
Resolution: Duplicate
OS: solaris_10,windows_7
CPU: x86,sparc

Submitted: 2012-01-25
Updated: 2012-04-12
Resolved: 2012-04-12

Related Reports

Duplicate :	JDK-7157903 - JSSE client sockets are very slow
Duplicate :	JDK-7156553 - TLS session terminated by mobile device after upgrading 1.7.0 from 1.6.0

Description

I am making minor corrections to the original submission. I confirmed with the submitter that he meant to say that this last worked in 6u27, but in a couple places, he had written 6u29. Marking with 6u27.

FULL PRODUCT VERSION :
java version "1.6.0_30"
Java(TM) SE Runtime Environment (build 1.6.0_30-b12)
Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
This seems to affect us on every OS we have tested it on. Here are the specific OS we have tried:
Windows 7:
Microsoft Windows [Version 6.1.7601]

Red Hat Linux:
Linux <deleted> 2.6.18-274.17.1.el5 #1 SMP Wed Jan 4 22:45:44 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

Mac:
Darwin <deleted> 11.2.0 Darwin Kernel Version 11.2.0: Tue Aug 9 20:54:00 PDT 2011; root:xnu-1699.24.8~1/RELEASE_X86_64 x86_64 i386

EXTRA RELEVANT SYSTEM CONFIGURATION :
This problem occurs on a plain installation of the 1.6.0_29 or 1.6.0_30 version of the JDK.

A DESCRIPTION OF THE PROBLEM :
We have a library which many of our Java applications use to connect via LDAP to retrieve information from Active Directory. Using any JDK after 1.6.0_27 causes a severe performance hit when doing any query against Active Directory. In versions of the JDK up to 1.6.0_27 these queries would take a few milliseconds at most, but they are now taking 200ms+ per query. This slowness appears to be coming during the read and write to a secure socket.

Here is a small snippet of code we use to initialize the InitialDirContext which we use to query ADS:

Hashtable<String, String> env = new Hashtable<String, String>();
env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.ldap.LdapCtxFactory");
env.put("com.sun.jndi.ldap.connect.timeout", "5000");
env.put(Context.PROVIDER_URL, "...");
env.put(Context.SECURITY_AUTHENTICATION, "simple");
env.put(Context.SECURITY_PRINCIPAL, "...");
env.put(Context.SECURITY_CREDENTIALS, "...");
env.put(Context.SECURITY_PROTOCOL, "ssl");
return new InitialDirContext(env);

Note: If we set the security protocol to "plain" the connection is as fast as it used to be, though unsecure.

If there is any more information which I can provide to help you debug this issue please let me know.

REGRESSION. Last worked in version 6u27

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Initialize DirContext to connect to LDAP (see description for the properties we are setting for the connection)
2. Query for attributes

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Querying an LDAP connection which uses the SSL protocol should take a few milliseconds at most.
ACTUAL -
During one test case the queries we run against LDAP are taking somewhere around 200 ms per query. This causes a common use case for the library we use to connect to LDAP to take around 50 seconds to return when it would take around one second with any JDK before 1.6.0_29.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
There are no error messages

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
We can supply this if necessary but the library that generates this issue contains several thousand lines of code so it's too large to paste here.
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
We have two potential workarounds, though neither are a good fix to this issue:
1. Use a plain, unencrypted connection to LDAP
2. Copying the jre/lib/jsse.jar from the 1.6.0_27 JDK into the 1.6.0_30 version appears to fix this issue.

Comments

EVALUATION Turned out that this is most likely a duplicate of 7157903. Nagle's alg at the kernel level was being enforced on the small packet, and thus was impacting the performance. Fix just went into 7u6/8, and will be pushed into earlier releases soon. Hope to hear from submitter if proposed fix will indeed fix problem.
12-04-2012
EVALUATION Microsoft has been notified and is investigating. They don't have any other reported instances of this, but may have some other LDAP latency issues that may be related. There is currently no MS bug number created yet, but the Oracle support case number is in the comments.
23-02-2012
EVALUATION We will try to notify Microsoft regarding this issue.
16-02-2012
EVALUATION $ java -Djavax.net.debug=all -agentlib:hprof=cpu=times,monitor=y,file=hprof_true.txt XXX $ java -Djavax.net.debug=all -agentlib:hprof=cpu=times,monitor=y,file=hprof_false.txt -Djsse.enableCBCProtection=false XXX
16-02-2012
EVALUATION Customer was able to add timestamping to the debug log information as described in http://sim.ivi.co/2009/05/these-days-i-was-asked-about-strange.html This helped immensely. Customer was able to supply much debug information: ==== I modified the library to always connect to the same IP so that the results should be the same across multiple runs. I also added the timestamp logging like you requested and ran the test four times. Please see the zip for the files containing the results. Please note that this is still multithreaded, but since the thread ID is logged on each line that seems like it might be less of an issue. If you would like for me to continue working on making this single threaded please let me know. Here are the contents of the zip: * 6u30-CBC-true.txt, 6u30-CBC-false.txt: Ran with the CBC flag with thread ID and timestamp logging * 6u30-CBC-true-hprof.txt, 6u30-CBC-false-hprof.txt: Same as the previous test but with the "-agentlib:hprof=cpu=times" flag set * hprof_true.txt, hprof_false.txt: Output from the tests with the "-agentlib:hprof=cpu=times" flag set ==== The Responsible Engineer looked at the debug information and responded: ==== Thanks for the help. The log with timestamp and thread ID is helpful. By comparing the two logs between 6u30-CBC-false and 6u30-CBC-true, we found that: 1. there is no visible performance downgrade in the local TLS operations, including encryption/decryption, write network bytes. 2. there are significant performance downgrade while reading network bytes from server. When no CBC protection, the read completed immediately after sending the request (or after the local write). However, with CBC protection, the read may take 100 to 200 milliseconds after the request is sent. This indicates to me that the server is able to handle the split records (the approach for CBC protection), but it may cannot handle it effectively. Can you debug the server and analysis the performance bottleneck? Or you may want to contact the vendor of ADS for a query. ==== I added the following: ==== Thread 1 is doing all the writing, and Thread 11 is doing all the reading. Fortunately, the interaction between the two handshakes didn't continue through the whole log, so there was no need to make the app single threaded. Assuming that your app does the standard requests/responses pattern, and that there are not multiple requests in flight at any one time, then I'd say I'm seeing the same things as the responsible engineer. (This seems to be the case when looking at the actual raw data below.) Here's some representative samples using: % grep "Raw [write\|read]" 6u30-CBC-true.txt No Protection (false): ====================== <Thread Id: 1><Timestamp: 1329322130291> [Raw write]: length = 245 <Thread Id: 11><Timestamp: 1329322130291> [Raw read]: length = 5 <Thread Id: 11><Timestamp: 1329322130291> [Raw read]: length = 368 <Thread Id: 1><Timestamp: 1329322130291> [Raw write]: length = 245 <Thread Id: 11><Timestamp: 1329322130301> [Raw read]: length = 5 <Thread Id: 11><Timestamp: 1329322130301> [Raw read]: length = 368 Protection (true): ================== <Thread Id: 1><Timestamp: 1329322239224> [Raw write]: length = 37 <Thread Id: 1><Timestamp: 1329322239224> [Raw write]: length = 261 <Thread Id: 11><Timestamp: 1329322239427> [Raw read]: length = 5 <Thread Id: 11><Timestamp: 1329322239442> [Raw read]: length = 752 <Thread Id: 1><Timestamp: 1329322239442> [Raw write]: length = 37 <Thread Id: 1><Timestamp: 1329322239442> [Raw write]: length = 261 <Thread Id: 11><Timestamp: 1329322239645> [Raw read]: length = 5 <Thread Id: 11><Timestamp: 1329322239645> [Raw read]: length = 416 I'm also seeing response in the 200 ms range with protection enabled. > As remind me that the server is able to handle the split records (the > approach for CBC protection), but it may cannot handle it > effectively. I had the same idea before seeing this data, I'm thinking that the server is just not happy with the format, but will still process it. I did a side/side diff (with highlighting) between the two files (6u30-CBC-true.txt/6u30-CBC-false.txt), and I did not see any difference in the inbound application for approx the last 20 records at the end of the files. I think the inbound data is the same in both cases. ==== This strongly indicates that the problem is with the server ability to handle split packets, and we are thus closing the bug.
16-02-2012
EVALUATION We received some preliminary information from submitter, but it contained two connections interleaved which made analysis difficult. Asked submitter for debug information from a single connection. Some additional details: ======================== I've been unable to get any kind of delay like what you've been seeing in either the JDK6u30 or JDK8 code, in which the changed code is almost identical. I'm really wondering if the Active Directory Server is having problems parsing the split. I'm using the SSLSocketClientOptTunneling program (see my home directory/tmp) which connects to a Web server and does a "G" + "ET / HTTP/1.0<cr><lf>etc...". When writing out the packets in repeated runs of the app, I'm getting values which seem strongly dependent on the load of the systems. Basically, I'm getting some results which take longer for the second packet, some which are shorter, and many in the same range +- 5%. For example (in nanos): Time to write: 170965 Time to write: 167338 If there was a big problem, the second value would be consistently and substantially bigger.
15-02-2012
EVALUATION An email was sent to the user asking for more information. Portions are reproduced here: Can you please do us a favor and try setting the following System Property before running your test. Please ONLY use JDK 6u30, do NOT use 6u29. % java -Djsse.enableCBCProtection=false YourApp If you have the ability to easily test, can you also run with 7u2 and let us know if you see similar degradation. If not, don't worry about it. If this property makes a difference, we'd really like to get the debug output for both the successful and failed runs. Can you please add the debugging system property invocation and capture the debugging output? % java \ -Djavax.net.debug=all -Djsse.enableCBCProtection=false \ YourApp 2>&1 \| tee CBC-false % java \ -Djavax.net.debug=all -Djsse.enableCBCProtection=true \ YourApp 2>&1 \| tee CBC-true We made some changes to the way data is output that would add a small amount of overhead, but there should be nothing on the order of what you're seeing. In your test, you also reported that this last worked in 6u29, I'm assuming that a typo? Regarding the forum https://forums.oracle.com/forums/thread.jspa?threadID=2332936&tstart=0 user9158206's post makes it clear that the LDAP response in 6u27 vs 6u29/30 is different. We've seen a lot of issues where the server code does not properly accept the split packets, what we are calling "1/n-1 intolerance". I wonder if the server's read of the split packet and subsequent response is causing the degradation and/or errors that may not be noticed. It's very interesting that the exact same data written would cause such a difference in response size: Thread-42, READ: TLSv1 Application Data, length = 16368 Thread-42, READ: TLSv1 Application Data, length = 16368 Thread-42, READ: TLSv1 Application Data, length = 5696 to: Thread-270, READ: TLSv1 Application Data, length = 16368 Thread-270, READ: TLSv1 Application Data, length = 16368 Thread-270, READ: TLSv1 Application Data, length = 11920 That is why we'd like to get the debug output above. A couple more bits of debug information might help. What is the LDAP server you are connecting to? Is this also a MS Active Domain as the user described? Do you know if it, and the machine itself, is up-to-date with the Jan 2012 Patch Tuesday? Do you have other non-Java MS applications that are also seeing this degradation?
14-02-2012