JDK-8245551 : Distributed TLS Sessions
  • Type: JEP
  • Component: security-libs
  • Sub-Component: javax.net.ssl
  • Priority: P2
  • Status: Draft
  • Resolution: Unresolved
  • Submitted: 2020-05-21
  • Updated: 2020-11-30
Related Reports
Relates :  
Relates :  
Sub Tasks
JDK-8245576 :  
JDK-8255222 :  
Description
Summary
-------
Improve the scalability of the TLS implementation by adding support for efficiently distributing and resuming TLS sessions across clusters of computers.

Goals
-----
- Improve the scalability and throughput of the TLS implementation.

Success Metrics
---------------
- Improve the performance of the SunJSSE provider for a multiple-nodes cluster by 20%.
- Improve the performance of the SunJSSE provider for a single node server by 5%.

Motivation
----------
Negotiating session parameters for TLS (in a full handshake) is expensive.  Since clients frequently reconnect to the same server, TLS already supports efficiently reusing session credentials from a previous session between the same client/server.  We wish to extend this benefit to reusing session credentials from a previous connection between the same client _and an entire cluster_, which will decrease server costs and increase application responsiveness.  

Description
----------
In order to increase capacity (the number of concurrent users) and reliability, an application can be deployed on a cluster of servers, where network connections and traffic to the application are distributed across the cluster.  The servers could be located in different locations, on different networks, or use different cloud VMs, containers, or other kinds of nodes.  Distributed computation improves overall performance and reliability by decreasing the burden and dependency on an individual server in the system.  Ideally, any server can be unplugged at runtime for replacement or upgrading, and new servers can be plugged in to extend the capacity.

A TLS connection is established via TLS handshaking.  For an initial connection, the client and server negotiate the security parameters and then establish the security channel.  The negotiation process of the security parameters is called a _full handshake_.  Since many cryptographic operations are involved, the full handshake is costly.  Fortunately, the negotiated parameters, which are also called _session data_, can be retained and reused for subsequent connections.  The process of reusing the negotiated parameters is called an _abbreviated handshake_, or _session resumption_.  Per [this research](https://blog.cloudflare.com/tls-session-resumption-full-speed-and-secure/), the overall cost of session resumption is 50% less than the full handshake, and the CPU cost is almost negligible (less than 5%) compared to the full handshake.

We wish to extend the benefit of session resumption from connections between the same client and server to connections between the same client _and an entire cluster_.

### Define a more distribution friendly session ticket protection scheme.
In order to resume the session, the negotiated parameters must be stored somewhere, such as in the server's cache or in a protected session ticket.  A _session ticket_ is a block of data that is generated and protected by the server, but is not cached on the server side.  The negotiated parameters could be encapsulated and encrypted in the session ticket and delivered to the client for session resumption.  The client will send back the exact session ticket in its session resumption request.  The server retrieves the negotiated parameters by decapsulating and decrypting the received session ticket.

To support distributed session resumption, a session ticket that is generated and protected in one server node must be usable for session resumption on other server nodes in the distributed system.  Each node should use the same session ticket structure, and share the secrets that are used to protect session tickets.

The session ticket processes are defined in [RFC 5077](https://tools.ietf.org/html/rfc5077) for TLS 1.2 and prior versions, and [RFC 8446](https://tools.ietf.org/html/rfc8446) for TLS 1.3.  However, the RFCs do not define how to construct and protect the session ticket. Currently, the session ticket generated in the JDK can be used with the server that generated it.  We wish to make this mechanism more distribution friendly to improve scalability and responsiveness of applications.

A session ticket protection scheme will be designed and implemented in the SunJSSE provider.  The scheme will support key generation, key rotation and key synchronization across clusters of computers. By using the new session ticket protection scheme, the SunJSSE provider will be updated to support distributed session resumption.

Testing
------

Testing will cover the following areas:

- Verifying that there is no compatibility impact.
- Verifying that there is no interoperability impact.
- Verifying that the performance is improved.
- Verifying that the session tickets generated and protected in one server node can be used for session resumption in other nodes in a distributed system.
- Verifying that the secret keys used to protect the session ticket can be rotated and synchronized.
- Verifying that a new server node inserted into the distributed system can be automatically synchronized, thus making it possible to plugin new server nodes as needed.


Dependencies
------------
This is an improvement of the TLS 1.3 implementation, [JEP 332](http://openjdk.java.net/jeps/332).
Comments
> I don't think you want to distribute application data across servers - that would be a difficult task (we had to implement it in Jetty for distributed HTTP session and there are a large number of issues - from serialization to versioning to classloading, etc.). > > Therefore it should be enough to clarify that putValue() has a local semantic only (not a distributed one). It's a good point. I thought about this approach while we were doing stateless server. Some applications may work if the session is distributed but the value bound to the session is not. Some others may not. I think it is still error pone, and need carefully application design and implementation. Problems may come out if moving from stateful server to stateless server, without carefully check the application logic and code calling to putValue(). For the long run, we'd like to have a robust solution. I can see the impact on Jetty. I will remove the deprecation part in the JEP, and add a known issue for the putValue() API in the documentation that the bound value will not be distributed. It is not recommended to use putValue() for stateless server for safe.
02-11-2020

> With this update, I hope I could have a consistent Session ID among full handshake and session resumption. Then if application have a solution to distribute customized values assigned by putValue() (like distributed caches), the customized values could be bound to an ID in application layer. Exactly, so you should not worry about deprecating putValue(). Your initial explanation of the previous comment is not clear to me, as perhaps we don't agree about "session data". To me, "session data" is internal TLS protocol information that applications never see and that is only necessary for the implementation to perform session resumption. Then, there is "application data" that applications attach to the SSLSession using putValue(). You only need to distribute "session data", not "application data". A client that performs session resumption will send back to the server the "session data" so that the TLS handshake can be the quick one. The server application (on a possibly different server) will still see a TLS handshake, and it will run the same application logic it ran on the previous server, which will result in storing "application data" in the session, as it was done in the previous server. I don't think you want to distribute application data across servers - that would be a difficult task (we had to implement it in Jetty for distributed HTTP session and there are a large number of issues - from serialization to versioning to classloading, etc.). Therefore it should be enough to clarify that putValue() has a local semantic only (not a distributed one). The application logic that uses putValue() is naturally performed in each server because the same application is deployed in each server. Take for example the case where a client connects to a server for an HTTP request, but then again with the same "session data" for a WebSocket upgrade, but lands on a different server. The server application logic on the second server will be different and I don't want to carry over any "application data" that was set on the first server.
28-10-2020

> Why you have to wrap user-defined values into the session ticket? If the server is stateless, (see JDK JDK-8223922), it means that the server will not cache the session data. The session data will be wrapped into a session ticket and sent to client. When the connection is closed, the session data is collected, and the server knows nothing about the session any longer. The SSLSession.putValue() binds application data with the session (store in the session data). For a stateless server, there is no session states in the server side any longer after the connection is closed, and hence the value assigned in the previous connection may not work after the initial connection is closed. During session resumption, the client will send back the session ticket. The server will unwrap the session ticket and retrieve the session data for session resumption. We thought about to wrap the values set by putValue() method into session ticket. But it does not sound like a good idea as the putValue() method could be called after the session ticket has been generated and delivered. We don't really know when the method get called in applications. As make it a pretty tough problem to continue support mutable SSLSession methods like putValue() without compatibility issues when the server is stateless. There are multiple approaches to support distributed sessions, using distributed caches is one of them. In this JEP, we are thinking of to use session ticket defined in TLS 1.3, so that applications could benefit without update their source code. Then, the server will be stateless for better performance and we will have to consider the compatibility issues of SSLSession mutable methods like putValue(). If one server node call putValue() method after the session ticket generated and delivered, another server node that received the session ticket for session resumption is not able to get the value assigned by the putValue() call in the other servers. It's fine to use putValue() is the server is not stateless configured (there is a System Property to control the behavior, see JDK-8223922), but as it is not reliable any longer for stateless server. The method is err prone with the introducing of stateless server. I was wondering, may be we should deprecate this method sooner rather than later so that applications could consider the impact earlier as well. It is also a signal for applications to consider the side effect of stateless server. Deprecating does not mean we will remove it the near future, this method could be there for years if there is significant impact of removal. Applications still can use these method in stateful servers. Although I would like to have an update so as to benefit from stateless server sooner. With this update, I hope I could have a consistent Session ID among full handshake and session resumption. Then if application have a solution to distribute customized values assigned by putValue() (like distributed caches), the customized values could be bound to an ID in application layer.
28-10-2020

> After handshaking, the value assigned with putValue() cannot be wrapped in the session ticket any longer. Why you have to wrap user-defined values into the session ticket? If the implementation uses putValue() for some data, then it would be better to have explicit APIs (maybe private to the implementation) for this, and avoid deprecating putValue(). What am I missing here?
27-10-2020

> Can you detail why why SSLSession attributes accessed via putValue() etc. cannot stay local attributes and avoid to deprecate those methods? When the handshake completed, the session ticket would has been delivered to the client side. After handshaking, the value assigned with putValue() cannot be wrapped in the session ticket any longer. When the server is configured to be stateless, the value will be lost when the connection is closed and cannot be resumed for session resumption. As before, if server is not stateless, the SSLSession.putValue() will continue to work, because the local state can be used. Applications may be able to set the value during handshaking rather than after the handshake. However, in the API level, we don't know how this method could be used in various applications. As makes it error prone. For a long term solution, we would like to have SSLSession immutable, and delegate the mutable functions to application or move to SSLParameters. > Also consider that SSLSession is very unfriendly to be used as a key in map data structures as its session id may change while still representing the same session. The issue is on our radar. I will see if it could be addressed in the JEP implementation.
23-10-2020

> Fortunately, the use of putValue() and the other three related methods (getValue(), removeValue(), and getValueNames()) are believed to be uncommon In Jetty we use `putValue()` to cache TLS information that needs to be computed to comply with the Servlet specification. Failing to cache this information led to a huge performance impact, for example: https://github.com/eclipse/jetty.project/issues/4923. Reading the issue description, seems to me that what needs to be pluggable (local vs distributed) is a _store_ for the cryptographic data required to resume the TLS session, and that could be entirely an implementation detail. Can you detail why why SSLSession attributes accessed via putValue() etc. cannot stay local attributes and avoid to deprecate those methods? Also consider that SSLSession is very unfriendly to be used as a key in map data structures as its session id may change while still representing the same session (see this discussion: https://mail.openjdk.java.net/pipermail/security-dev/2018-August/017993.html). We have implemented a similar use case in Jetty, for distributed HTTP sessions. The `HttpSession` APIs did not require changes or deprecations, and all the work was done under the covers by the cache layer and store layer for just the (internal) data that needs to be distributed. Thanks!
23-10-2020