JDK-8220166 : Performance regression in deserialization (4-6% in SPECjbb)
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.io:serialization
  • Affected Version: 9,11,12,13
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-03-05
  • Updated: 2019-07-31
  • Resolved: 2019-05-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11
11.0.4Fixed
Related Reports
Relates :  
Description
This is noticeable on machines with many cores/HW threads, such as 2-socket Xeon Platinum 8176 (112) or 2-socket ThunderX2 (224). It may not be apparent on small systems.

We (primarily Andrey Sudnik) tracked down a performance regression introduced in JDK 9 to the addition ObjectInputFilter support (even when filtering is not enabled).

The issue is the new code in the ObjectInputStream constructor:

     serialFilter = ObjectInputFilter.Config.getSerialFilter();

   public static ObjectInputFilter getSerialFilter() {
            synchronized (serialFilterLock) {
                return serialFilter;
            }
        }
There is a global lock (serialFilterLock) that is held very briefly as part of creating ObjectInputStreams. In SPECjbb this happens when every transaction is received. The filter is never set in SPECjbb.

My theory:
On a single-core machine, there would be essentially never be contention on that lock, so the fastest lock path will be taken. But with more and more cores, at some point there will be contention and the lock gets inflated. In this case, even if contention was unlikely, the lock is taking a slower ���fast-path���, and this is increasing the chance of contention. So there���s a performance cliff at some point.

I believe on large-core machines such as ThunderX2 or Xeon Platinum we are falling over that cliff.

Suggested fix:
I don���t think that getSerialFilter() needs to be synchronized after all:
-	setSerialFilter() ensures that the value of serialFilter is either null or non-null, and it can only be set once.
-	There are not multiple states that need to be kept synchronized. 
-	An ObjectInputStream created around the time of the setSerialFilter call may or may not see the new value of the serialFilter (with or without current synchronization). If it was important that a new ObjectInputStream would never see a stale null value for serialFilter, a lock would have to be used at a higher level. 
-	I think making the serialFilter field volatile should be sufficient, so the readers will eventually see a new value.

Comments
Fix Request (11u, 12u) Backporting this near-trivial patch rectifies the performance regression since 9. Patch applies cleanly to 11u and 12u, passes tier1 tests.
17-05-2019

Introduced by JDK-8155760.
15-05-2019