JDK-8324181 : enhance compressed streams
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 23
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2024-01-18
  • Updated: 2024-03-15
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Relates :  
Relates :  
Description
As observed by JDK-8293170, the nmethod DebugInfo metadata (the “scopes” section) contributes a significant portion (10% - 20%) to overall nmethod size. This data is compressed (from uint tokens to bytes) using UNSIGNED5, which is adopted (in a modified form) from Pack200.

But, about 50% of that data (in the original uint tokens) is zero, which means about half of the scopes metadata is zero bytes.  This adds up to megabytes of footprint.  (It also makes the code cache less dense; see JDK-7072317.

We are beginning to store this stuff in Leyden CDS archives as well, so lots of wasted metadata space will translate into bigger file sizes and longer load times (as well as dynamic memory footprint).

By using well known, very simple compression techniques, we can reduce the size of these sections by about 36%.  Specifically, zero-suppression is known to be a “sweet spot” for fast online compression, IF the data is known to be rich in zeroes, which is the case here.  We see this in both ZFS and Capn Proto technologies.  We should arrange our UNSIGNED5 streams to (optionally) suppress zeroes as well, and take the option with DebugInfo.

After looking further at the uses of UNSIGNED5, I found also that some uint tokens come in pairs where it is usually profitable to concatenate the bits of both tokens into a single (longer) token.  This happens when the first token is very short (1-5 bits) and the second token is also well-behaved.  This “int pairing” technique generalizes the well-known tactic of “low bit tagging”, seen in HotSpot, where a larger number (like a pointer) is paired with a small one by injecting the smaller one into a fixed number of low-order bits in a word, with the larger number occupying the rest.

Adding the int-pairing capability to the existing UNSIGNED5 streams (on top of the UNSIGNED5 coding itself) shaves 6% of the DebugInfo size, contributing to the bottom line of 36% after zero suppression.  It also shaves 12% from FieldInfo storage and a nice 29% from LineNumberTable storage.  It probably would help other kinds of streams as well (OopMap or Dependencies, maybe RelocInfo or PCDesc tables).

For more details, see the prototype (code and comments) in PR #17474.

https://github.com/openjdk/jdk/pull/17474