The JVM uses the UNSIGNED5 format (from Pack200) for storing streams of debug and register liveness information. The format is implemented inside CompressedStream. It should be brought out into its own separate header file, so it can be maintained and upgraded separately, and possibly used for additional purposes.
When it goes in its own header, it needs a reasonable "kit" of API points:
- a reader and a writer (templates to avoid implementation coupling)
- a writer which can also expand its output buffer (template again; used by `CompressedWriteStream`)
- a function (constant-foldable) which computes the byte-length of a given value to encode
- another function (constant-foldable) which, given a byte-length, reports the largest encodable value of that length
- maximum byte-length and encoded-value (constants)
- a function or two to test correctness or feasibility of encoding, plus a debug.cpp function to display compressed data
- some miscellaneous functions to encode and decode signed numbers (s4 <-> u4)
A gtest is needed for these API points.
- - - -
A future cleanup of field metadata (probably necessary for Valhalla and/or Leyden) should use UNSIGNED5 format instead of the fixed-sized arrays of u2 elements, which are bursting at the seams with no incremental fix in sight.
Similarly, relocation information could be simpified if structured as a compact stream of 32-bit tokens.
Any stream-oriented data structure that favors smaller values (such as Pack200 value bands) benefits from a varint format like UNSIGNED5.
Proof of concept: https://github.com/rose00/jdk/commit/compressed-stream