Bug ID: JDK-8064810 JEP-JDK-8054307: Performance plan for More memory-efficient internal representation for Strings

Type: JEP Task
Component: performance

Priority: P4
Status: Resolved
Resolution: Fixed
Fix Versions: 9

Submitted: 2014-11-13
Updated: 2016-02-11
Resolved: 2016-02-11

Performance Requirements
------------------------

String performance is critical for a wide range of Java applications, and therefore we can not afford regressions there. Suggested JDK/VM feature allows to improve the footprint of Strings consisting only with Latin1 characters, e.g. those representable by single byte, hereafter called "compressible" Strings. Strings that contain at least one non-Latin1 characters still need to have 2-byte characters, hereafter called "non-compressible" Strings.

The requirements are:

- **Non-compressible Strings:** The implementation should not regress neither footprint, nor throughput performance of non-compressible Strings significantly. Proposed implementations may need to store an additional metadata within a String instance, as well as add substantial amount of new code in String, with at least a few checks on the hot paths. Therefore, we can expect a little throughput degradation, as well as minor memory footprint degradation in some corner cases. These disadvantageous degradations should be justified by improvements in frequent cases.
- **Compressible Strings**: Compressible Strings should improve the footprint a lot, and should not regress performance at all. In other words, whatever implementation penalty the dual String representation has, compressible strings should bring the improvement that cancels that penalty.

Concrete targets:

1. The performance baseline is the JDK 9 build where the changes are supposed to be integrated. Score and footprint targets are set for all platforms, and all compilers. Interpreted mode is exempt from score targets, but not from the footprint ones. Score is defined as the speed metric for a particular workload. Footprint is measured by counting the String instances and their footprints as Java objects. 
 
2. Targeted microbenchmarks working with non-compressible Strings should show: 
**Base goal**: less than 5% score regression, and less than 8 bytes per String instance footprint regression. 
**Stretch goal**: no score regression, no footprint regression. 
 
3. Targeted microbenchmarks working with compressible Strings should show: 
**Base goal**: no score regression, and 40% footprint improvement with Strings larger than 32 characters. 
**Stretch goal**: 20% score improvement, 45% footprint improvement with Strings larger than 32 characters. 
 
4. Large workloads working with realistic data should show: 
**Base goal**: no score regressions, and at least 5% footprint improvement. 
**Stretch goal**: 10% score improvement, with at least 5% footprint improvement.

Performance Work
----------------

The performance work on this JEP can be separated in five steps:

**Step 1.** Digest and analyze the current collection of heap dumps from realistic applications and estimate the footprint improvements on that data.

*Suggested target date: Nov 25, 2014*

**Step 2.** Create the targeted microbenchmarks for all public String and StringBuilder/StringBuffer API methods. These microbenchmarks should:

- Measure Strings method performance and footprint on compressible and non-compressible Strings
- Measure the performance in interpreted, C1, C2 without intrinsics, and C2 with intrinsics compile modes
- Measure the performance in different VM modes: 32-bit, 64-bit without compressed oops, 64-bit with compressed oops and default object alignment, 64-bit with compressed oops and 16-byte object alignment

Microbenchmarks should be developed by (or under the guidance of) Performance team. Microbenchmarks should be developed with JMH, in order to be transferable to microbenchmark corpus for regular performance testing. The regular performance testing is not in the scope of this JEP.

*Suggested target date for developing benchmarks: Dec 5, 2014*

**Step 3.** Analyze the microbenchmarks, identify the bottlenecks in all cases and configurations, provide the ideas for improvements, and prioritize them against the goals. Tune up the proposed implementation in Java and VM to achieve the best benchmark scores. Once microbenchmark goals are met, go to next step.

*Suggested target date for first iteration: Dec 23, 2014* (expect three more iterations afterwards, not counting the iterations for concrete tests)

**Step 4.** Make the confirmation runs on large workloads, and see if the targets are met there. If the targets on large workloads are not met, figure out what microbenchmarks best describe the large workload use cases, set more aggressive goals for these microbenchmarks, and get back to (Step 3). Large workloads include:

- SPECjbb2005
- SPECjbb2013
- SPECjvm2008
- Dacapo
- Nashorn/Octane
- Possibly additional internal application benchmarks as suggested by Oracle product teams

*Suggested target date for complete performance run: Feb 17, 2015*

**Step 5.** Write up the publishable performance report outlining the rationale and performance data backing up the change.

*Suggested target date for complete report: Mar 3, 2015*

Completed.

11-02-2016