JDK version | LiveNodeCountInliningCutoff | RSS (MB) | Total runtime | Compilation time | Application time
7u80 | 20'000 (default) | 163 | 60s | 11s | 49s
8u60 | 20'000 | 522 | 166s | 127s | 39s
8u60 | 40'000 (default) | 976 | 414ss | 371s | 43s
In total, the VM's memory usage for the application considered increases by 6X from 7u80 to 8u60. JDK9 is similar to JDK8 and is also affected by this problem.
A number of issues have targeted reducing the VM's memory usage (JDK-8011858, JDK-8137160, JDK-8129847). The patches for the first two bugs result in a slight reduction of memory usage, the patch for JDK-8129847 reduces memory usage by 20-30%. However, the VM's memory usage should be further reduced.
The goal of this enhancement is to further reduce the memory usage of the compiler. This issue is supposed to investigate three ways the compiler's memory usage can be reduced.
(1) Change arrays directly addressed with node IDs (the _idx field of every compiler node) to use hash tables instead. This change should target arrays with a high impact on the compiler's memory usage.
(2) For compilations with a large number of nodes, introduce and additional chunk size (in addition to the existing sizes tiny, init, medium, size, non_pool_size). The new chunk size should be larger than the existing chunk sizes and should allow the reuse of large memory chunks that are currently allocated with the operating system's memory allocator.
(3) Incremental (or post-parse) inlining in C2 produces lots of dead nodes (observed on Octane/Nashorn). Multiple PhaseRenumberLive passes during incremental inlining can help further reduce peak memory usage in that scenario. Since the pass can be expensive, it can be triggered when the gap between unique and live node counts becomes too large and performed with PhaseIdealLoop (see Compile::inline_incrementally).
(4) PhaseRemoveUseless and PhaseIterGVN are performed too frequently (that problem is targeted by JDK-8059241).
Here are some notes related to (1):
Code locations that use directly-referenced arrays:
- PhaseIdealLoop::Dominators -- allocates dfsorder and ntarjan arrays of size unique();
- PhaseIdealLoop::dom_depth and PhaseIdealLoop::_idom -- proportional to unique();
- PhaseCFG::global_code_motion -- recalc_pressure_nodes -- could be large, but size not necessarily proportional to unique();
- PhaseChaitin::stretch_base_pointer_live_ranges -- derived_base_map is allocated with malloc, size proportional to unique();
- PhaseIdealLoop::_preorders -- size proportional to unique();
- PhaseRegAlloc::_node_regs -- size proportional to unique();
- Scheduling::_node_bundling_base, _node_latency, _uses, _current_latency -- size most likely proportional to unique();
- Compile::fill_buffer -- allocates node_offsets array of size unique(), used only in fastdebug.
Data structures that use directly-referenced arrays:
- GrowableArray -- example usages ConnectionGraph::nodes, DepGraph::_map, Compile::_node_note_array, LiveRangeMap::_names, LiveRangeMap::_uf_map, PhaseCFG::_node_latency
- Node_Array -- example usages ConnectionGraph::_node_map, Matcher::_old2new_map (only debug), Matcher::_new2old_map (only debug), PhaseTransform::_nodes, Type_Array::_types
- Node_List -- example usages: Invariance::_old_new, PhaseCFG::schedule_local, Scheduling::_scheduled, Scheduling::_available
- Block_Array -- used in PhaseCFG::_node_to_block_mapping
- VectorSet -- uses _idx for checks -- already compressed but it could be maybe further optimized.