JDK-8221249 : x86: make r12_heapbase register available when compressed oops do not require it
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 14
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • CPU: x86
  • Submitted: 2019-03-21
  • Updated: 2022-02-07
  • Resolved: 2022-02-07
Related Reports
Relates :  
Relates :  
Description
JDK-8217909 untied %r12 (heap_base) register when CompressedOops are disabled. We can improve on that by also untying heap_base register when zero-based/32-bit compressed oops are enabled, like this:

diff -r 7a9a828195c7 src/hotspot/cpu/x86/x86_64.ad
--- a/src/hotspot/cpu/x86/x86_64.ad     Mon Mar 11 18:44:40 2019 +0100
+++ b/src/hotspot/cpu/x86/x86_64.ad     Thu Mar 21 12:44:49 2019 +0100
@@ -352,11 +352,12 @@
 RegMask _STACK_OR_PTR_REG_mask;
 RegMask _STACK_OR_LONG_REG_mask;
 RegMask _STACK_OR_INT_REG_mask;
 
 static bool need_r12_heapbase() {
-  return UseCompressedOops || UseCompressedClassPointers;
+  return (UseCompressedOops && (Universe::narrow_oop_base() != NULL)) ||
+         (UseCompressedClassPointers && (Universe::narrow_klass_base() != NULL));
 }
 
 void reg_mask_init() {
   // _ALL_REG_mask is generated by adlc from the all_reg register class below.
   // We derive a number of subsets from it.

Comments
I have rebased the patch to the current state of JDK, cleaned it up, passed tier1. Then I ran an extended battery of benchmarks with -Xmx1g -XX:+UseParallelGC, and some of them show regressions, most are neutral, and the improvements are either small (within 0.5%) or unstable and weird (no r12 uses on hotpaths). I attribute regressions to the effectively disabled of r12-as-zero-register match rules. Therefore, I would like to table this patch, pronounce this experiment as dead end, and close the issue as WNF. The current patch is attached to this issue, in case we would like to revisit this in future.
27-10-2021

Okay, this is convincing. Agree.
10-07-2020

reinit_heapbase() is complicated for initial code that does not have heap available yet. Yes, the concern about having a zero register handy is legitimate. That said, I figured my last perf runs were not using 32-bit coops and that is why I have not seen any improvement. In better experiment, running SPECjvm2008 with -Xmx1g -XX:+UseParallelGC, there seem to be statistically significant improvements on Compiler.sunflow, MpegAudio at about +1.5-1.9%, and ScimarkSparse.small improves about +9.8%. Apparently, having additional general purpose register on rather tight x86_64 pays off.
09-07-2020

I don't think complicating reinit_heapbase() (for base == 0) and removing to have 0 in R12 will improve generated code.
07-07-2020

Even better patch, made possible by JDK-8241825 that untied the dependency on compressed klass ptrs. This passes tier1 tests: https://cr.openjdk.java.net/~shade/8221249/webrev.02/
07-07-2020

Please note, post Rampdown Phase One of JDK 14, changing the Fix Version here to 15.
13-12-2019

Better patch: http://cr.openjdk.java.net/~shade/8221249/webrev.01/ Eyeballed assembly with 32-bit compressed oops, and %r12 is indeed available now. Unfortunately, performance tests does not show measurable improvements with this patch yet.
27-06-2019

There are paths that seem to trust that heap_base is in %r12, even if it is null. Most of them are asserts. But there is also a creepy MacroAssembler::reinit_heapbase(): void MacroAssembler::reinit_heapbase() { if (UseCompressedOops || UseCompressedClassPointers) { if (Universe::heap() != NULL) { if (Universe::narrow_oop_base() == NULL) { MacroAssembler::xorptr(r12_heapbase, r12_heapbase); } else { mov64(r12_heapbase, (int64_t)Universe::narrow_ptrs_base()); } } else { // Oopsies, heap is not yet there? Emit the store to %r12 from the global addr. movptr(r12_heapbase, ExternalAddress((address)Universe::narrow_ptrs_base_addr())); } } }
21-03-2019