JDK-8076985 : Allocation path: biased locking + compressed oops code quality
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 9,10
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2015-04-05
  • Updated: 2021-03-19
  • Resolved: 2020-07-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 16
16 b05Fixed
Related Reports
Relates :  
Relates :  
Description
Comparing the codepaths for allocation in 64-bit mode and 64-bit mode with compressed oops yields some interesting codegen issues, caused by the BiasedLocking prototype header mechanism and compressed oops.

For example, "new Object()" yields this allocation sequence:

; load and unpack metadata for java.lang.Object
mov    $0x200001d5,%r11d
movabs $0x0,%r10              
lea    (%r10,%r11,8),%r10

; get prototype mark word, and store it into object
mov    0xa8(%r10),%r10
mov    %r10,(%rax)

;  store class word
movl   $0x200001d5,0x8(%rax)

Doing either -XX:-UseCompressedOops or -XX:-UseBiasedLocking improves allocation performance at around +7% -- seems because we strip away the decoding part.

It does not seem simple or sane to rework biased locking machinery to avoid polling the prototype headers during allocation, so we may want to just improve the generated code quality there. Indeed, it seems that -XX:-UseCompressedOops is providing the good boost on targeted microbenchmark. There are two things that we might try to improve the code quality here:

a) Since we know the narrow class address statically, we might as well unpack it statically, and store it right away, e.g.:

; get prototype mark word, and store it into object
mov    $0x100000EA8,%r10d
mov    0xa8(%r10),%r10
mov    %r10,(%rax)

;  store class word
movl   $0x200001d5,0x8(%rax)

b) Keep the narrow class constant, but generate better code:

; get prototype mark word, and store it into object
mov    $0x200001d5,%r11d
mov 0xa8(%r12,%r11,8),%r10
mov    %r10,(%rax)

;  store class word
movl   %r11,0x8(%rax)
Comments
URL: https://hg.openjdk.java.net/jdk/jdk/rev/4d1c4400c75d User: kvn Date: 2020-07-02 19:53:02 +0000
02-07-2020

Performance testing shows no significant difference. Except some OpenCrypto micros which seems unstable. I rerun them and results are different. Tiers testing also passed.
01-07-2020

Note, my code examples comes from PrintOptoAssembly which prints pseudo assembler. Class pointer values are actually Handle pointers. That is why they don't match klass pointers range.
01-07-2020

The problem with code is that base for compressed klasses is not 0: Narrow klass base: 0x0000000800000000, Narrow klass shift: 0, Narrow klass range: 0x40000000 So I will go with a) version.
01-07-2020

I tried to add addressing modes for DecodeNKlass but VM crashed in some compiled methods. In places where it crashed the code is similar to above: 33c movl R8, narrowklass: precise klass java/lang/String: 0x00007f74c4006380:Constant:exact * # compressed klass ptr 342 movq R11, [R8 + #192 (32-bit)] # long 349 movq [RAX], R11 # long 34c movl [RAX + #8 (8-bit)], narrowklass: precise klass java/lang/String: 0x00007f74c4006380:Constant:exact * # compressed klass p
01-07-2020

By using functionality added by JDK-8155729 and simple change: src/hotspot/cpu/x86/x86_64.ad Tue Jun 30 19:18:02 2020 -0700 @@ -1649,7 +1649,7 @@ // or condisider the following: // Prefer ConNKlass+DecodeNKlass over ConP in simple compressed klass mode. //return CompressedKlassPointers::base() == NULL; - return true; + return false; } The generated code become Aleksey's a) version: 038 movq R10, precise klass java/lang/Object: 0x00007fa1ec2b86c0:Constant:exact * # ptr 042 movq R10, [R10 + #192 (32-bit)] # long 049 movq [RAX], R10 # long 04c movl [RAX + #8 (8-bit)], narrowklass: precise klass java/lang/Object: 0x00007fa1ec2b86c0:Constant:exact * # compressed klass p
01-07-2020

I think the issue here that C2 does not define address expression for compressed klass pointer. It has it only for compressed oops: http://hg.openjdk.java.net/jdk/jdk/file/de6ad5f86276/src/hotspot/cpu/x86/x86_64.ad#l3864
01-07-2020

[~dholmes] Thank you for clarification. Hmm. May be we should not concentrate on Biased locking here but fixing bad pattern shown by Aleksey: lea (%r10,%r11,8),%r10 mov 0xa8(%r10),%r10 should be mov 0xa8(%r10,%r11,8),%r10 And also reuse class ptr mov $0x200001d5,%r11d movl %r11,0x8(%rax) These are general optimizations and not related to Biased Locking.
01-07-2020

[~kvn] Biased-locking was disabled by default and deprecated in 15, but not yet removed. That will depend on the feedback we get from disabling it - and so far we have seen some performance issues reported with it turned off.
30-06-2020

[~shade] Biased Locking was removed in JDK 15 JDK-8231264. Can you close this RFE?
30-06-2020

Benchmark, including the properly-formatted assembly dump: http://cr.openjdk.java.net/~shade/8076985/AllocateCodegen.java Executable JAR: http://cr.openjdk.java.net/~shade/8076985/benchmarks.jar
05-04-2015