JDK-8028337 : Checkcast-arraycopy stub for G1 is very slow
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs25
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2013-11-14
  • Updated: 2021-07-02
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
Profiles from refworkload/wls_webapp_atomic indicate that the checkcast_arraycopy stub for G1 is very slow compared to parallel gc which spends almost no time in there. In detail from a vtune trace:

3.698 weblogic.servlet.internal.ServletRequestImpl.getParameter(java.lang.String)
  ->1.755    checkcast_arraycopy
       ->1.722    BarrierSet::static_write_ref_array_post(HeapWord*, unsigned long)
       ->0.652    BarrierSet::static_write_ref_array_pre(HeapWord*, unsigned long)
       -> 0.202    G1SATBCardTableModRefBS::write_ref_array_pre(unsigned*, int, bool)

I.e. G1 calls helper methods, while in parallel gc these are inlined.

One idea to mitigate this is to move some checks inline into the checkcast_arraycopy, e.g. for the pre barrier the marking-active check, and for the post barrier a young gen check.
Comments
Did runs with Thomas's prototype. From the profile, the branches for BarrierSet::static_write_ref_array_post and BarrierSet::static_write_ref_array_pre did not show up, and the cost in the profile went down. But this did not increase the overall score for cases with increased -XX:TLABSize=4m and -XX:-StackTraceInThrowable
18-11-2013

Attached diff that implements above ideas (for x86-64 only). Could probably be made faster with JDK-7163196
14-11-2013

ILW => MLH => P4 Impact: Medium, only a problem in certain benchmarks Likelihood: Low, probably not a very common use case. Workaround: High, no workaround.
14-11-2013