JDK-8232222 : Set state to 'linked' when an archived boot class is restored at runtime
  • Type: Sub-task
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 14
  • Priority: P4
  • Status: Closed
  • Resolution: Won't Fix
  • Submitted: 2019-10-14
  • Updated: 2024-01-10
  • Resolved: 2024-01-10
Related Reports
Blocks :  
CSR :  
Relates :  
Relates :  
Relates :  
Description
When linking a class, InstanceKlass::link_class_impl() first links all super classes and super interfaces of the current class. For the current class, it then verifies and rewrites the bytecode, links methods, initializes the itable and vtable, and sets the current class to 'linked' state.

When loading an archived class at runtime, SystemDictionary::load_shared_class makes sure the super types (all super classes and super interfaces) in the class hierarchy are loaded first. If not, the archived class is not used. The archived class is restored when 'loading' from the archive. At the end of the restoration, all methods are linked. As bytecode verification and rewriting are done at CDS dump time, runtime does not redo the operations for an archived class.  

If we make sure the itable and vtable are properly initialized (not needed for classes loaded by the NULL class loader) and SystemDictionaryShared::check_verification_constraints is performed for an archived class during restoration, then the archived class (from builtin loaders) is effectively in 'linked' state.

For all archived classes loaded by the builtin loaders, we can safely set the archived class to 'linked' state at the end of restoration. As a result, we can save the work for iterating the super types in InstanceKlass::link_class_impl() in those cases. 

Here is the 'before' and 'after' comparison when running HelloWorld (1000 runs) for the change on top of JDK 11:

before
---------
 Performance counter stats for 'bin/java -cp hw.jar -Xshare:auto HelloWorld' (1000 runs):

             69.99 msec task-clock:u              #    1.125 CPUs utilized            ( +-  0.33% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
             4,488      page-faults:u             # 64596.865 M/sec                   ( +-  0.03% )
        90,755,863      cycles:u                  # 1306159.241 GHz                   ( +-  0.05% )
        96,734,939      instructions:u            #    1.07  insn per cycle           ( +-  0.01% )
        17,956,529      branches:u                # 258430532.850 M/sec               ( +-  0.01% )
           573,094      branch-misses:u           #    3.19% of all branches          ( +-  0.09% )

          0.062232 +- 0.000249 seconds time elapsed  ( +-  0.40% )


after
------
 Performance counter stats for 'bin/java -cp hw.jar -Xshare:auto HelloWorld' (1000 runs):

             69.61 msec task-clock:u              #    1.125 CPUs utilized            ( +-  0.34% )
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
             4,489      page-faults:u             # 64941.193 M/sec                   ( +-  0.03% )
        89,888,015      cycles:u                  # 1300369.120 GHz                   ( +-  0.03% )
        95,082,578      instructions:u            #    1.06  insn per cycle           ( +-  0.01% )
        17,580,311      branches:u                # 254326380.673 M/sec               ( +-  0.01% )
           568,132      branch-misses:u           #    3.23% of all branches          ( +-  0.02% )

          0.061886 +- 0.000251 seconds time elapsed  ( +-  0.41% )
 
It saves >1.5M instructions execution for HelloWorld. Perf is also showing saving with cpu cycles.

A more important motivation of this change is to lay a foundation for future optimizations that support pre-resolving constant pool references (which in-turn can help generate better optimized AOT code) and pre-initializing classes, and preserving those states at CDS dump time. As JVM spec requires the ordering of loading, verifying, linking/preparing, and initializing and we seek a solution that is spec complaint. Being able to place an archived class in 'linked' state during restoration would allow it to be placed in 'initialized' state at restore time for cases where it is suitable in the future. That would solve some of the prerequisites for pre-resolving CP references to fields and methods. 
Comments
Runtime Triage: This is not on our current list of priorities. We will consider this feature if we receive additional customer requirements.
10-01-2024

The RFE will address the archived boot class only.
27-05-2020

Here is the data measured with the latest jdk/jdk repo when running HelloWorld using the default CDS: after -------  Performance counter stats for 'bin/java -cp /usr/local/google/home/jianglizhou/benchmarks/hw.jar HelloWorld' (500 runs):              94.71 msec task-clock:u              #    1.538 CPUs utilized            ( +-  0.13% )                  0      context-switches:u        #    0.000 K/sec                                    0      cpu-migrations:u          #    0.000 K/sec                                5,223      page-faults:u             # 55429.166 M/sec                   ( +-  0.05% )        110,968,720      cycles:u                  # 1177761.835 GHz                   ( +-  0.07% )        110,466,048      instructions:u            #    1.00  insn per cycle           ( +-  0.04% )         21,998,543      branches:u                # 233480610.656 M/sec               ( +-  0.04% )            738,663      branch-misses:u           #    3.36% of all branches          ( +-  0.07% )          0.0615654 +- 0.0000990 seconds time elapsed  ( +-  0.16% ) before -------- Performance counter stats for 'bin/java -cp /usr/local/google/home/jianglizhou/benchmarks/hw.jar HelloWorld' (500 runs): 95.23 msec task-clock:u # 1.536 CPUs utilized ( +- 0.15% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 5,223 page-faults:u # 55138.867 M/sec ( +- 0.05% ) 112,201,042 cycles:u # 1184504.901 GHz ( +- 0.06% ) 111,995,915 instructions:u # 1.00 insn per cycle ( +- 0.04% ) 22,376,401 branches:u # 236227370.825 M/sec ( +- 0.04% ) 743,765 branch-misses:u # 3.32% of all branches ( +- 0.07% ) 0.061993 +- 0.000110 seconds time elapsed ( +- 0.18% )
28-11-2019

Proposed change: http://cr.openjdk.java.net/~jiangli/8232222/webrev.00/
28-11-2019

For static dumping, as we can resolve constraint classes and complete verification at dump time for all classes loaded by the builtin loaders, no runtime SystemDictionaryShared::check_verification_constraints is need for those archived classes (loaded by the builtin loaders). We can set the 'linked' state for application classes loaded by the system class loader as well. Updated the description.
15-10-2019