JDK-8250607 : C2: Filter type in PhiNode::Value() for induction variables of trip-counted integer loops
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 16
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-07-27
  • Updated: 2020-12-07
  • Resolved: 2020-11-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 16
16 b25Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
As part of a previous fix [1] for JDK-8248552, the following change was tested which allows the already narrowed type of induction variable phis before pre/main/post loops to be kept afterwards in pre/main/post loops:

--- old/src/hotspot/share/opto/cfgnode.cpp	2020-07-15 11:49:41.047714560 +0200
+++ new/src/hotspot/share/opto/cfgnode.cpp	2020-07-15 11:49:40.611529333 +0200
@@ -1080,9 +1080,9 @@
           if (bt != BoolTest::ne) {
             if (stride_t->_hi < 0) {          // Down-counter loop
               swap(lo, hi);
-              return TypeInt::make(MIN2(lo->_lo, hi->_lo) , hi->_hi, 3);
+             return TypeInt::make(MIN2(lo->_lo, hi->_lo) , hi->_hi, 3)->filter_speculative(_type);
             } else if (stride_t->_lo >= 0) {
-                return TypeInt::make(lo->_lo, MAX2(lo->_hi, hi->_hi), 3);
+               return TypeInt::make(lo->_lo, MAX2(lo->_hi, hi->_hi), 3)->filter_speculative(_type);
             }

The type of iv phis before creating pre/main/post loops is currently lost afterwards. It should be beneficial to do the above type narrowing, especially when this type information can be propagated in the loop.

However, when doing performance evaluation, the micros open crypto benchmark openjdk.bench.javax.crypto.small.SecureRandomBench.nextBytes (located at test/micro/org/openjdk/bench/javax/crypto/small/SecureRandomBench.java) results in a repeatable regression of 1-2% with these two settings:
- algorithm=SHA1PRNG-dataSize:64-provider:-shared:false
- algorithm=SHA1PRNG-dataSize:64-provider:-shared:true

This enhancement should investigate further and show that this change could be beneficial. It should also tackle the two regressions above and try to fix those based on the suggested fix for PhiNode::Value(). Maybe there is another issue which blocks further optimizations compared to mainline.


[1] http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/
Comments
These changes were rolled back by JDK-8257561 because they prevents vectorization in Reduction tests we have: With JDK-8250607 changes: $ java -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:CompileThresholdScaling=0.1 -XX:+TraceNewVectors RedTest_int | grep "new Vector node" |wc 0 0 0 Without: $ java -Xbatch -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:CompileThresholdScaling=0.1 -XX:+TraceNewVectors RedTest_int | grep "new Vector node" |wc 50 1560 12836 Running before JDK-8250607 (JDK 16 b24) and after (b25) all compiler/loopopts/superword/ tests gives: $ grep "new Vector node" vect_b24_sw.log | wc 748 19501 167606 $ grep "new Vector node" vect_b25_sw.log | wc 385 9880 86386
03-12-2020

Changeset: 70c7b1d9 Author: Roland Westrelin <roland@openjdk.org> Date: 2020-11-12 14:15:40 +0000 URL: https://github.com/openjdk/jdk/commit/70c7b1d9
12-11-2020

Regarding the performance regression: the top 3 hottest methods (as reported by -prof perfasm) that the change affects are: sun.security.provider.SHA::implCompress0 java.security.SecureRandom::nextBytes sun.security.provider.SHA::implDigest Running the benchmarks 3 times with the change enabled in turn in each of the methods indicates that java.security.SecureRandom::nextBytes would be the one to cause the regression. Now compiling java.security.SecureRandom::nextBytes with the change on and off and comparing the resulting IR graph, I only see one transformation that's applied with the change and not without: Convert ConvI2L(AddI(x, y)) to AddL(ConvI2L(x), ConvI2L(y)) in ConvI2LNode::Ideal() in a couple of post loops. If that transformation is disabled for java.security.SecureRandom::nextBytes and the benchmark is run with the change on and off, we recover most of the performance regression but maybe not quite all of it. On 2 runs: shared:false = 0.43%, shared:true = 0.49% shared:false = 0.68%, shared:true = 0.99% (baseline is run with the change) Anyway, given the regression is small, the change is valuable (I would say it's a bug) and there's no sign from this investigation of some unexpected flaw, I think we should go ahead with the change. (Thanks to [~chagedorn] for running the experiments)
03-11-2020