JDK-8345766 : C2 should emit macro nodes for ModF/ModD instead of calls during parsing
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 11,17,21,24,25
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • CPU: x86_64
  • Submitted: 2024-12-04
  • Updated: 2025-03-11
  • Resolved: 2025-01-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 25
25 b07Fixed
Related Reports
Causes :  
Relates :  
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
## System / OS / Java Runtime Information 
# Java version
java 23.0.1 2024-10-15
java 21.0.5 2024-10-15 LTS
java 17.0.12 2024-07-16 LTS

# Operating system details
$ cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

$ uname -a
Linux shuntian 5.15.0-84-generic #93~20.04.1-Ubuntu SMP Wed Sep 6 16:15:40 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
Missed optimization in C2 compiler. This bug affects java 21.0.5
2024-10-15 LTS, java 23.0.1 2024-10-15, java 17.0.12 2024-07-16 LTS
and openjdk 11.0.18 2023-01-17. It was not reproduced in GraalVM JDK.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
The following steps shows how to reproduce the bug on Java 21.0.5 in a Ubuntu Linux environment.

# Compile
$ javac C.java

# Default or compilation up to level 4
$ time java C
# Output (up to several seconds)
real    0m8.089s
user    0m8.425s
sys     0m0.028s

# Compilation up to level 3/2/1
$ java -XX:TieredStopAtLevel=3 C
# Output (hundreds of milliseconds)
real    0m0.511s
user    0m0.635s
sys     0m0.028s

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
C2 compiler should be no more slower than C1 when the method is hot
ACTUAL -
C2 compiler is much slower

---------- BEGIN SOURCE ----------
# C.java
public class C {
    private static final double Z = 0.3;

    public static double process(final double x) {
        double w = (double) 0.1;
        double p = 0;
        p = (double) (3.109615012413746E307 % (w % Z));
        p = (double) (7.614949555185036E307 / (x % x));    // <- return value only dependends on this line
        return (double) (x * p);
    }

    public static void main(String[] args) {
        int N = 30000000;
        for (int i = 0; i < N; i++) {
            process(1.0E-15d);
        }
    }
}
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
GraalVM for Level 4 are fine.

FREQUENCY : always



Comments
Changeset: f54e0bf2 Branch: master Author: Theo Weidmann <tweidmann@openjdk.org> Committer: Emanuel Peter <epeter@openjdk.org> Date: 2025-01-21 09:15:18 +0000 URL: https://git.openjdk.org/jdk/commit/f54e0bf267280c270b0e181289498b28aaf36ee6
21-01-2025

I think the title of the issue was a bit misleading here. (I updated the title to make things more clear.) The cause of this issue is that C2 emits calls to the runtime library immediately during parsing, which prevent any optimizations. The fix for this issue (currently under review) emits specific ModF/ModD nodes, which will be optimized and only be converted to runtime calls after optimizations. I tested the new code sample provided by the submitter with the proposed fix and C2 now has better performance than C1.
07-01-2025

Mail from Submitter ============== Regarding the issue I reported before (JDK-8345766), I have some more information to share. I see the discussion resolving the issue as 'C2 does not remove useless dream runtime calls', but I think it is also related to how C2 deals with floating point operations. I include another version where the method call actually has some side effects, the C2 version is still much slower than C1. ```java public class C2 { private static final double Z = 0.3; final static int N = 10000000; static double res[] = new double[N]; public static double process(final double x) { double w = (double) 0.1; double p = 0; p = (double) (3.109615012413746E307 % (w % Z)); p = (double) (7.614949555185036 / (x % 3.6)); // <- return value only dependents on this line return (double) (x * p); } public static void main(String[] args) { for (int i = 0; i < N; i++) { // process(1.0E-15d); res[i] = process(i / 10000); } for (int i = 0; i < N; i += 1000000) { System.out.println(res[i]); } } } ```
06-01-2025

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/22786 Date: 2024-12-17 09:01:57 +0000
18-12-2024

Most likely the issue is that we emit the runtime calls already during parsing and C2 will not remove them later. We should probably use macro nodes that are only expanded late or special case these runtime calls such that they can be removed.
10-12-2024

The problem is that C2 does not remove useless runtime calls to drem ("call_leaf,runtime drem"). I attached a simplified reproducer: time /oracle/jdks/jdk-24/fastdebug/bin/java -XX:CompileCommand=dontinline,*::test -XX:-TieredCompilation Test.java CompileCommand: dontinline *.test bool dontinline = true real 0m12.357s user 0m13.025s sys 0m0.054s time /oracle/jdks/jdk-24/fastdebug/bin/java -XX:CompileCommand=dontinline,*::test -XX:TieredStopAtLevel=1 Test.java CompileCommand: dontinline *.test bool dontinline = true real 0m1.278s user 0m2.043s sys 0m0.105s C1 code looks like this (constant zero return): [...] 0x00007010d0f9363b: vxorpd %xmm0,%xmm0,%xmm0 0x00007010d0f9363f: add $0x30,%rsp 0x00007010d0f93643: pop %rbp 0x00007010d0f93644: cmp 0x480(%r15),%rsp ; {poll_return} 0x00007010d0f9364b: ja 0x00007010d0f93652 0x00007010d0f93651: retq [...] Whereas C2 code has useless runtime calls: [...] 01a # MachConstantBaseNode (empty encoding) 01a movsd XMM0, [constant table base + #0] # load from constant table: double=#0.100000 022 movsd XMM1, [constant table base + #8] # load from constant table: double=#42.000000 02a call_leaf,runtime drem No JVM State Info # 03f movsd XMM1, [constant table base + #16] # load from constant table: double=#43.000000 047 call_leaf,runtime drem No JVM State Info # [...] This seems to be an old issue. ILW = C2 does not remove useless runtime calls to drem (performance issue), easy to reproduce but old issue, no workaround but remove useless calls from Java code = MLH = P4
10-12-2024

Reply from submitter ================ The Graal VM version I am using is: java 21.0.5 2024-10-15 LTS Java(TM) SE Runtime Environment Oracle GraalVM 21.0.5+9.1 (build 21.0.5+9-LTS-jvmci-23.1-b48) Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 21.0.5+9.1 (build 21.0.5+9-LTS-jvmci-23.1-b48, mixed mode, sharing) The command is the same: `time java -XX:TieredStopAtLevel=1,2,3,4 C`, Graal VM time on Level 4 is normal. cpu cores : 8
10-12-2024

I can reproduce this with latest JDK 24 (increased N in the code by 10x): time jdk-24_linux-x64_bin/jdk-24/bin/java -XX:TieredStopAtLevel=1 C real 0m0.853s user 0m0.836s sys 0m0.010s time jdk-24_linux-x64_bin/jdk-24/bin/java C real 0m19.085s user 0m19.090s sys 0m0.016s
10-12-2024

Mailed submitter ========= Hello Regarding the issue reported missed optimization in C2 compiler with 21.0.5 1.Could you provide complete Graal VM version and command used when referred as "GraalVM for Level 4 are fine." 2.Could you share the output of command $cat /proc/cpuinfo With JDK21.0.5 on Ubuntu with cpu cores : 8,Could notice that C2 compiler is much slower as below On Ubuntu DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS" #time java C <<<================ real 0m11.622s user 0m11.525s sys 0m0.070s # time java -XX:TieredStopAtLevel=1 C real 0m0.389s user 0m0.366s sys 0m0.010s # time java -XX:TieredStopAtLevel=2 C real 0m0.428s user 0m0.402s sys 0m0.021s # time java -XX:TieredStopAtLevel=3 C real 0m0.469s user 0m0.450s sys 0m0.013s # time java -XX:TieredStopAtLevel=4 C <<<================ real 0m11.606s user 0m11.527s sys 0m0.051s With Java HotSpot(TM) 64-Bit Server VM (build 23.0.2+7-58, mixed mode, sharing) ============================================================== $ time java C <<<<< Default real 0m5.882s user 0m5.853s sys 0m0.044s $ time java -XX:TieredStopAtLevel=1 C real 0m0.366s user 0m0.334s sys 0m0.033s $ time java -XX:TieredStopAtLevel=2 C real 0m0.418s user 0m0.378s sys 0m0.041s $ time java -XX:TieredStopAtLevel=3 C real 0m0.481s user 0m0.454s sys 0m0.028s $ time java -XX:TieredStopAtLevel=4 C <<<<< real 0m5.794s user 0m5.781s sys 0m0.029s
09-12-2024