[Blender.java](https://www.graalvm.org/22.1/examples/java-performance-examples/#sunflow-example) is the kernel of Sunflow. It's basically a microbenchmark of Partial Escape Analysis. the object Color inside of loop manifests the PEA opportunity.
C2 PEA can make Blender 38.58% faster. Graal (graalvm-ce-java17-22.3.1) can make it 47.35% faster. In other words, Graal is still 14.3% faster than C2 with PEA. I profiled allocation using async-profiler, I believe C2 PEA has the same effect as Graal. It looks like the problem comes from drem operation for this expression: (color.r + color.g + color.b) % 42 == 0
In output_c2.html, 66% cpu time on Blender.initialize@82, that's bytecode drem. Even though Color.x/y/z are all double, their value are only from integers. output_graal.html, bytecode @82 only accounts for 4.10%.
I think it's a good opportunity to optimize drem like Graal does.