Bug ID: JDK-8286823 Default to UseAVX=2 on all Skylake/Cascade Lake CPUs

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other
tbdUnresolved

Opening this issue on behalf of Oli Gillespie from the Amazon Corretto team:

The current code already does this for 'older' Skylake processors, namely those with _stepping < 5. My testing indicates this is a problem for later processors in this family too, so I have removed the max stepping condition.

The original exclusion was added in https://bugs.openjdk.java.net/browse/JDK-8221092.

A general description of the overall issue is given at https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Downclocking.

According to https://en.wikichip.org/wiki/intel/microarchitectures/cascade_lake#CPUID, stepping values 5..7 indicate Cascade Lake. I have tested on a CPU with stepping=7, and I see CPU frequency reduction from 3.1GHz down to 2.7GHz (~23%) when using -XX:UseAVX=3, along with a corresponding performance reduction.

I first saw this issue in a real production workload, where the main AVX3 instructions being executed were those generated for various flavours of disjoint_arraycopy.

I can reproduce a similar effect using SPECjvm2008's xml.transform benchmark.

```
java --add-opens=java.xml/com.sun.org.apache.xerces.internal.parsers=ALL-UNNAMED \
--add-opens=java.xml/com.sun.org.apache.xerces.internal.util=ALL-UNNAMED \
-jar SPECjvm2008.jar -ikv -ict xml.transform
```

Before the change, or with -XX:UseAVX=3:

```
Valid run!
Score on xml.transform: 776.00 ops/m
```

After the change, or with -XX:UseAVX=2:

```
Valid run!
Score on xml.transform: 894.07 ops/m
```

So, a 15% improvement in this benchmark. It's possible some benchmarks will be negatively affected by this change, but I contend that this is still the right move given the stark difference in this benchmark combined with the fact that use of AVX3 instructions can affect *all* processes/code on the host due to the downclocking, and the fact that this effect is very hard to root-cause, for example CPU profiles look very similar before and after since all code is equally slowed.