JDK-8314859 : JNI AttachCurrentThread/DetachCurrentThread experiences performance regression on JDK17
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 17
  • Priority: P3
  • Status: Closed
  • Resolution: Won't Fix
  • OS: generic
  • CPU: generic
  • Submitted: 2023-08-15
  • Updated: 2023-11-10
  • Resolved: 2023-11-10
Related Reports
Relates :  
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
-----------------------------------------------------------------
Ubuntu version:
-----------------------------------------------------------------
Linux ip-10-110-25-144 5.4.0-1045-aws #47~18.04.1-Ubuntu SMP Tue Apr 13 15:58:14 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
-----------------------------------------------------------------
JDK8 version:
-----------------------------------------------------------------
openjdk version "1.8.0_362"
OpenJDK Runtime Environment (build 1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09)
OpenJDK 64-Bit Server VM (build 25.362-b09, mixed mode)
-----------------------------------------------------------------
JDK17 version:
-----------------------------------------------------------------
openjdk version "17.0.7" 2023-04-18
OpenJDK Runtime Environment (build 17.0.7+7-Ubuntu-0ubuntu118.04)
OpenJDK 64-Bit Server VM (build 17.0.7+7-Ubuntu-0ubuntu118.04, mixed mode, sharing)

A DESCRIPTION OF THE PROBLEM :
Calling AttachCurrentThread/DetachCurrentThread from an unattached native thread experiences up to 100% regression.
```
Time Taken | Mean            | Standard Deviation | p999            | p99             | p95             | p90            
----------------------------------------
jdk8  110039 μs | 10966.776800 ns | 56177.341432 ns | 532334 ns       | 178610 ns       | 9317 ns         | 7873 ns        
jdk17 179558 μs | 17924.003600 ns | 54217.940476 ns | 321488 ns       | 122924 ns       | 18141 ns        | 15983 ns       
----------------------------------------
jdk8  110509 μs | 11013.906400 ns | 55116.644885 ns | 513406 ns       | 97419 ns        | 9828 ns         | 7960 ns        
jdk17 177801 μs | 17749.037200 ns | 55293.117182 ns | 330321 ns       | 111809 ns       | 18062 ns        | 16026 ns       
----------------------------------------
jdk8  109543 μs | 10919.695700 ns | 56184.273630 ns | 543077 ns       | 106345 ns       | 9569 ns         | 7974 ns        
jdk17 179328 μs | 17901.148800 ns | 54683.063365 ns | 346205 ns       | 112247 ns       | 18124 ns        | 16025 ns       
----------------------------------------
jdk8  112088 μs | 11166.931300 ns | 55597.324850 ns | 529521 ns       | 112062 ns       | 10241 ns        | 8825 ns        
jdk17 178398 μs | 17808.946700 ns | 54630.243726 ns | 296359 ns       | 116179 ns       | 18323 ns        | 16121 ns       
----------------------------------------
jdk8  110469 μs | 11013.433300 ns | 54701.167789 ns | 598978 ns       | 96847 ns        | 10273 ns        | 8487 ns        
jdk17 175880 μs | 17555.650300 ns | 53842.205624 ns | 305101 ns       | 114811 ns       | 18063 ns        | 15962 ns       
----------------------------------------
```

Based on profiles taken, it seems to be related to Thread-SMR support, particularly ThreadsSMRSupport::wait_until_not_protected is taking a lot of time.

REGRESSION : Last worked in version 8

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
reproducing repo: https://github.com/alexjinghn/gradleJNI

```
git clone git@github.com:alexjinghn/gradleJNI.git
cd gradleJNI/
benchmark.sh 5 10000
```


EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
performance of these two API should not change.
ACTUAL -
performance regression.

---------- BEGIN SOURCE ----------
reproducing repo: https://github.com/alexjinghn/gradleJNI
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Avoid attaching native threads to JVM as much as possible.

FREQUENCY : always



Comments
It is evident that there have been performance changes in this area, but there seems little scope for making changes related to ThreadSMR or OopStorage. So at this time I am closing this as "will not fix".
10-11-2023

Here's a statistical comparative summary from my basic attach/detach round-trip testing: (Apologies for the formatting but JBS can't do formatting unfortunately.) Version Delta to prev Delta to 8u Delta to 8u best 8u382 9 -47.56% -47.56% -34.05% 10 43.27% -24.88% -5.52% 11 0.95% -24.16% -4.62% 12 -0.99% -24.91% -5.56% 13 6.20% -20.25% 0.30% 14 -3.20% -22.80% -2.91% 15 8.39% -16.33% 5.24% 16 12.74% -5.67% 18.64% 17 -1.00% -6.61% 17.46% 18 3.44% -3.40% 21.50% 19 2.98% -0.52% 25.12% 20 -6.85% -7.33% 16.55% So looking at the averages JDK 17 shows a 6.61% speedup compared to 8u382. But as I stated 8u also showed extreme variance. So looking at 8u's best result JDK 17 shows a 17.46% slowdown.
29-08-2023

The initial slowdown in JDK 10 arose in b36, likely from JDK-8167108 which introduced the Thread-SMR mechanism. The subsequent slowdown in JDK 16 occurred in b11, likely caused by JDK-8244997: Convert the JavaThread::_threadObj oop to use OopStorage
28-08-2023

Raw data - 5 runs each release. Interesting to note that 8u seems to have significant variance. Huge speedup in 9 that we lose a lot 10, then we lose again in 16 to get to current levels. But compared to 8, 17 is not too bad. Running ... openjdk version "1.8.0_382" OpenJDK Runtime Environment (build 1.8.0_382-b05) OpenJDK 64-Bit Server VM (build 25.382-b05, mixed mode) 1000000 iterations in 39672227061ns (39672.227061 ns/op) 1000000 iterations in 28600501237ns (28600.501237 ns/op) 1000000 iterations in 27681855017ns (27681.855017 ns/op) 1000000 iterations in 38784841669ns (38784.841669 ns/op) 1000000 iterations in 39338177243ns (39338.177243 ns/op) Running ... java version "9" Java(TM) SE Runtime Environment (build 9+181) Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode) 1000000 iterations in 18360202638ns (18360.202638 ns/op) 1000000 iterations in 18087048603ns (18087.048603 ns/op) 1000000 iterations in 17869036982ns (17869.036982 ns/op) 1000000 iterations in 18538841961ns (18538.841961 ns/op) 1000000 iterations in 18423336126ns (18423.336126 ns/op) Running ... java version "10" 2018-03-20 Java(TM) SE Runtime Environment 18.3 (build 10+46) Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10+46, mixed mode) 1000000 iterations in 26198910419ns (26198.910419 ns/op) 1000000 iterations in 25325828481ns (25325.828481 ns/op) 1000000 iterations in 26479679938ns (26479.679938 ns/op) 1000000 iterations in 26029168059ns (26029.168059 ns/op) 1000000 iterations in 26741678768ns (26741.678768 ns/op) Running ... java version "11" 2018-09-25 Java(TM) SE Runtime Environment 18.9 (build 11+28) Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode, sharing) 1000000 iterations in 25984372650ns (25984.372650 ns/op) 1000000 iterations in 26520840482ns (26520.840482 ns/op) 1000000 iterations in 26484095471ns (26484.095471 ns/op) 1000000 iterations in 26346530379ns (26346.530379 ns/op) 1000000 iterations in 26685180521ns (26685.180521 ns/op) Running ... java version "12" 2019-03-19 Java(TM) SE Runtime Environment (build 12+33) Java HotSpot(TM) 64-Bit Server VM (build 12+33, mixed mode, sharing) 1000000 iterations in 26168187811ns (26168.187811 ns/op) 1000000 iterations in 26036940685ns (26036.940685 ns/op) 1000000 iterations in 25654727513ns (25654.727513 ns/op) 1000000 iterations in 26565516106ns (26565.516106 ns/op) 1000000 iterations in 26289671954ns (26289.671954 ns/op) Running ... java version "13" 2019-09-17 Java(TM) SE Runtime Environment (build 13+33) Java HotSpot(TM) 64-Bit Server VM (build 13+33, mixed mode, sharing) 1000000 iterations in 28082667722ns (28082.667722 ns/op) 1000000 iterations in 27817695072ns (27817.695072 ns/op) 1000000 iterations in 27023157798ns (27023.157798 ns/op) 1000000 iterations in 28150004146ns (28150.004146 ns/op) 1000000 iterations in 27747709791ns (27747.709791 ns/op) Running ... java version "14" 2020-03-17 Java(TM) SE Runtime Environment (build 14+36-1461) Java HotSpot(TM) 64-Bit Server VM (build 14+36-1461, mixed mode, sharing) 1000000 iterations in 26793722966ns (26793.722966 ns/op) 1000000 iterations in 26800345376ns (26800.345376 ns/op) 1000000 iterations in 26714572606ns (26714.572606 ns/op) 1000000 iterations in 27179284683ns (27179.284683 ns/op) 1000000 iterations in 26894600458ns (26894.600458 ns/op) Running ... java version "15" 2020-09-15 Java(TM) SE Runtime Environment (build 15+36-1562) Java HotSpot(TM) 64-Bit Server VM (build 15+36-1562, mixed mode, sharing) 1000000 iterations in 29135142059ns (29135.142059 ns/op) 1000000 iterations in 29102727624ns (29102.727624 ns/op) 1000000 iterations in 28975347722ns (28975.347722 ns/op) 1000000 iterations in 29188575261ns (29188.575261 ns/op) 1000000 iterations in 29257370186ns (29257.370186 ns/op) Running ... java version "16" 2021-03-16 Java(TM) SE Runtime Environment (build 16+36-2231) Java HotSpot(TM) 64-Bit Server VM (build 16+36-2231, mixed mode, sharing) 1000000 iterations in 32977467650ns (32977.467650 ns/op) 1000000 iterations in 33528488053ns (33528.488053 ns/op) 1000000 iterations in 32036169071ns (32036.169071 ns/op) 1000000 iterations in 32443484619ns (32443.484619 ns/op) 1000000 iterations in 33224378727ns (33224.378727 ns/op) Running ... java version "17" 2021-09-14 LTS Java(TM) SE Runtime Environment (build 17+35-LTS-2724) Java HotSpot(TM) 64-Bit Server VM (build 17+35-LTS-2724, mixed mode, sharing) 1000000 iterations in 33484085800ns (33484.085800 ns/op) 1000000 iterations in 32110454974ns (32110.454974 ns/op) 1000000 iterations in 32633310901ns (32633.310901 ns/op) 1000000 iterations in 32145603723ns (32145.603723 ns/op) 1000000 iterations in 32195304191ns (32195.304191 ns/op) Running ... java version "18" 2022-03-22 Java(TM) SE Runtime Environment (build 18+36-2087) Java HotSpot(TM) 64-Bit Server VM (build 18+36-2087, mixed mode, sharing) 1000000 iterations in 32985148030ns (32985.148030 ns/op) 1000000 iterations in 33431451782ns (33431.451782 ns/op) 1000000 iterations in 33394443145ns (33394.443145 ns/op) 1000000 iterations in 34894354537ns (34894.354537 ns/op) 1000000 iterations in 33456532642ns (33456.532642 ns/op) Running ... java version "19" 2022-09-20 Java(TM) SE Runtime Environment (build 19+36-2238) Java HotSpot(TM) 64-Bit Server VM (build 19+36-2238, mixed mode, sharing) 1000000 iterations in 34908270982ns (34908.270982 ns/op) 1000000 iterations in 33923039669ns (33923.039669 ns/op) 1000000 iterations in 35587516908ns (35587.516908 ns/op) 1000000 iterations in 34443303364ns (34443.303364 ns/op) 1000000 iterations in 34312022374ns (34312.022374 ns/op) Running ... java version "20" 2023-03-21 Java(TM) SE Runtime Environment (build 20+36-2344) Java HotSpot(TM) 64-Bit Server VM (build 20+36-2344, mixed mode, sharing) 1000000 iterations in 31977081797ns (31977.081797 ns/op) 1000000 iterations in 32360210378ns (32360.210378 ns/op) 1000000 iterations in 32335724330ns (32335.724330 ns/op) 1000000 iterations in 32539086242ns (32539.086242 ns/op) 1000000 iterations in 32097096341ns (32097.096341 ns/op)
25-08-2023

I created a very basic attach/detach loop to measure a million iterations: static void perf_test() { const int iters = 1000000; int res; struct timespec start; struct timespec end; res = clock_gettime(CLOCK_MONOTONIC, &start); if (res != 0) die("clock_gettime"); for (int i = 0; i < iters; i++) { JNIEnv* env; res = (*jvm)->AttachCurrentThread(jvm, (void**)&env, NULL); if (res != JNI_OK) die("AttachCurrentThread"); if ((*jvm)->DetachCurrentThread(jvm) != JNI_OK) die("DetachCurrentThread"); } res = clock_gettime(CLOCK_MONOTONIC, &end); if (res != 0) die("clock_gettime 2"); long seconds = end.tv_sec - start.tv_sec; long nanos = seconds * 1000 * 1000 * 1000; nanos = nanos - start.tv_nsec + end.tv_nsec; printf("%d iterations in %ldns (%f ns/op)\n", iters, nanos, ((nanos*1.0)/iters)); } There is a slight slowdown in 17 (and mainline) versus 8, but more interestingly there is a huge speedup in Java 9 that we then lose again. I need to gather more data before investigating the details.
24-08-2023

It is quite likely that Thread-SMR had an effect on detach times, but if so there may be little we can do about it. This may be an area where the number of threads still active affects performance.
23-08-2023

Moving it to dev team for analysis.
23-08-2023