Originally reported here:
https://mail.openjdk.java.net/pipermail/shenandoah-dev/2019-October/010772.html
The easiest reproducer is this:
https://cr.openjdk.java.net/~shade/8232575/GZIPUncompressBench.java
https://cr.openjdk.java.net/~shade/8232575/benchmarks.jar
With current jdk/jdk:
Parallel, 1 thread: 111.951 �� 4.628 us/op
Parallel, 16 threads: 149.951 �� 3.555 us/op
Shenandoah, 1 threads: 141.543 �� 0.280 us/op
Shenandoah, 16 threads: 1451.914 �� 51.406 us/op ; <---- huge scalability bottleneck
This is ruled to be the contention on the lock that guards the transition to "Pinned" state for the region. Shenandoah can be much smarter about that.
With this proof-of-concept patch:
https://cr.openjdk.java.net/~shade/8232575/webrev.01/
Shenandoah, 1 threads: 111.997 �� 3.213 us/op ; <--- matches Parallel
Shenandoah, 16 threads: 151.777 �� 3.511 us/op <--- matches Parallel