Bug ID: JDK-8177704 Regression in SPECjvm2008.sparse-large because of changes to marking cycle in JDK-8017744

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 10
10Resolved

In 9b116 SPECjvm2008.sparse-large regressed by about 10% compared to 9b115 on x64.

Analysis (see comments) showed that this is some bad interaction between some compiler optimization and the changes to the concurrent marking cycle.

In particular, in 9b116 the work done during marking has been split into two concurrent phases, marking as usual (before the Remark phase), and calculating live data information (after Remark and before Cleanup).

In general, this decreased the length of concurrent cycles significantly. Now, this benchmark seems to starve the CPU the marking (and the control) threads get. Now since we have some additional work to do (previously Remark and Cleanup basically occur back-to-back so for some reason this is "okay"), this lengthens the concurrent phase and decreases G1 performance significantly.

This problem goes away with -XX:+UseCountedLoopSafepoints.

Closing as duplicate to the strip mining RFE: the original investigation already showed that at least somehow the UseCountedLoopSafepoint optimization was responsible for the regression (in conjunction with differences to how G1 operates). In addition to not only fixing the regression, with JDK-8186027 we also get a nice perf improvement anyway, so closing this out as duplicate.

07-11-2017

I have run Rolands loop strip mining optimization on scimark.sparse.large as part of that review. The results might be interesting to this bug. Using G1: -XX:-UseCountedLoopSafepoints ~86 ops/m -XX:+UseCountedLoopSafepoints ~106 ops/m -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 ~111 ops/m

23-10-2017

Another potential option would be to just reclaim these dead humongous objects either in the cleanup/remark pause (JDK-8154528) directly or at the start of gc. If reclaim during the pauses is too expensive, another option would be to just format these objects as int arrays, i.e. the is_typeArray() check for dead objects would always be valid then :)

28-03-2017

Duplicate :	JDK-8186027 - C2: loop strip mining
Relates :	JDK-8177703 - Logging for gc+humongous potentially accesses klasses of dead objects
Relates :	JDK-8163579 - Improve adaptive IHOP in situations where G1 always aborts the mixed gc phase
Relates :	JDK-5014723 - implement "strip mining" loop optimization
Relates :	JDK-8153843 - G1CardLiveDataHelper incorrectly sets next_live_bytes on dead humongous regions