JDK-6814552 : par compact - some compilers fail to optimize bitmap code
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: hs14
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2009-03-07
  • Updated: 2010-04-02
  • Resolved: 2009-06-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u18Fixed 7Fixed hs16Fixed
Related Reports
Relates :  
Description
There is a fairly massive performance degradation on parallel old gc when
running on x86-64/linux. The performance degradation can also be seen in the
recently released 6u14 release from Sun.

The issue is the code that is generated for these functions in file
ParMarkBitMap.hpp:


inline size_t ParMarkBitMap::bits_to_words(idx_t bits) {
        return bits * obj_granularity();
}

and

inline ParMarkBitMap::idx_t ParMarkBitMap::words_to_bits(size_t words) {
        return words / obj_granularity();
}

In both cases, the value returned by obj_granularity() is 1. However gcc
decides to generate a div instruction when this function is called from
ParMarkBitMap::live_words_in_range().

See openjdk bugzilla bug https://bugs.openjdk.java.net/show_bug.cgi?id=100006

Comments
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/353ba4575581
14-06-2009

SUGGESTED FIX diff -r f89cf529c3c7 -r 348e8b681498 src/share/vm/gc_implementation/parallelScavenge/parMarkBitMap.hpp --- a/src/share/vm/gc_implementation/parallelScavenge/parMarkBitMap.hpp Mon Jun 08 16:14:19 2009 -0700 +++ b/src/share/vm/gc_implementation/parallelScavenge/parMarkBitMap.hpp Sun Jun 07 22:08:24 2009 -0700 @@ -177,6 +177,7 @@ // are double-word aligned in 32-bit VMs, but not in 64-bit VMs, so the 32-bit // granularity is 2, 64-bit is 1. static inline size_t obj_granularity() { return size_t(MinObjAlignment); } + static inline int obj_granularity_shift() { return LogMinObjAlignment; } HeapWord* _region_start; size_t _region_size; @@ -299,13 +300,13 @@ inline size_t ParMarkBitMap::bits_to_words(idx_t bits) { - return bits * obj_granularity(); + return bits << obj_granularity_shift(); } inline ParMarkBitMap::idx_t ParMarkBitMap::words_to_bits(size_t words) { - return words / obj_granularity(); + return words >> obj_granularity_shift(); } inline size_t ParMarkBitMap::obj_size(idx_t beg_bit, idx_t end_bit) const
10-06-2009

EVALUATION Using shift instead of div helps performance, but the more likely cause of the regression is changes to the BitMap class which inadvertently enabled calls to empty verification methods in the product build; see 6849716.
10-06-2009

EVALUATION The g++ compiler failed to optimize away a divide by 1 in code that is important to GC performance. This code has been part of parallel compaction since it's initial release in jdk 5 update 6; it's possible that gcc changes resulted in the regression. Simple fix is to change the code to use explicit shifts.
07-03-2009