JDK-2141973 : CMS+ParNew: wildly different ParNew pause times depending on heap shape caused by allocation spread
  • Type: Backport
  • Backport of: JDK-6459113
  • Component: hotspot
  • Sub-Component: gc
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2006-09-13
  • Updated: 2010-04-02
  • Resolved: 2006-12-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 Other
6u1 b01Fixed hs10Fixed
Description
See parent CR.

Comments
SUGGESTED FIX Event: putback-to Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/1.6/update1/baseline (jano.sfbay:/export/disk05/hotspot/ws/1.6/update1/baseline) Child workspace: /net/prt-web.sfbay/prt-workspaces/20061121150426.ysr.6u1/workspace (prt-web:/net/prt-web.sfbay/prt-workspaces/20061121150426.ysr.6u1/workspace) User: ysr Comment: --------------------------------------------------------- Job ID: 20061121150426.ysr.6u1 Original workspace: neeraja:/net/spot/archive02/ysr/6u1 Submitter: ysr Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/1.6/update1/baseline/2006/20061121150426.ysr.6u1/ Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/1.6/update1/baseline/2006/20061121150426.ysr.6u1/workspace/webrevs/webrev-2006.11.21/index.html Fixed 6433335: ParNewGC times spiking, eventually taking up 20 out of every 30 seconds Fixed 6459113: CMS+ParNew: wildly different ParNew pause times depending on heap shape caused by allocation spread This is a verbatim backport to 6u1 of changes in 7.0 (and already backported to 5u10 and 1.4.2_14). webrev: http://vmsqe.sfbay/net/spot/archive02/ysr/6u1/webrev Reproduced below are the putback comments from the 7.0 putback:- (a) 6433335: ParNew's chunking of the card-table makes use of block-start calls for chunk surgery prior to assignment. These calls are in addition to the usual ones where block-start calls are used to start the scanning of the first card of a range of dirty cards. The logarithmic BOT was expected to keep the cost of such calls, well, logarithmic, in the size of the block being described, i.e. O(log n), n the size of the block. Unfortunately, as implemented, the cost was not asymptotically logarithmic, but rather logarithmic up to n = 163 = 4K cards = 2 MB and linear i.e. O(n) for the balance. The problem in 6433335 was that under certain kinds of heap shapes and object sizes and lifetimes, the CMS heap could end up with large free blocks. The chunking of the card-table would then run into O(n) cost of block-start calls on these large free chunks. There are several orthogonal fixes to this problem, all of which are additive. This putback accomplishes the most major of those, which is to make the BOT asymptotically logarithmic. A consequence of that change, however, is that the "block-split" operation can now become potentially costlier. The previous asymptotic cost of a split was bounded by a constant. For moderate size blocks, the cost was O(p.log s), p+s=n, n the size of the block being split, p the size of the prefix of the split and s the size of the suffix. The actual cost was however dictated by a tight loop in which a BOT entry is first read and tested and then possibly modified, thus making for a large constant in the big-O. We modified the BOT fixup operation so that we do not need to do any BOT reads at all, just writes. This won back any potential loss from the new block-split cost which is asymptotically O(p.log s). This solved the customer's performance problem with no regression in performance elsewhere. (The main concern had been that the increased asymptotic cost of splitting would increase the variance in the ParNew times due to the large hit one might take as a result of splitting a large block to replenish the free lists. Too, the initial scavenges as well as perm gen allocation would be more adversely affected, causing an increase in start-up. Indeed fixing the logBOT problem did bring up these issues, but fixing the BOT fixup won back the loss in performance.) As we mentioned above in passing, there are further possible improvements, including reducing the block-start calls for ParNew chunking. These are deferred for later (under a different bug id we will be opening for the purpose). (b) 6459113: The problem here was that when we expand the old generation we were not taking care to coalesce the newly obtained space to the end of an existing previously co-terminal free block in the heap. Depending on allocation patterns and lifetimes this could potentially cause avoidable fragmentation, defeat CMS's coalition heuristics that try to maintain a large corpus of free storage in the upper part of the generation, and because of the presence of large free blocks lower in the address space, increase block-start costs for ParNew's card-table chunking. This also fixes 6446077 reported by Shutterfly and independently by a Java forum user as well. Thanks to Mark Susko for interfacing w/customer (General Dynamics), to Poonam Bajaj for help with 5uXX instrumentation for customer and verifying fix using test case from Shuttrefly, and to Stephen "cool-as-a-cucumber" Fitch and Mark Susko for keeping a hot escalation cool, allowing us the luxury of exploring all avenues of solution including several false starts at a constant time block split operation using a non-canonical BOT representation which was not to be. Reviewed in 7.0 by: Jon Masamitsu, Tony Printezis; Poonam Bajaj (partial) Rereviewed in 6u1 by: Jon Masamitsu Approved for 6u1 by: Dave Cox Fix Verified: yes Verification Testing: . 6433335: [5uXX-based] testing by the customer [5u10 based] . 6459113: . test case from customer (Cinnober) attached to bug report . [5uXX-based] also independently tested using Shutterfly's test case (by Poonam) Other testing: �� CMS �� VerifyBlockOffsetArray . PRT . refworkload Performance testing: (7.0-based) . refworkload (20% scavenge improvement, 5% bottom-line improvement on Niagara w/CMS) . Alacrity (no change w/CMS or w/PSGC) Files: update: src/share/vm/memory/blockOffsetTable.cpp update: src/share/vm/memory/blockOffsetTable.hpp update: src/share/vm/memory/compactibleFreeListSpace.cpp update: src/share/vm/memory/compactibleFreeListSpace.hpp Examined files: 3392 Contents Summary: 4 update 3388 no action (unchanged)
22-11-2006

EVALUATION see parent CR.
14-09-2006