Bug ID: JDK-6459113 CMS+ParNew: wildly different ParNew pause times depending on heap shape caused by allocation spread

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 5.0u7,6

Priority: P3
Status: Resolved
Resolution: Fixed
OS: generic,solaris_9
CPU: generic,x86

Submitted: 2006-08-10
Updated: 2010-12-08
Resolved: 2006-11-14

Other	JDK 6	JDK 7	Other
1.4.2_14,hs10Fixed	6u1Fixed	7Fixed	hs10Fixed

See comments section for email thread.

EVALUATION This bug fix should be backported along with that for 6433335.
13-09-2006
SUGGESTED FIX The following fix has been putback to 7.0 and should be backported to 6u1, 5u10 and as appropriate 1.4.2_XX. Event: putback-to Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline (jano.sfbay:/export/disk05/hotspot/ws/main/gc_baseline) Child workspace: /net/prt-web.sfbay/prt-workspaces/20060912150719.ysr.bot/workspace (prt-web:/net/prt-web.sfbay/prt-workspaces/20060912150719.ysr.bot/workspace) User: ysr Comment: --------------------------------------------------------- Job ID: 20060912150719.ysr.bot Original workspace: neeraja:/net/spot/scratch/ysr/bot Submitter: ysr Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060912150719.ysr.bot/ Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060912150719.ysr.bot/workspace/webrevs/webrev-2006.09.12/index.html Fixed 6433335: ParNewGC times spiking, eventually taking up 20 out of every 30 seconds Fixed 6459113: CMS+ParNew: wildly different ParNew pause times depending on heap shape caused by allocation spread Webrev: http://analemma.sfbay/net/spot/scratch/ysr/bot/webrev (a) 6433335: ParNew's chunking of the card-table makes use of block-start calls for chunk surgery prior to assignment. These calls are in addition to the usual ones where block-start calls are used to start the scanning of the first card of a range of dirty cards. The logarithmic BOT was expected to keep the cost of such calls, well, logarithmic, in the size of the block being described, i.e. O(log n), n the size of the block. Unfortunately, as implemented, the cost was not asymptotically logarithmic, but rather logarithmic up to n = 16^3 = 4K cards = 2 MB and linear i.e. O(n) for the balance. The problem in 6433335 was that under certain kinds of heap shapes and object sizes and lifetimes, the CMS heap could end up with large free blocks. The chunking of the card-table would then run into O(n) cost of block-start calls on these large free chunks. There are several orthogonal fixes to this problem, all of which are additive. This putback accomplishes the most major of those, which is to make the BOT asymptotically logarithmic. A consequence of that change, however, is that the "block-split" operation can now become potentially costlier. The previous asymptotic cost of a split was bounded by a constant. For moderate size blocks, the cost was O(p.log s), p+s=n, n the size of the block being split, p the size of the prefix of the split and s the size of the suffix. The actual cost was however dictated by a tight loop in which a BOT entry is first read and tested and then possibly modified, thus making for a large constant in the big-O. We modified the BOT fixup operation so that we do not need to do any BOT reads at all, just writes. This won back any potential loss from the new block-split cost which is asymptotically O(p.log s). This solved the customer's performance problem with no regression in performance elsewhere. (The main concern had been that the increased asymptotic cost of splitting would increase the variance in the ParNew times due to the large hit one might take as a result of splitting a large block to replenish the free lists. Too, the initial scavenges as well as perm gen allocation would be more adversely affected, causing an increase in start-up. Indeed fixing the logBOT problem did bring up these issues, but fixing the BOT fixup won back the loss in performance.) As we mentioned above in passing, there are further possible improvements, including reducing the block-start calls for ParNew chunking. These are deferred for later (under a different bug id we will be opening for the purpose). (b) 6459113: The problem here was that when we expand the old generation we were not taking care to coalesce the newly obtained space to the end of an existing previously co-terminal free block in the heap. Depending on allocation patterns and lifetimes this could potentially cause avoidable fragmentation, defeat CMS's coalition heuristics that try to maintain a large corpus of free storage in the upper part of the generation, and because of the presence of large free blocks lower in the address space, increase block-start costs for ParNew's card-table chunking. This also fixes 6446077 reported by Shutterfly and independently by a Java forum user as well. Thanks to Mark Susko for interfacing w/customer (General Dynamics), to Poonam Bajaj for help with 5uXX instrumentation for customer and verifying fix using test case from Shuttrefly, and to Stephen "cool-as-a-cucumber" Fitch and Mark Susko for keeping a hot escalation cool, allowing us the luxury of exploring all avenues of solution including several false starts at a constant time block split operation using a non-canonical BOT representation which was not to be. Reviewed by: Jon Masamitsu, Tony Printezis; Poonam Bajaj (partial) Fix Verified: yes Verification Testing: . 6433335: testing by the customer . 6459113: . test case from customer (Cinnober) attached to bug report . also independently tested using Shutterfly's test case (by Poonam) Other testing: +/- CMS +/- VerifyBlockOffsetArray . PRT . refworkload Performance testing: . refworkload (JBB with CMS+ParNew gives a nice 20% ParNew improvement and a 5% bottom-line improvement on Niagara) . Alacrity (no change) Files: update: src/share/vm/memory/blockOffsetTable.cpp update: src/share/vm/memory/blockOffsetTable.hpp update: src/share/vm/memory/compactibleFreeListSpace.cpp update: src/share/vm/memory/compactibleFreeListSpace.hpp Examined files: 3876 Contents Summary: 4 update 3872 no action (unchanged)
13-09-2006
EVALUATION See suggested fix, comments and workaround. We should fix this in 6u1 and backport as appropriate since it can cause performance surprises.
15-08-2006
SUGGESTED FIX In addition to the fix for 6433335, when the heap expands, the new block should be coalesced with the "big guy" that we try to maintain at the end of the heap, rather than adding it independently to the big chunk dictionary as currently done. This will prevent the "allocation spread" with the large free block intervening in the middle.
15-08-2006
WORK AROUND -XX:-UseParNewGC reduces the slow down substantially (same as 6433335); alternatively, set -Xmx and -Xms the same to avoid the "allocation spread" that tickles this behaviour (at least with this program).
15-08-2006

Duplicate :	JDK-6446077 - Long GC times after Concurrent Mode Failure
Relates :	JDK-6433335 - ParNewGC times spiking, eventually taking up 20 out of every 30 seconds