United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6433335 ParNewGC times spiking, eventually taking up 20 out of every 30 seconds
JDK-6433335 : ParNewGC times spiking, eventually taking up 20 out of every 30 seconds

Submit Date:
Updated Date:
Project Name:
Resolved Date:
Affected Versions:
Fixed Versions:
hs10 (b03)

Related Reports

Sub Tasks

Customer is using an ER release of 5.0U6
The ER is 1.5.0_06-erdist-2006-02-01. The bug addressed was 6367204

The hardware is:
       Dell 4xDual core Xeon
       32GB RAM
       10x RAID 10 HDD
   The OS is Server 2003.
   The java version and configuration:
       JRE v.1.5.0_06-erdist-20060201
       28GB heap
       GC options: -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:ParallelGCThreads=7 -XX:NewSize=128M -XX:MaxNewSize=128M
-XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing
-XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10
-XX:CMSMarkStackSize=8M -XX:CMSMarkStackSizeMax=32M -XX:+UseLargePages

   The application is a custom distributed database server based on
TCP/IP and Sleepycat DBJE

   The symptoms:
       After running smoothly for ~1-4 days straight with
constant but light load, the ParNew GC's jump from ~150 ms every 30
seconds to 5-20 seconds out of every 30 seconds.  The start of the
degenerate ParNew GCs seem to mostly (but not always) coincide with the
start of a new CMS mark phase.  The general pattern is to spend 20-90%
of the time in young GC, which eventually quiesces down to acceptable
levels after ~4 hours of GC pain (frequently to re-start after the next
CMS sweep).
       The load was constant and unvaried from our side, so we
don't see any application-level cause for the degenerate GC performance.

They ran a test with 5.0u8 and the problem seemed to be pushed out.
The time to failure went to 48 hours for the initial 5 second spike and
another day or so to hit the ~20 second spikes.

They were running with large pages so they ran a test without it and with their ER 5.0u6 and the problem seemed to have gone away, but returned many days later.
Turning off large pages seem to have also extended the running but eventually
they still see the problem



Instrumentation so far appears to implicate the presence of
very large free blocks (but the same could happen with very large
objects as well) in the CMS generation affecting adversely the
card scanning times for scavenges presumably on account of the
extremely frequent and slow block_start() calls.

-XX:-UseParNewGC appears to provide a temporary workaround for
this customer for avoiding the long parnew pauses. The downside
is that single-threaded scavenges are slightly longer than
the "good case" when using ParNew. Your milage may vary, but you'll
certainly avoid the long pauses you might otherwise see either
because of very large objects or very large free blocks in the
old generation.

. make BOT lookup asymptotically logarithmic
. speed up splitting operation by avoiding read before write

The following fixes were put back to 7.0 (b2?). They will be ported to
6u1 and 5u10, and as appropriate 1.4.2_XX, for all of which appropriate
subCR's will be created:

Event:            putback-to
Parent workspace: /net/jano.sfbay/export/disk05/hotspot/ws/main/gc_baseline
Child workspace:  /net/prt-web.sfbay/prt-workspaces/20060912150719.ysr.bot/workspace
User:             ysr



Job ID:                 20060912150719.ysr.bot
Original workspace:     neeraja:/net/spot/scratch/ysr/bot
Submitter:              ysr
Archived data:          /net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060912150719.ysr.bot/
Webrev:                 http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/gc_baseline/2006/20060912150719.ysr.bot/workspace/webrevs/webrev-2006.09.12/index.html

Fixed 6433335: ParNewGC times spiking, eventually taking up 20 out of every 30 seconds
Fixed 6459113: CMS+ParNew: wildly different ParNew pause times depending on heap shape caused by allocation spread

  Webrev: http://analemma.sfbay/net/spot/scratch/ysr/bot/webrev

(a) 6433335:
ParNew's chunking of the card-table makes use of block-start
calls for chunk surgery prior to assignment. These calls are
in addition to the usual ones where block-start calls are
used to start the scanning of the first card of a range of
dirty cards. The logarithmic BOT was expected to keep the cost
of such calls, well, logarithmic, in the size of the block
being described, i.e. O(log n), n the size of the block.
Unfortunately, as implemented, the cost was not
asymptotically logarithmic, but rather logarithmic up to
n = 16^3 = 4K cards = 2 MB and linear i.e. O(n) for
the balance.

The problem in 6433335 was that under certain kinds of
heap shapes and object sizes and lifetimes, the CMS heap
could end up with large free blocks. The chunking of the
card-table would then run into O(n) cost of block-start
calls on these large free chunks.

There are several orthogonal fixes to this problem, all of
which are additive. This putback accomplishes the most major
of those, which is to make the BOT asymptotically logarithmic.
A consequence of that change, however, is that the "block-split"
operation can now become potentially costlier. The previous
asymptotic cost of a split was bounded by a constant. For
moderate size blocks, the cost was O(p.log s), p+s=n, n the
size of the block being split, p the size of the prefix of
the split and s the size of the suffix. The actual cost was however
dictated by a tight loop in which a BOT entry is first read
and tested and then possibly modified, thus making for a
large constant in the big-O. We modified the BOT fixup
operation so that we do not need to do any BOT reads at
all, just writes. This won back any potential loss from
the new block-split cost which is asymptotically O(p.log s).

This solved the customer's performance problem with no regression
in performance elsewhere. (The main concern had been that the
increased asymptotic cost of splitting would increase the
variance in the ParNew times due to the large hit one might take
as a result of splitting a large block to replenish the
free lists. Too, the initial scavenges as well as perm gen
allocation would be more adversely affected, causing an
increase in start-up. Indeed fixing the logBOT problem did
bring up these issues, but fixing the BOT fixup won back the
loss in performance.)

As we mentioned above in passing, there are further possible
improvements, including reducing the block-start calls for
ParNew chunking. These are deferred for later (under a different
bug id we will be opening for the purpose).

(b) 6459113:
The problem here was that when we expand the old generation
we were not taking care to coalesce the newly obtained space
to the end of an existing previously co-terminal free block
in the heap. Depending on allocation patterns and lifetimes
this could potentially cause avoidable fragmentation,
defeat CMS's coalition heuristics that try to maintain a large
corpus of free storage in the upper part of the generation,
and because of the presence of large free blocks lower in
the address space, increase block-start costs for ParNew's
card-table chunking. This also fixes 6446077 reported by
Shutterfly and independently by a Java forum user as well.

Thanks to Mark Susko for interfacing w/customer (General Dynamics),
to Poonam Bajaj for help with 5uXX instrumentation for customer
and verifying fix using test case from Shuttrefly, and to
Stephen "cool-as-a-cucumber" Fitch and Mark Susko for keeping a
hot escalation cool, allowing us the luxury of exploring all
avenues of solution including several false starts at a constant
time block split operation using a non-canonical BOT representation
which was not to be.

Reviewed by:  Jon Masamitsu, Tony Printezis; Poonam Bajaj (partial)

Fix Verified: yes

Verification Testing:
  . 6433335: testing by the customer
  . 6459113: . test case from customer (Cinnober) attached to bug report
             . also independently tested using Shutterfly's test case (by Poonam)

Other testing: +/- CMS +/- VerifyBlockOffsetArray
  . PRT
  . refworkload

Performance testing:
  . refworkload (JBB with CMS+ParNew gives a nice 20% ParNew
                 improvement and a 5% bottom-line improvement on Niagara)
  . Alacrity (no change)

update: src/share/vm/memory/blockOffsetTable.cpp
update: src/share/vm/memory/blockOffsetTable.hpp
update: src/share/vm/memory/compactibleFreeListSpace.cpp
update: src/share/vm/memory/compactibleFreeListSpace.hpp

Examined files: 3876

Contents Summary:
       4   update
    3872   no action (unchanged)

See suggested fix section. This bug fix should be back ported along with
that for 6459113.

Hardware and Software, Engineered to Work Together