JDK-4924782 : Using 1.3.1_07/09 young generation collections take too long
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 1.3.1_09
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • OS: solaris_8
  • CPU: sparc
  • Submitted: 2003-09-18
  • Updated: 2003-12-04
  • Resolved: 2003-12-04
Related Reports
Duplicate :  
Description
As load climbs, long Garbage Collection pauses are often observed on production servers, very often in the 10-30 second range, but sometimes much higher: customer have seen 180 seconds, and their system administrators report occurences of 900 seconds.

The hardware setup for the test kit is a SUN Netra:

bash-2.03$ uname -a
SunOS averia 5.8 Generic_108528-23 sun4u sparc SUNW,UltraAX-i2

bash-2.03$ psrinfo -v
Status of processor 0 as of: 09/18/03 14:08:53
  Processor has been on-line since 09/18/03 10:34:37.
  The sparcv9 processor operates at 500 MHz,
        and has a sparcv9 floating point processor.

CPU states: 99.2% idle,  0.0% user,  0.8% kernel,  0.0% iowait,  0.0%
swap Memory: 2048M real, 989M free, 1493M swap in use, 1191M swap free

Running with the following options the long pauses times are seen.
-server -Xmx768m -Xms768m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:SurvivorRatio=2 -XX:TargetSurvivorRatio=90 -XX:PermSize=128m -XX:MaxPermSize=128m  -XX:+PrintTenuringDistribution -verbose:gc 
-XX:+DisableExplicitGC

[GC Desired survivor size 60397976 bytes, new threshold 32 (max 32)
- age   1:  4395464 bytes,  4395464 total
- age   2:  4355928 bytes,  8751392 total
- age   3:  5033640 bytes, 13785032 total
- age   4:  4649080 bytes, 18434112 total
- age   5:  6494632 bytes, 24928744 total
 151501K->24344K(720896K), 11.6820641 secs]
[GC
Desired survivor size 60397976 bytes, new threshold 32 (max 32)
- age   1:  3259272 bytes,  3259272 total
- age   2:  3649848 bytes,  6909120 total
- age   3:  4353200 bytes, 11262320 total
- age   4:  5031600 bytes, 16293920 total
- age   5:  4648912 bytes, 20942832 total
- age   6:  6491704 bytes, 27434536 total
 155416K->26791K(720896K), 23.4025614 secs]
Desired survivor size 60397976 bytes, new threshold 32 (max 32)
- age   1:  1539792 bytes,  1539792 total
- age   2:  2045880 bytes,  3585672 total
- age   3:  3607016 bytes,  7192688 total
- age   4:  4351832 bytes, 11544520 total
- age   5:  5028312 bytes, 16572832 total
- age   6:  4648912 bytes, 21221744 total
- age   7:  6491704 bytes, 27713448 total
 157863K->27063K(720896K), 101.4247403 secs]
Desired survivor size 60397976 bytes, new threshold 32 (max 32)
- age   1:  1007208 bytes,  1007208 total
- age   2:   663184 bytes,  1670392 total
- age   3:  2026504 bytes,  3696896 total
- age   4:  3600344 bytes,  7297240 total
- age   5:  4351832 bytes, 11649072 total
- age   6:  5005384 bytes, 16654456 total
- age   7:  4641888 bytes, 21296344 total
- age   8:  6439832 bytes, 27736176 total
 158135K->27086K(720896K), 1.3596074 secs]
[GC
Desired survivor size 60397976 bytes, new threshold 32 (max 32)
- age   1:   911200 bytes,   911200 total
- age   2:   182840 bytes,  1094040 total
- age   3:   659800 bytes,  1753840 total
- age   4:  2026504 bytes,  3780344 total
- age   5:  3594128 bytes,  7374472 total
- age   6:  4331816 bytes, 11706288 total
- age   7:  4998288 bytes, 16704576 total
- age   8:  4641584 bytes, 21346160 total
- age   9:  6407320 bytes, 27753480 total
 158158K->27103K(720896K), 1.7949328 secs]

The pstacks, GC and thread dumps are in the attachment.

Comments
EVALUATION This is JDK-1.3.1, on which I'm not an expert, but I suspect the problem is this thread: ----------------- lwp# 4 / thread# 4 -------------------- fe1175e4 __1cOGenerateOopMapTmerge_state_vectors6MpnNCellTypeState_2_l_ (f411b4, 0, 46173c, f4124c, 4617d4, fee0b4b4) + b0 fe0e8c18 __1cOGenerateOopMapRdo_exception_edge6MpnOBytecodeStream__v_ (ffffffff, fe4cc020, f3d35628, a88, 1, 0) + 2c0 fe12422c __1cOGenerateOopMapJinterp_bb6MpnKBasicBlock__v_ (f3d35afc, 45d73c, 45c07c, fe4cc020, 45d73c, f3d36577) + 224 fe1ea794 __1cOGenerateOopMapLcompute_map6M_v_ (fe4d76c8, fe4e5608, fe4cc020, ffffffff, 1, 1) + 608 fe2105fc __1cTOopMapForCacheEntryLcompute_map6M_v_ (fee0b4b4, fee0b460, 5d918, fee0b, fe4cc020, fee0b3fc) + 8c fe20f358 __1cQOopMapCacheEntryEfill6MpnNmethodOopDesc_i_v_ (ba66a8, f3d35ab8, 1fd8, fe4cc020, cc1d7d20, 5d7f8) + 324 fe15c300 __1cLOopMapCacheGlookup6MpnNmethodOopDesc_il_pnQOopMapCacheEntry__ (25ab4, ba66a8, 1fd8, 1, f3d35ab8, 1fd8) + 1d0 fe15c0c4 __1cNmethodOopDescImask_for6Mi_pnQOopMapCacheEntry__ (fe4cc020, 1fd8, f3d35ab8, 12, 12, fee0b6d0) + 2c fe15c658 __1cFframeToops_interpreted_do6MpFppnHoopDesc__vpknLRegisterMap__v_ (fe4d6fec, fee0b7d0, f3d35ab8, fe4cc020, fe0e0b3c, fe4d6e60) + 330 fe139aa8 __1cFframeHoops_do6MpFppnHoopDesc__vpnLRegisterMap__v_ (fee0b7d0, fe0e0b3c, fee0b7e0, fe4cc020, fee0b7d0, fe0e0b3c) + 64 fe17e09c __1cKJavaThreadHoops_do6MpFppnHoopDesc__v_v_ (0, fe0e0b3c, cc1d71c8, f2815c7c, f2815c7c, fe0e0b3c) + 13c fe1fb8a4 __1cHThreadsHoops_do6FpFppnHoopDesc__v_v_ (fe4cc020, 99bc40, fe0e0b3c, 20, 24fb70, fe0e0b3c) + 34 fe211364 __1cIScavengeTinvoke_at_safepoint6Filrl_v_ (fe53eb70, fe4e2f94, fe4e2f88, 0, fe54528c, fe53eb68) + 70c fe1963e4 __1cMVM_OperationIevaluate6M_v_ (bd57cb20, fe4e5608, 164, fee0b, fe4cc020, fee0bb34) + 34 fe196488 __1cIVMThreadSevaluate_operation6MpnMVM_Operation__v_ (122d88, bd57cb20, fe4cc020, fe4e2a20, fe4dea2c, fe4cc020) + 3c fe2d4918 __1cIVMThreadEloop6M_v_ (122d88, bd57cb20, 8, fe4ea534, fe4ea528, fe4ea538) + 2e8 fe2d42a4 __1cIVMThreadDrun6M_v_ (fe4cc020, 122d88, 0, 0, 0, 0) + 7c fe21450c _start (fe4cc020, ff38f688, 1, 1, ff38e000, 0) + 20 ff37b744 _thread_start (122d88, 0, 0, 0, 0, 0) + 40 which reminds me of bug 4734748: pathologically slow oopmap generation Oop map generation happens during GC, and so shows up as part of the time spent in GC. It doesn't look like there are any good workarounds in 4734748, except to avoid deeply-nested try/finally blocks. Do you know if you are running code generated from JSP pages? (It looks like it from the Java-level stack traces.) Do you have any control over the JSPs, or the compiler that's used to translate them to Java code? There were rumors that some JSP compilers did a better job than others at translating to code that runs quickly. ###@###.### 2003-09-18 ---------------------------------------------------------------- Is there a way to reproduce this problem, or someone I can work with who has the problem that would be willing to work with me on this? It does look like there are JSP's involved, which have caused long GC pauses, and is tracked with bug 4734748. That would account for a few 10's of seconds of pause per GC, not the 900 seconds reported in the description: that sounds like paging, not oop map generation. Is this really a new bug or a duplicate of 4734748? I'm marking the bug as "incomplete" until we find out. ###@###.### 2003-12-03 ---------------------------------------------------------------- Date: Wed, 03 Dec 2003 15:57:25 -0800 From: albert tong-schmidt <###@###.###> Subject: Re: BugId 4924782 : (P4/S1) Has been Updated - by pbk Escalated Bug. Sender: ###@###.### To: ###@###.### Hello Peter, You can close this bug as a duplicate of bug 4734748. The customer modified their compiler to be more conservative in creating the try/finally blocks and the GC performance became much better. Albert. ----------------------------------------------------------------
11-06-2004