JDK-4734748 : pathologically slow oopmap generation
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 2.0,1.3.1_09,1.4.2,1.4.2_05,5.0
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS:
    generic,solaris_2.5.1,solaris_8,solaris_9,windows_nt generic,solaris_2.5.1,solaris_8,solaris_9,windows_nt
  • CPU: generic,x86,sparc
  • Submitted: 2002-08-21
  • Updated: 2013-11-01
  • Resolved: 2005-08-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 6
1.4.2_13Fixed 6 b48Fixed
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Description
An iPlanet app has encountered huge minor pause times (40+sec)
due to oopmap generation by the interpreter.  More generally,
some oopmap generations are observed to take >200ms, which is
more than the entire GC budget for the Nortel telco application.

Oopmaps are needed for interpreted frames.  We have interpreted
frames when the compiler bails out.  Some of the bailing methods
are code generated from JSPs, that have heavy nesting of try..finally.

Both c1 and c2 are bailing out in this particular case.  Fixing 
compiler bailouts in general is a large project.  One might
automagically refactor large methods instead, which would also
be a large project.  

Oopmaps tell, for a particular bci, whether each slot holds an oop.
The current system caches this information.  Normally one hopes
that methods are compiled, and thus no longer need oopmaps.

However, the same methods that cause compiler bailouts are also
causing unexpectedly large oopmap generation times, occasionally
on the order of 100s of msec.  The offending code does fixed-point data 
flow iteration. 

Simply growing the cache size doesn't help, because there are a
very large number of possible bci's in these large methods.

Changing the way JSPs generate code may help avoid bailouts, but won't
fix the underlying problem.  

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: dragon
03-08-2004

EVALUATION Email from James McIlree <###@###.###> To: <###@###.###> Sent: Monday, August 05, 2002 1:12 AM Subject: Potential problem for pause-time collectors > > I've spent some time looking at GenerateOopMap > performance over the weekend. The goal was to track down > a performance problem for iPlanet, but I've found some > interesting data. > > I did a fairly quick hack to "CompileTheWorld" > to cause it to OopMapTheWorld instead, and print out > how long it takes to generate an OopMap for any given > method. > > I ran against rt.jar for 1.4.0_02 and sorted > the results. Some of the more interesting entries: > > [0.0101962 sun/io/ByteToCharCp949 : <clinit> (17663)] > [0.0104368 javax/swing/plaf/basic/BasicLookAndFeel : initComponentDefaults (13763)] > [0.0112697 sun/text/resources/DateFormatZoneData : getContents (14119)] > [0.0169934 javax/swing/text/html/parser/Parser : parseAttributeSpecificationList (805)] > [0.0186694 sun/io/ByteToCharCp933 : <clinit> (32638)] > [0.0345885 sun/io/CharToByteCp949C : <clinit> (65393)] > [0.0355946 sun/io/CharToByteCp970 : <clinit> (65393)] > [0.0359730 sun/io/CharToByteCp933 : <clinit> (65389)] > [0.0360463 sun/io/CharToByteCp949 : <clinit> (65393)] > [0.0402428 java/util/HashMap : eq (19)] > [0.0402560 sun/java2d/pipe/PixelToShapeConverter : fillRect (24)] > [0.0801238 java/awt/image/Raster : getSample (25)] > [0.0820165 sun/reflect/MethodAccessorGenerator : emitInvoke (1012)] > [0.1201715 java/awt/font/TextLayout : equals (29)] > [0.1203534 java/nio/CharBuffer : hashCode (38)] > [0.1911322 sun/security/provider/certpath/Builder : targetDistance (741)] > > The times were generated on a 450MHz US2. > > Note that hitting just *one* targetDistance method during > a stack crawl will blow CMS's entire time budget for Nortel. A few > of the cheaper methods can still break the bank if they happen at > the same time. > > Clearly, pause time oriented collectors are going to need > a higher performance solution. > > James M Will look into this for post-tiger. ###@###.### 2004-08-02 I am in the process of working on this now (and before last week). I've written a 1 pager on this that I have attached. The one pager outlines some of the alternatives that we've discussed for solving this problem. Because this is an escalation, the idea of saving the oopmap information generated for large methods will be investigated first (it'll be easier to backport). Fixed 4734748: Pathologically slow oopmap generation The interpreter is modified to write tags to the interpreter stack in order to find oopmaps. See one pager at: http://j2se.east/jruntime/task_templates/Mustang/tagged_stack.1pager.html Latest Webrev: http://jruntime.east/~coleenp/webrev/tag4 The tagged stack interpreter is on the switch TaggedStackInterpreter. The current plan is to have the default value of this switch be false, and will be documented for users with large JSP who notice pathological GC slowdowns. Much of the code for the switch is common, so I'm less worried about bit rot under a non-default switch than users complaining that they're now getting more StackOverflowErrors with mustang. Inlined functions in interpreter_<cpu>.hpp were added to support tags. A lot of changes were of the form: wordsize => Interpreter::StackElementSize() in places where the assumption was that a java stack element was a wordSize. Steve Goldman suggested the revised naming for the interpreter expression stack and local items. The other major changes were to write/check tags in interp_masm_<cpu>.cpp, skip them in interpreterRT_<cpu>.cpp and sharedRuntime_<cpu>.cpp, and to gc them in frame.cpp. Performance: The Alacrity numbers are mixed but not bad. See: http://alacrity.sfbay/query/ones.jsp?VersionSelector_baseline=1.6.0%7Cclean-rt-base_124500&VersionSelector=1.6.0%7Ctaggedstack_125110&esum=Summary&pval=0.01&type_baseline=o&type_build=o The microbenchmark submitted with bug 5049261 shows speedup with nested try finally blocks. See attached. sample from before: [GC 2153K->105K(3520K), 0.0896927 secs] sample from tagged stack interpreter: [GC 2153K->105K(3520K), 0.0002645 secs] Fix checked in 8/8/2005 for mustang. I don't think that this will be easy or even possible to backport to earlier releases because the argument marshalling between the compilers and interpreted code was rewritten in mustang, and this relies on the new interface.
03-08-2004

WORK AROUND A workaround might be to record oopmaps for all the bci's for a given method at once - this will introduce longer pauses at startup, and unknown additional memory consumption, but these could be acceptable under the duress of escalation.
03-08-2004

SUGGESTED FIX I suspect that the rewriting associated with jsr-rets is causing slow convergence, because after each rewrite the flow graph is reinitialized. It may be possible to fix the convergence by avoiding some of the reinitialization. A tagged interpreter stack would eliminate the need to oopmaps.
03-08-2004