United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4734748 pathologically slow oopmap generation
JDK-4734748 : pathologically slow oopmap generation

Details
Type:
Bug
Submit Date:
2002-08-21
Status:
Closed
Updated Date:
2012-10-08
Project Name:
JDK
Resolved Date:
2005-08-17
Component:
hotspot
OS:
solaris_9,solaris_8,windows_nt,solaris_2.5.1,generic
Sub-Component:
runtime
CPU:
x86,sparc,generic
Priority:
P4
Resolution:
Fixed
Affected Versions:
2.0,1.3.1_09,1.4.2,1.4.2_05,5.0
Fixed Versions:

Related Reports
Backport:
Backport:
Duplicate:
Duplicate:
Relates:

Sub Tasks

Description
An iPlanet app has encountered huge minor pause times (40+sec)
due to oopmap generation by the interpreter.  More generally,
some oopmap generations are observed to take >200ms, which is
more than the entire GC budget for the Nortel telco application.

Oopmaps are needed for interpreted frames.  We have interpreted
frames when the compiler bails out.  Some of the bailing methods
are code generated from JSPs, that have heavy nesting of try..finally.

Both c1 and c2 are bailing out in this particular case.  Fixing 
compiler bailouts in general is a large project.  One might
automagically refactor large methods instead, which would also
be a large project.  

Oopmaps tell, for a particular bci, whether each slot holds an oop.
The current system caches this information.  Normally one hopes
that methods are compiled, and thus no longer need oopmaps.

However, the same methods that cause compiler bailouts are also
causing unexpectedly large oopmap generation times, occasionally
on the order of 100s of msec.  The offending code does fixed-point data 
flow iteration. 

Simply growing the cache size doesn't help, because there are a
very large number of possible bci's in these large methods.

Changing the way JSPs generate code may help avoid bailouts, but won't
fix the underlying problem.  

                                    

Comments
SUGGESTED FIX

I suspect that the rewriting associated with jsr-rets
is causing slow convergence, because after each rewrite the flow
graph is reinitialized.  It may be possible to fix the convergence
by avoiding some of the reinitialization.

A tagged interpreter stack would eliminate the need to oopmaps.
                                     
2004-08-03
WORK AROUND

A workaround might be to record oopmaps for all the bci's 
for a given method at once - this will introduce longer pauses at startup,
and unknown additional memory consumption, but these could be acceptable
under the duress of escalation.
                                     
2004-08-03
EVALUATION

Email from James McIlree <###@###.###>
To: <###@###.###>
Sent: Monday, August 05, 2002 1:12 AM
Subject: Potential problem for pause-time collectors


>
> I've spent some time looking at GenerateOopMap
> performance over the weekend. The goal was to track down
> a performance problem for iPlanet, but I've found some
> interesting data.
>
> I did a fairly quick hack to "CompileTheWorld"
> to cause it to OopMapTheWorld instead, and print out
> how long it takes to generate an OopMap for any given
> method.
>
> I ran against rt.jar for 1.4.0_02 and sorted
> the results. Some of the more interesting entries:
>
> [0.0101962 sun/io/ByteToCharCp949 : <clinit> (17663)]
> [0.0104368 javax/swing/plaf/basic/BasicLookAndFeel : initComponentDefaults (13763)]
> [0.0112697 sun/text/resources/DateFormatZoneData : getContents (14119)]
> [0.0169934 javax/swing/text/html/parser/Parser : parseAttributeSpecificationList (805)]
> [0.0186694 sun/io/ByteToCharCp933 : <clinit> (32638)]
> [0.0345885 sun/io/CharToByteCp949C : <clinit> (65393)]
> [0.0355946 sun/io/CharToByteCp970 : <clinit> (65393)]
> [0.0359730 sun/io/CharToByteCp933 : <clinit> (65389)]
> [0.0360463 sun/io/CharToByteCp949 : <clinit> (65393)]
> [0.0402428 java/util/HashMap : eq (19)]
> [0.0402560 sun/java2d/pipe/PixelToShapeConverter : fillRect (24)]
> [0.0801238 java/awt/image/Raster : getSample (25)]
> [0.0820165 sun/reflect/MethodAccessorGenerator : emitInvoke (1012)]
> [0.1201715 java/awt/font/TextLayout : equals (29)]
> [0.1203534 java/nio/CharBuffer : hashCode (38)]
> [0.1911322 sun/security/provider/certpath/Builder : targetDistance (741)]
>
> The times were generated on a 450MHz US2.
>
> Note that hitting just *one* targetDistance method during
> a stack crawl will blow CMS's entire time budget for Nortel. A few
> of the cheaper methods can still break the bank if they happen at
> the same time.
>
> Clearly, pause time oriented collectors are going to need
> a higher performance solution.
>
> James M


Will look into this for post-tiger.



###@###.### 2004-08-02

I am in the process of working on this now (and before last week).  I've written
a 1 pager on this that I have attached.  The one pager outlines some of the 
alternatives that we've discussed for solving this problem.

Because this is an escalation, the idea of saving the oopmap information generated for large methods will be investigated first (it'll be easier to backport).


Fixed 4734748: Pathologically slow oopmap generation

The interpreter is modified to write tags to the interpreter stack in order
to find oopmaps.  See one pager at:

http://j2se.east/jruntime/task_templates/Mustang/tagged_stack.1pager.html

Latest Webrev:

http://jruntime.east/~coleenp/webrev/tag4

The tagged stack interpreter is on the switch TaggedStackInterpreter.  The
current plan is to have the default value of this switch be false, and will
be documented for users with large JSP who notice pathological GC slowdowns.
Much of the code for the switch is common, so I'm less worried about bit rot
under a non-default switch than users complaining that they're now getting
more StackOverflowErrors with mustang.

Inlined functions in interpreter_<cpu>.hpp were added to support tags.
A lot of changes were of the form:
  wordsize => Interpreter::StackElementSize()
in places where the assumption was that a java stack element was a wordSize.
Steve Goldman suggested the revised naming for the interpreter expression
stack and local items.

The other major changes were to write/check tags in interp_masm_<cpu>.cpp,
skip them in interpreterRT_<cpu>.cpp and sharedRuntime_<cpu>.cpp,
and to gc them in frame.cpp.

Performance: The Alacrity numbers are mixed but not bad.  See:

http://alacrity.sfbay/query/ones.jsp?VersionSelector_baseline=1.6.0%7Cclean-rt-base_124500&VersionSelector=1.6.0%7Ctaggedstack_125110&esum=Summary&pval=0.01&type_baseline=o&type_build=o

The microbenchmark submitted with bug 5049261 shows speedup with nested try
finally blocks.  See attached.

sample from before:
[GC 2153K->105K(3520K), 0.0896927 secs]
sample from tagged stack interpreter:
[GC 2153K->105K(3520K), 0.0002645 secs]

Fix checked in 8/8/2005 for mustang.  I don't think that this will be easy or even possible to backport to earlier releases because the argument marshalling between the compilers and interpreted code was rewritten in mustang, and this relies on the new interface.
                                     
2004-08-03
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
dragon


                                     
2004-08-03



Hardware and Software, Engineered to Work Together