JDK-6332491 : Performance regression in b54 on x86
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 6
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_9
  • CPU: sparc
  • Submitted: 2005-10-04
  • Updated: 2013-11-01
  • Resolved: 2005-12-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b63Fixed
Related Reports
Relates :  
Description
There is performance regression in b54 on x86 (32-bits VM on multi-cpus Opteron)
after Coleen's putback 0050919182406.coleenp.pt.rme.ws. 
The test is provided by Dough Lea. I attached it.

You have to use the flag -XX:ProfileMaturityPercentage=0 to workaround
the bug 6329104.

prt-solamd64-q1-4% psrinfo -v
Status of virtual processor 0 as of: 10/04/2005 10:33:51
  on-line since 09/14/2005 20:52:20.
  The i386 processor operates at 2589 MHz,
        and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 10/04/2005 10:33:51
  on-line since 09/14/2005 20:52:23.
  The i386 processor operates at 2589 MHz,
        and has an i387 compatible floating point processor.

prt-solamd64-q1-4% uname -a
SunOS prt-solamd64-q1-4 5.10 Generic i86pc i386 i86pc

prt-solamd64-q1-4% /tmp/kvn/jdk1.6.0/bin/java -version
java version "1.6.0-ea"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-ea-b12)
Java HotSpot(TM) Server VM (build 20050916202013.coleenp.rt_merge, mixed mode)
prt-solamd64-q1-4% time /tmp/kvn/jdk1.6.0/bin/java -XX:ProfileMaturityPercentage=0 CollectionLoops RWCollection
Class: RWCollection threads: 2 size: 10000 ins: 100 rem: 1 ops: 100000
Threads: 1      :    18847 ns per op     1.884758981s run time
Threads: 1      :    18745 ns per op     1.874591785s run time
Threads: 1      :    19904 ns per op     1.990489363s run time
Threads: 1      :    18669 ns per op     1.866918584s run time
Threads: 2      :    19770 ns per op     3.954020314s run time
Threads: 2      :    19783 ns per op     3.956649671s run time
16.0u 1.0s 0:16 104% 0+0k 0+0io 0pf+0w

prt-solamd64-q1-4% /tmp/kvn/jdk1.6.0/bin/java -version
java version "1.6.0-ea"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-ea-b12)
Java HotSpot(TM) Server VM (build 20050919182406.coleenp.pt.rme.ws, mixed mode)
prt-solamd64-q1-4% time /tmp/kvn/jdk1.6.0/bin/java -XX:ProfileMaturityPercentage=0 CollectionLoops RWCollection
Class: RWCollection threads: 2 size: 10000 ins: 100 rem: 1 ops: 100000
Threads: 1      :    31216 ns per op     3.121675978s run time
Threads: 1      :    32242 ns per op     3.224239121s run time
Threads: 1      :    32717 ns per op     3.271775494s run time
Threads: 1      :    31701 ns per op     3.170120089s run time
Threads: 2      :    35219 ns per op     7.043972725s run time
Threads: 2      :    34752 ns per op     6.950433048s run time
27.0u 1.0s 0:27 101% 0+0k 0+0io 0pf+0w

Comments
SUGGESTED FIX The solution was suggested by Tom: throw uncommon trap if the allocation bytecode actually hasn't been executed in interpreter yet. This solution is good since it almost does not increase number of uncommon traps, does not regress performance and applied to all (including preinitialized) classes. I also added target_bci for branch's log information to distinguish forward and backward branches. Webrev: http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2005/20051123111515.kvn.6332491/workspace/webrevs/webrev-2005.11.23/index.html
24-11-2005

EVALUATION ------------------------------------------------------------------ Coleen is right: preinitialization of IllegalMonitorStateException class causes this test regression. The test calls ReentrantReadWriteLock.ReadLock.unlock() in RWCollection.contains(). The unlock() method calls ReentrantReadWriteLock$Sync.tryReleaseShared() (through inlining) which could throw IllegalMonitorStateException: protected final boolean tryReleaseShared(int unused) { HoldCounter rh = cachedHoldCounter; Thread current = Thread.currentThread(); if (rh == null || rh.tid != current.getId()) rh = readHolds.get(); if (rh.tryDecrement() <= 0) throw new IllegalMonitorStateException(); for (;;) { int c = getState(); int nextc = c - SHARED_UNIT; if (compareAndSetState(c, nextc)) return nextc == 0; } } If IllegalMonitorStateException is not initialized we generates uncommon trap 'uninitialized' and the final code is fast: <bc code='157' bci='40'/> <branch taken='10000' not_taken='0' cnt='6.66667' prob='always'/> <bc code='187' bci='43'/> <klass id='574' name='java/lang/IllegalMonitorStateException' flags='1'/> <uncommon_trap bci='43' reason='uninitialized' action='reinterpret' klass='574'/> If IllegalMonitorStateException is preinitialized we generate full allocation and throw code: <bc code='157' bci='40'/> <branch taken='10000' not_taken='0' cnt='6.66667' prob='always'/> <bc code='187' bci='43'/> <klass id='503' name='java/lang/OutOfMemoryError' flags='1'/> <dependency type='leaf_type' ctxk='503'/> <bc code='183' bci='47'/> <klass id='574' name='java/lang/IllegalMonitorStateException' flags='1'/> <method id='575' holder='574' name='&lt;init&gt;' return='466' flags='1' bytes='5' iicount='1'/> <call method='575' count='0' prof_factor='0.670167' inline='1'/> <inline_fail reason='exception method'/> What is interesting here is that this part of code is never executed according to the profiling data: not_taken='0' and call method='575' count='0' In other cases we generates uncommon trap 'unreached': <bc code='165' bci='14'/> <branch taken='7888' not_taken='0' cnt='5.25867' prob='always'/> <uncommon_trap bci='14' reason='unreached' action='reinterpret' comment='taken always'/> The difference: code='157' is 'ifgt' and code='165' is 'if_acmpeq'. It is known that C2 generates 'unreached' uncommon trap only for pointers comparisions ifnull, ifnonnull, if_acmpeq and if_acmpne (see do_if() in parse2.cpp): if (!stopped() && seems_never_taken(prob) && c->Opcode() == Op_CmpP) { repush_if_args(this, a, b); uncommon_trap(Deoptimization::Reason_unreached, Deoptimization::Action_reinterpret, NULL, "taken never"); } The reason is, I believe, to avoid a lot of uncommon traps and recompilations due to an execution of the code which was unreachable in interpreter. But I don't have data on it.
17-11-2005

EVALUATION This is only a partial evaluation. I see a performance decreates with x86 on b12 only because the class IllegalMonitorStateException is preinitialized in thread.cpp. If I comment out this line in create_vm(), the decrease goes away. initialize_class(vmSymbolHandles::java_lang_IllegalMonitorStateException(), CHECK_0); This is with an Oct 28th rt_baseline copied into a b12 jdk. I ran with LogCompilation and got reams of stuff, but I think the initialization of this class must change some compilation decisions. See hotspot.ill.log (preinitialized) vs. hotspot.ill2.log (not initialized) attached. The With b54, there is no decrease in performance for my change. I want to reassign this to the compiler group because I'm really stuck. Maybe the hotspot.log files mean something. Feel free to send it back.
31-10-2005