While benchmarking some library code, we noticed that,
on multi-processor runs with -server, performance of the benchmarked code
was occasionally a factor of 3 slower, as measured by wall clock or "user" CPU time.
True of releases from 1.5.0 through 6.0-b53.
Seen on solaris-sparc, solaris-i586 opteron, and linux-amd64 MP
To see the effect, run the code below using a loop like
while true; do time jver 6.0 java -server ListBash 900 1000; done
and observe the occasional (10% ?) factor of 3 increase in cpu time
-----------------------------------------------------------
import java.util.*;
public class ListBash {
static final Random rnd = new Random();
static int numItr;
static int listSize;
public static void main(String[] args) throws Throwable {
numItr = Integer.parseInt(args[0]);
listSize = Integer.parseInt(args[1]);
oneRun();
oneRun();
oneRun();
oneRun();
}
static void oneRun() {
long startTime = System.nanoTime();
for (int i=0; i<numItr; i++)
elementLoop();
long elapsed = System.nanoTime() - startTime;
System.out.println("Time: " + (elapsed/1000000000.0) + "s");
}
static void elementLoop() {
List<Integer> s1 = new ArrayList();
AddRandoms(s1, listSize);
List<Integer> s2 = new ArrayList();
AddRandoms(s2, listSize);
clone(s1);
}
// Done inefficiently so as to exercise toArray
static List<Integer> clone(List<Integer> s) {
List a = Arrays.asList(s.toArray());
List<Integer> clone = new ArrayList();
clone.addAll(a);
if (!s.equals(clone))
fail("List not equal to copy.");
if (!s.containsAll(clone))
fail("List does not contain copy.");
return clone;
}
static void AddRandoms(List<Integer> s, int n) {
for (int i=0; i<n; i++) {
int r = rnd.nextInt() % n;
Integer e = new Integer(r < 0 ? -r : r);
int preSize = s.size();
if (!s.add(e))
fail ("Add failed.");
int postSize = s.size();
if (postSize-preSize != 1)
fail ("Add didn't increase size by 1.");
}
}
static void fail(String s) {
System.out.println(s);
System.exit(1);
}
}
-----------------------------------------------------------
Here is an excerpt from output:
Time: 1.211352022s
Time: 1.121192678s
Time: 0.995122533s
Time: 1.009511974s
jver 6.0 java -server ListBash 900 1000 4.14s user 0.24s system 94% cpu 4.627 total
Time: 3.664018005s
Time: 3.466817495s
Time: 3.606883598s
Time: 3.362159027s
jver 6.0 java -server ListBash 900 1000 13.47s user 0.10s system 94% cpu 14.409 total
Time: 1.144527022s
Time: 1.075664236s
Time: 0.987576517s
Time: 0.979692477s
jver 6.0 java -server ListBash 900 1000 4.11s user 0.22s system 97% cpu 4.439 total
Here's a backport of the test source, which can be used to demonstrate that
the variable performance effect does *NOT* appear to occur with 1.4.2.
--------------------------------------------------------------------
import java.util.*;
public class ListBash {
static final Random rnd = new Random();
static int numItr;
static int listSize;
public static void main(String[] args) throws Throwable {
numItr = Integer.parseInt(args[0]);
listSize = Integer.parseInt(args[1]);
oneRun();
oneRun();
oneRun();
oneRun();
}
static void oneRun() {
long startTime = System.currentTimeMillis();
for (int i=0; i<numItr; i++)
elementLoop();
long elapsed = System.currentTimeMillis() - startTime;
System.out.println("Time: " + (elapsed/1000.0) + "s");
}
static void elementLoop() {
List s1 = new ArrayList();
AddRandoms(s1, listSize);
clone(s1);
}
// Done inefficiently so as to exercise toArray
static List clone(List s) {
List a = Arrays.asList(s.toArray());
List clone = new ArrayList();
clone.addAll(a);
if (!s.equals(clone))
fail("List not equal to copy.");
if (!s.containsAll(clone))
fail("List does not contain copy.");
return clone;
}
static void AddRandoms(List s, int n) {
for (int i=0; i<n; i++) {
int r = rnd.nextInt() % n;
Integer e = new Integer(r < 0 ? -r : r);
int preSize = s.size();
if (!s.add(e))
fail ("Add failed.");
int postSize = s.size();
if (postSize-preSize != 1)
fail ("Add didn't increase size by 1.");
}
}
static void fail(String s) {
System.out.println(s);
System.exit(1);
}
}
Doug Lea writes:
---------------------------------------------------------
I do think that whatever is causing this has been getting
worse in recent builds. If more data would be more helpful,
unpack the attached zip. About half the time I run things
using b54 on solaris-amd I get about factor of 2 slowdown.
% java CollectionLoops RWCollection
[results omitted]
Possibilities that occur to me but I don't know how to check include
* Have inlining thresholds changed lately?
* Are there new races with or stalls versus class resolution or
loading that cause devirtualization to give up?
---------------------------------------------------------
(Doug's test is available as an attachment)
I can reproduce the performance loss Doug reports.
I tried running benchmarks as follows, (varying the build number):
(PATH=/java/re/jdk/6.0/promoted/all/b12/binaries/solaris-amd64/bin:$PATH; java -version; repeat 30 time java CollectionLoops RWCollection > /dev/null)
and get results dramatically worse in b54 than in b12. Not only is the incidence
of "slow" results greater, but the "fast" results are slower in b54 than in b12,
indicating two different causes of slowdown. This is observed on Solaris 10 and 11.
Here are the results from one particular solaris10-amd64 machine:
Bottom line:
b54: 895.99s user 6.89s system 127% cpu 11:48.20 total
b12: 682.12s user 39.92s system 108% cpu 11:02.88 total
This is a serious mustang performance regression, esp. for user cpu time.
(mb29450@mho) ~/src/toy/6329104/3oct05 $ (PATH=/java/re/jdk/6.0/promoted/all/b12/binaries/solaris-amd64/bin:$PATH; java -version; repeat 30 time java CollectionLoops RWCollection > /dev/null)
java version "1.6.0-ea"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-ea-b12)
Java HotSpot(TM) Server VM (build 1.6.0-ea-b12, mixed mode)
java CollectionLoops RWCollection > /dev/null 18.40s user 1.24s system 110% cpu 17.826 total
java CollectionLoops RWCollection > /dev/null 18.32s user 1.30s system 111% cpu 17.518 total
java CollectionLoops RWCollection > /dev/null 18.22s user 1.26s system 111% cpu 17.519 total
java CollectionLoops RWCollection > /dev/null 18.08s user 1.16s system 111% cpu 17.310 total
java CollectionLoops RWCollection > /dev/null 18.07s user 1.31s system 111% cpu 17.408 total
java CollectionLoops RWCollection > /dev/null 45.78s user 1.59s system 104% cpu 45.130 total
java CollectionLoops RWCollection > /dev/null 18.06s user 1.16s system 109% cpu 17.501 total
java CollectionLoops RWCollection > /dev/null 30.19s user 1.52s system 107% cpu 29.429 total
java CollectionLoops RWCollection > /dev/null 29.99s user 1.55s system 108% cpu 29.171 total
java CollectionLoops RWCollection > /dev/null 18.20s user 1.26s system 111% cpu 17.480 total
java CollectionLoops RWCollection > /dev/null 18.15s user 1.24s system 111% cpu 17.338 total
java CollectionLoops RWCollection > /dev/null 18.08s user 1.28s system 111% cpu 17.429 total
java CollectionLoops RWCollection > /dev/null 18.01s user 1.23s system 110% cpu 17.360 total
java CollectionLoops RWCollection > /dev/null 19.02s user 1.27s system 111% cpu 18.279 total
java CollectionLoops RWCollection > /dev/null 17.96s user 1.16s system 110% cpu 17.348 total
java CollectionLoops RWCollection > /dev/null 17.89s user 1.31s system 109% cpu 17.489 total
java CollectionLoops RWCollection > /dev/null 18.10s user 1.25s system 111% cpu 17.330 total
java CollectionLoops RWCollection > /dev/null 18.16s user 1.32s system 111% cpu 17.448 total
java CollectionLoops RWCollection > /dev/null 30.02s user 1.45s system 108% cpu 29.129 total
java CollectionLoops RWCollection > /dev/null 51.06s user 1.51s system 104% cpu 50.511 total
java CollectionLoops RWCollection > /dev/null 18.06s user 1.28s system 111% cpu 17.318 total
java CollectionLoops RWCollection > /dev/null 30.24s user 1.49s system 109% cpu 28.919 total
java CollectionLoops RWCollection > /dev/null 18.02s user 1.25s system 110% cpu 17.369 total
java CollectionLoops RWCollection > /dev/null 18.18s user 1.28s system 111% cpu 17.399 total
java CollectionLoops RWCollection > /dev/null 34.30s user 1.52s system 106% cpu 33.568 total
java CollectionLoops RWCollection > /dev/null 18.21s user 1.24s system 111% cpu 17.508 total
java CollectionLoops RWCollection > /dev/null 18.20s user 1.28s system 111% cpu 17.469 total
java CollectionLoops RWCollection > /dev/null 18.83s user 1.32s system 111% cpu 18.048 total
java CollectionLoops RWCollection > /dev/null 30.08s user 1.58s system 106% cpu 29.608 total
java CollectionLoops RWCollection > /dev/null 18.17s user 1.26s system 111% cpu 17.449 total
(; PATH=/java/re/jdk/6.0/promoted/all/b12/binaries/solaris-amd64/bin:$PATH ; ) 682.12s user 39.92s system 108% cpu 11:02.88 total
(mb29450@mho) ~/src/toy/6329104/3oct05 $ (PATH=/java/re/jdk/6.0/promoted/all/b54/binaries/solaris-amd64/bin:$PATH; java -version; repeat 30 time java CollectionLoops RWCollection > /dev/null)
java version "1.6.0-ea"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-ea-b54)
Java HotSpot(TM) Server VM (build 1.6.0-ea-b54, mixed mode)
java CollectionLoops RWCollection > /dev/null 23.06s user 0.24s system 125% cpu 18.592 total
java CollectionLoops RWCollection > /dev/null 23.28s user 0.23s system 125% cpu 18.689 total
java CollectionLoops RWCollection > /dev/null 23.35s user 0.22s system 124% cpu 18.879 total
java CollectionLoops RWCollection > /dev/null 29.55s user 0.23s system 129% cpu 23.007 total
java CollectionLoops RWCollection > /dev/null 23.00s user 0.22s system 124% cpu 18.659 total
java CollectionLoops RWCollection > /dev/null 43.36s user 0.23s system 129% cpu 33.588 total
java CollectionLoops RWCollection > /dev/null 22.99s user 0.23s system 124% cpu 18.668 total
java CollectionLoops RWCollection > /dev/null 23.08s user 0.23s system 125% cpu 18.548 total
java CollectionLoops RWCollection > /dev/null 22.21s user 0.23s system 126% cpu 17.700 total
java CollectionLoops RWCollection > /dev/null 29.65s user 0.23s system 130% cpu 22.927 total
java CollectionLoops RWCollection > /dev/null 43.17s user 0.23s system 129% cpu 33.528 total
java CollectionLoops RWCollection > /dev/null 23.16s user 0.22s system 124% cpu 18.719 total
java CollectionLoops RWCollection > /dev/null 23.44s user 0.23s system 123% cpu 19.109 total
java CollectionLoops RWCollection > /dev/null 44.02s user 0.23s system 129% cpu 34.048 total
java CollectionLoops RWCollection > /dev/null 22.36s user 0.23s system 127% cpu 17.670 total
java CollectionLoops RWCollection > /dev/null 29.10s user 0.23s system 127% cpu 23.037 total
java CollectionLoops RWCollection > /dev/null 42.00s user 0.23s system 129% cpu 32.599 total
java CollectionLoops RWCollection > /dev/null 22.78s user 0.22s system 124% cpu 18.468 total
java CollectionLoops RWCollection > /dev/null 41.38s user 0.22s system 129% cpu 32.238 total
java CollectionLoops RWCollection > /dev/null 41.43s user 0.23s system 129% cpu 32.210 total
java CollectionLoops RWCollection > /dev/null 23.15s user 0.23s system 124% cpu 18.729 total
java CollectionLoops RWCollection > /dev/null 42.79s user 0.23s system 130% cpu 33.057 total
java CollectionLoops RWCollection > /dev/null 43.39s user 0.23s system 129% cpu 33.628 total
java CollectionLoops RWCollection > /dev/null 23.07s user 0.22s system 125% cpu 18.579 total
java CollectionLoops RWCollection > /dev/null 29.65s user 0.22s system 129% cpu 23.037 total
java CollectionLoops RWCollection > /dev/null 23.46s user 0.23s system 126% cpu 18.720 total
java CollectionLoops RWCollection > /dev/null 23.53s user 0.23s system 126% cpu 18.827 total
java CollectionLoops RWCollection > /dev/null 29.93s user 0.23s system 129% cpu 23.258 total
java CollectionLoops RWCollection > /dev/null 37.22s user 0.22s system 126% cpu 29.529 total
java CollectionLoops RWCollection > /dev/null 23.36s user 0.23s system 126% cpu 18.670 total
(; PATH=/java/re/jdk/6.0/promoted/all/b54/binaries/solaris-amd64/bin:$PATH ; ) 895.99s user 6.89s system 127% cpu 11:48.20 total