while looking at scaling on a batoka system [ a 4 socket victoria falls ] with 256 cpu's, it
was noticed that using the XMLtest benchmark that the machine did'nt scale particulary well
with high thread count inside the jvm
Thr Tx tx/thr
8 213 26.63
16 399 24.94
32 797 24.91
64 1534 23.97
128 2295 17.93
192 2318 12.07
256 2327 9.09
Note: this is number of worker threads, in the 256 thread case, the jvm actually has 439 threads.
looking at the 128 thread case, the vmstat data collected shows and interesting picture
vmstat 2
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s1 s2 s3 s4 in sy cs us sy id
1 0 0 234985728 127954760 0 60 0 0 0 0 0 0 0 0 0 1468 2891 608 50 0 50
1 0 0 234973128 127583760 0 64 0 0 0 0 0 0 0 0 0 1569 3463 756 50 0 50
1 0 0 234966536 127218768 0 62 0 0 0 0 0 0 0 0 0 1636 3273 898 49 0 50
1 0 0 234964864 126857880 0 58 0 0 0 0 0 0 0 0 3 1464 3192 544 50 0 50
1 0 0 234962352 126495056 0 68 0 0 0 0 0 0 0 0 0 1783 3671 1191 49 0 51
1 0 0 234962320 126134448 0 73 0 0 0 0 0 0 0 0 0 1754 3548 1141 49 0 51
1 0 0 234962296 125487384 0 309 0 0 0 0 0 0 0 0 0 2918 7845 3473 48 1 51
0 0 0 234962264 124434512 1 477 26 0 0 0 0 0 0 0 0 5554 23202 8747 44 1 56
1 0 0 234962216 124107552 0 5 0 0 0 0 0 0 0 0 0 1959 15389 1787 49 0 50
1 0 0 234962208 124107544 0 3 0 0 0 0 0 0 0 0 1 2224 17998 2456 49 0 51
1 0 0 234956024 123931288 7 86 0 0 0 0 0 0 0 0 0 5040 201428 9515 45 1 55
0 0 0 234955800 123931008 1 44 0 0 0 0 0 0 0 0 0 8439 112558 15681 37 1 62
for the start of the benchmark all is well, with 50% of the machine consumed, then
we start generating large number of syscalls. Digging deeper...
libc.so.1`lwp_yield+0x4
libjvm.so`__1cJStealTaskFdo_it6MpnNGCTaskManager_I_v_+0x374
libjvm.so`__1cMGCTaskThreadDrun6M_v_+0x1e8
libjvm.so`java_start+0x17c
libc.so.1`_lwp_start
3462 7858959
captured from a dtrace profile, sampling at 10 second intervals, shows our jvm
calling yield some 7 million times some 700k/second ! this was an extreme example with the rate normally around 200k/second
if we measure the 64 thread case, virtually no calls of this stack are observed
this was using:-
java version "1.6.0_05-p"
Java(TM) SE Runtime Environment (build 1.6.0_05-p-b01)
OpenJDK Server VM (build 12.0-b02, mixed mode)
running the benchmark with the following verbose GC options -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
generated this data...
Buildfile: /andyb/XMLTest.v1/Java/src/xmltest.xml
check_env:
run-xmltest:
[java] XMLTest started with iniFile = /andyb/XMLTest.v1/ini/unix/xml-JAXB.ini
[java] trans = SAXUsage, val = 0.0
[java] trans = DOMUsage, val = 0.0
[java] trans = JAXBUsage, val = 100.0
[java] trans = PullParserUsage, val = 0.0
[java] trans = SAXFIUsage, val = 0.0
[java] Build Time per Transaction(Seconds) = 0.070179
[java] Walk Time per Transaction(Seconds) = 2.1188919E-6
[java] Modify Time per Transaction(Seconds) = 2.6189066E-6
[java] Serialize Time per Transaction(Seconds) = 0.0
[java] Response Time per Transaction(Seconds) = 0.0701878
[java] Total Transactions = 1093968
[java] Steady state time = 600
[java] Transactions per second = 1823.28
[java] Detailed transaction counts:
[java] Thread 0 = 8456
[java] Thread 1 = 8671
[java] Thread 2 = 8411
elided uninteresting data
[java] Thread 126 = 8439
[java] Thread 127 = 8610
[java] Output can be found in ..//..//results/XMLMarkReportSummaryjava_20080331_030217
BUILD SUCCESSFUL
Total time: 15 minutes 47 seconds
Heap
PSYoungGen total 308224K, used 31705K [0xe6400000, 0xfbc00000, 0xfbc00000)
eden space 264192K, 12% used [0xe6400000,0xe82f66f8,0xf6600000)
from space 44032K, 0% used [0xf9100000,0xf9100000,0xfbc00000)
to space 44032K, 0% used [0xf6600000,0xf6600000,0xf9100000)
PSOldGen total 700416K, used 0K [0xbb800000, 0xe6400000, 0xe6400000)
object space 700416K, 0% used [0xbb800000,0xbb800000,0xe6400000)
PSPermGen total 16384K, used 4821K [0xb7800000, 0xb8800000, 0xbb800000)
object space 16384K, 29% used [0xb7800000,0xb7cb5788,0xb8800000)