JDK-6690796 : excessive calls to yield hurt jvm scalability
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 6u5p
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: solaris_10
  • CPU: sparc
  • Submitted: 2008-04-18
  • Updated: 2010-08-26
  • Resolved: 2010-08-26
Related Reports
Duplicate :  
Description
while looking at scaling on a batoka system [ a 4 socket victoria falls ] with 256 cpu's, it
was noticed that using the XMLtest benchmark that the machine did'nt scale particulary well
with high thread count inside the jvm

Thr     Tx      tx/thr
8	213	26.63
16	399	24.94
32	797	24.91
64	1534	23.97
128	2295	17.93
192	2318	12.07
256	2327	9.09

Note: this is number of worker threads, in the 256 thread case, the jvm actually has 439 threads.

looking at the 128 thread case, the vmstat data collected shows and interesting picture

vmstat 2

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s1 s2 s3 s4   in   sy   cs us sy id
 1 0 0 234985728 127954760 0 60 0 0 0 0  0  0  0  0  0 1468 2891  608 50  0 50
 1 0 0 234973128 127583760 0 64 0 0 0 0  0  0  0  0  0 1569 3463  756 50  0 50
 1 0 0 234966536 127218768 0 62 0 0 0 0  0  0  0  0  0 1636 3273  898 49  0 50
 1 0 0 234964864 126857880 0 58 0 0 0 0  0  0  0  0  3 1464 3192  544 50  0 50
 1 0 0 234962352 126495056 0 68 0 0 0 0  0  0  0  0  0 1783 3671 1191 49  0 51
 1 0 0 234962320 126134448 0 73 0 0 0 0  0  0  0  0  0 1754 3548 1141 49  0 51
 1 0 0 234962296 125487384 0 309 0 0 0 0 0  0  0  0  0 2918 7845 3473 48  1 51
 0 0 0 234962264 124434512 1 477 26 0 0 0 0 0  0  0  0 5554 23202 8747 44 1 56
 1 0 0 234962216 124107552 0 5 0 0 0  0  0  0  0  0  0 1959 15389 1787 49 0 50
 1 0 0 234962208 124107544 0 3 0 0 0  0  0  0  0  0  1 2224 17998 2456 49 0 51
 1 0 0 234956024 123931288 7 86 0 0 0 0  0  0  0  0  0 5040 201428 9515 45 1 55
 0 0 0 234955800 123931008 1 44 0 0 0 0  0  0  0  0  0 8439 112558 15681 37 1 62

for the start of the benchmark all is well, with 50% of the machine consumed, then
we start generating large number of syscalls. Digging deeper...

              libc.so.1`lwp_yield+0x4
              libjvm.so`__1cJStealTaskFdo_it6MpnNGCTaskManager_I_v_+0x374
              libjvm.so`__1cMGCTaskThreadDrun6M_v_+0x1e8
              libjvm.so`java_start+0x17c
              libc.so.1`_lwp_start
     3462          7858959

captured from a dtrace profile, sampling at 10 second intervals, shows our jvm
calling yield some 7 million times some 700k/second ! this was an extreme example with the rate normally around 200k/second

if we measure the 64 thread case, virtually no calls of this stack are observed

this was using:-
java version "1.6.0_05-p"
Java(TM) SE Runtime Environment (build 1.6.0_05-p-b01)
OpenJDK Server VM (build 12.0-b02, mixed mode)

running the benchmark with the following verbose GC options -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

generated this data...

Buildfile: /andyb/XMLTest.v1/Java/src/xmltest.xml

check_env:

run-xmltest:
     [java] XMLTest started with iniFile = /andyb/XMLTest.v1/ini/unix/xml-JAXB.ini
     [java] trans = SAXUsage, val = 0.0
     [java] trans = DOMUsage, val = 0.0
     [java] trans = JAXBUsage, val = 100.0
     [java] trans = PullParserUsage, val = 0.0
     [java] trans = SAXFIUsage, val = 0.0
     [java] Build Time per Transaction(Seconds) = 0.070179
     [java] Walk Time per Transaction(Seconds) = 2.1188919E-6
     [java] Modify Time per Transaction(Seconds) = 2.6189066E-6
     [java] Serialize Time per Transaction(Seconds) = 0.0
     [java] Response Time per Transaction(Seconds) = 0.0701878

     [java] Total Transactions = 1093968
     [java] Steady state time = 600
     [java] Transactions per second = 1823.28

     [java] Detailed transaction counts:
     [java]     Thread 0 = 8456
     [java]     Thread 1 = 8671
     [java]     Thread 2 = 8411

elided uninteresting data

     [java]     Thread 126 = 8439
     [java]     Thread 127 = 8610
     [java] Output can be found in ..//..//results/XMLMarkReportSummaryjava_20080331_030217



BUILD SUCCESSFUL
Total time: 15 minutes 47 seconds
Heap
 PSYoungGen      total 308224K, used 31705K [0xe6400000, 0xfbc00000, 0xfbc00000)
  eden space 264192K, 12% used [0xe6400000,0xe82f66f8,0xf6600000)
  from space 44032K, 0% used [0xf9100000,0xf9100000,0xfbc00000)
  to   space 44032K, 0% used [0xf6600000,0xf6600000,0xf9100000)
 PSOldGen        total 700416K, used 0K [0xbb800000, 0xe6400000, 0xe6400000)
  object space 700416K, 0% used [0xbb800000,0xbb800000,0xe6400000)
 PSPermGen       total 16384K, used 4821K [0xb7800000, 0xb8800000, 0xbb800000)
  object space 16384K, 29% used [0xb7800000,0xb7cb5788,0xb8800000)

Comments
EVALUATION The code for 6690928 to reduce GC's use of yields has been pushed into hsx15. Closing as a duplicate of 6690928. If the real problem is that the number of GC threads is too high, see 6593758.
26-08-2010