In WS6.0 with 1.4.2_09 an HashMap is loaded with 200 items at startup (in syncronized code) and then accessed by multiple threads with high ratio (formatCell is called 600 times/sec) in unsyncronized code. See source code in http://balabiot.italy/calls/37570730_borsait/0304/src/FieldFormatter.java After a few week it happens that some threads use all the cpu running get(): http://balabiot.italy/calls/37570730_borsait/1004/prstat.out: PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID 4384 www 2911M 2604M cpu0 20 10 0:16:51 27% webservd/112 4384 www 2911M 2604M run 20 10 0:16:51 24% webservd/157 4384 www 2911M 2604M run 21 10 0:44:56 23% webservd/111 4384 www 2911M 2604M sleep 29 10 0:00:02 1.3% webservd/177 We have asked to collect when it happens prstat, multiple pstack, pmap and a gcore and then we extracted via SA scripts java dump, heap stats, class histogram, java flags. No truss available. /net/balabiot.italy/calls/37570730_borsait/0304 /net/balabiot.italy/calls/37570730_borsait/1004 http://balabiot.italy/calls/37570730_borsait/1004/core.4384.mwsbit03.jstack Deadlock Detection: No deadlocks found. .... Thread t@111: (state = IN_JAVA, current Java SP = null ) - java.util.HashMap.get(java.lang.Object) @bci=72, line=325, pc=0xf8d32788, methodOop=0xf0063b10 (Compiled frame; information may be imprecise) - com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String, java.lang.String, java.util.Locale) @bci=125, line=248, pc=0xf8903ac8, methodOop=0xf0b0dcc0 (Compiled frame) ... Thread t@112: (state = IN_JAVA, current Java SP = null ) - java.util.HashMap.get(java.lang.Object) @bci=6, line=317, pc=0xf8d3264c, methodOop=0xf0063b10 (Compiled frame; information may be imprecise) - com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String, java.lang.String, java.util.Locale) @bci=125, line=248, pc=0xf8903ac8, methodOop=0xf0b0dcc0 (Compiled frame) ... Thread t@157: (state = IN_JAVA, current Java SP = null ) - java.util.HashMap.get(java.lang.Object) @bci=6, line=317, pc=0xf8d3264c, methodOop=0xf0063b10 (Compiled frame; information may be imprecise) - com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String, java.lang.String, java.util.Locale) @bci=125, line=248, pc=0xf8903ac8, methodOop=0xf0b0dcc0 (Compiled frame) The program counter are: Thread t@111: pc=0xf8d32788 Thread t@112: pc=0xf8d3264c Thread t@157: pc=0xf8d3264c We see a couple of threads (112 e 157) running the same optimized native code, and exactly using the same amount of cpu time From the sequence of 3 pstack, and looking at the first arguments only we see all the threads are running the same code with different arguments (e.g:f8d32790) pstack1.out thread# 111: f8d32648 (4055b100, 4054a990, 4055b100, 72127600, 0, 4055b0d0) pstack2.out thread# 111: f8d32790 (4055b100, 4054a990, 4055b100, 72127600, 0, 4055b0d0) pstack3.out thread# 111: f8d3264c (4055b100, 4054a990, 4055b100, 72127600, 0, 4055b0d0) pstack1.out thread# 112: f8d32790 (4e8401b0, 4e6bb890, 4e8401b0, 7212bd40, 0, 4e840180) pstack2.out thread# 112: f8d3264c (4e8401b0, 4e6bb890, 4e8401b0, 7212bd40, 0, 4e840180) pstack3.out thread# 112: f8d3264c (4e8401b0, 4e6bb890, 4e8401b0, 7212bd40, 0, 4e840180) pstack1.out thread# 157: f8d32790 (4e19a448, 4e11aef0, 4e19a448, 7162c638, 0, 4e19a418) pstack2.out thread# 157: f8d3279c (4e19a448, 4e11aef0, 4e19a448, 7162c638, 0, 4e19a418) pstack3.out thread# 157: f8d32658 (4e19a448, 4e11aef0, 4e19a448, 7162c638, 0, 4e19a418) In the previous set of data (http://balabiot.italy/calls/37570730_borsait/0304) 2 threads where using all the cpu running HashMap.get() and eq() which is in get implementation Total: 69 processes, 474 lwps, load averages: 2.47, 2.77, 3.00 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID 8090 www 2501M 2385M run 0 10 64:43:10 48% webservd/139 8090 www 2501M 2385M cpu1 0 10 0:55:06 48% webservd/109 4037 bitsys02 2848K 1704K sleep 59 0 0:00:03 0.2% top/1 ----------------- t@139 ----------------- 0xf8a1e8d0 0xf8a1e8d0 * java.util.HashMap.get(java.lang.Object) bci:72 line:325 methodOop:0xf0063b10 (Compiled frame; information may be imprecise) 0xf88652c8 0xf88652c8 * com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String, java.lang.String, java.util.Locale) bci:125 line:248 methodOop:0xf0b10c80 (Compiled frame) ... ----------------- t@109 ----------------- 0xf8a1e8c8 0xf8a1e8c8 * java.util.HashMap.eq(java.lang.Object, java.lang.Object) bci:7 line:274 methodOop:0xf0063dd8 (Compiled frame; information may be imprecise) 0xf88652c8 0xf88652c8 * com.inferentiadnm.display.dao.FieldFormatter.formatCell(java.lang.String, java.lang.String, java.util.Locale) bci:125 line:248 methodOop:0xf0b10c80 (Compiled frame) ... http://balabiot.italy/calls/37570730_borsait/0304/trace/pstack1_dem.out ----------------- lwp# 109 / thread# 109 -------------------- f8a1e884 ???????? (f0009018, 93ecd750, fffbe4e7, 71483fe8, f0592648, 331d1c68) ----------------- lwp# 139 / thread# 139 -------------------- f8a1e8d4 ???????? (f0009018, 73f42118, fffbe4e7, 7180c7f8, f059b9f8, 36b199a0) http://balabiot.italy/calls/37570730_borsait/0304/trace/pstack2_dem.out ----------------- lwp# 109 / thread# 109 -------------------- f8a1e884 ???????? (f0009018, 93ecd750, fffbe4e7, 71483fe8, f0592648, 331d1c68) ----------------- lwp# 139 / thread# 139 -------------------- f8a1e8c8 ???????? (f0009018, 73f42118, fffbe4e7, 7180c7f8, f059b9f8, 36b199a0) http://balabiot.italy/calls/37570730_borsait/0304/trace/pstack3_dem.out ----------------- lwp# 109 / thread# 109 -------------------- f8a1e884 ???????? (f0009018, 93ecd750, fffbe4e7, 71483fe8, f0592648, 331d1c68) ----------------- lwp# 139 / thread# 139 -------------------- f8a1e8d4 ???????? (f0009018, 73f42118, fffbe4e7, 7180c7f8, f059b9f8, 36b199a0)
|