The core file and error logs are given by customer and can be found at : /net/cores.central/cores/dir33/831492/CBOE_NanoTimeProblem2.tar
Here is a problem description :
Sun Case id 65579208 triggered the following result in the GC clock values -
Application time: 0.0000000 seconds
{Heap before GC invocations=3265 (full 357):
par new generation total 305088K, used 298999K [0x2fa00000,
0x43f00000, 0x43f00000)
eden space 277376K, 99% used [0x2fa00000, 0x408df3c0, 0x408e0000)
from space 27712K, 78% used [0x423f0000, 0x4390eac0, 0x43f00000)
to space 27712K, 0% used [0x408e0000, 0x408e0000, 0x423f0000)
concurrent mark-sweep generation total 2739200K, used 1378661K
[0x43f00000, 0xeb200000, 0xeb200000)
concurrent-mark-sweep perm gen total 131072K, used 38148K
[0xeb200000, 0xf3200000, 0xfb200000)
4381582.442: [GC 4381582.442: [ParNew (promotion failed)
Desired survivor size 14188544 bytes, new threshold 4 (max 4)
- age 1: 7509784 bytes, 7509784 total
- age 2: 5357184 bytes, 12866968 total
: 298999K->298999K(305088K), 0.0000000 secs]4381582.442: [CMS:
1384483K->1108874K(2739200K), 0.0000000 secs]
1677661K->1108874K(3044288K), 0.0000000 secs]
Heap after GC invocations=3266 (full 357):
par new generation total 305088K, used 0K [0x2fa00000,
0x43f00000, 0x43f00000)
eden space 277376K, 0% used [0x2fa00000, 0x2fa00000, 0x408e0000)
from space 27712K, 0% used [0x423f0000, 0x423f0000, 0x43f00000)
to space 27712K, 0% used [0x408e0000, 0x408e0000, 0x423f0000)
concurrent mark-sweep generation total 2739200K, used 1108874K
[0x43f00000, 0xeb200000, 0xeb200000)
concurrent-mark-sweep perm gen total 131072K, used 38137K
[0xeb200000, 0xf3200000, 0xfb200000)
}
Total time for which application threads were stopped: 0.0000000
seconds
Also we are seeing these promotion failures at the following times
9:52
10:37
11:21
11:47
12:14
The Old Gen is not full. Believe there is code in the VM that handles nanosecond clock requests that can be lower than the previous request and suspect this code could cause other problems in the VM as well.
Here is the top part of that jmap.
Object Histogram:
Size Count Class description
-------------------------------------------------------
-58345980960 455548 java.lang.Object[]
843420192 12868 org.apache.activemq.command.DataStructure[]
-8264799760 55510 java.util.Hashtable$Entry[]
216903376 275293 byte[]
36410080 239540 com.cboe.infrastructureServices.persistenceService.DBAdapter
30714744 1279781 java.lang.String
24927368 283704 java.util.HashMap$Entry[]
22629552 128577 com.cboe.businessServices.marketMakerQuoteService.QuoteImpl
18012576 750524 java.util.HashMap$Entry
16388288 292648 com.cboe.businessServices.marketMakerQuoteService.QuoteSideImpl
11930688 745668 java.lang.Integer
11297600 282440 java.util.HashMap
9521920 297560 java.util.TreeMap$Entry
8667760 108347 com.cboe.ORBInfra.ORB.DelegateImpl
8239344 343306 java.util.ArrayList
6199664 387479 com.cboe.ORBInfra.CDR.IntHolder
6198840 258285 com.cboe.idl.cmiUtil.PriceStruct
5984320 37402 com.cboe.businessServices.orderBookService.OrderBookPriceItem
5426064 339129 com.cboe.domain.util.PriceSqlType
From the application stdout file and grepping out 'Application time:' proved interesting:
>>
>> Application time: 3.8622900 seconds
>> Application time: 3.6311665 seconds
>> Application time: 2.9104706 seconds
>> Application time: 2.2640438 seconds
>> Application time: 2.8882385 seconds
>> Application time: 3.3494021 seconds
>> Application time: 2.3039980 seconds
>> Application time: 4352582.5295995 seconds
>> Application time: 0.0000000 seconds
>> Application time: 0.0000000 seconds
>> Application time: 0.0000000 seconds
>> Application time: 0.0000000 seconds
>> Application time: 0.0000000 seconds
>> Application time: 0.0000000 seconds
>> Application time: 0.0000000 seconds
>> Application time: 0.0000000 seconds
>> Application time: 0.0000000 seconds
>> Application time: 0.0000000 seconds
>>
>> There was one very large value followed by 0.000s.
>> So, it certainly looks as though the values were
>> calculated by the cached nanosecond time value: when
>> the value doubled we got a large time, but since the
>> value never changed again, all subsequent times showed
>> up as 0.00000s.
>
>That's exactly why the gc times are zero as well.
>4352582.5295995-4352582.5295995==0.0 always and forever.
>
>The gc times won't reflect anything realistic until a
>subsequent call to gethrtime() exceeds the current
>cached maximum ever seen
Need to look into it as soon as possible.