Bug ID: JDK-8043607 Add a GC id as a log decoration similar to PrintGCTimeStamps

Type: Enhancement
Component: hotspot
Sub-Component: gc
Affected Version: 9

Priority: P4
Status: Resolved
Resolution: Fixed

Submitted: 2014-05-21
Updated: 2015-01-21
Resolved: 2014-06-19

JDK 8	JDK 9
8u40Fixed	9 b23Fixed

With event based tracing we introduced a GC id field on all GC events. This id is a constantly increasing number associated with each GC. I figured we can use the this GC id as a decoration for the HotSpot GC logging similarly to what we do with PrintGCTimeStamps and PrintGCDateStamps.

A nice side effect of this is that it will make it easy to map an event in the event tracing to an entry in the GC log. Currently this is impossible to do deterministically since the logging and event tracing use different timing.

I propose to add a flag called PrintGCID. Here's an example log entry for GC number 4 running with -XX:+PrintGCID:

#4: [GC pause  (G1 Evacuation Pause) (young) 185M->212M(502M), 0,2373420 secs]

It is a decoration similar to the timestamps. Here's what it looks like if you add -XX:+PrintGCTimeStamps:

3,921: #4: [GC pause  (G1 Evacuation Pause) (young) 185M->212M(502M), 0,2373420 secs]

And here's what it will look like if you add -XX:+PrintGCDateStamps:

2014-05-09T13:03:59.384+0200: 3,921: #4: [GC pause  (G1 Evacuation Pause) (young) 185M->212M(502M), 0,2373420 secs]


Here's a little longer example running ParallelGC:

#0: [GC (Allocation Failure)  129024K->2980K(493056K), 0,1050244 secs]
#1: [GC (Allocation Failure)  132004K->3731K(622080K), 0,0389474 secs]
#2: [GC (Allocation Failure)  261779K->27153K(622080K), 0,2057294 secs]
#3: [GC (Allocation Failure)  285201K->224350K(880128K), 0,2410144 secs]
#4: [Full GC (Ergonomics)  224350K->219198K(1154560K), 2,7104481 secs]
#5: [GC (Allocation Failure)  735294K->638073K(863744K), 0,7707819 secs]

Note that for the concurrent GCs this gives a good way of seeing the concurrent action. Here is a G1 example:

#0: [GC pause  (G1 Evacuation Pause) (young) 24M->2881K(502M), 0,0383157 secs]
#1: [GC pause  (G1 Evacuation Pause) (young) 48M->4745K(502M), 0,0412220 secs]
#2: [GC pause  (G1 Evacuation Pause) (young) 136M->9618K(502M), 0,0426144 secs]
#3: [GC pause  (G1 Evacuation Pause) (young) 248M->143M(502M), 0,1467633 secs]
#4: [GC pause  (G1 Evacuation Pause) (young) 210M->199M(502M), 0,1627465 secs]
#5: [GC pause  (G1 Evacuation Pause) (young) 213M->235M(502M), 0,2376728 secs]
#6: [GC pause  (G1 Evacuation Pause) (young) (initial-mark) 255M->269M(1004M), 0,2623779 secs]
#7: [GC concurrent-root-region-scan-start]
#7: [GC concurrent-root-region-scan-end, 0,0158488 secs]
#7: [GC concurrent-mark-start]
#8: [GC pause  (G1 Evacuation Pause) (young) 315M->330M(2008M), 0,3007405 secs]
#9: [GC pause  (G1 Evacuation Pause) (young) 422M->418M(3212M), 0,4063937 secs]
#10: [GC pause  (G1 Evacuation Pause) (young) 564M->572M(4176M), 0,7500609 secs]
#11: [GC pause  (G1 Evacuation Pause) (young) 760M->756M(4946M), 1,5464884 secs]
#7: [GC concurrent-mark-end, 3,9464599 secs]
#7: [GC remark, 0,0240462 secs]
#7: [GC cleanup 801M->800M(4946M), 0,0108146 secs]
#7: [GC concurrent-cleanup-start]
#7: [GC concurrent-cleanup-end, 0,0000228 secs]
#12: [GC pause  (G1 Evacuation Pause) (young) 977M->978M(5562M), 2,2464423 secs]

Notice how the complete concurrent cycle is decorated with id #7. That makes it possible to follow the concurrent work more easily and grep it out etc.


Here's a similar thing for CMS where the concurrent cycle #22 has a young GC #23 in the middle of it:

#21: [GC (Allocation Failure)  1621551K->1107977K(2395632K), 1,4558792 secs]
#22: [GC (CMS Initial Mark)  1336515K(2452584K), 0,1107938 secs]
#23: [GC (Allocation Failure)  1768969K->1184500K(2452584K), 0,7909961 secs]
#22: [GC (CMS Final Remark)  1304871K(2452584K), 0,2353840 secs]

The CMS looging is somewhat special. It uses the GCTraceTime class a lot for sub part and thus we get timestamps in the middle of other logging. For this change I propose to add the GC id whenever we add a GC time stamp. That means that the GC id will also sometime be included inside other logging. It would be possible to leave this out, but it would require even more parameters to the GCTraceTime constructor. Leaving it in may also make parsing by tools easier since the decorations always look the same. This is mainly a problem when running with PrintGCDetails. Here's an exampel what a CMS cycle with PrintGCDetails might look like: 2.688: #15: [GC (CMS Initial Mark) [1 CMS-initial-mark: 52632K(87424K)] 57681K(126848K), 0.0124742 secs] [Times: user=0.03 sys=0.00 real=0.01 secs] 2.701: #15: [CMS-concurrent-mark-start] 3.052: #15: [CMS-concurrent-mark: 0.352/0.352 secs] [Times: user=1.29 sys=0.01 real=0.35 secs] 3.052: #15: [CMS-concurrent-preclean-start] 3.059: #15: [CMS-concurrent-preclean: 0.006/0.007 secs] [Times: user=0.02 sys=0.00 real=0.01 secs] 3.059: #15: [CMS-concurrent-abortable-preclean-start] 3.560: #16: [GC (Allocation Failure) 3.561: #17: [ParNew: 39424K->4352K(39424K), 0.2217483 secs] 92056K->68572K(126848K), 0.2226810 secs] [Times: user=0.69 sys=0.00 real=0.22 secs] 4.275: #15: [CMS-concurrent-abortable-preclean: 0.992/1.216 secs] [Times: user=4.54 sys=0.02 real=1.21 secs] 4.276: #15: [GC (CMS Final Remark) [YG occupancy: 22985 K (39424 K)]4.277: #15: [checkpointRootsFinalWork4.277: #15: [Rescan (parallel) , 0.0329553 secs]4.310: #15: [refProcessingWork4.310: #15: [weak refs processing, 0.0000447 secs]4.310: #15: [class unloading, 0.0194019 secs]4.329: #15: [scrub symbol table, 0.0011832 secs]4.330: #15: [scrub string table, 0.0004165 secs], 0.0211528 secs], 0.0657054 secs][1 CMS-remark: 64220K(87424K)] 87206K(126848K), 0.0657690 secs] [Times: user=0.16 sys=0.00 real=0.06 secs] 4.353: #15: [CMS-concurrent-sweep-start] Also, ones the unified logging JEP is implemented we probably want to change this logging completely.

21-05-2014