JDK-8233111 : Epoch shift synchronization point for Compiler threads
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 11,14
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-10-29
  • Updated: 2022-05-14
  • Resolved: 2019-12-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 14 JDK 15
14 b29Fixed 15Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
JFR artifact tagging is a function of an epoch and transitioning between epochs happens during a safepoint.

This works well for threads that respect safepoints, but for threads that run _thread_in_native writing events that have a relation to an artifact, for example a Method, they race against the epoch shift. A consequence of a race condition is that an artifact being tagged in the wrong epoch does not become visible to the event referencing it.This situation mainly applies to Compiler Threads (i.e. JavaThreads running _thread_in_native).

With JFR Event Streaming, the events must continuously be fully parsable as a unit, and an artifact tag race will therefore cause problems for the parser.
Comments
The problem is that for threads that do not respect safepoints, an artifact can be tagged in the current epoch, but the event referencing the tagged artifact may not get committed until after the events have been processed by the JfrRecorderThread for that specific epoch. Instead, the event will be serialized as part of the next epoch, but it refers to an artifact tagged in the previous epoch. There is still a potential race here, as you point out, and the patch does not fully resolve this but instead makes it very unlikely. The problem was scoped to only Compiler Threads tagging Methods as part of their event fields, and these events do not have stack traces. IIRC, the original had the call to _checkpoint_manager.begin_epoch_shift(); inside pre_safepoint_write(), which, now looking back at it, seems to be a better way to accomplish this. But I remember there was some problem with that approach at the time. Of course, it is also not foolproof but only makes it even more unlikely. One could have Compiler threads transition to _thread_in_VM before writing JFR events, but that would introduce some overhead to the Compiler threads.
08-07-2021

Hi, I may not fully understand the implementation of this patch, but I have a question about the current implementation. If I understand correctly, in order to ensure that the artifact's epoch is correct, we need to ensure that there is no epoch shift between the event's field (artifact) writing and the committing. The current implementation does not seem to be able to do this, because the epoch shift may still occur after JfrTraceIdEpoch::is_synchronizing() returns false.
08-07-2021

URL: https://hg.openjdk.java.net/jdk/jdk14/rev/3b2174ed0eb1 User: mgronlun Date: 2019-12-21 13:01:56 +0000
21-12-2019