JDK-8295223 : JFR: At most one native periodic event thread at a time
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 11,17
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-10-12
  • Updated: 2025-01-17
  • Resolved: 2022-10-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 20
20 b20Fixed
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Description
Today two threads can call into JVM::emitEvent(long, long, long) at the same time. This can happen if a recording is started/stopped at the same time as a periodic event is emitted. 

It's unlikely they will emit the same event as start/stop triggers event with "beginChunk", "endChunk" or "everyChunk", while periodic events have an interval, for example "1 s". Still, they might access the same data structure, for example Performance Counters on Windows.

This is very likely the solution for the "exitValue = -1073741819" error when using JFR on Windows, so we may want to consider a backport to JDK 11 and JDK 17.
Comments
A note about this bug acting as a solution to strange JFR issues reported on Windows, with "exitValue = -1073741819" and no associated hs_err<pid>.log files. // os/windows/os_perf_windows.cpp static int ensure_current_process_query_index(ProcessQueryP query) { assert(query != nullptr, "invariant"); const int previous_query_idx = query->process_idx; if (previous_query_idx == 0) { return previous_query_idx; } const int current_query_idx = current_process_query_index(previous_query_idx); if (current_query_idx == OS_ERR || current_query_idx >= query->set.size) { return OS_ERR; } if (current_query_idx == previous_query_idx) { return previous_query_idx; } assert(current_query_idx >= 0 && current_query_idx < query->set.size, "out of bounds!"); // If the kernel PDH process list changed (because another java process terminated), // we need to shift down our query using the current_query_idx. Before we start using the new query, we deallocate the previous, now stale query (using close_query()). while (current_query_idx < query->set.size - 1) { const int new_size = --query->set.size; close_query(&query->set.queries[new_size]); // <-- close_query() deallocates allocated counters and query // At this point the queries[new_size] is de-allocated ( see below) } assert(current_query_idx < query->set.size, "invariant"); query->process_idx = current_query_idx; return OS_OK; } dt query -b Local var @ 0xc9014fed80 Type MultiCounterQueryS* 0x00000262`5dba9fd0 +0x000 query : UpdateQueryS +0x000 pdh_query_handle : 0x00000262`5dbddd00 +0x008 lastUpdate : 0n0 +0x010 counters : 0x00000262`5dab3100 +0x018 noOfCounters : 0n2 +0x01c initialized : 1 static void close_query(HQUERY* const pdh_query_handle, HCOUNTER* const counter) { if (counter != nullptr && *counter != nullptr) { PdhDll::PdhRemoveCounter(*counter); *counter = nullptr; } if (pdh_query_handle != nullptr && *pdh_query_handle != nullptr) { PdhDll::PdhCloseQuery(*pdh_query_handle); *pdh_query_handle = nullptr; } } static void close_query(MultiCounterQueryP query) { for (int i = 0; i < query->noOfCounters; ++i) { close_query(nullptr, &query->counters[i]); } close_query(&query->query.pdh_query_handle, nullptr); query->initialized = false; } // Counters (two) dps poi(counter) 00000262`5dab3100 00000262`5dbde050 00000262`5dab3108 00000262`5dbe0bf0 // PdhRemoveCounter(00000262`5dbde050) 00000262`5dbde050 feeefeee`feeefeee <<--- PdhRemoveCounter == deallocate 00000262`5dbde058 feeefeee`feeefeee 00000262`5dbde060 feeefeee`feeefeee 00000262`5dbde068 feeefeee`feeefeee // PdhRemoveCounter(00000262`5dbe0bf0) 00000262`5dbe0bf0 feeefeee`feeefeee <<--- PdhRemoveCounter == deallocate 00000262`5dbe0bf8 feeefeee`feeefeee 00000262`5dbe0c00 feeefeee`feeefeee 00000262`5dbe0c08 feeefeee`feeefeee // Query (one) // PdhCloseQuery(0x00000262`5dbddd00) 00000262`5dbddd00 feeefeee`feeefeee <<--- PdhCloseQuery == deallocate 00000262`5dbddd08 feeefeee`feeefeee 00000262`5dbddd10 feeefeee`feeefeee 00000262`5dbddd18 feeefeee`feeefeee // Hypothesis and scenario 1. What happens if two threads enter ensure_current_process_query_index() simultaneously (which is a broken invariant)? 2. The first thread would pull out the current_process_query_index() and pass the corresponding query object into one of the OS PDH APIs. 3. Another Java process terminates, thus updating the kernel PDH process list. 4. A second periodic thread enters ensure_current_process_query_index() and discovers that the kernel PDH process list has changed. 5. The second thread deallocates the query object and its associated counters at the index for the query object already passed into the OS PDH APIS by the first thread. This is a situation that could result in an ACCESS_VIOLATION (0xc0000005, which is -0n1073741819 (in decimal)) raised by the OS (inside the PDH APIs) and not the JVM, which would explain why we are not getting an hs_err<pid>.log file created. This hypothesis is supported by "JDK-8295223: JFR: At most one native periodic event thread at a time", which introduced a ReentrantLock to prevent more than one periodic thread from calling into JVM.emitEvent(). JDK-8295223 was fixed in JDK20. We have not seen more JFR bugs with the "exitValue = -1073741819" after JDK20; see filter https://bugs.openjdk.org/browse/JDK-8273455?filter=40705 In addition, some of the tests reporting "exitValue = -1073741819" were removed from ProblemList.txt as of JDK21: https://bugs.openjdk.org/browse/JDK-8303085
17-01-2025

Changeset: c7f65438 Author: Erik Gahlin <egahlin@openjdk.org> Date: 2022-10-13 15:53:33 +0000 URL: https://git.openjdk.org/jdk/commit/c7f65438bb4a4fd449bd19b68574cfa4b42d7ca8
13-10-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10676 Date: 2022-10-12 13:11:46 +0000
12-10-2022