JDK-8279011 : JFR: JfrChunkWriter incorrectly handles int64_t chunk size as size_t
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 17,18,19
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2021-12-20
  • Updated: 2022-01-03
  • Resolved: 2021-12-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17 JDK 18 JDK 19
17.0.3Fixed 18 b29Fixed 19Fixed
Related Reports
Relates :  
Description
Followed up on JFR failures on x86_32, and spotted this failure:

$ CONF=linux-x86-server-fastdebug make run-test TEST=jdk/jfr/jmx/streaming/TestClose.java 

STDOUT:
test-close-1640016512165/2021_12_20_17_08_33.jfr
STDERR:
java.lang.Exception: Expected repository to be empty
	at jdk.jfr.jmx.streaming.TestClose.main(TestClose.java:69)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:577)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
	at java.base/java.lang.Thread.run(Thread.java:833)

Intrusive debugging shows the same picture: chunkSize is recorded as huge value, so the JFR code internally receives the EOF when reading, event handling gets borked, and the test fails, never observing the chunk flushes.

[3.141s][info][jfr,system,parser  ] Chunk: startPosition=0
[3.141s][info][jfr,system,parser  ] Chunk: major=2
[3.141s][info][jfr,system,parser  ] Chunk: minor=1
[3.141s][info][jfr,system,parser  ] Chunk: chunkSize=9896023423722239
[3.141s][info][jfr,system,parser  ] Chunk: constantPoolPosition=0
[3.141s][info][jfr,system,parser  ] Chunk: metadataPosition=0
[3.141s][info][jfr,system,parser  ] Chunk: startNanos=1640015039383595228
[3.141s][info][jfr,system,parser  ] Chunk: durationNanos=945990689
[3.141s][info][jfr,system,parser  ] Chunk: startTicks=1203376481
[3.141s][info][jfr,system,parser  ] Chunk: ticksPerSecond=1000000000
[3.141s][info][jfr,system,parser  ] Setting input size to 9896023423722239
[3.141s][info][jfr,system,parser  ] Chunk: chunkSize=9896023423722239
[3.141s][info][jfr,system,parser  ] Chunk: constantPoolPosition=2304002
[3.141s][info][jfr,system,parser  ] Chunk: metadataPosition=2103375
[3.141s][info][jfr,system,parser  ] Chunk: durationNanos =945990689
[3.141s][info][jfr,system,parser  ] Chunk: generation=2
[3.141s][info][jfr,system,parser  ] Chunk: finished=false
[3.141s][info][jfr,system,parser  ] Chunk: fileSize=9896023423722239
[3.141s][info][jfr,system,parser  ] Chunk: finalChunk=false

The problem seems to be here:

int64_t JfrChunkWriter::write_chunk_header_checkpoint(bool flushpoint) {
   const u4 checkpoint_size = current_offset() - event_size_offset;
   write_padded_at_offset<u4>(checkpoint_size, event_size_offset);
   set_last_checkpoint_offset(event_size_offset);
   const size_t sz_written = size_written();  // <-- returns int64_t
   write_be_at_offset(sz_written, chunk_size_offset); // <--- template instantiation with type=size_t
   return sz_written;
 }

This would have been nearly fine -- small size_t -> int64_t conversion is okay value-wise. But write_be_at_offset calculates the position for the writeout using sizeof(T), which silently borks the whole thing on at least 32-bit platforms, where sizeof(size_t) != sizeof(int64_t).

This might also explain timeouts in these tests, as the fix apparently resolves the timeouts there too:

# Timeouts
jdk/jfr/jmx/streaming/TestRemoteDump.java
jdk/jfr/jmx/streaming/TestSetSettings.java
jdk/jfr/jmx/streaming/TestDelegated.java
jdk/jfr/jmx/streaming/TestRotate.java
jdk/jfr/jmx/streaming/TestEnableDisable.java
Comments
A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk17u-dev/pull/24 Date: 2021-12-21 13:37:46 +0000
21-12-2021

Fix Request (17u) Fixes the JFR bug that manifests on 32-bit platforms. Applies cleanly. Tests pass. (Will wait a bit for mainline testing to catch up before pushing to 17u, although the fix is pretty trivial).
21-12-2021

Changeset: 467f6549 Author: Aleksey Shipilev <shade@openjdk.org> Date: 2021-12-21 12:33:59 +0000 URL: https://git.openjdk.java.net/jdk18/commit/467f654916764751746934f80877d49a5f107194
21-12-2021

A pull request was submitted for review. URL: https://git.openjdk.java.net/jdk18/pull/50 Date: 2021-12-20 16:34:07 +0000
20-12-2021