JDK-8231081 : TestMetadataRetention fails due to missing symbol id
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 14
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2019-09-16
  • Updated: 2022-06-27
  • Resolved: 2019-10-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14
11.0.7Fixed 13.0.4Fixed 14 b21Fixed
Related Reports
Relates :  
Relates :  
Description
----------System.err:(33/2595)----------
java.lang.InternalError: Missing object id=16777227 in pool jdk.types.Symbol. All ids should reference object
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap.get(ConstantMap.java:100)
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap$Reference.resolve(ConstantMap.java:56)
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap.resolve(ConstantMap.java:137)
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap.resolve(ConstantMap.java:143)
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap.get(ConstantMap.java:118)
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap$Reference.resolve(ConstantMap.java:56)
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap.resolve(ConstantMap.java:137)
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap.resolve(ConstantMap.java:143)
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap.get(ConstantMap.java:118)
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap.lambda$resolve$0(ConstantMap.java:151)
	at jdk.jfr/jdk.jfr.internal.LongMap.forEachKey(LongMap.java:224)
	at jdk.jfr/jdk.jfr.internal.consumer.ConstantMap.resolve(ConstantMap.java:151)
	at jdk.jfr/jdk.jfr.internal.consumer.ChunkParser.lambda$new$2(ChunkParser.java:137)
	at jdk.jfr/jdk.jfr.internal.LongMap.forEach(LongMap.java:233)
	at jdk.jfr/jdk.jfr.internal.consumer.ChunkParser.<init>(ChunkParser.java:137)
	at jdk.jfr/jdk.jfr.internal.consumer.ChunkParser.nextChunkParser(ChunkParser.java:144)
	at jdk.jfr/jdk.jfr.consumer.RecordingFile.findNext(RecordingFile.java:244)
	at jdk.jfr/jdk.jfr.consumer.RecordingFile.readEvent(RecordingFile.java:110)
	at jdk.jfr/jdk.jfr.consumer.RecordingFile.readAllEvents(RecordingFile.java:227)
	at jdk.test.lib.jfr.Events.fromRecording(Events.java:274)
	at jdk.jfr.event.oldobject.TestMetadataRetention.main(TestMetadataRetention.java:103)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
	at java.base/java.lang.Thread.run(Thread.java:830)

JavaTest Message: Test threw exception: java.lang.InternalError: Missing object id=16777227 in pool jdk.types.Symbol. All ids should reference object
Comments
Fix request (13u): The change applies cleanly, tested with jdk/jfr, and tier1 tests.
29-05-2020

Fix Request [11] This is a necessary followup fix for the planned backport of JDK-8225797. The patch does not apply cleanly and needs to be adjusted due to missing ClassDataGraph_lock and is_unsafe_anonymous() symbols in JDK 11. There is very low risk associated with this change. Public review available as https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2020-January/002418.html
24-01-2020

URL: https://hg.openjdk.java.net/jdk/jdk/rev/7909763ad193 User: mgronlun Date: 2019-10-24 14:37:49 +0000
24-10-2019

Great! Thanks! I can confirm the reproducer works as expected and the fix is indeed making the problem go away.
25-09-2019

[~jbachorik] Hi again Jaroslav, I think i have managed to create a JTREG reproducer for this issue, please see: http://cr.openjdk.java.net/~mgronlun/8231081/testcase/webrev/ Running this testcase without the fix will have an NPE be thrown, i.e.: [5.459s][info][class,unload] unloading class jdk.jfr.event.runtime.TestClasses 0x000000010010b840 [5.719s][info][gc ] GC(4) Pause Full (System.gc()) 7M->3M(14M) 367.474ms STDERR: java.lang.NullPointerException at jdk.jfr/jdk.jfr.consumer.RecordedClass.getName(RecordedClass.java:90) at jdk.jfr/jdk.jfr.internal.tool.PrettyWriter.printClass(PrettyWriter.java:481) at jdk.jfr/jdk.jfr.internal.tool.PrettyWriter.printValue(PrettyWriter.java:296) at jdk.jfr/jdk.jfr.internal.tool.PrettyWriter.printFieldValue(PrettyWriter.java:270) at jdk.jfr/jdk.jfr.internal.tool.PrettyWriter.print(PrettyWriter.java:210) at jdk.jfr/jdk.jfr.consumer.RecordedObject.toString(RecordedObject.java:883) at java.base/java.lang.StringConcatHelper.stringOf(StringConcatHelper.java:439) at java.base/java.lang.StringConcatHelper.simpleConcat(StringConcatHelper.java:418) at jdk.jfr.event.runtime.Test8231081.locateClassDefineEvent(Test8231081.java:102) at jdk.jfr.event.runtime.Test8231081.recordClassDefinition(Test8231081.java:93) at jdk.jfr.event.runtime.Test8231081.main(Test8231081.java:84) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:830) JavaTest Message: Test threw exception: java.lang.NullPointerException JavaTest Message: shutting down test No NPE's with the fix. HTH Markus
25-09-2019

[~jbachorik] Hi Jaroslav, I have tried to create a standalone reproducer but with no luck. There are a lot of time dependent details that must converge for this to occur (i could only have this error provoked by longer stress testing (hours) using our Streaming prototype. The reason it happens there is because streaming will continuously parse the recordings, and we currently have stricter sanity checking in place. In earlier versions this would only manifest as some symbol names being null. What needs to occur is: 1. JFR takes the system to a safepoint to switch the epoch. 2. After coming out of the safepoint, some thread must tag an artifact that is very close to being unloaded (klass, method) for this current epoch. 3. JFR collects the artifacts for the previous epoch and stops recording. 4. A class unload occurs, involving the artifacts that were tagged in the short window 2-3. This tags related symbols. 5. Recording starts up again, and a typeset is again written involving the symbols that were related by 4. The best way i can think of is really heavy stress testing (involving lots of stacktraces) and frequent recording start / stops.
23-09-2019

Is there a reproducer for this? I've been trying to get the test to crash but with no luck. Would be great to be able to validate the patch - the approach looks correct AFAICT (with my limited depth of knowledge).
23-09-2019

The above comment is necessary but not sufficient to fully resolve the problem related to "Missing object id=n in pool jdk.types.Symbol. All ids should reference object". The root cause is a situation involving a class unload interleaved between recorder stop and recorder start. Class unloading will serialize classes and artifacts that are about to be removed from the system. In addition, it will update state as to what traceid's are no longer in the system to allow for scavenging of stale type entries. What happens is that the chunk finalization rotation (when the recording is stopped), collected all artifacts and reset the type state. Then a class unload occurs which will tag some artifacts (also some artifacts that are not unloading, but are in the unloading closure), in the process marking them as serialized. Unfortunately, the typeset does not currently involve an explicit clear on start step. Instead, when the recorder starts up again, some of the artifacts will already be marked as serialized (without having been written to the chunk) by the interleaved class unload. This can be solved by introducing a clearing of the typeset on recorder start (implemented using clear() routines), so that the type set is always reset upon recording (re)start. Updated webrev: http://cr.openjdk.java.net/~mgronlun/8231081/webrev02/
22-09-2019

JDK-8225797 introduced a caching mechanism for serializing symbols based on the assumption that almost all lookup queries come back-to-back. Usually the first query is issued by the writer associated with leak profiler, followed by the same query issued by the regular writer. The problem is located here: https://hg.openjdk.java.net/jdk/jdk/rev/caa25ab47aca#l26.211 The fact that caching is in place could cause a tagging attempt of a symbol as "leakp" to not complete as intended. Therefore, when later iterating symbols to serialize, this entry will not be included in the leakp set of symbols. Not being included in the blob associated with an object sample candidate, the parser will not be able to resolve the referenced symbol id successfully.
16-09-2019