JDK-8168445 : make pd_get_top_frame_for_profiling more robust
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 9
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2016-10-20
  • Updated: 2023-09-11
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Blocks :  
Blocks :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
Implementations of pd_get_top_frame_for_profiling use frame::safe_for_sender to see if the top frame is in a safe state for stack walking.  Safe_for_sender has lots of checks and heuristics, but some of them are not fool-proof, particularly for compiled frames.

C1/C2 compiled frames have a "frame complete" offset for frame setup that safe_for_sender can use, but no corresponding "frame incomplete" offset for frame tear-down.

Deoptimization stubs have even more problems.  They have no "frame complete" offset, and they have multiple entry points.

To allow safe_for_sender to work correctly on compiled frames, we would need to map each offset to the frame/stack delta at the location.  This would obsolete "frame complete", but would require more meta data about the codeblob.  A possible solution would be a new reloc type.

Alternatively, we could make safe_for_sender more conservative, at the cost of fined-grained profiling, so that it only returns true when the frame is really safe.  So instead of blinding declaring deopt stubs safe, it would need to return false if it lacks enough information
Comments
This is a rare bug that is hard to reproduce. I haven't seen any recent reports on Linux, but there are some older bugs against 9 on Solaris x86. We really need a way to reproduce the problem reliably, which is why this bug is blocked on JDK-8170152, which is targeted for 13. Changing both to tbd_feature.
23-02-2018

An example of what is wrong is the compiled epilogue. We pop the frame, then have a small window before we return where is_frame_complete_at() returns true, the PC is still in compiled code, but the compiled frame is gone. The checks used by safe_for_sender() a susceptible to a false positive if the right junk is on the stack. To fix this properly, we need C1/C2/Graal/AOT to create compiled methods with extra metadata so that is_frame_complete_at() [or perhaps a new frame_adjustment_at()] gives correct results for arbitrary locations.
07-11-2017

I don't think we have time to fix this properly for jdk9. The easiest fix would be to ignore top frames that are compiled, but that makes pd_get_top_frame_for_profiling not very useful.
22-02-2017

We don't have a fix, and it probably doesn't deserve to be a P2. The related bugs are P3. The workaround is to not use this kind of profiling.
21-02-2017

why is this P2 deferred to 10?
21-02-2017

Hi Dean, we need to keep track of this issue, so I'm assigning it to you. We can think later how to take care of it. Zoltan
15-12-2016

ILW=crash,low frequency of occurrence,no workaround=HLH=P2
21-11-2016