JDK-8214542 : JFR: Old Object Sample event slow on a deep heap in debug builds
  • Type: Bug
  • Component: hotspot
  • Sub-Component: jfr
  • Affected Version: 12
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2018-11-30
  • Updated: 2020-04-29
  • Resolved: 2019-07-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14 Other
11.0.6Fixed 13 b29Fixed 14Fixed openjdk8u-jfr-incubatorFixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
The test test\jdk\jdk\jfr\event\oldobject\TestHeapDeep.java tries to locate memory leaks in a small heap that consists of a linked list of 100 000 elements.The test passes, but it executes slowly, about 30 seconds instead of a few milliseconds.

This is a product bug. 
Comments
Replacing jdk8u-fix-request with link to JDK-8239140
17-02-2020

RFC: https://mail.openjdk.java.net/pipermail/jdk8u-dev/2020-January/011063.html
30-01-2020

Addition to 11u fix request by [~hirt]: Patch did not apply cleanly. Review request sent out on mailing list: https://mail.openjdk.java.net/pipermail/jdk-updates-dev/2019-September/001811.html Testing shows no regressions. Will submit this item immediately follwed by JDK-8228834, since the latter fixes a regression introduced by this fix. I also include JDK-8229437 when pushing this, since it is a regression fix for this item as well.
03-09-2019

Fix request comment: This fix should be backported to make it viable to use the old object sample in production systems. It depends on fixing and backporting JDK-8225797, without which the event cannot be used in production anyways.
22-07-2019

I cannot see any time complexity algorithmic issues in the code that would explain what you describe, except some superheavy assertions like this: // bfsclosure.cpp // if we are processinig initial root set, don't add to queue if (_current_parent != NULL) { 141: assert(_current_parent->distance_to_root() == _current_frontier_level, "invariant"); _edge_queue->add(_current_parent, reference); } There are memory complexity issues that worsen in fastdebug / slowdebug runs because of the above assertion keeping a high working set for a longer time period. This gets very bad for concurrent runs (again fastdebug / slowdebug) where a -conc:16 setting leads to almost having a 8GB memory footprint, with just running the OldObject tests. I did not see this using product builds. Is it possible that a concurrent test run with fastdebug / slowdebug could have caused swappping due to the high memory footprint? There are many improvements to trim the working sets: 1. We determine the BitSet for marking as a value derived from Universe->heap()->reserved_region(). This is almost always 8GB regardless of usages. This leads to a committed bitset of 127MB which is a bit wasteful, if the current occupance is in the vicinity of 1-2 MB. I have elaborated on building a page manager plug-in to the virtual memory backing storage. This only commits the pages that are needed. The downside is that we then need to check a bitset for the mapped page in order to check the bitset for the marked address...127 MB might be ok since we are running exclusively in a VM Operation... 2. Currently we store the entire chain in the EdgeStore, but we only serialize the top 100 edges and the root 100 edges. I have updated so that we only store the edges needed. In addition, we do not need the intermediate Edge[] in the process of storing edges. 3. I also saw that the BitSet uses par_set_bit which uses cmpxchg() underneath - we don't need cas since we are single threaded (this is more of a performance optimization).
17-12-2018

Impact: High This is a scalability issue, time spent searching for a chain of length n with k samples is O(n*k) instead of O(n). This means the algorithm is up to 256 times as slow. In the test with a rather small heap, it meant 15 s in a safepoint. This is not acceptable in a production environment. Likelihood: Low/Medium How big the problem is depends on the heap size and the length of the longest chain, so it is hard to say something conclusive about the likelihood as it depends o the application. This only happens if you request path to GC root. Workaround: High There is no known workaround H (L/M) H = > P1/P2
13-12-2018