Bug ID: JDK-8251570 JDK-8215624 causes assert(worker_id < _n_workers) failed: Invalid worker

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 16

Priority: P1
Status: Closed
Resolution: Fixed

Submitted: 2020-08-14
Updated: 2024-11-20
Resolved: 2020-08-17

JDK 11	JDK 16
11.0.14Fixed	16 b12Fixed

'JDK-8215624: Add parallel heap iteration for jmap –histo' implemented a way to parallelize heap inspection. There is a mismatch between the number of requested parallel heap inspection threads, and the number of actually spawned threads.

The heap region claimer is initialized with the value requested by the heap inspection system, while the task spawning mechanism uses the current active number of GC threads.

#  assert(worker_id < _n_workers) failed: Invalid worker_id.

V  [libjvm.so+0xcb85dc]  HeapRegionClaimer::offset_for_worker(unsigned int) const+0x4c
V  [libjvm.so+0xaeabfb]  G1ParallelObjectIterator::object_iterate(ObjectClosure*, unsigned int)+0x3b
V  [libjvm.so+0xc9f66e]  ParHeapInspectTask::work(unsigned int)+0x7e
V  [libjvm.so+0x187cfd4]  GangWorker::run_task(WorkData)+0x84

Fix Request (11u) Please approve backporting this to OpenJDK 11u. The PR is https://github.com/openjdk/jdk11u-dev/pull/284. This fix is required as we are downporting https://bugs.openjdk.java.net/browse/JDK-8215624. The risk is low as most of the code are same as the one in jdk-master. Thanks, Lin
08-09-2021
URL: https://hg.openjdk.java.net/jdk/jdk/rev/dd827a012e43 User: stefank Date: 2020-08-17 09:33:15 +0000
17-08-2020
Initial patch was incomplete and didn't deal with all cases. See review thread: https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-August/030560.html Updated webrev: https://cr.openjdk.java.net/~stefank/8251570/webrev.02/
14-08-2020
Bumping from P4 -> P3. This failure mode is reproducing once per Tier4 job set, ~10 times per Tier5 job set and ~6 times per Tier6 job set. Update: ~6 times in a Tier7 job set.
14-08-2020
Spotted in the jdk-16+12-434-tier5 CI job set: https://mach5.us.oracle.com/mdash/jobs/mach5-one-jdk-16+12-434-tier5-20200814-1100-13437418/results?search=status%3Afailed%20AND%20-state%3Ainvalid Multiple failures of the following tests: sun/tools/jmap/BasicJMapTest.java java/util/logging/TestLoggerWeakRefLeak.java
14-08-2020
https://cr.openjdk.java.net/~stefank/8251570/webrev.01
14-08-2020
Hi Stefan, Sounds great, please add me in the cc list when you post a webrev, so I can learn from that. Thanks! Cheers, Lin
14-08-2020
Hi [~lzang] I'm In the process of testing a patch that does something similar to what you describe. However, when looking at this I realized that we use the wrong workers for ZGC and Shenandoah. The current patch I'm working on rips out the GC specific run_task functions, and instead provide a run_task_at_safepoint function that uses the "safepoint workers", that should be used when non-GC subsystems want to run tasks in parallel. void CollectedHeap::run_task_at_safepoint(AbstractGangTask* task, uint num_workers) { WorkGang* gang = get_safepoint_workers(); if (gang == NULL) { // GC doesn't support parallel worker threads. // Execute in this thread with worker id 0. task->work(0); } else { gang->run_task(task, num_workers); } } I'll post a webrev soon.
14-08-2020
Hi Stefan, I did a quick investigation. it seem one possible reason is G1CollectedHeap::run_task(AbstractGangTask* task) use an fixed "workers()->active_workers()", which can be equal to parallelGCThreads, but the parallel heap inspection thread number was set seperately. One solution maybe change the run_task(AbstractGangTask* task) to run_task(AbstractGangTask* task, uint workers) and pass heap iteration parallel thread number to it when do heap inspection. Do you think it is reasonable? P.S. the run_task() is used only for paralle heap inpection at present. and I can provide a fix ASAP if it is ok. BRs, Lin
14-08-2020
Reproducible by bumping up the number of parallel GC threads: make -C ../build/fastdebug test TEST="java/util/logging/TestLoggerWeakRefLeak.java" JTREG="JAVA_OPTIONS=-XX:ParallelGCThreads=100"
14-08-2020
Ping [~lzang] [~phh] [~sspitsyn]. The jmap changeset had a problem. I'll try to deal with asap.
14-08-2020