JDK-8252885 : Re-implement ThreadGroup
  • Type: JEP
  • Component: core-libs
  • Priority: P4
  • Status: Closed
  • Resolution: Withdrawn
  • Submitted: 2020-09-07
  • Updated: 2022-07-02
  • Resolved: 2021-11-20
Related Reports
Relates :  
Relates :  
Description
## Summary

Re-implement `java.lang.ThreadGroup` to address its most significant flaws. Remove the ability to explicitly _destroy_ a thread group. Remove the notion of _daemon thread group_. Degrade or remove the methods that are already deprecated and terminally deprecated, but otherwise leave the ThreadGroup API "_as is_".



## Non-Goals

It is not a goal of this JEP to terminally deprecate or remove ThreadGroup. Removing ThreadGroup would be a disruptive change and is discussed further in the _Alternatives_ section below. This JEP does not preclude deprecating, terminally deprecating, or removing ThreadGroup in the future.

It is not a goal to _modernize_ or add new features to the ThreadGroup API.



## Description

From the API docs: 

"_A thread group represents a set of threads. In addition, a thread group can also include other thread groups. The thread groups form a tree in which every thread group except the initial thread group has a parent._"

The `java.lang.ThreadGroup` APIs dates from JDK 1.0. The API appears to have been influenced by Unix process groups at the time and was intended as a form of job control for threads, e.g. "stop all threads". Modern code is more likely to use the thread pool executors, provided by the java.util.concurrent API.

ThreadGroup supported the isolation of applets in early JDK releases. The Java security architecture evolved significantly in Java 1.2 with ThreadGroups no longer having a significant role.

ThreadGroup was also intended to be useful for diagnostic purposes but that aspect has been superseded by the monitoring and management support and java.lang.management API since Java 5.



### Problems with ThreadGroup

Aside from relevance, the ThreadGroup API and implementation has a number of significant problems:

1. The API and mechanism to destroy thread groups is flawed. To avoid a memory leak, users need to explicitly _destroy_ a thread group when it is empty and no longer needed, or set its _daemon status_ so that the group is automatically destroyed when the last thread in the group terminates. There will often not be a reliable point to destroy a thread group and/or set the daemon status, e.g.

   - `ThreadGroup::destroy` is not an atomic operation. It can fail with `IllegalThreadStateException` after destroying some, but not all, subgroups.

   - Threads can be created, but not started, before their thread group is destroyed. This can lead to `Thread::start` failing with an undocumented `IllegalThreadStateException`.
   - If `setDaemon(true)` is called after starting threads in the group then it races with the termination of the last thread in the group. If the last thread terminates before setDaemon is called then the group will not be destroyed.
   - If `setDaemon(true)` is called before starting threads in the group then it also races with thread termination when there is more than one thread in the group. Thread termination may destroy the group before the remaining threads have been created or started.
2. The implementation has a reference to all live threads in the group. This adds synchronization and contention overhead to thread creation, start, and termination. ThreadGroup maintains an internal "threads" array that increases the likelihood of Thread objects being on the same cache line (hence the padding in Thread to avoid false sharing on the mutable fields used by `java.util.concurrent.ThreadLocalRandom`).
3. Defines `suspend`, `resume`, and `stop` methods that are inherently deadlock prone and unsafe as they as invoke the the same named deadlock prone and unsafe methods defined by `java.lang.Thread`.
4. Defines `enumerate` methods that are inherently racy and flawed. These methods are unable to return a complete snapshot of all threads when called with an *undersized array* and so need to be called in a loop until a thread count less than the array size is returned.
5. Defines several methods that should have been final (`activeCount`, `enumerate`, `isDestroyed`, ...).
6. ThreadGroup has a number of concurrency bugs, e.g. the `daemon` and `maxPriority` fields are accessed without synchronization.



### Usages of ThreadGroup

A search of 100,000 artifacts on Maven Central found 2394 unique artifacts with compiled code referencing java.lang.ThreadGroup.  Many of the usages sampled have "ThreadFactory" in the class name and appear to just create threads. Several usages sampled are classes that extend Thread and define constructors that take a ThreadGroup.

The following is the usage count of specific methods from the search. The usages of Thread.activeCount and Thread.enumerate are included as these methods delegate to ThreadGroup.

<table>
  <tr>
    <th>Method</th>
    <th>#Usages</th>
    <th>Notes</th>
  </tr>
  <tr>
    <td>suspend / resume / allowThreadSuspension</td>
    <td>0 / 0 / 0</td>
    <td></td>
  </tr>
  <tr>
    <td>stop</td>
    <td>3</td>
    <td></td>
  </tr>
  <tr>
    <td>interrupt</td>
    <td>420</td>
    <td></td>
  </tr>
  <tr>
    <td>list</td>
    <td>10</td>
    <td></td>
  </tr>
  <tr>
    <td>setDaemon / isDaemon</td>
    <td>154 / 12</td>
    <td></td>
  </tr>
  <tr>
    <td>destroy / isDestroyed</td>
    <td>57 / 71</td>
    <td></td>
  </tr>
  <tr>
    <td>setMaxPriority / getMaxPrority</td>
    <td>11 / 132</td>
    <td></td>
  </tr>
  <tr>
    <td>activeCount</td>
    <td>744</td>
    <td></td>
  </tr>
  <tr>
    <td>activeGroupCount</td>
    <td>325</td>
    <td></td>
  </tr>
  <tr>
    <td>enumerate</td>
    <td>651</td>
    <td>Mix of enumerate(Thread[]) and enumerate(Threadgroup[])</td>
  </tr>
  <tr>
    <td>Thread.activeCount</td>
    <td>175</td>
    <td>Delegates to ThreadGroup activeCount</td>
  </tr>
  <tr>
    <td>Thread.enumerate</td>
    <td>134</td>
    <td>Delegates to ThreadGroup enumerate(Thread[])</td>
  </tr>
</table>

The search doesn't provide insight into the age or relevance of the artifacts but it at least shows which methods have some usage, and which methods are rarely or never used, and in particular

1. There are no usages of the `suspend`, `resume` or `allowThreadSuspension` methods and only 3 usages of the `stop` method. These problematic methods could be removed with little impact.
2. The `activeCount` and `enumerate` methods are used (usually in conjunction with each other). Removing these methods would be disruptive for at least some libraries and tools.



### Debugger support

Changes to ThreadGroup need to take debugger support into account:

- The [JVM Tool Interface](https://docs.oracle.com/en/java/javase/15/docs/specs/jvmti.html) (JVM TI) defines functions to allow agents enumerate the threads in a thread group, enumerate subgroups, and get information about thread groups.
- [Java Debug Wire Protocol](https://docs.oracle.com/en/java/javase/15/docs/specs/jdwp/jdwp-protocol.html) (JDWP) defines command packets and replies to support debugger operations that are the wire protocol equivalent to the JVM TI functions.
- The JDK-specific [Java Debug Interface](https://docs.oracle.com/en/java/javase/15/docs/api/jdk.jdi/module-summary.html) (JDI) defines a mirror API for thread groups. This allows IDEs and debuggers to enumerate the threads in a thread group, enumerate subgroups, and get information about thread groups.

In addition to the debugger, changes to ThreadGroup need to take the [Java Flight Recorder API](https://docs.oracle.com/en/java/javase/15/docs/api/jdk.jfr/module-summary.html) into account because its API for consuming JFR data has support for accessing data recorded about threads and thread groups.



### Project Loom and ThreadGroup

Project Loom aims to drastically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications that make the best use of available hardware. The foundation in the current design and prototype is _virtual threads_ that are scheduled by the Java virtual machine rather than the operating system. The current proposal uses the existing `java.lang.Thread` API which necessitates addressing a number of issues with both `Thread` and  `ThreadGroup`. In the current prototype, the relationship between virtual threads and thread groups is as follows:

1. Virtual threads are not _active_ threads of a thread group. Thread::getThreadGroup returns a placeholder "VirtualThreads" thread group that cannot be destroyed. Its activeCount() method returns 0, it appears empty. There is no way to create virtual threads in other thread groups.
2. By default, threads are created in the same thread group as their parent. If code executing in the context of a virtual thread creates a kernel thread then it is currently created in a sub-group of "VirtualThreads".



### Proposed Changes

1. Remove the ability to explicitly _destroy_ a thread group. This impacts the `destroy` and `isDestroyed` methods which can be *degraded* for a release or two prior to removal. In degraded form, the `destroy` method is changed to be a no-op or unconditionally throw, and the `isDestroyed` method changed to always return `false`.
2. Remove the notion of a _daemon thread group_ that is automatically destroyed when the last thread in the group terminates. This impacts the `setDaemon` and `isDaemon` methods which can be *degraded* for a release or two prior to removal. In degraded form, the `setDaemon` method is changed to be a no-op or unconditionally throw, and the `isDaemon` method changed to always return `false`.
3. Re-specify Thread to allow a thread group be eligible to be GC'ed when there are no live threads in the group and there is nothing else keeping the thread group alive.
4. Change the implementation to not keep a reference to the threads in the group, meaning its internal "threads" array goes away. The `activeCount` and `enumerate` methods are re-implemented to take a snapshot of the VM thread list. This change should be transparent to users of the API.
5. Remove the `suspend`, `resume`, `stop` methods are degraded to unconditionally throw `UnsupportedOperationException` or are removed.
6.  If removed, the `allowThreadSuspension` method will also be removed.
7. Fix the concurrency bugs by way of re-implementation.

The proposal does not deprecate the flawed `enumerate` methods. It wouldn't be hard to define better methods to enumerate the set of threads or subgroups but ThreadGroup is legacy and not interesting for modern code.

The proposed changes do not do anything about the "_should be final_" methods, it's not worth the disruption.

The proposed changes require adjustments to the implementation of 3 JVM TI functions, and a small update to the specification of the JVM TI `GetThreadGroupInfo` function, but otherwise have no impact on the debugger support.



### Preparatory Changes

In advance of the changes, the `stop`, `destroy`, `isDestroyed`, `setDaemon` and `isDaemon` methods will be terminally deprecated. 

The `suspend`,  `resume`, and `allowThreadSuspension` methods are already terminally deprecated. These methods could  be removed in advance of the changes proposed in this JEP although this is not critical.



## Alternatives

Deprecate, terminally deprecate, and eventually remove ThreadGroup. This would be a disruptive change as there are at least some tools using it (including  Apache `ant` and the `jtreg` test harness used by the JDK tests). The debugger support is deeply tied to thread groups, meaning ThreadGroup cannot be significantly degraded or removed without also providing migration paths for debuggers and other tools that use JVM TI or JDI. The proposed changes do not preclude re-visiting this alternative in the future.

Reduce the impact of removing the ability to destroy a thread group. The main compatibility impact in the proposal is on code that depends on destroying a thread group. Several alternatives to this aspect of the proposal have been explored:

1. Keep the daemon status and change `isDestroyed` to return `true` if the thread group is a daemon thread group and `activeCount` returns 0. This alternative was ruled out because it creates inconsistencies such as not preventing a thread to be created or started in a "destroyed" thread group. It would also mean that `isDestroyed` could return false some time after it has returned true.
2. Change `isDestroyed`  to return `true` if `destroy` has been invoked. This alternative was ruled out because it creates the same inconsistencies as the previous alternative. Fixing the anomalies or races would require thread creation or start to coordinate with the thread group.

Do nothing. This alternative is problematic for Project Loom because the introduction of virtual threads necessitates specifying the behavior of their thread group (e.g. an early prototype had to specify that the thread group could not be destroyed). 



## Risks and Assumptions

The following are the risks and compatibility issues that have been identified with the proposed changes. Preliminary testing of the changes with tools that use ThreadGroup in anger have not run into any of these issues.

1. Code that depends on destroying a thread group may be impacted, e.g.
   - There may be code that waits for a group to be destroyed with a loop like this:
     `while (!group.isDestroyed()) { ... }`
     This code would loop forever with the proposed change.
   - There may be code that invokes `destroy` and catches `IllegalThreadStateException` to detect that there are threads still running. The exception will not be thrown with the proposed change.
2. Code that depends on finding a thread group by name may be impacted, e.g. it is possible that code enumerates a thread group to find a subgroup by name. In that scenario, the subgroup may have been GC'ed so the search will fail.
3. Performance. The performance of `activeCount` will regress. One data point is a group with two sub-groups, each with 100 threads. The `activeCount` method takes around 45 ns/op on an Intel 2.6Ghz i7 with the existing ThreadGroup implementation. With the proposal, it takes about 4 us/op on the same system, a significant hit. The performance of `enumerate(Thread[])` may be impacted in some cases although it may be faster than the existing implementation in other cases.
4. Code that depends on `suspend`, `resume`, or `stop` will fail (at run-time if they are changed to throw `UnsupportedOperationException` or at compile-time if these methods are removed).


Comments
The proposal in this JEP has been subsumed into a section of the Virtual Threads JEP (JDK-8277131).
20-11-2021

For programs with hundreds of threads being able to organize the threads in to hierarchical groups has been a useful feature, particularly for debugging. I do note that most of the existing web containers; Glassfish, Wildfly, Jetty make at least some effort to use thread groups for the threads they create and the main value is to ignore groups or owners of threads in the debugger. When confronted with hundreds or thousands of unstructured threads in a debugger it can be a challenge to locate the 2 or 3 that you care about. It is very much appreciated when the owner of a thread pool creates all of their threads in a group that can be ignored. Absent use of ThreadGroup, filtering for threads by name, which is effectively by group, is sometimes the only option. The current pattern I have used for destroying a thread group is : try { threadGroup.setDaemon(true); threadGroup.destroy(); } catch(IllegalThreadStateException likely) { logger.note("Incomplete destruction of thread group"); } The setDaemon(true) is intended to allow eventual automatic cleanup if the destory() fails. The pain points you describe are all familiar. In my experience everybody gives up on using ThreadGroup for anything beyond organization after brief and painful experimentation.
11-05-2021