JDK-6394757 : (coll) AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util:collections
  • Affected Version: 6
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2006-03-07
  • Updated: 2023-08-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
One would think that the following program
---------------------------
import java.util.*;

public class Bug {
    static void test(String... elts) {
        Set<String> s = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
        s.addAll(Arrays.asList("a", "b"));
        s.removeAll(Arrays.asList(elts));
        System.out.println(s);
    }

    public static void main(String[] args) {
        test("A");
        test("A", "C");
    }
}
---------------------------

would print 2 identical lines, but in fact it prints

[b]
[a, b]

The results are dependent on the relative sizes of the two sets,
violating the Principle of Least Astonishment.

Comments
Fixing this bug by removing AbstractSet.removeAll will also fix JDK-4744966.
02-05-2020

Latest review thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2020-May/066196.html
01-05-2020

Some early discussion (July 2011) on this topic: http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007125.html
28-02-2020

Second review thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060147.html Updated webrev: http://cr.openjdk.java.net/~smarks/reviews/6394757/webrev.1/
14-05-2019

Review thread: http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060007.html Webrev: http://cr.openjdk.java.net/~smarks/reviews/6394757/webrev.0/
03-05-2019

The approach used in AbstractCollection.removeAll(), AbstractCollection.retainAll(), and AbstractSet.retainAll() is to iterate this collection and to remove based on the result of calling contains() on the argument collection. AbstractSet.removeAll() does the same if this collection's size is <= the argument collection's size. However, if this collection's size > the argument collection's size, AbstractSet.removeAll() iterates the argument collection and calls remove() on this collection. This iterates the wrong collection and thus uses the wrong collection's semantics for containership. A consequence of using the wrong collection's containership semantics is that removeAll() and retainAll() are no longer complementary. Here's an example of that. jshell> Set<String> removals = new TreeSet<>(String.CASE_INSENSITIVE_ORDER) jshell> removals.addAll(List.of("A", "B", "C")) jshell> List<String> init = List.of("a", "b", "c", "d", "e", "f") jshell> Set<String> hs = new HashSet<>(init) hs ==> [a, b, c, d, e, f] jshell> hs.removeAll(removals) hs ==> [a, b, c, d, e, f] jshell> Set<String> hs = new HashSet<>(init) hs ==> [a, b, c, d, e, f] jshell> hs.retainAll(removals) hs ==> [a, b, c]
15-02-2019

Recent email discussion: http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058140.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058378.html Tentative conclusion is that the performance optimization should be removed, the specification for the performance heuristic should be removed from AbstractSet.removeAll(), and the implementation should iterate this collection and call contains() on the argument. It might also be wise to clarify the specs of Collection.{remove,retain}All to say that the contains() semantics of the argument are used, without necessarily specifying the implementation.
14-02-2019

This has been a continual source of problems. It may be time to bite the compatibility bullet and remove the size-based optimization. The specification of AbstractSet.removeAll would need to be changed to be aligned with AbstractCollection.removeAll. (This might entail simply removing the override of removeAll from AbstractSet. But it would be good to put a note in here somewhere.) Collections.disjoint() may have to be adjusted as well, since it also uses several heuristics to determine which collection to iterate and which collection to probe. The size-based heuristic is only applied if neither collection is a set, however. But if some non-JDK collection type is introduced that uses an inconsistent-with-equals Comparator (or an "equalator") the problem could still occur. See JDK-6728865 and JDK-8179007. A couple linked bugs (JDK-6982173, JDK-8178425) mention possible performance improvements. These will have to be scrutinized to ensure they don't introduce semantic problems. Note, this problem also affects TreeMap's keySet, which is implemented using AbstractSet. Here's a modified version of the description's test case that illustrates the issue with TreeMap: static void testMap(String... elts) { Map<String, Integer> m = new TreeMap<>(String.CASE_INSENSITIVE_ORDER); m.put("a", 1); m.put("b", 2); m.keySet().removeAll(Arrays.asList(elts)); System.out.println(m); } public static void main(String[] args) { testMap("A"); testMap("A", "C"); } Output is {b=2} {a=1, b=2} which is analogous to the original test program's unexpected output.
12-06-2017

EVALUATION Jason writes insightfully: The spec AbstractSet.removeAll and Collections.disjoint should be aligned with each other. This suggested fix uses the wording from Collections.disjoint and IdentityHashMap and applies it to Collection.removeAll spec. Jason Source from jdk7b17 --- c:\temp\Collection.java 2007-08-03 08:54:36.941733500 -0500 +++ c:\temp\6394757\Collection.java 2007-08-08 16:51:39.907991600 -0500 @@ -337,6 +337,15 @@ * specified collection (optional operation). After this call returns, * this collection will contain no elements in common with the specified * collection. + * + * &lt;p&gt;Care must be exercised if this method is used on collections that + * do not comply with the general contract for {@code Collection}. + * Implementations may elect to iterate over either collection and test + * for containment in the other collection (or to perform any equivalent + * computation). If either collection uses a nonstandard equality test + * whose ordering is not &lt;i&gt;compatible with equals&lt;/i&gt; or uses + * reference-equality semantics then both collections must use the + * same nonstandard equality test, or the result of this method is undefined. * * @param c collection containing elements to be removed from this collection * @return &lt;tt&gt;true&lt;/tt&gt; if this collection changed as a result of the
09-08-2007

EVALUATION Contribution forum : https://jdk-collaboration.dev.java.net/servlets/ProjectForumMessageView?forumID=1463&messageID=21022
09-08-2007

EVALUATION Josh Bloch writes: "The spec clearly states that removeAll "Removes all this collection's elements that are also contained in the specified collection." So the current behavior is not just unpredictable, it's broken. It worked until 1.3, and was broken in 1.3-1.5. Sigh..." (And indeed, AbstractCollection.removeAll does precisely what the Collection.removeAll spec demands)
07-03-2006

EVALUATION This is apparently due to a misguided optimization 4266874: java.util.AbstractSet can speed up removeAll There was a missed opportunity to fix it in 4730113: TreeSet removeAll(), retainAll() don't use comparator; addAll() does The Right Thing is to simply remove AbstractSet.removeAll. Some more spec work on Collection.removeAll would also be worthwhile.
07-03-2006