JDK-8218945 : ConcurrentSkipListSet.removeAll() uses semantics of wrong collection
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Priority: P4
  • Status: In Progress
  • Resolution: Unresolved
  • Submitted: 2019-02-14
  • Updated: 2019-02-15
Related Reports
Relates :  
Description
Collection.removeAll() is specified to remove from this collection every element that is contained in the argument collection. This implies that the contains() semantics of the argument collection are used. However, CSLS.removeAll() iterates its argument and calls this.remove(), using the contains() semantics of this set instead of the argument collection.

(This is complicated by the performance heuristic in AbstractSet.removeAll, which is arguably broken. I've concluded that the contains() semantics of the argument collection must always be used. See JDK-6394757.)

The following jshell sessions illustrate the anomalous behavior (edited for brevity).

==========
jshell> List<String> removes = List.of("a", "b", "c", "d", "e", "f")
removes ==> [a, b, c, d, e, f]
jshell> Set<String> ts = new TreeSet<>(String.CASE_INSENSITIVE_ORDER)
jshell> ts.addAll(List.of("A", "b", "x"))
ts ==> [A, b, x]
jshell> ts.removeAll(removes)
ts ==> [A, x]
jshell> ts.addAll(List.of("A", "b", "x"))
jshell> ts.retainAll(removes)
ts ==> [b]
==========

The above shows that TreeSet uses the argument List's semantics of containership, namely case-sensitive equals(). Also note that the result of retainAll() is the complement of the result of removeAll(), which is expected. Now on to ConcurrentSkipListSet:

==========
jshell> Set<String> csls = new ConcurrentSkipListSet<>(String.CASE_INSENSITIVE_ORDER)
jshell> csls.addAll(List.of("A", "b", "x"))
csls ==> [A, b, x]
jshell> csls.removeAll(removes)
csls ==> [x]     ***1***
jshell> csls.addAll(List.of("A", "b", "x"))
jshell> csls.retainAll(removes)
csls ==> [b]     ***2***
==========

The result of csls.removeAll() at ***1*** indicates that the case-insensitive semantics of csls are used, instead of the case-sensitive semantics of List.

The result of csls.retainAll() at ***2*** is not a complement of the result of removeAll(), because retainAll() is inherited from AbstractSet and uses the contains() semantics of the argument.

CSLS should be fixed to iterate "this" and check contains() on its argument. It might be able to inherit from AbstractSet once JDK-6394757 is fixed.
Comments
I didn't make an effort to fix this back in 2005 because there's no solution without drawbacks and any change was likely to be rejected due to Peak Compatibility Era. There's no way to fix existing third party implementations, so there's no good way to make the spec completely precise - we can't promise to call contains() on the argument collection. It would be good for the JDK itself to offer predictable behavior, clearly documented, and for any collection where we are potentially less efficient, we should document that clearly. An alternative is to document that the two collections should have consistent containment semantics. In practice, users often have better knowledge of semantics and performance, and should be encouraged to write their own methods. Maybe CSLS should have a custom removeIf implementation that can be reused for removeAll and retainAll as in other collection classes
14-02-2019

Also note that ConcurrentHashMap's CollectionView nested class has the size-based heuristic. This should probably be removed too. Let me know how you want to proceed with this: as part of this bug, as part of JDK-6394757, or separately.
14-02-2019

See email threads starting here: http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058140.html http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058378.html in particular this one from me: http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058539.html where I conclude that given this.removeAll(arg), one must iterate "this" and call arg.contains() on each element. Do you agree?
14-02-2019

Maybe Doug can give some history here. We've done some fiddling recently with removeAll/retainAll/removeIf methods in j.u.c. but generally to unify them, and not in ConcurrentSkipList* If/when we agree on a policy for all collection class implementations, we should apply and test uniformly.
14-02-2019

I agree that CSLS could just use AbstractSet version when fixed. I don't think there's an especially better alternative.
14-02-2019