JDK-8180409 : TreeSet removeAll inconsistent behaviour with String.CASE_INSENSITIVE_ORDER
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util:collections
  • Affected Version: 8u131
  • Priority: P4
  • Status: Closed
  • Resolution: Duplicate
  • Submitted: 2017-05-11
  • Updated: 2021-01-06
  • Resolved: 2021-01-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdResolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7601]

A DESCRIPTION OF THE PROBLEM :
The TreeSet removeAll method seems to remove elements correctly with a custom comparator (String.CASE_INSENSITIVE_ORDER)  when the number of elements in the Set is more than the number of elements provided as an argument to removeAll.

Consider the following snippet:

import java.util.*;

public class Test {
        public static void main(String[] args) {
                final Set<String> names = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
                names.addAll(Arrays.asList("jack", "john", "mark"));

                System.out.println("NAMES=" + names);

                names.removeAll(Arrays.asList("JACK"));
                System.out.println("After removing JACK=" + names);

                System.out.printf("JOHN %s in names%n", names.contains("JOHN") ? "exists" : "does not exist");
                names.removeAll(Arrays.asList("PAUL", "CINDY", "SARAH", "JOHN"));
                System.out.println("After removing JOHN=" + names);
        }
}

Why was JACK removed from the initial set : NAMES=[jack, john, mark] while in the next invocation john was NOT removed from the above TreeSet ?

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the following program:

import java.util.*;

public class Test {
        public static void main(String[] args) {
                final Set<String> names = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
                names.addAll(Arrays.asList("jack", "john", "mark"));

                System.out.println("NAMES=" + names);

                names.removeAll(Arrays.asList("JACK"));
                System.out.println("After removing JACK=" + names);

                System.out.printf("JOHN %s in names%n", names.contains("JOHN") ? "exists" : "does not exist");
                names.removeAll(Arrays.asList("PAUL", "CINDY", "SARAH", "JOHN"));
                System.out.println("After removing JOHN=" + names);
        }
}

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
NAMES=[jack, john, mark]
After removing JACK=[john, mark]
JOHN exists in names
After removing JOHN=[mark]
ACTUAL -
NAMES=[jack, john, mark]
After removing JACK=[john, mark]
JOHN exists in names
After removing JOHN=[john, mark]

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.util.*;

public class Test {
        public static void main(String[] args) {
                final Set<String> names = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
                names.addAll(Arrays.asList("jack", "john", "mark"));

                System.out.println("NAMES=" + names);

                names.removeAll(Arrays.asList("JACK"));
                System.out.println("After removing JACK=" + names);

                System.out.printf("JOHN %s in names%n", names.contains("JOHN") ? "exists" : "does not exist");
                names.removeAll(Arrays.asList("PAUL", "CINDY", "SARAH", "JOHN"));
                System.out.println("After removing JOHN=" + names);
        }
}
---------- END SOURCE ----------


Comments
This is indeed a duplicate of JDK-6394757. AbstractSet.removeAll (inherited by TreeSet) will use the relative sizes to determine whether to iterate this set or to iterate the argument. If the semantics of contains() differs between this set and the argument, this may give different results depending on the relative sizes. This will be fixed by JDK-6394757, so closing as a duplicate. In my comment above about "But that seems exactly backward" it might seem more intuitive to iterate the argument and call remove() on this set. However, the specifications of removeAll and retainAll in Collection are carefully worded to imply iterating this collection, calling contains() on the argument, and removing the current element if contains() returned true. This is so that removeAll and retainAll are complements of each other. It also ensures that if multiple elements in this set match one in the argument collection, then all are removed.
06-01-2021

This is closely related to, if not a duplicate of, JDK-6394757.
16-05-2017

Reopening. I think this is a bug. At least, it's seriously surprising behavior from TreeSet. The analysis of AbstractSet.removeAll is correct. However, the choice of iterating the smaller set as an optimization that AbstractSet makes assumes that the contains() operation is symmetric. In the case of TreeSet.removeAll(Collection), that's not the case, since the contains() operation of TreeSet is potentially different from the contains() operation of the collection. Simplified example: TreeSet<String> orig = new TreeSet<>(String.CASE_INSENSITIVE_ORDER); orig.addAll(Arrays.asList("a", "b", "c")); TreeSet<String> tree1 = new TreeSet<>(orig) ==> tree1 is [a, b, c] tree1.removeAll(Arrays.asList("A", "B")) ==> returns true ==> tree1 is [c] TreeSet<String> tree2 = new TreeSet<>(orig); tree2.removeAll(Arrays.asList("A", "B", "C", "D")); ==> returns false ==> tree2 is [a, b, c] The general contract of removeAll is stated in the Set.removeAll() method: ��Removes from this set all of its elements that are contained in the specified collection�� This overrides, but modifies only slightly the semantics of the Collection.removeAll() method: ��Removes all of this collection's elements that are also contained in the specified collection�� Adding irrelevant elements to the collection that's the argument of removeAll() shouldn't change the outcome, so this clearly a violation of the contract of Set.removeAll and Collection.removeAll. Looking closely at the wording, both specs talk about removing elements from this set that are "contained in the specified collection." This would seem to imply that this set is iterated and the contains() operation is performed on the Collection argument. But that seems exactly backward, since we are operating on THIS set, and this set has been provided special semantics via the comparator passed at construction time. So it would seem more sensible to iterate the Collection, and for every element e, to perform this.remove(e). Note: that's what ConcurrentSkipListSet does. This would entail revising the SortedSet specification to make it clear that the semantics of this set are used, and TreeSet should override removeAll instead of inheriting it from AbstractSet. Similar analysis and modifications should be made to SortedMap and TreeMap, and also to the retainAll() and containsAll() bulk operations for both sets and maps. Workaround: use forEach on the collection argument. For example, Arrays.asList("A", "B", "C", "D").forEach(tree::remove)
16-05-2017

The API documentation at http://docs.oracle.com/javase/8/docs/api/java/util/SortedSet.html states : "Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface." Here, the comparator orders by ignoring the case, so compare("john","JOHN") == 0 . However, "john".equals("JOHN") is false. Hence ordering is not consistent with equals. The API documentation of AbstractSet removeAll method (http://docs.oracle.com/javase/8/docs/api/java/util/AbstractSet.html#removeAll-java.util.Collection- ) also provides the details of the implementation , which explains why "jack" was removed from the Set while "john" was not. "This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method." The collection with the "JACK" element is smaller than the set names, so it's elements are just removed by calling the set's remove method. However, the collection with the "JOHN" element is larger than the set names, so it is checked if each element of the set is contained in the backing array ( using array[i].equals(<set element>) ) . "john" is not equals to "JOHN", hence it is not removed.
16-05-2017