JDK-6529800 : (coll) ArrayList.removeAll should be O(n), but is O(n*n)
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util:collections
  • Affected Version: 6
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2007-03-01
  • Updated: 2012-10-08
  • Resolved: 2011-05-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7
7 b15Fixed
Description
A DESCRIPTION OF THE REQUEST :
ArrayList.removeAll runs in worst case O(n^2) time. The problem with ArrayList.removeAll was discovered in Sun's Java 6 SDK. ArrayList does not directly implement the removeAll method: rather, it is implemented by its superclass AbstractCollection. The listing below shows the implementation of that method.

 1  public boolean removeAll(Collection c) {
 2    boolean modified = false;
 3    Iterator e = iterator();
 4    while (e.hasNext()) {
 5      if (c.contains(e.next())) {
 6        e.remove();
 7        modified = true;
 8      }
 9    }
10    return modified;
11  }

This raises red flags because remove operations on array lists are O(n) operations, so while the code above is optimal for a linked list, it is definitely not for an array list. In particular, at every iteration in the loop, it is possible to trigger a remove, hence the worst case performance is O(n2).

JUSTIFICATION :
There is no reason why removeAll can't have O(n) worst-case performance.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
removeAll should have worst case O(n) behavior, disregarding the behavior of the call to "contains"
ACTUAL -
removeAll has worst case O(n^2) behavior, assuming O(1) behavior for the "contains" invocation.

---------- BEGIN SOURCE ----------
http://www.ahmadsoft.org/code/arraytest.zip
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
See http://www.ahmadsoft.org/articles/removeall/index.html for two O(n) implementations as well as benchmarks and analysis.

Comments
EVALUATION Amin Ahmad has put up a beautiful web page analysing this problem: http://www.ahmadsoft.org/articles/removeall/index.html I think we should go with Amin's solution #1.
2007-03-01