JDK-8167002 : JAXP schema validator: Use HashSet instead of ArrayList for tracking XML IDs
  • Type: Bug
  • Component: xml
  • Sub-Component: jaxp
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2016-10-01
  • Updated: 2020-01-29
  • Resolved: 2016-10-05
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b140Fixed
Related Reports
Relates :  
Description
A colleague reported:

ValidationState is used to validate XML ID and IDREF elements (among other types). To do so, it keeps data structures containing all IDs and IDREFs that have been seen in a document. The only methods ever called on the ID container are add() and contains(), so substituting HashSet for ArrayList makes no difference in behavior, while improving performance by orders of magnitude in large documents. No change is necessary/possible, however, for the IDREF container.This one is only ever iterated over -- never searched -- and order matters, so ArrayList is appropriate.

On a test document with ~1.5M elements, ~330K IDs, and ~430K IDREFs, this
change speeds up parsing (with validation enabled) by a factor of 26 (from
21.4 minutes, ~800ms/element, to 49 seconds, ~31��s/element).

There are further obvious algorithmic improvements possible for additional
constant-factor gains, but this is simple, safe, and brings schema validation
from O(n^2+mn) to O(n+m), where n is the number of IDs and m is the number of
IDREFs.


Comments
http://cr.openjdk.java.net/~martin/webrevs/openjdk9/xml-id-validation-by-hash/
01-10-2016