JDK-6483188 : JDK1.5.0_09 Schema validation slow (took 10-20 minutes) with SAP schema
  • Type: Bug
  • Component: xml
  • Sub-Component: javax.xml.validation
  • Affected Version: 1.1
  • Priority: P4
  • Status: Closed
  • Resolution: Cannot Reproduce
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2006-10-17
  • Updated: 2017-04-14
  • Resolved: 2017-04-14
Description
Problem:
NB5.5 EntPack xmltools provides capability to edit schema using its Schema view (a tree or column representation), and it internally calls JDK Schema validator, at certain point when the user is editing xml schema. For the attached schema, the issue is that IDE appears hanging while invoking JDK JAXP schema validator.

Issue:
Schema validator that comes bundled with JDK 1.5.0_09 or JDK1.6beta2 is slow with attached schema (See source code section for the schema)

Steps to reproduce:

1. download attachment, and unzip to "C:\tmp"
2. cd c:\tmp\SchemaValidate
3. java -Xmx256m -classpath .\dist\SchemaValidate.jar;$classpath schemavaliate.Main c:\tmp\SchemaValidate\src\schemavalidate\CREMAS04.xsd

This would take about 20min to validate.

There is no error message, however I have taken the thread dump, below is the relevant information:

The code seems to be hanging at  CMStateSet.java:226

==========================================================
        at com.sun.org.apache.xerces.internal.impl.dtd.models.CMStateSet.union(C
MStateSet.java:226)
        at com.sun.org.apache.xerces.internal.impl.xs.models.XSDFACM.buildDFA(XS
DFACM.java:562)
        at com.sun.org.apache.xerces.internal.impl.xs.models.XSDFACM.<init>(XSDF
ACM.java:182)
        at com.sun.org.apache.xerces.internal.impl.xs.models.CMBuilder.createDFA
CM(CMBuilder.java:132)
        at com.sun.org.apache.xerces.internal.impl.xs.models.CMBuilder.getConten
tModel(CMBuilder.java:92)
        at com.sun.org.apache.xerces.internal.impl.xs.XSComplexTypeDecl.getConte
ntModel(XSComplexTypeDecl.java:153)
        - locked <0x05fe2ef0> (a com.sun.org.apache.xerces.internal.impl.xs.XSCo
mplexTypeDecl)
        at com.sun.org.apache.xerces.internal.impl.xs.XSConstraints.fullSchemaCh
ecking(XSConstraints.java:421)
        at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadGramma
r(XMLSchemaLoader.java:526)
        at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaLoader.loadGramma
r(XMLSchemaLoader.java:485)
        at com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory.n
ewSchema(XMLSchemaFactory.java:206)
        at javax.xml.validation.SchemaFactory.newSchema(SchemaFactory.java:585)
=======================================================
Attaching schema from http://www.netbeans.org/issues/show_bug.cgi?id=87152 for testing purposes
Also, found that validating following following (each) schema would take 3-4 sec.
I would expect each validation should be <<1 sec (in millis rather than seconds).

The issue here is if in our tool (NB5.5 EntPack Schema tool) user (after opening file messages.xsd in IDE), tries to expand messages.xsd|Groups|ALLMESSAGES.CONTENT|choice|ACK element, it will try to open  files "ACK.xsd" "segments.xsd|" fields.xsd" "datatypes.xsd". While opening this file it will also try to validate these schema files. This will cause the IDE to appear slow to user doing certain basic operation in schema tools. You can see we are talking about some 10-15 sec, on top on other UI time spend by the schema tool. This is unacceptable to schema tools user.

To get this schema download attached hl7_v2.3.zip file into c:\tmp\hld.

C:\Documents and Settings\Owner\SchemaValidate>java -Xmx256m -classpath dist\Sch
emaValidate.jar;%classpath% schemavalidate.Main "C:\tmp\hld\messages.xsd"

C:\Documents and Settings\Owner\SchemaValidate>java -Xmx256m -classpath dist\Sch
emaValidate.jar;%classpath% schemavalidate.Main "C:\tmp\hld\ACK.xsd"

C:\Documents and Settings\Owner\SchemaValidate>java -Xmx256m -classpath dist\Sch
emaValidate.jar;%classpath% schemavalidate.Main "C:\tmp\hld\segments.xsd"

C:\Documents and Settings\Owner\SchemaValidate>java -Xmx256m -classpath dist\Sch
emaValidate.jar;%classpath% schemavalidate.Main "C:\tmp\hld\fields.xsd"

C:\Documents and Settings\Owner\SchemaValidate>java -Xmx256m -classpath dist\Sch
emaValidate.jar;%classpath% schemavalidate.Main "C:\tmp\hld\datatypes.xsd"

Comments
No longer reproducible as of JDK 6.0 b104, refer to Santiago's comments above.
14-04-2017

EVALUATION Here is a more detailed evaluation of this bug. JDK 5.0: large values of maxOccurs can cause the VM to run out of memory due to the expansion explained earlier. The workaround of using "unbounded" still applies. The other alternative is to enable "secure processing", in which case a 5000 limit will be imposed for maxOccurs attributes and an error would be reported immediately. This is explained here: http://java.sun.com/developer/technicalArticles/xml/jaxp1-3/#Security Note that valiation won't be carried out, but at least the application won't run out of memory. This is a known limitation that we are not planning on fixing at the moment. JDK 6.0: certain schemas with large values of maxOccurs can be executed in constant space. The latest JDK 6.0 build includes this enhancement. In fact, the schema attached in this report will be process in only a few milliseconds using b104. Note, however, that there are certain schemas with large values of maxOccurs which can still cause the VM to run out of memory unless "secured processing" is enabled, just like in JDK 5.0. I'm removing any references to JDK 6.0, as this bug is not reproducible in b104. Moreover, since schemas with very large maxOcurrs are not that common, I'm lowering the priority of this bug for JDK 5.0. For JDK 5.0, our current recommendation is to use secure processing, or if possible, to upgrade JDK 5.0 to use JAXP 1.4 RI by using the endorsed mechanism.
10-11-2006

WORK AROUND Replace "9999" by "unbounded" in the CREMAS04.xsd schema to ensure validation runs in constant space.
18-10-2006

EVALUATION The problem is that the schema uses very large (9999) but bounded values for maxOccurs in several places. A type expression of the form E{0,N} where 0 is minOccurs and N is maxOccurs is equivalent to, E?, E?, ..., E? (9999 times) and this expression needs to be expanded internally by the validation. It follows that the validation runs in constant space with the respect to N. This is causing the VM to run out of memory. For Java SE 6, we've implemented a new algorithm that runs in constant space when E is an element whose type is a simple datatype (like string, integer, etc.). However, our optimization does not apply in the CREMAS04.xsd schema since E in this case is of the form, (F1?, F2?, ..., F14?) which combined with the expression shown above results in, (F1?, F2?, ..., F14?), (F1?, F2?, ..., F14?), ..., (F1?, F2?, ..., F14?) (9999 times) where F's are all simple element types. Implementing a constant space algorithm to validate this schema is non-trivial. A simple workaround it to change "9999" to "unbounded", which would result in the expression, (F1?, F2?, ..., F14?)*
18-10-2006