JDK-6564400 : setIgnoringContentElementWhitespace doesn't work with XSD (did in 1.5)
  • Type: Bug
  • Component: xml
  • Sub-Component: jaxp
  • Affected Version: 6
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2007-06-01
  • Updated: 2012-06-09
  • Resolved: 2007-07-13
Related Reports
Duplicate :  
Relates :  
java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
Java HotSpot(TM) Server VM (build 1.6.0_01-b06, mixed mode)

Linux 2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:17:21 EST 2007 i686 i686 i386 GNU/Linux

In 1.5, when validating an xml document against a schema, if DocumentBuilderFactory.setIgnoringElementContentWhitespace(true) was used the parser would remove all of the extraneous whitespace.  However, in 1.6 it no longer works like this (though setIgnoringComments() still works).  The following code run under 1.5 and 1.6 provide very different results:

File schemaFile = new File("test.xsd");
// Now attempt to load up the schema
Schema schema = null;
SchemaFactory schFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
schema = schFactory.newSchema(schemaFile);
File xmlFile = new File("test.xml");
// Set the options on the DocumentFactory to remove comments, remove whitespace
// and validate against the schema.
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = docFactory.newDocumentBuilder();
Document xmlDoc = parser.parse(xmlFile);

Here is the sample XML:

        <FirstName>Doofus</FirstName><!-- MONKEY -->

Here is the sample schema:

<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>
  <xsd:element name='Person' type='PersonType'/>
  <xsd:complexType name='PersonType'>
                        <xsd:element name='FirstName' type='xsd:string'/>
                        <xsd:element name='LastName' type='xsd:string'/>

In 1.5, it correctly removes the extraneous whitespace nodes and in 1.6 it does not. Java 1.5 gives me:

NODE: Person   TYPE: element   VALUE:
  NODE: FirstName   TYPE: element   VALUE:
    NODE: #text   TYPE: text   VALUE: Doofus
  NODE: LastName   TYPE: element   VALUE:
    NODE: #text   TYPE: text   VALUE: McGee

Java 1.6 gives me:

NODE: Person   TYPE: element   VALUE:
  NODE: #text   TYPE: text   VALUE:
  NODE: FirstName   TYPE: element   VALUE:
    NODE: #text   TYPE: text   VALUE: Doofus
  NODE: #text   TYPE: text   VALUE:
  NODE: LastName   TYPE: element   VALUE:
    NODE: #text   TYPE: text   VALUE: McGee
  NODE: #text   TYPE: text   VALUE:

Both versions correctly remove the comments but only 1.5 removes the whitespace.

This bug can be reproduced always.

Release Regression From : 5.0u11
The above release value was the last known release where this 
bug was not reproducible. Since then there has been a regression.

EVALUATION I've checked in a fix that corrects the problem. Please see CR6545684 for more details.

EVALUATION I can't explain why the behavior changed from 1.5 to 1.6, but I'm afraid I think the current behavior is the correct behavior. The problem is that "element content whitespace" is true if and only if the infoset [element content whitespace] property is true. When DTD validation occurs, the parser sets the element content whitespace property to true on whitespace that it encounters in element content. Unfortunately, schema validation is not defined to change the [element content whitespace] infoset property. Nor does it provide an equivalent property (at least not that I recall). I'm concerned that this is a regression, but I don't know what to do about it.