JDK-6971190 : Xml document validator partly accepts UTF lexical presentation of digit and words
  • Type: Bug
  • Component: xml
  • Sub-Component: javax.xml.validation
  • Affected Version: 7
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2010-07-22
  • Updated: 2019-05-13
  • Resolved: 2012-03-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other JDK 8
1.4.0 1.4Fixed 8Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Description
The validator uses the following schema:

<xsd:element name="doc">
    <xsd:complexType>
            <xsd:choice>
                <xsd:element name="elem" type="Regex" minOccurs="1" maxOccurs="unbounded"/>
            </xsd:choice>
    </xsd:complexType>
</xsd:element>

<xsd:complexType name="Regex">
   <xsd:attribute name="att">
       <xsd:simpleType>
           <xsd:restriction base="xsd:string">
               <xsd:pattern value="\d"/>    
           </xsd:restriction>
       </xsd:simpleType>
   </xsd:attribute>
</xsd:complexType>   

and the xml document:

<doc  xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
    xsi:noNamespaceSchemaLocation='reS17.xsd' >

<!-- 
base='string', pattern='\d', value='#x1369;', type='valid', RULE='37'
-->

      <elem att='&#x1369;'/>

</doc>

If value of the att attribute has a lexical presentation of digit starting with 0, the validator accept such xml document as valid.
There is a set of tested UTF symbols which validator accepts    
U+0C66  	TELUGU DIGIT ZERO
U+0CE6  	KANNADA DIGIT ZERO
U+0D66  	MALAYALAM DIGIT ZERO
U+0E50  	THAI DIGIT ZERO
U+0ED0  	LAO DIGIT ZERO
U+0F20  	TIBETAN DIGIT ZERO

If the document has following digit presentation:
U+1040  	MYANMAR DIGIT ZERO
U+1369  	ETHIOPIC DIGIT ONE
U+17E0  	KHMER DIGIT ZERO
U+1810  	MONGOLIAN DIGIT ZERO
U+FF10  	FULLWIDTH DIGIT ZERO
U+1049  	MYANMAR DIGIT NINE
U+1371  	ETHIOPIC DIGIT NINE
U+17E9  	KHMER DIGIT NINE

The validator fails with exception:
SAX error: file:/devel/analysis/reS17.xml(9,29): cvc-pattern-valid: Value '��' is not facet-valid with respect to pattern '\d' for type '#AnonType_attRegex'.

Comments
EVALUATION Add just enough ranges to cover what are requested by the jck tests. Since the likelyhood more language support are requested is low, I'll leave out the other blocks just as the original implementation did.
12-11-2010