JDK-8057941 : Xml document validator partly accepts UTF lexical presentation of digit and words
  • Type: Bug
  • Component: xml
  • Sub-Component: jaxp
  • Affected Version: 8,9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2011-07-27
  • Updated: 2015-12-16
  • Resolved: 2015-12-16
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9Fixed
Related Reports
Relates :  
Description
Since the original CR is only partially fixed, I thought it's probably better to handle the invalid JCK tests separately from the original CR so that we could leave it as fixed in JAXP.

New result shows that reS21 still failed. reS21 is a negative test that tests 𝟎 is NOT a digit. However, Character.isDigit does return true for 1D7CE which is 'MATHEMATICAL BOLD DIGIT ZERO'.

Similar to the above, reS42 tests that 𝟿 is NOT a digit. But 1D7FF, MATHEMATICAL MONOSPACE DIGIT NINE, is indeed a digit.

Both reS21 and reS42 are invalid tests. reT21, reT42 are actually opposite tests, and passing. These tests cannot possibly coexist.


Among the negative tests, reV16 - reV24, reV27 - reV43 are invalid. See below for more details.


<!--reV10--> <elem>&#x2B0;</elem>
<!--reV11--> <elem>&#x2B0;</elem>
<!--reV12--> <elem>&#xFF9F;</elem>
<!--reV15--> <elem>&#x2FA1D;</elem>
<!--reV16--> <!--elem>&#x64B;</elem  064b is ARABIC FATHATAN, not a letter according to Character.isLetter, the current range \u0641\u064a (Arabic letters) is correct-->
<!--reV17--> <!-- elem>&#x1D1AD;</elem MUSICAL SYMBOL COMBINING SNAP PIZZICATO, is not a letter-->
<!--reV18--> <!-- elem>&#x903;</elem  'DEVANAGARI SIGN VISARGA' , not a letter -->
<!--reV19--> <!-- elem>&#x1D172;</elem 'MUSICAL SYMBOL COMBINING FLAG-5', not a letter -->
<!--reV20--> <!-- elem>&#x903;</elem -->
<!--reV21--> <!-- elem>&#x1D172;</elem -->
<!--reV22 elem text--> <!-- elem>&#x20DD;</elem 'COMBINING ENCLOSING CIRCLE' , not a letter -->
<!--reV23 attribute--> <!--elem>&#x20DD;</elem-->
<!--reV24--> <!-- elem>&#x20E2;</elem 'COMBINING ENCLOSING SCREEN' , not a letter -->
<!--reV26--> <elem>&#x1D7FF;</elem> <!-- 1D7FF 'MATHEMATICAL MONOSPACE DIGIT NINE', added to digit range -->
<!--reV27--> <!-- elem>&#x1034A;</elem 'GOTHIC LETTER NINE HUNDRED', not a letter -->
<!--reV28--> <!--elem>&#x1034A;</elem-->
<!--reV30--> <!-- elem>&#xB2;</elem 'SUPERSCRIPT TWO', not a letter -->
<!--reV31--> <!-- elem>&#xB2;</elem-->
<!--reV32--> <!-- elem>&#x10323;</elem OLD ITALIC NUMERAL FIFTY, not a letter. In fact, none of the OLD ITALIC NUMERALs are considered letter -->
<!--reV33--> <!-- elem>&#x2044;</elem 'FRACTION SLASH' , not a letter -->
<!--reV34--> <!-- elem>&#xFFE2;</elem 'FULLWIDTH NOT SIGN', not a letter -->
<!--reV35--> <!-- elem>&#x20A0;</elem 'EURO-CURRENCY SIGN', not a letter -->
<!--reV36--> <!-- elem>&#x20A0;</elem -->
<!--reV37--> <!-- elem>&#xFFE6;</elem 'FULLWIDTH WON SIGN' , not a letter -->
<!--reV38--> <!-- elem>&#x309B;</elem 'KATAKANA-HIRAGANA VOICED SOUND MARK', not a letter -->
<!--reV39--> <!-- elem>&#x309B;</elem -->
<!--reV40--> <!-- elem>&#xFFE3;</elem 'FULLWIDTH MACRON', not a letter -->
<!--reV41--> <!-- elem>&#x3190;</elem 'IDEOGRAPHIC ANNOTATION LINKING MARK', not a letter -->
<!--reV42--> <!-- elem>&#x3190;</elem-->
<!--reV43--> <!-- elem>&#x1D1DD;</elem 'MUSICAL SYMBOL PES SUBPUNCTIS', not a letter -->
<!--reV3--> <elem>&#x1D7A8;</elem>
<!--reV6--> <elem>&#x1D7C9;</elem>
<!--reV7--> <elem>&#x1C5;</elem>
<!--reV8--> <elem>&#x1C5;</elem>
The following tests from CR 6971190 also still fail:
xml_schema/msData/regex/jaxp area
Tests: reV3, reV6-reV8, reV10-reV12, reV15, reV26

Comments
I verified using a current JDK 9 build that these tests, reV16 - 43 with the exclusion of 25, 26 and 29, now pass.
16-12-2015

I believe no. Sorry missed it when moving to JDK; changing it now to unassigned.
14-01-2015

JDK version: JDK-9 (Jigsaw) Adding the new ruleset of Aurora: RULE xml_schema/msData/regex/jaxp/reV16_reV16.i any any RULE xml_schema/msData/regex/jaxp/reV17_reV17.i any any RULE xml_schema/msData/regex/jaxp/reV18_reV18.i any any RULE xml_schema/msData/regex/jaxp/reV19_reV19.i any any RULE xml_schema/msData/regex/jaxp/reV20_reV20.i any any RULE xml_schema/msData/regex/jaxp/reV21_reV21.i any any RULE xml_schema/msData/regex/jaxp/reV22_reV22.i any any RULE xml_schema/msData/regex/jaxp/reV23_reV23.i any any RULE xml_schema/msData/regex/jaxp/reV24_reV24.i any any RULE xml_schema/msData/regex/jaxp/reV27_reV27.i any any RULE xml_schema/msData/regex/jaxp/reV28_reV28.i any any RULE xml_schema/msData/regex/jaxp/reV30_reV30.i any any RULE xml_schema/msData/regex/jaxp/reV31_reV31.i any any RULE xml_schema/msData/regex/jaxp/reV32_reV32.i any any RULE xml_schema/msData/regex/jaxp/reV33_reV33.i any any RULE xml_schema/msData/regex/jaxp/reV34_reV34.i any any RULE xml_schema/msData/regex/jaxp/reV35_reV35.i any any RULE xml_schema/msData/regex/jaxp/reV36_reV36.i any any RULE xml_schema/msData/regex/jaxp/reV37_reV37.i any any RULE xml_schema/msData/regex/jaxp/reV38_reV38.i any any RULE xml_schema/msData/regex/jaxp/reV39_reV39.i any any RULE xml_schema/msData/regex/jaxp/reV40_reV40.i any any RULE xml_schema/msData/regex/jaxp/reV41_reV41.i any any RULE xml_schema/msData/regex/jaxp/reV42_reV42.i any any RULE xml_schema/msData/regex/jaxp/reV43_reV43.i any any
27-11-2014

The issue is moved to JDK. Following tests still fail on JDK8b129, JDK8b132, JDK9b28: xml_schema/msData/regex/jaxp/reV16.html#reV16.i xml_schema/msData/regex/jaxp/reV17.html#reV17.i xml_schema/msData/regex/jaxp/reV18.html#reV18.i xml_schema/msData/regex/jaxp/reV19.html#reV19.i xml_schema/msData/regex/jaxp/reV20.html#reV20.i xml_schema/msData/regex/jaxp/reV21.html#reV21.i xml_schema/msData/regex/jaxp/reV22.html#reV22.i xml_schema/msData/regex/jaxp/reV23.html#reV23.i xml_schema/msData/regex/jaxp/reV24.html#reV24.i xml_schema/msData/regex/jaxp/reV27.html#reV27.i xml_schema/msData/regex/jaxp/reV28.html#reV28.i xml_schema/msData/regex/jaxp/reV30.html#reV30.i xml_schema/msData/regex/jaxp/reV31.html#reV31.i xml_schema/msData/regex/jaxp/reV32.html#reV32.i xml_schema/msData/regex/jaxp/reV33.html#reV33.i xml_schema/msData/regex/jaxp/reV34.html#reV34.i xml_schema/msData/regex/jaxp/reV35.html#reV35.i xml_schema/msData/regex/jaxp/reV36.html#reV36.i xml_schema/msData/regex/jaxp/reV37.html#reV37.i xml_schema/msData/regex/jaxp/reV38.html#reV38.i xml_schema/msData/regex/jaxp/reV39.html#reV39.i xml_schema/msData/regex/jaxp/reV40.html#reV40.i xml_schema/msData/regex/jaxp/reV41.html#reV41.i xml_schema/msData/regex/jaxp/reV42.html#reV42.i xml_schema/msData/regex/jaxp/reV43.html#reV43.i The minimized test is attached; some tests that pass are added to the minimized cases as valid samples. All the tests are negative so each of them should produce errors. In order to start minimized test: 1. Unzip archive; 2. Compile Test3.java; 3. Execute Test3 class passing full path to Test3Files as first argument.
09-09-2014

Regarding reV16 - reV24, reV27 - reV43. All these tests are negative verifying regex multi-character escape \W. W3C specification [1] states: [37] MultiCharEsc ::= '\' [sSiIcCdDwW] \w [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}] (all characters except the set of "punctuation", "separator" and "other" characters) \W [^\w] So \W accepts "punctuation", "separator" and "other" characters that is if character belong to any of the P, Z or C Unicode General Categories it is accepted by \W and it isn't accepted otherwise. According to [2] or [3] and as it's specified by the issue description characters in reV16 - reV24, reV27 - reV43 don't belong to P, Z or C Unicode Category. Therefore they shouldn't be accepted by \W and negative tests reV16 - reV24, reV27 - reV43 are valid. I'm afraid I don't quite understand why Character.isLetter is referred to as judging criteria in the issue description. So it seems to be JDK bug. [1] http://www.w3.org/TR/xmlschema-2/ [2] http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt [3] http://www.unicode.org/Public/6.2.0/ucd/UnicodeData.txt
14-08-2014

for correct bug linkage in aurora, affected tests: xml_schema/msData/regex/jaxp/reV16_reV16.i xml_schema/msData/regex/jaxp/reV17_reV17.i xml_schema/msData/regex/jaxp/reV18_reV18.i xml_schema/msData/regex/jaxp/reV19_reV19.i xml_schema/msData/regex/jaxp/reV20_reV20.i xml_schema/msData/regex/jaxp/reV21_reV21.i xml_schema/msData/regex/jaxp/reV22_reV22.i xml_schema/msData/regex/jaxp/reV23_reV23.i xml_schema/msData/regex/jaxp/reV24_reV24.i xml_schema/msData/regex/jaxp/reV27_reV27.i xml_schema/msData/regex/jaxp/reV28_reV28.i xml_schema/msData/regex/jaxp/reV29_reV29.i xml_schema/msData/regex/jaxp/reV30_reV30.i xml_schema/msData/regex/jaxp/reV31_reV31.i xml_schema/msData/regex/jaxp/reV32_reV32.i xml_schema/msData/regex/jaxp/reV33_reV33.i xml_schema/msData/regex/jaxp/reV34_reV34.i xml_schema/msData/regex/jaxp/reV35_reV35.i xml_schema/msData/regex/jaxp/reV36_reV36.i xml_schema/msData/regex/jaxp/reV37_reV37.i xml_schema/msData/regex/jaxp/reV38_reV38.i xml_schema/msData/regex/jaxp/reV39_reV39.i xml_schema/msData/regex/jaxp/reV40_reV40.i xml_schema/msData/regex/jaxp/reV41_reV41.i xml_schema/msData/regex/jaxp/reV42_reV42.i xml_schema/msData/regex/jaxp/reV43_reV43.i
28-03-2013

EVALUATION Review excluded tests and correct validity if needed
15-11-2011