Bug ID: JDK-8074297 substring in XSLT returns wrong character if string contains supplementary chars

Type: Bug
Component: xml
Sub-Component: jaxp
Affected Version: 8u31

Priority: P3
Status: Closed
Resolution: Fixed

Submitted: 2015-03-03
Updated: 2016-08-24
Resolved: 2015-03-30

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

JDK 6	JDK 7	JDK 8	JDK 9	Other
6u101Fixed	7u85Fixed	8u51Fixed	9 b59Fixed	openjdk7uFixed

substring() in XSLT stylesheet returns wrong character when string contains 
UNICODE's supplementary characters.

For example, an UNICODE supplementary character, '&#131083;' (codepoint is 
U+2000B), is one length character but substring('&#131083;ABC', 3, 1)  
returns the second character, 'A', although the third character, 'B', is 
expected to be returned.

A similar issue existed in string-length() and it was fixed in JDK-8032909.
So, string-length('&#131083;') returns 1 as the supplementary character's 
length correctly on jdk with the fix of JDK-8032909 although 
string-length('&#131083;')  returned 2 wrongly before the bug is fixed.