(1) Composite characters only "Character Classes" pattern will throw Exception, example below shows the problem. import java.util.regex.*; public class RegTest { public static void main(String args[]) { CharSequence inputStr = "ab\u1f82cd"; String patternStr = "[\u1f80\u1f82]"; Pattern pattern = Pattern.compile(patternStr, Pattern.CANON_EQ); Matcher matcher = pattern.matcher(inputStr); boolean matchFound = matcher.find(); if (matchFound) { System.out.println("<" + Integer.toString(matcher.start()) + "," + Integer.toString(matcher.end()) + "> "); } } } (2) replace the pattern to String patternStr = "\u1f80\u1f82"; also throw exception (3)Pattern "[\u1f80-\u1f82]" will not have match for input string "ab\u1f81cd" in CANONO_EQ mode, though it does catch character \u1f80 and \u1f82. Need to iterate all characters in "Range" and list all their "EquivalentAlternation" in CANONO_EQ mode. import java.util.regex.*; public class RegTest { public static void main(String args[]) { CharSequence inputStr = "ab\u1f81cd"; String patternStr = "[\u1f80-\u1f82]"; Pattern pattern = Pattern.compile(patternStr, Pattern.CANON_EQ); Matcher matcher = pattern.matcher(inputStr); boolean matchFound = matcher.find(); if (matchFound) { System.out.println("<" + Integer.toString(matcher.start()) + "," + Integer.toString(matcher.end()) + "> "); } else { System.out.println("No Match"); } } } (4)Though not critical, but seems like there will be some redundency patterns created by produceEquivalentAlternation() when dealint with multiple combining characters in CANON_EQ mode for example pattern "\u1f80" will create (?: 0x3b1 0x313 0x345 | 0x1f00 0x345 | 0x1f80 | 0x3b1 0x345 0x313 | 0x1fb3 0x313 | 0x1f80) and "\u1f82" will create (?: 0x3b1 0x313 0x300 0x345 | 0x1f00 0x300 0x345 | 0x1f02 0x345 | 0x1f82 | 0x1f00 0x345 0x300 | 0x1f80 0x300 | 0x1f82 | 0x3b1 0x313 0x345 0x300 | 0x1f00 0x345 0x300 | 0x1f80 0x300 | 0x1f82 | 0x1f00 0x300 0x345 | 0x1f02 0x345 | 0x1f82 | 0x3b1 0x345 0x313 0x300 | 0x1fb3 0x313 0x300 | 0x1f80 0x300 | 0x1f82) #space has been added between hexadecimal numbers
|