A DESCRIPTION OF THE PROBLEM :
Hi, the operator '^' (negation in a character classes) seems not to work.
I provide a source code example where his behavior is totally different in Java 8 and Java 11
REGRESSION : Last worked in version 8u191
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Encoding : UTF-8
The output is ooerdqK$Fop22{78ae������������
ACTUAL -
Encoding : UTF-8
The output is ooerdqKFop22{78ae
---------- BEGIN SOURCE ----------
import java.text.Normalizer;
import java.util.regex.Pattern;
/**
*
* @author Andres Bel Alonso
*/
public class BugExample {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
// need UTF-8 encoding, ensure it
System.out.println("Encoding : " + System.getProperty("file.encoding"));
// I want to change this input ir order to delete the non ascii characters and non combining diacritical marks
// but keep ������ and $
String input = "oo����er����������dqK$F����o����p����2������2����{78a������������e������������";
String str = Normalizer.normalize(input, Normalizer.Form.NFD);
Pattern pattern = Pattern.compile("[^\\p{ASCII}&&[^\\p{InCombiningDiacriticalMarks}]&&[^������$]]");
// I make me clean string
String out = pattern.matcher(str).replaceAll("");
// Java 8 ouput : ooerdqK$Fop22{78ae������������
// Java 11 ouput : ooerdqKFop22{78ae
// java 11 output does not complain because it cleans the characters i wanted to keep. Java 8 output is ok
System.out.println("The output is " + out);
// Finally, using the regex [\\P{ASCII}&&[\\P{InCombiningDiacriticalMarks}]&&[^������$]] works good in java 11
}
}
---------- END SOURCE ----------
FREQUENCY : always