CSR :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
JDK-8216391 :
|
> >> I have been looking into the definition of [character set] > >> expressions in Java regular expressions, to understand what needs to > >> be done to make ICU be compatible, or more compatible at least. > >> > >> There does not appear to be any formal definition for [set > >> expressions], or at least not that I can find. > >> > >> Trying tests, one aspect of the behavior seems really odd. It would > >> be good if we could find out from Sun whether it was really intended > >> to work the way that it does. > >> > >> The question concerns the negation of a set, > >> [^0-9], to get everything except for the ASCII digits, for example. > >> > >> In Java, the negation does _not_ apply to anything appearing in > >> nested [brackets] > >> > >> So [^c] does not match "c", as you would expect. > >> [^[c]] does match "c". Not what I would expect. > >> [[^c]] does not match "c" > >> > >> The same holds true for ranges or property expressions - if they're > >> inside brackets, a negation at an out level does not affect them. > >> > >> [^a-z] is opposite from [^[a-z]] > >> > >> And the same seems to hold for set expressions with &&, although the > >> cases become hard to understand. > >> > >> Perl and Posix behavior doesn't provide any guidance here, as they do > >> not support nested brackets at all - a '[' is not special within a > >> set, and just becomes yet another member of the set.
|