Bug ID: JDK-8046101 JEP 111: Additional Unicode Constructs for Regular Expressions

Summary
-------

Adopt further regular-expression constructs from from [Unicode TR#18][tr18].


Motivation
----------

The primary motivation is to enhance/enrich the Unicode support level to allow
developers to write sophisticated Unicode-enabled regular expressions on the
Java platform.  This is important to keep the Java Platform competitive with
other languages that already offer more complete support for Unicode regular
expressions.


Description
-----------

Java Regular Expressions are derived from Perl Regular Expression and are
supposed to provide Java developers most of the Perl style regression
expression features.  Perl Regular Expressions have evolved rapidly in the past
couple years to follow [Unicode Standard TR#18 Unicode Regular Expressions][tr18].  Java Regular Expressions have claimed to be in conformance
with Level 1 of the same Unicode Standard TR#18 Unicode Regular Expressions,
plus RL2.1 Canonical Equivalents, which is the "lowest" level of conformance.
Given that the Unicode Standard has been widely accepted as the de facto
standard for development platforms and Java uses Unicode as its internal
encoding scheme, it appears that higher-level Unicode support is desirable for
developers working on Unicode-aware applications.  The following new constructs
and features are proposed to provide better Unicode support in Java Regular
Expressions:

  - \\N \\{...\\} -- Unicode Name Properties
  - \\X -- Extended Grapheme Clusters
  - Fix the broken Canonical Equivalent support
  - \\R -- Unicode line-break sequence, as suggested at TR#18 Line Boundaries
  - \\g \\{...\\} -- Perl style construct for named capturing group and capturing group
  - More complete Unicode properties, as in \\p \\{IsXXXX\\}
  - \\h \\H \\v \\V -- Horizontal/vertical whitespace


Testing
-------

All the features (new regex constructs) listed here will be covered by the new
unit tests and run by the existing test framework.


[tr18]: http://unicode.org/reports/tr18

Relates :	JDK-7014640 - To add a metachar \R for line ending and character classes for vertical/horizontal ws \v \V \h \H
Relates :	JDK-8147531 - To add named character construct \N{...} to support Unicode name property
Relates :	JDK-7071819 - To support Extended Grapheme Clusters in Regex