JDK-8354659 : PathMatcher doesn't match path with specific unicode symbol
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.regex
  • Affected Version: 11,17,24,25
  • Priority: P4
  • Status: Closed
  • Resolution: Not an Issue
  • OS: os_x
  • CPU: generic
  • Submitted: 2025-04-14
  • Updated: 2025-05-21
  • Resolved: 2025-05-21
Related Reports
Relates :  
Description
ADDITIONAL SYSTEM INFORMATION :
macOS Monterey

A DESCRIPTION OF THE PROBLEM :
It seems Java's glob path matcher isn't matching some unicode symbols.

Note that this does return true on Windows and Linux

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :

---------- BEGIN SOURCE ----------
$ jshell
|  Welcome to JShell -- Version 24
|  For an introduction type: /help intro

jshell>  var matcher = java.nio.file.FileSystems.getDefault().getPathMatcher("glob:*.*")
   ...>
matcher ==> sun.nio.fs.UnixFileSystem$1@2328c243

jshell>  matcher.matches(new java.io.File("Article.md").toPath())
   ...>
$2 ==> true

jshell>  matcher.matches(new java.io.File("🗞️ Article.md").toPath())
   ...>
$3 ==> false
---------- END SOURCE ----------

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
matcher.matches(new java.io.File("🗞️ Article.md").toPath()) returns true

ACTUAL -
matcher.matches(new java.io.File("🗞️ Article.md").toPath()) returns false



Comments
The problem as I see it is that for some reason the 🗞️ glyph in this example has a variation selector included with it. This doesn't seem to change the appearance of the glyph, but it does make it incompatible with the regular expression matcher as it turns it into an extended grapheme cluster instead of a regular grapheme. The character classes generated from the glob:*.* pattern do not support grapheme clusters. You can observe this with: jshell> var matcher = java.nio.file.FileSystems.getDefault().getPathMatcher("glob:*.*") jshell> String s = "🗞️" s ==> "🗞️" jshell> s.length() $35 ==> 3 jshell> s.substring(0,2); $36 ==> "🗞" jshell> var file2 = new java.io.File("🗞 Article.md"); file2 ==> 🗞 Article.md jshell> matcher.matches(file2.toPath()); $38 ==> true
21-05-2025

Additional Information from submitter: ====================================== Another issue has been made from the same incident: https://bugs.openjdk.org/browse/JDK-8354490
16-04-2025

The observations on macOS: JDK 11: Failed, matcher.matches(new java.io.File("🗞️ Article.md").toPath()) returns false JDK 17: Failed. JDK 24: Failed. JDK 25ea+13: Failed.
15-04-2025