JDK-6792400 : Avoid loading of Normalizer resources for simple uses
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.text
  • Affected Version: 6
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2009-01-11
  • Updated: 2010-07-29
  • Resolved: 2009-02-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
6u14 b02Fixed 7Fixed
Attempt to use Normalizer.normalize() on simple ASCII strings should not cause loading and parsing of unorm.icu. This file is over 100k and initialization is not cheap. It also requires opening and reading directory of resource.jar file.

In many situations Normalizer object is used to "do it safe" while it has no impact on result of execution of the program. E.g. Normalizer is used get canonical names of certificates but many names are ASCII and do not need to be normalized!

Unfortunatelly, some of these "uses" of Normalizer happen on startup of application, e.g. in webstart code 
performs verification of certificates and triggers full initialization of Normalizer. 

Perhaps it can be possible to do simple check of whether request for normalization is trivial before performing full normalization for first time. See suggested fix for possible solution.

EVALUATION Changed NormalizerBase.java to handle ASCII-only text special no matter if the initialization is complete. Performance tests show it's now 20 times faster to process ASCII-only text while the overhead is negligible for non-ASCII text.

SUGGESTED FIX diff -r e811a72bfbf4 addon/sun/text/normalizer/NormalizerBase.java --- a/addon/sun/text/normalizer/NormalizerBase.java Sun Jan 11 20:32:53 2009 +0300 +++ b/addon/sun/text/normalizer/NormalizerBase.java Sun Jan 11 21:19:47 2009 +0300 @@ -699,11 +699,26 @@ public final class NormalizerBase implem * @param options The normalization options, ORed together (0 for no options). * @return String The decomposed string * @stable ICU 2.6 - */ + */ + private static boolean stillLazy = true; + public static String decompose(String str, boolean compat, int options) { int[] trailCC = new int[1]; int destSize=0; + + if (stillLazy) { + char c[] = str.toCharArray(); + for(int i=0;i<c.length;i++) { + if (c[i] > 127) { + stillLazy = false; + break; + } + } + if (stillLazy) + return str; + } + UnicodeSet nx = NormalizerImpl.getNX(options); char[] dest;