JDK-6447475 : charsets.jar is at least 1.1MB bigger than it should be.
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 6
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2006-07-11
  • Updated: 2010-08-19
  • Resolved: 2006-07-22
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b93Fixed
Related Reports
Relates :  
Description
A set of double-byte charsets in ExtendedCharsets package (sun.nio.cs.ext) has the
following similar implementation model

public class CharsetXYZ extends Charset {
...
    public String getDecoderIndex2() {
        return Decoder.index2;
    }
    public String getEncoderIndex2() {
        return Encoder.Index2;
    }
    private static class Decoder extends XYZDecoder {
        private final static String index2 = "HUGE STRING CONSTANT 1";
        ...
    }
    private static class Encoder extends XYZEncoder {
        private final static String index2 = "HUGE STRING CONSTANT 2";
        ...
    }
...
}

The getDecoderIndex2() and getCoderIndex2() are utility methods used to share
the huge String data with corresponding converter implementation in sun.io package,
they are supposed to save the space for both runtime and static storage (in jar file)
when two implementations (sun.nio.cs.ext and sun.io) share the same data. However,
the above implemention model has a loophole that total fails the expectation, since
the De/Encoder.index2 is a "final" and "static" String, the javac will make a
copy instead of using the reference of De/Encoder.index2 into CharsetXYZ.class,
the result is the size of the supposedly lightweight class CharsetXYZ.class becomes
unreasonable huge (including two huge copies of De/Encoder.index2).

Below is the list of the charsets in charsets.jar that has the overweight
charset class size.

Either to remove the keyword "final" from the String constant declaration or
to "reorg" the declaration as

private final static String index2;
static {
    index2 = "HUGE STRING CONSTNT2"
}

yields a surprising 1.1MB decrease in size out of the 4.6MB charsets.jar.

   297551  EUC_TW$Decoder.class
   486384  EUC_TW$Encoder.class
   468722  EUC_TW.class

    47562  IBM1381$Decoder.class
    67448  IBM1381$Encoder.class
    80838  IBM1381.class

    27037  IBM1383$Decoder.class
    65805  IBM1383$Encoder.class
    76730  IBM1383.class

    57450  IBM33722$Decoder.class
   113585  IBM33722$Encoder.class
   151495  IBM33722.class

    45223  IBM930$Decoder.class
    77387  IBM930$Encoder.class
    98091  IBM930.class

    91826  IBM933$Decoder.class
   130404  IBM933$Encoder.class
    59788  IBM933.class

    39135  IBM935$Decoder.class
    67545  IBM935$Encoder.class
    82161  IBM935.class

    72513  IBM937$Decoder.class
    85852  IBM937$Encoder.class
   142030  IBM937.class

    45223  IBM939$Decoder.class
    77386  IBM939$Encoder.class
    98090  IBM939.class

    38683  IBM942$Decoder.class
    69778  IBM942$Encoder.class
    82680  IBM942.class
    17292  IBM942C$Encoder.class

    39149  IBM943$Decoder.class
    68398  IBM943$Encoder.class
    90800  IBM943.class
    25827  IBM943C$Encoder.class

    74150  IBM948$Decoder.class
    85592  IBM948$Encoder.class
   141632  IBM948.class

    80623  IBM950$Decoder.class
    85592  IBM950$Encoder.class
   139913  IBM950.class

   110046  IBM964$Decoder.class
   156843  IBM964$Encoder.class
   255762  IBM964.class

    27011  IBM970$Decoder.class
   121104  IBM970$Encoder.class
    77734  IBM970.class

Comments
EVALUATION something worth doing in mustang.
11-07-2006