A set of double-byte charsets in ExtendedCharsets package (sun.nio.cs.ext) has the
following similar implementation model
public class CharsetXYZ extends Charset {
...
public String getDecoderIndex2() {
return Decoder.index2;
}
public String getEncoderIndex2() {
return Encoder.Index2;
}
private static class Decoder extends XYZDecoder {
private final static String index2 = "HUGE STRING CONSTANT 1";
...
}
private static class Encoder extends XYZEncoder {
private final static String index2 = "HUGE STRING CONSTANT 2";
...
}
...
}
The getDecoderIndex2() and getCoderIndex2() are utility methods used to share
the huge String data with corresponding converter implementation in sun.io package,
they are supposed to save the space for both runtime and static storage (in jar file)
when two implementations (sun.nio.cs.ext and sun.io) share the same data. However,
the above implemention model has a loophole that total fails the expectation, since
the De/Encoder.index2 is a "final" and "static" String, the javac will make a
copy instead of using the reference of De/Encoder.index2 into CharsetXYZ.class,
the result is the size of the supposedly lightweight class CharsetXYZ.class becomes
unreasonable huge (including two huge copies of De/Encoder.index2).
Below is the list of the charsets in charsets.jar that has the overweight
charset class size.
Either to remove the keyword "final" from the String constant declaration or
to "reorg" the declaration as
private final static String index2;
static {
index2 = "HUGE STRING CONSTNT2"
}
yields a surprising 1.1MB decrease in size out of the 4.6MB charsets.jar.
297551 EUC_TW$Decoder.class
486384 EUC_TW$Encoder.class
468722 EUC_TW.class
47562 IBM1381$Decoder.class
67448 IBM1381$Encoder.class
80838 IBM1381.class
27037 IBM1383$Decoder.class
65805 IBM1383$Encoder.class
76730 IBM1383.class
57450 IBM33722$Decoder.class
113585 IBM33722$Encoder.class
151495 IBM33722.class
45223 IBM930$Decoder.class
77387 IBM930$Encoder.class
98091 IBM930.class
91826 IBM933$Decoder.class
130404 IBM933$Encoder.class
59788 IBM933.class
39135 IBM935$Decoder.class
67545 IBM935$Encoder.class
82161 IBM935.class
72513 IBM937$Decoder.class
85852 IBM937$Encoder.class
142030 IBM937.class
45223 IBM939$Decoder.class
77386 IBM939$Encoder.class
98090 IBM939.class
38683 IBM942$Decoder.class
69778 IBM942$Encoder.class
82680 IBM942.class
17292 IBM942C$Encoder.class
39149 IBM943$Decoder.class
68398 IBM943$Encoder.class
90800 IBM943.class
25827 IBM943C$Encoder.class
74150 IBM948$Decoder.class
85592 IBM948$Encoder.class
141632 IBM948.class
80623 IBM950$Decoder.class
85592 IBM950$Encoder.class
139913 IBM950.class
110046 IBM964$Decoder.class
156843 IBM964$Encoder.class
255762 IBM964.class
27011 IBM970$Decoder.class
121104 IBM970$Encoder.class
77734 IBM970.class