While debugging a problem recently I unjarred rt.jar, copied over a few classes, and rejarred it with
jar cv0fm ../rt-new.jar META-INF/MANIFEST.MF com java javax ...
Doing this I found that the CPU went to 100% busy for quite some time, perhaps a minute or more on my laptop, before starting to hit the disk to create the jar archive. This behavior occurs pretty identically on both Windows and Solaris. On Solaris a couple of thread dumps from JDK 1.4.2 while the CPU is busy look like
"main" prio=5 tid=0x00032e98 nid=0x1 runnable [0xffbfd000..0xffbfe510]
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
at java.io.File.isFile(File.java:725)
at sun.tools.jar.Main.expand(Main.java:365)
at sun.tools.jar.Main.expand(Main.java:376)
at sun.tools.jar.Main.expand(Main.java:376)
at sun.tools.jar.Main.expand(Main.java:346)
at sun.tools.jar.Main.run(Main.java:143)
- locked <0xf1ff0e78> (a sun.tools.jar.Main)
at sun.tools.jar.Main.main(Main.java:904)
and
"main" prio=5 tid=0x00032e98 nid=0x1 runnable [0xffbfd000..0xffbfe510]
at java.io.UnixFileSystem.compare(UnixFileSystem.java:290)
at java.io.File.compareTo(File.java:1463)
at java.io.File.equals(File.java:1509)
at java.util.Hashtable.contains(Hashtable.java:274)
- locked <0xf1ff0ed8> (a java.util.Hashtable)
at sun.tools.jar.Main.expand(Main.java:366)
at sun.tools.jar.Main.expand(Main.java:376)
at sun.tools.jar.Main.expand(Main.java:376)
at sun.tools.jar.Main.expand(Main.java:376)
at sun.tools.jar.Main.expand(Main.java:376)
at sun.tools.jar.Main.expand(Main.java:376)
at sun.tools.jar.Main.expand(Main.java:346)
at sun.tools.jar.Main.run(Main.java:143)
- locked <0xf1ff0e78> (a sun.tools.jar.Main)
at sun.tools.jar.Main.main(Main.java:904)
It seems to me that whatever stat()s and hashes are occurring are taking longer than necessary. I would expect jar to behave much more like tar in starting to hit the disk and create the archive more quickly. Given that we have found inefficient algorithms in portions of the platform before, I think the above area of code warrants some more investigation to see whether there is an inefficient algorithm in use. This high CPU use for a long time period has been present since 1.4.2 and appears in 5.0 and 6 as well.