Operating System: Windows 2003, 64-bit
OS version: 2003, 64-bit
Product Name: Java 2SE (Java 5)
Product version: 1.5.0_03 through 1.5.0_06
Hardware platform: Windows, Intel
Severity : 1
Short description : Poor garbage collection performance
Full problem description:
When using large Java heaps (1GB, 2GB, 4GB, 16GB, and 28GB) garbage collection performance is performing poorly. CMS parallel rescans range in length from 2 to 75 minutes and can choke the entire server.
Hardware being used is:
Dell PowerEdge 6800 with 32GB physical memory and 4 HT-Xeon processors (8 logical processors) running on Windows 2003 Advanced Server 64-bit edition. We have run the tests using Java 1.5.0_03 through 1.5.0_06 with similar results.
Incremental CMS does not help used recommendations from the Garbage Collection Tuning documentation at http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html. When using incremental CMS, gc���s happen periodically; however, long full GC���s are triggered very early on:
Stock generation sizes yield unacceptable results (long ParNew collections ~1 to ~4 seconds every second), and all live objects are immediate tenured due to tiny 64K survivor pools. System spends > 20% of processing time in young collections. Also, the longer the server is running, the longer the CMS GC���s take.
The system uses a large number of softly and weakly referenced data in addition to a number of strong referenced data.
An average light load for the system results in allocation of 1GB of memory every 8 to 9 seconds, with 1-2% eventually being tenured. A heavy load will be more than double that amount. All of the log files below are from the lightly loaded scenarios with the exception of the 75 minute CMS GC (this was at the full load).
Additional Information :
1) The application deployed.
- This is a custom multi-tier collaboration engine. It consists of one or more servers and many clients. This issue is in regards to a two-tier configuration (a server and 150 to 350 users). The system is written in Java 5, uses NIO for network communications and uses an embedded DBJE system for data persistence (DBJE is Sleepcat Software���s pure Java Berkley DB implementation).
2) If this problem is seen frequently.
- Yes. All the time.
3) Is the application crashing or hanging.
- Yes, it is hanging. If we do not throttle the JVM via the ParallelGCThreads flag, the server will hang until the parallel rescan is complete (or power is cycled on the server). The application has not crashed, but it is completely unusable because of the extremely long stop-the-world GC���s.
4) Its good that the most latest version is being tried and have you collected the dumps and logs.
- We have approximately 20GB of log files collected. More information can be provided, however, proprietary information would need to be removed. If you can provide me with a list of things you would like to see, we can work to accommodate that request.