JDK-6775807 : G1: Crash in G1 collector
  • Type: Bug
  • Component: hotspot
  • Sub-Component: gc
  • Affected Version: 7
  • Priority: P3
  • Status: Closed
  • Resolution: Cannot Reproduce
  • OS: windows_vista
  • CPU: x86
  • Submitted: 2008-11-24
  • Updated: 2013-09-18
  • Resolved: 2010-05-05
Related Reports
Relates :  
Relates :  
Description
J2SE Version (please include all output from java -version flag):
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b40)
Java HotSpot(TM) Client VM (build 14.0-b07, mixed mode, sharing)


Does this problem occur on J2SE 1.3, 1.4.x or 1.5?  Yes / No (pick one)
No (bug specific to the new G1 collector)


Operating System Configuration Information (be specific):
Windows Vista SP1 [6.0.6001], Brazilian Portuguese, fully patched


Hardware Configuration Information (be specific):
Dell Precision T3400 (Intel Core2 Quad Q6600; NVidia Quadro FX1700)


Bug Description:
Testing the new "Garbage First" collector, I executed the same benchmark first with the CMS collector (default settings) and then with G1. The CMS test executed successfully, and G1 proceeded quite far into the test but crashed with an access violation, apparently in a Young-GC pause.


Steps to Reproduce (be specific):
The benchmark is a unit test of a large customer application, so it would be difficult if possible at all to deliver a reproducible testcase. I picked this app to stress-test G1 because it's a very memory-intensive app. Follows a decsiption of the app, if it helps. I'm also attaching the detailed GC logs for both runs, and also the crash log file produced by HotSpot.

The system calculates the Billing form Brazil's largest inter-bank ATM network, from TecBan. The monthy billing job requires processing ~8 million transaction records, and produces ~2,5 million result records; all this in a single transaction performed by a J2EE 1.4 app running on WebSphere 6.1. (The unit test was performed without any appserver and with Sun's JVMs, however.) This transaction is split in many groups (per each bank+product billed); the largest group reaches ~700,000 records, which are loaded into memory and processed. For these worst-case groups, the system requires ~700Mb of heap to run successfully, without OutOfMemoryErrors or degradation with excessive GC events. The result records are also voluminous but they are inserted in the database in batches of 1,000 and then discarded (made eligible for GC). So there are two main sources of GC activity:
1) Continuous Young-GC to clean up these batches of output records (and also all the garbage produced by the Oracle 9i JDBC driver, which is a quite messy driver that allocates a ton of trash).
2) After each bank+product group's processing is completely, all its input records are discarded, which makes virtually the entire content of the heap eligible for GC. The program doesn't use explicit System.gc() calls, so in practice this leaves a enormous number of tenured objects in the heap, and the next group may hit the heap limit as it loads and accumulates its own input records in the heap; which induces Full-GC events.

In my benchmark, I configured the heap to a static size: min=max=750M, so the test should run combortably with either GC. The primary purpose was investigating the performance of G1, but since it crashed, I'm now using it as a test for G1's stability.

It's worth notice that the benchmarked app is strictly single-threaded, but I executed it on a quad-core CPU and G1 itself is a heavily concurrent and parallel collector, so that may be some race bug. (I used -Xbatch to eliminate JIT concurrency from the diagnostic space.)

Comments
EVALUATION Old bug that was likely caused by bad oop issues. We have resolved many issues related to bad oops since. I'm closing this as not reproducible.
05-05-2010