Bug ID: JDK-4829040 -XX:+UseParallelGC causes JVM crash

Type: Bug
Component: hotspot
Sub-Component: gc
Affected Version: 1.4.2

Priority: P2
Status: Closed
Resolution: Duplicate
OS: windows
CPU: x86

Submitted: 2003-03-07
Updated: 2003-03-11
Resolved: 2003-03-11

Recently we noticed a problem with parallel garbage collection in the 1.4.2
Mantis beta.

This can be reproduced with the latest build: build 17 on IA-32 servers.

The problem can be easily reproduced if you try running the industry
standard benchmark, SPECjbb 2000.

The failure will occur very quickly, as soon as the application is trying to
start up (typically just as the benchmark attempts to test GC in preparation
for the benchmark run).

By adding -XX:+UseParallelGC to the command-line the JVM will crash.
We tested this on a 16x IA-32 system with 1600MHz chips, where we'd have to
acknowledge faults 15 times in succession.
On an 8x IA-32 system with 700MHz chips, we would acknowledge the fault 7
times.

Reducing the value of -XX:ParallelGCThreads=n may reduce the number of
faults (or avoid the problem).

--------

Here is some more detailed diagnostic information:

A Windows error message box pops up: 

Application popup: java.exe - Application Error : The instruction at
"0x08191b28" referenced memory at "0x00000004". The memory could not be
"read".

Click on OK to terminate the program
Click on CANCEL to debug the program 

Cancel would not put me into the debugger.

java_g on the other hand was able to run beyond the point of failure.
Instead java_g failed much later, and it isnt clear that it is pinpointing
the same issue:

# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:
SuppressErrorAt=/hotspot\src\share\vm\gc_impleme
ntation\parallelScavenge\psPromotionManager.inline.hpp:42
#
# HotSpot Virtual Machine Error, assertion failure
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Java VM: Java HotSpot(TM) Server VM (1.4.2-beta-b17-debug mixed mode)
#
#
assert(!((ParallelScavengeHeap*)Universe::heap())->young_gen()->to_space()->
co
ntains(*p), "Attempt to rescan object")
#
# Error ID:
D:/BUILD_AREA/jdk1.4.2/hotspot\src\share\vm\gc_implementation\parall
elScavenge\psPromotionManager.inline.hpp, 42
#
# Problematic Thread: prio=5 tid=0x00276f38 nid=0x838 runnable
#

Heap at VM Abort:
Heap
 PSYoungGen      total 54016K, used 50351K [0x10260000, 0x15020000,
0x15020000)
  eden space 26624K, 99% used [0x10260000,0x11c5fff8,0x11c60000)
  from space 27392K, 86% used [0x11c60000,0x1338c000,0x13720000)
  to   space 25600K, 48% used [0x13720000,0x14350000,0x15020000)
 PSOldGen        total 637184K, used 543549K [0x15020000, 0x3be60000,
0x3be60000
)
  object space 637184K, 85% used [0x15020000,0x362ef630,0x3be60000)
 PSPermGen       total 16384K, used 2972K [0x3be60000, 0x3ce60000,
0x3fe60000)
  object space 16384K, 18% used [0x3be60000,0x3c147260,0x3ce60000)

Increasing the heap size for java_g resulted in a stack overflow...

PhaseIdealLoop::set_preorder_visited(Node * 0x72f26074, int 1459) line 326 +
5 bytes
PhaseIdealLoop::build_loop_tree(Node * 0x72f26074, int 1459) line 1491
PhaseIdealLoop::build_loop_tree(Node * 0x72f26020, int 1459) line 1521 + 22
bytes
PhaseIdealLoop::build_loop_tree(Node * 0x72f2568c, int 1458) line 1521 + 22
bytes
PhaseIdealLoop::build_loop_tree(Node * 0x72f253ec, int 1457) line 1521 + 22
bytes
PhaseIdealLoop::build_loop_tree(Node * 0x72f23f14, int 1455) line 1521 + 22
bytes

EVALUATION This is exactly the same as a crash I debugged on a 4 way p4 box recently, down to the stack overflow seen when running the debug vm :-). The problem is that on win32, membar was a no-op. The failures show up more often with +UseParallelGC because it requires membars for the work stealing code to function. The fix for this was put back into mantis-beta over the weekend, I'm not sure when the next build comes out, but it should have the fix. ###@###.### 2003-03-11

11-03-2003

Duplicate :	JDK-4827353 - atomic::membar doesn't on x86
Relates :	JDK-4779902 - ParallelGC is clearing card marks it should not clear.