The customer's application created 50-100 threads. When the
threads are doing BigInteger.toString(), NullPointerException
happened.
Comments
EVALUATION
Will investigate; suspect cause may be a threading issue independent of BigInteger.
###@###.### 2004-04-28
This bug seems unreproducible with 1.4.2_05, latest.
----------------copy-paste begins here -----------------
bash-2.05$ /usr/java/j2re1.4.2_05/bin/java BigIntegerTest 10000 50
Start: Tue May 18 01:10:18 IST 2004
163173
bash-2.05$ /usr/java/j2re1.4.2_05/bin/java BigIntegerTest 100000 50
Start: Tue May 18 01:14:02 IST 2004
1477004
bash-2.05$ hostname
jlab110
bash-2.05$ pwd
/home/vs147299/test
bash-2.05$ uname -a
Linux jlab110 2.4.7-10 #1 Thu Sep 6 17:27:27 EDT 2001 i686 unknown
bash-2.05$ cat /proc/version
Linux version 2.4.7-10 (###@###.###) (gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 Thu Sep 6 17:27:27 EDT 2001
bash-2.05$
--------------copy-paste ends here -------------------
###@###.### 2004-05-17
I am unable to reproduce the bug on my system "jlab110.india.sun.com",
using neither j2re1.4.2_05/bin/java, nor j2sdk1.4.2_05/bin/java, nor j2sdk1.4.2_05/jre/bin/java.
Whereas on "goya.japan.Sun.COM", I am able to reproduce the bug using /usr/j2sdk1.4.2_05/bin/java.
It turns out both of these machines have the identical output to the command "cat /proc/version":
Linux version 2.4.7-10 (###@###.###) (gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 Thu Sep 6 17:27:27 EDT 2001
However if I run the command "ulimit -a" on both these machines I see only one difference:
on jlab110.india.sun.com: "max user processes 2047"
on goya.japan.Sun.COM: "max user processes 255"
So I ran (as root) the command "ulimit -u 2047" on goya.japan.Sun.COM.
As a result, I then see the outputs of "ulimit -a" on both these machines to be identical.
I then run "/usr/j2sdk1.4.2_05/bin/java BigIntegerTest 100000 50" on goya.japan.Sun.COM:
I see a failure like before, however, this time the command runs for longer before failing.
with Max no. of Processes set to 255, goya.japan.Sun.COM takes:
real 9m8.366s; user 2m56.200s; sys 0m7.800s to fail,
whereas with Max no. of Processes set to 2047, goya.japan.Sun.COM takes:
real 23m51.409s; user 6m27.470s; sys 0m17.540s.
I further investigated the output of the command "cat /proc/meminfo" on both these machines:
On jlab110.india.sun.com: RAM: 525MB
On goya.japan.Sun.COM: RAM: 63MB
I believe, therefore, that it is indeed because of a lack of system resources that the program is failing on the goya.japan.Sun.COM machine.
If the field/Customer can use a "well-endowed" machine, with an appropriately setup ulimit values,
than the test case may not fail, like it doesnot on jlab110.india.sun.com.
----
###@###.### 2004-05-28
With the use of the '-server' option, the same test fails to crash the jvm on either of these machines, mentioned above:
[root@goya 18132]# time java -server BigIntegerTest 1000000 50
Start: Tue Jun 01 12:17:32 JST 2004
41671630
real 694m33.515s
user 665m9.280s
sys 28m31.180s
[root@goya 18132]# which java
/usr/j2sdk1.4.2_05/bin/java
[root@goya 18132]#
Similarly on the other machine:
[root@jlab110 test]# time /usr/java/j2sdk1.4.2_05/jre/bin/java BigIntegerTest 1000000 50
Start: Tue Jun 01 19:11:46 IST 2004
14207671
real 236m49.085s
user 225m35.950s
sys 9m46.000s
[root@jlab110 test]# time /usr/java/j2sdk1.4.2_05/jre/bin/java -server BigIntegerTest 1000000 50
Start: Tue Jun 01 23:49:20 IST 2004
12964860
real 216m5.823s
user 205m12.970s
sys 9m34.470s
[root@jlab110 test]#
--
###@###.### 2004-06-02
Being investigated by C1 team.
###@###.### 2004-07-21
This bug is reproducible on Windows/x86 as well. So far in a few runs I wasn't
able to get 1.5.0 to crash which indicates it's a bug that's already been
fixed. Looking into the current 1.4.2_06 sources to attempt to narrow down
the root cause.
###@###.### 2004-07-28
This is the same bug as 4917709. The fix for that bug contained not only the
needed bailout for a non-empty expression stack at a backward branch (which
isn't triggered in this case), but also a fix for the contents of the oop map
at a backward branch safepoint. The latter is the change that fixes this bug.
###@###.### 2004-07-28
28-07-2004
WORK AROUND
There are currently two workarounds discovered for this problem:
1. (See Analysis under X-Evaluation) that reasons that it is indeed because of a lack of system resources that the program is failing on the goya.japan.Sun.COM machine. If the field/Customer can use a "well-endowed" machine, with an appropriately setup ulimit values, than the test case may not fail, like it doesnot on jlab110.india.sun.com.
2. The crash doesnot happen with the use of the '-server' option when imvoking the jvm.
--
###@###.### 2004-06-02