Bug ID: JDK-4761344 NullPointerException just before data loss

JDK-4761344 : NullPointerException just before data loss

Type: Bug
Component: hotspot
Sub-Component: compiler
Affected Version: 1.4.1,1.4.1_01a,1.4.2

Priority: P1
Status: Resolved
Resolution: Fixed
OS: generic,solaris_8
CPU: generic,sparc

Submitted: 2002-10-10
Updated: 2003-03-19
Resolved: 2003-03-04

Versions (Unresolved/Resolved/Fixed)

The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.

Other	Other
1.4.1_03 03Fixed	1.4.2Fixed

Related Reports

Duplicate :	JDK-4793625 - iMM4.5 on S1Appserver gave NPE with 1.4.2 Build 10 in C2 mode
Relates :	JDK-4684993 - jdk1.4.1 b11 crashes iAS7/TPC-W
Relates :	JDK-4807707 - osr compile problem with jdk 1.4.1, 1.4.1_01 and 1.4.2-b11
Relates :	JDK-4292742 - NullPointerException with no stack trace
Relates :	JDK-6913410 - Crash: class resolution triggered within hotspot compiler.
Relates :	JDK-6272923 - CI wrongly reports array class as unloaded after constrained klass is reported as loaded

Description

===============================================
09/10/2002 11:17:47
===============================================
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) 64-Bit Server VM (build 1.4.1_01-b01, mixed mode)

com.adventnet.snmp.snmp2.agent.AgentRuntimeException: Failure in opening SnmpSession, errstat: 0, errvalue: 0

	at com.adventnet.snmp.snmp2.agent.SnmpAgent.addClientAndOpenSession(SnmpAgent.java:801)
	at com.adventnet.snmp.snmp2.agent.SnmpAgent.restartSnmpAgent(SnmpAgent.java:939)
	at com.adventnet.snmp.snmp2.agent.SnmpAgent.restartSnmpAgent(SnmpAgent.java:955)
	at com.adventnet.snmp.snmp2.agent.SnmpAgent.setPort(SnmpAgent.java:1034)
	at com.xacct.xacctusage.util.xmi.snmpagent.XacctUsage.initXacctUsage(XacctUsage.java:250)
	at com.xacct.xacctusage.util.xmi.snmpagent.XacctUsage.init(XacctUsage.java:320)
	at com.xacct.xacctusage.core.Application$1.run(Application.java:897)
	at java.lang.Thread.run(Thread.java:536)
CSVLogReader.CreateMapArguments() - Exception !!!
java.lang.NullPointerException

All files mentioned in the description are in the attachments

Problem description :
------------------------------

We have CSVLogReader.class that performs CSV file line parsing . After
reading the line from the file, the CreateMapArguments(String line)
method is called in order to put each field to the buffer according to its
type. Inside this method there is a try/catch (Exception E) block, that
sometimes
catches NullPointerException. In order to understand the reason, we've
inserted E.printStackTrace() call, and we've expected to see the stack trace
for this
Exception object in the standard output. But only
"java.lang.NullPointerException" was printed. So the first question is: why
we didn't see the full stack trace ?
Using
                Throwable X = E.fillInStackTrace();
                X.printStackTrace();
gives the same results.

Then we've concluded that the only way to see  what object caused the
NullPointerException is to produce a thread dump immediately after catching
the exception.
From the program we called to a script that sent kill -QUIT signal to the
process. The results of the thread point that the source of this Exception
is inside Java classes.

Examples
---------------

Here are some examples from several thread dumps that show the threads which
called to the CreateMapArguments() method:

1.  Thread dump from gath7_stdout.log

"Thread-24" prio=5 tid=0x1053d0680 nid=0x4a runnable
[fffffffec7a00000..fffffffec7a014a0]
	at
ISM.ISM_114_1_0.CSVFieldTokenizer.fieldAsString(CSVFieldTokenizer.java:161)
	at ISM.ISM_114_1_0.CSVLogReader.CreateMapArguments(CSVLogReader.java:426)
	at ISM.ISM_114_1_0.CSVLogReader.processRecord(CSVLogReader.java:333)
	at ISM.ISM_114_1_0.LogReader.readNewLines(LogReader.java:575)
	at ISM.ISM_114_1_0.LogReader.run(LogReader.java:484)
	- locked <fffffffee8b85858> (a ISM.ISM_114_1_0.CSVLogReader)

Here is the  ISM.ISM_114_1_0.CSVFieldTokenizer.fieldAsString() method with
line 161 specified:

    public String fieldAsString(int FieldNo)
    {
        char[] arr = new char[m_FieldLen[FieldNo]]; // line 161
        System.arraycopy(m_Buffer[FieldNo], 0, arr, 0, m_FieldLen[FieldNo]);
        return (new String(arr));
    }

As you can see, no our objects were called in this line.

The same situation we can see in the gath9_stdout.log :

"Thread-15" prio=5 tid=0x1055f2180 nid=0x30 runnable
[fffffffecae00000..fffffffecae014a0]
	at
ISM.ISM_114_1_0.CSVFieldTokenizer.fieldAsString(CSVFieldTokenizer.java:161)
	at ISM.ISM_114_1_0.CSVLogReader.CreateMapArguments(CSVLogReader.java:426)
	at ISM.ISM_114_1_0.CSVLogReader.processRecord(CSVLogReader.java:333)
	at ISM.ISM_114_1_0.LogReader.readNewLines(LogReader.java:575)
	at ISM.ISM_114_1_0.LogReader.run(LogReader.java:484)
	- locked <fffffffee8b86ef8> (a ISM.ISM_114_1_0.CSVLogReader)

2.  Thread dump from gath17_stdout.log

"Thread-26" prio=5 tid=0x1002eb170 nid=0x53 runnable
[fffffffec6600000..fffffffec66014a0]
	at java.lang.Integer.intValue(Integer.java:591)
	at Util.MapArguments.putString(MapArguments.java:889)
	at ISM.ISM_114_1_0.CSVLogReader.CreateMapArguments(CSVLogReader.java:426)
	at ISM.ISM_114_1_0.CSVLogReader.processRecord(CSVLogReader.java:333)
	at ISM.ISM_114_1_0.LogReader.readNewLines(LogReader.java:575)
	at ISM.ISM_114_1_0.LogReader.run(LogReader.java:484)
	- locked <fffffffee8b85070> (a ISM.ISM_114_1_0.CSVLogReader)

Here is the  Util.MapArguments.putString() method with line 889 specified:

    public void putString(Integer Key, String Value)
    {
	putString(Key.intValue(), Value); // line 889
    }

As you can see, the Integer.intValue() method is called before Exception was
thrown.

3. Thread dump from gath20_stdout.log

"Thread-24" prio=5 tid=0x100301430 nid=0x44 runnable
[fffffffec8400000..fffffffec84014a0]
	at java.util.Calendar.complete(Calendar.java:1095)
	at java.util.Calendar.get(Calendar.java:942)
	at ISM.ISM_114_1_0.SuperParser.parseDateTime(SuperParser.java:286)
	at ISM.ISM_114_1_0.CSVLogReader.CreateMapArguments(CSVLogReader.java:476)
	at ISM.ISM_114_1_0.CSVLogReader.processRecord(CSVLogReader.java:333)
	at ISM.ISM_114_1_0.LogReader.readNewLines(LogReader.java:575)
	at ISM.ISM_114_1_0.LogReader.run(LogReader.java:484)
	- locked <fffffffee81867c8> (a ISM.ISM_114_1_0.CSVLogReader)

The relevant line in the ISM.ISM_114_1_0.SuperParser.parseDateTime() is :
        int Year = m_CalObj.get(Calendar.YEAR); // line 286

The same situation we can see in the gath24_stdout.log :

"Thread-23" prio=5 tid=0x1002fa8e0 nid=0x43 runnable
[fffffffec8500000..fffffffec85014a0]
	at java.util.Calendar.get(Calendar.java:943)
	at ISM.ISM_114_1_0.SuperParser.parseDateTime(SuperParser.java:286)
	at ISM.ISM_114_1_0.CSVLogReader.CreateMapArguments(CSVLogReader.java:476)
	at ISM.ISM_114_1_0.CSVLogReader.processRecord(CSVLogReader.java:333)
	at ISM.ISM_114_1_0.LogReader.readNewLines(LogReader.java:575)
	at ISM.ISM_114_1_0.LogReader.run(LogReader.java:484)
	- locked <fffffffee8181e08> (a ISM.ISM_114_1_0.CSVLogReader)

As you can see, the Calendar m_CalObj method is not null here,  and the
problem is in the Calendar methods.

4. Thread dump from gath28_stdout.log

"Thread-30" prio=5 tid=0x105562140 nid=0x52 runnable
[fffffffec6f00000..fffffffec6f014a0]
	at Util.MapArguments.putString(MapArguments.java:903)
	at Util.MapArguments.putString(MapArguments.java:889)
	at ISM.ISM_114_1_0.CSVLogReader.CreateMapArguments(CSVLogReader.java:426)
	at ISM.ISM_114_1_0.CSVLogReader.processRecord(CSVLogReader.java:333)
	at ISM.ISM_114_1_0.LogReader.readNewLines(LogReader.java:575)
	at ISM.ISM_114_1_0.LogReader.run(LogReader.java:484)
	- locked <fffffffee8b814a0> (a ISM.ISM_114_1_0.CSVLogReader)

The relevant line in the Util.MapArguments.putString() is :
             m_StringErr [KeyDesc.Index] = NO_ERROR; // line 903

The m_StringErr is a array of booleans.

How do you explain this situation ? What object can cause the
NullPointerException ? And why we can't see the stack trace
by using printStackTrace() call ?

Attached with this email :
1. All mentioned thread dumps (gath7_stdout.log, gath9_stdout.log,
gath17_stdout.log, gath20_stdout.log, gath24_stdout.log, gath28_stdout.log)
.
2. The CSVLogReader  source file.

Comments

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.1_03 mantis-beta FIXED IN: 1.4.1_03 mantis-beta INTEGRATED IN: 1.4.1_03 mantis-b18 mantis-beta tiger
17-09-2004
SUGGESTED FIX ###@###.### 2003-01-13 Record additional "dependence" information for OSR compiles so that the runtime system will invalidate/deoptimize the OSR code when the specified class is loaded. Problem still exists in 1.4.2 but may be less common after the fix for 4778368 since this allows most type information to flow through ciTypeFlow's checkcast. ===== =====
17-09-2004
PUBLIC COMMENTS NullPointerExceptions are thrown by compiled code, but when the source code is inspected, it is provably impossible for a null object to be present. The problem is exacerbated by the behavior of bug 4292742, which causes NPE backtraces to be lost in server-compiled methods. This particular JVM failure has been linked to the use of multiple application-defined class loaders and aliasing of classes among them. This confuses the compiler and causes it to "prove" to itself, wrongly, that a certain reference must be null, leading to the NPE. Customers suspecting occurrences of this bug should first verify that it is not the simpler "missing backtrace" problem of 4292742.
17-09-2004
EVALUATION According to the description, the customer is looking for the stack trace on the standard output. The printStackTrace() method prints the stack trace on the standard error, not standard output. However it does appear that the log file do capture standard error becaue the traceback from the second call to printStackTrace() does come out (the description implies that it doesn't.) He should modify his program to write his extra debugging output to System.err instead of System.out and capture stderr and stdout separately when running the program. The combined output to the 2 output streams may not be intermixed in the same order as written. His invocation of the "printstacktrace" script does not do what he wants. The execution will run concurrently with his program, so the thread he is interested in could have continued execution far beyond his call to Runtime.exec(). He needs to add a call to MyProcess.waitFor() after the call to Runtime.getRuntime().exec(). However this will show the location in the exception handling code, not where the original exception occurred. His analysis of where the NullPointerException occurred from the thread dumps is incorrect, the thread dumps show where the program execution had avanced to by the time the SIGQUIT was delivered. It does seem to be a bug that the printStackTrace() call on the original exception object does not print the stack trace. This does work correctly on a small test case I have written. Without more information on how to reproduce the bug I cannot evaluate it any further. ###@###.### 2002-10-16 ---------------------------------------- ###@###.### 2002-11-19 Initially, I suspected it to be a compiler issue, but after inspecting the 2 cores at the given location, using SA, I could find very few compiled frames: com.xacct.xacctusage.gatherer.flow.reliable.ReliableTriggerCompFlowStep.forwardPendingData() java.util.Calendar.clear() Util.Pulse.xacctRun(). And the method CreateMapArguments() that is seeing the data loss, wasn't compiled in any of the cores. I suspect the generated interpreter code right now. To make sure it's an interpreter issue & not a compiler problem, I would suggest a run with java -Xcomp. And if that run fails, do another run with disabling compilation of above three methods. ###@###.### 2002-11-20 New class file was developed as suggested and tested. In summary, it is getting worse :-( The class file is under /dumps/CSVLogReader.class, if needed. There have been several crashes due to Null pointer exception. For a sample, look in the file /t3-5/gatherer/g5/Log/Gatherer_stdout.log towards the end of the logfile. There was also another crash with little information. Pertinent details are in /t3-1/gatherer/g29/Log/Gatherer_stdout.log and its corresponding core file is /t3-1/gatherer/g29/Gatherer/core. There were also Hashtable hang bug and its details are in /t3-7/gatherer/g21/Log/Gatherer_stdout.log Using -Xcomp is NOT an acceptable workaround for several reasons. - We do not ask a customer to run with un-documented flags in production. - We very well know that using it is deterimetal to performance and that's probably the reason we don't document it. - This application's (ISVs) main competitive feature is performance, the main reason for which I (MDE) am working with them now but in the middle of these bugs. let me stop with this....... The system is on SWAN. When I left I was told that there's some maintanance work going on with the building router but that should not affect this system. However, I was not able to get to my mail in sfbay domain; but, I can access that E10K from here now. So, the connectivity may be intermittent but please don't give up. > Date: Wed, 20 Nov 2002 19:41:53 -0500 (GMT) > From: ###@###.### > > Synopsis: NullPointerException just before data loss > > **************************************************** > > Progress: > > Nov 20 2002 5:56AM R,Anupam ** > Event: The test with -Xcomp doesn't reproduce the problem. So most probably it's the generated interpreter code, that is the culprit. I am currently looking at the genereated codelets for bytecodes. My suspecion right now is on iconst_m1 opcode, which will put a -1(0xffffffffffffffff) on the operand stack. > Looking in CreateMapArguments() the place where this opcode is coming from is, > > boolean Found = false; > > Can you try a run, after making Found as "int" data type and using values of 1 & 2 as false & true respectively. This will avoid iconst_m1. > > But since, -Xcomp doen't reproduce the problem that can be used as workaround for the time being. > > I will continue to need the machine and I haven't made any modification to the set up on E10K today, as requested. > ----------------------------------------- ###@###.### 2002-11-21 Looked at every word in the stack frame of failing method's frame using SA. I would expect the local reference which is getting corrupted to be present there, but didn't find any oop of type Util.MapArguments. I did see a null value in the region where other local oops were present. Knowing how interpreter frames are laid out, would help get the exact offset for this local. I think an earlier bytecode execution corrupted(null) this location, so next iload_2 loading from that location puts null on the operand stack. ---------------------------------------- ###@###.### 2002-12-10 Test doesn't fail when compilation of CrateMapArguments() is disabled and in -Xcomp mode. Looking at TraceDeoptimization outputs, CrateMapArguments() is both compiled and deoptimized twice: 1. First one is a normal compile, and is deopt due to failed optimized checkcast. Deopt here looks proper and execution proceeds normally. 2. Second compile is an OSR compile, due to the for loop in CrateMapArguments(), and is deoptimized due to "unresolved class: Util/KDSServer". Excecution after this deopt gives a NullPointerException. This points to a problem in setting up of interpreted frames after deopt. -------------------------------------------- ###@###.### 2003-01-03 looking at .class and PrintOpto, it looks solely a OSR compile only problem and deopt may not be the issue. Although, deopt packing showed map as NULL, deopt itself happened inside the catch block, and catch was executed because map was already null. PrintOpto shows that at calls to parse*() methods in the pseudo compiled code L2=#NULL(that's the place where local#2, map, will be kept). Whereas, for normal compiles(where there is no data corruption) L2 has good values. Compiler has assumed _L2(map) to be dead in a range where it should have been alive. -------------------------------------------- Attached file BadNullAssert.java shows a test case for a possible root cause of this bug. The problem is subtle: An OSR method is compiled using a model of typestates which are derived from flow analysis over the bytecodes. This flow analysis is necessarily incomplete if the interpreter has not yet exercised all paths in the code, leaving some classes still unloaded. In effect the CFG for the method grows over time, as the interpreter exercises new paths and loads new classes. Exercising a new path must invalidate pre-existing OSR methods, since they may have compiled into them assumptions about typestates that must change as new CFG edges come into being. The problem can manifest (in the test case above, for example) when an unloaded class is the subject of a checkcast operation. The flow analysis that precedes an OSR compilation will simply assert that any value casted to an unloaded class must be null, and the compilation proceeds on that assumption. Later on, if the class loads, the interpreter can checkast non-null values successfully, leading to typestates that were previously proved impossible. The OSR method can be invoked on a typestate created by the interpreter containing non-null values, and yet the OSR code will assume that they are null, based on the out-of-date type analysis. Solution is to have OSR methods register dependencies on unloaded classes, and get flushed if the classes ever load. This is not an issue with regular (non-OSR) methods, since a normal method creates all its own typestates, from the method's entry point forward. There is no way for the interpreter dump surprising typestates into a non-OSR method. A binary which fixes this bug is here: /net/lukasiewicz/export/lukasiewicz1/4761344 There are product, optimized, and fastdebug flavors. It is based on the current 1.4.2 baseline. To gather more information, testing should use the following extra flags: -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=hotspot.log The file "hotspot.log" will be written with diagnostic information. The name can be adjusted to any pathname. ###@###.### 2003-01-17 -------------------------------------------- The libjvm.so binaries containing the fix are under this directory: /net/lukasiewicz.sfbay/export/lukasiewicz1/4761344 The 32-bit VM builds are: fastdebug/libjvm_g.so asserts enabled, compiled optimized, for java_g optimized/libjvm.so debugging options compiled in product/libjvm.so product build The corresponding 64-bit VM builds are: sparcv9/fastdebug/libjvm_g.so sparcv9/optimized/libjvm.so sparcv9/product/libjvm.so All of these were built from the current 1.4.2 baseline, but they can run inserted into the 1.4.1 JRE. To verify installation, start a VM with -Xinternalversion, and make sure the version string includes "jrose in mantis". If these binaries do not fix the problem, please re-run with the previously suggested three options including +LogCompilation. ###@###.### 2003-01-22 -------------------------------------------- After some intensive analysis, we are more confident the root problem is the compiler erroneously deciding that a particular class is unloaded, due to the fact that it is defined in a delegate class loader and not directly visible in the current class loader. We believe we can fix this cleanly and reliably by using the class loader constraints recorded by the VM, to correctly report that the class in question is in fact loaded in the VM, even though it has not yet been fetched from the local class loader. I am working on a VM that includes this adjustment. Meanwhile, run with logging turned on, and the log file set to a non-existent file, such as "/no/such/file". This will cause the VM to choose a unique file in /tmp/, avoiding the problem with multiple VMs overwriting a single log. ###@###.### 2003-02-12 -------------------------------------------- There is an updated VM which fixes the problems mentioned above. Please test with it. /net/lukasiewicz.sfbay/export/lukasiewicz1/4761344 The 64-bit VM builds are: sparcv9/fastdebug/libjvm_g.so sparcv9/optimized/libjvm.so sparcv9/product/libjvm.so ###@###.### 2003-02-14 -------------------------------------------- Recent JDC comments report that there is some variation of this bug active in 1.4.2. The particular bug fixed under this report had to do with the compiler getting confused by aliasing of classes through different clas loaders, causing it to "know" that a given class was not yet loaded and therefore all references "must" be null--leading to embarassing spurious NPEs. On top of this problem, up until 1.5 the compiler optimizes the treatment of NPEs by using a preallocated NPE object, rather than creating it every time. The preallocated NPE object works great, except that it lacks a backtrace. A backtrace-free NPE makes it hard to diagnose any NPE bug, whether it comes from user error (usually) or a JVM bug (rarely). We have opened a separate bug 4292742 for the missing backtrace, and fixed it in 1.5beta2. This backtrace problem is probably bothering people on 1.4.2 release. It is probably making it difficult to debug application errors in in compiled code, forcing them to run -Xint or -client to debug. If this is true for you, I suggest transferring your JDC vote to bug 4292742, so we can consider backporting a version of the 1.5 fix. On the other hand, NPE exceptions that are truly spurious, and are known to be due to JVM errors, should probably be reported as new bugs. However, if they are clearly cases of the complex syndrome described in this bug, we would appreciate an explicit note to this effect, so we can re-evaluate the fix to the class loading logic. ###@###.### 2004-03-26
26-03-2004