Bug ID: JDK-7012088 jump to 0 address because of lack of memory ordering in SignatureHandlerLibrary::add

Type: Bug
Component: hotspot
Sub-Component: runtime
Affected Version: 5.0u14

Priority: P3
Status: Closed
Resolution: Fixed
OS: solaris_10
CPU: sparc

Submitted: 2011-01-13
Updated: 2015-11-20
Resolved: 2011-03-08

JDK 6	JDK 7	Other
6u111Fixed	7Fixed	hs21Fixed

A CU found out JVM jumps to 0 address and terminates abnormally.

CONFIGURATION :
JDK : JDK5u14 server
OS : Solaris 10

PHENOMENA:
The following is the stack trace when the abort occurs.

#0  0xfe33092c in JVM_handle_solaris_signal
#1  <signal handler called>
#2  0x00000000 in ??
#3  0xf880c1d8 in %%% sun.misc.GC#maxObjectInspectionAge()J
#4  0xf8805904 in %%% sun.misc.GC$Daemon#run()V
#5  0xf8800220 in === BufferBlob[StubRoutines (1)] @ 0xf8800108 ===
#6  0xfe1a2e74 in void JavaCalls::call_helper(JavaValue*,methodHandle*,JavaCallArguments*,Thread*)
#7  0xfe37f424 in void JavaCalls::call_virtual(JavaValue*,Handle,KlassHandle,symbolHandle,symbolHandle,Thread*)
#8  0xfe39f9f0 in void thread_entry(JavaThread*,Thread*)
#9  0xfe39b2e4 in void JavaThread::run()
#10 0xfe7a6968 in void*_start(void*)
#11 0xfeac8c70 in _lwp_start (0x0, 0x0, 0x0, 0x0, 0x0, 0x0) from root/usr/lib/libc.so.1

(%%% means interpreter is running.)

JVM jumps to 0 address in "call" instruction at the following line# 0xf880c224.
This is the code corresponding to
  ...
method entry point (kind = native) [0xf880c070, 0xf880c4c0]  1104 bytes not safepoint safe
  ...
in interpreter codelet

0xf880c1d0:     call  0xfe33fcc8 <void InterpreterRuntime::prepare_native_call(JavaThread*,methodOopDesc*)>
 0xf880c1d4:     mov  %g2, %o0
 0xf880c1d8:     mov  %l7, %g2
 0xf880c1dc:     clr  [ %g2 + 0x120 ]
 0xf880c1e0:     clr  [ %g2 + 0x124 ]
 0xf880c1e4:     clr  [ %g2 + 0x12c ]
 0xf880c1e8:     ld  [ %g2 + 4 ], %g5
 0xf880c1ec:     tst  %g5
 0xf880c1f0:     be  %icc, 0xf880c200
 0xf880c1f4:     nop
 0xf880c1f8:     call  0xf8800140
 0xf880c1fc:     nop
 0xf880c200:     ld  [ %l2 + 0x4c ], %g3
 0xf880c204:     st  %l2, [ %sp + 8 ]
 0xf880c208:     mov  %l3, %o1
 0xf880c20c:     add  %fp, -24, %o2
 0xf880c210:     sub  %sp, %fp, %o3
 0xf880c214:     save  %sp, %o3, %sp
 0xf880c218:     ld  [ %fp + 8 ], %l2
 0xf880c21c:     mov  %i1, %l3
 0xf880c220:     mov  %i2, %l6
 0xf880c224:     call  %g3                 //  NOTE !!

signature_handler of methodOop should have been set to %3 at the line 0xf880c224,
but actually 0 is set.
signature_handler should have been set in InterpreterRuntime::prepare_native_call at the line 0xf880c1d0
but have not.

The code of SignatureHandlerLibrary::add called from InterpreterRuntime::prepare_native_call is as follows.

    if (method->signature_handler() == NULL) {
    ...
      int handler_index = -1;
    ...
      if (UseFastSignatureHandlers && method->size_of_parameters() <= Fingerprinter::max_size_of_parameters) {
    ...
           MutexLocker mu(SignatureHandlerLibrary_lock);
    ...
           uint64_t fingerprint = Fingerprinter(method).fingerprint();
           handler_index = _fingerprints->find(fingerprint);
    ...
           address handler = set_handler(buffer);
    ...
           _fingerprints->append(fingerprint);
           _handlers->append(handler);               // *1
    ...
      }
    ...
      method->set_signature_handler(_handlers->at(handler_index));   // *2
    }

The program is locked until the line *1 by MutexLocker mu, but not at the line *2.
When a thread is accessing to the line *2,  another thread executes the line *1
and GrowableArray::grow is called, GrowableArray::_data might be overwritten.
The thread which is executing _handlers->at(handler_index) at *2
possibly refers to different data.

The following code is extarcted from GrowableArray::grow,

  for (int i = 0; i < _len; i++) newData[i] = _data[i];
  ...
  _data = newData;

The above program set new data to  _data variable after copying
old _data. Some memory ordering code seems needed here.

When the crash occurs, the status at *2 seems,
  new _data can be seen but copied elements can not be seen yet.
(This is similar to 6704010.)


Please see "comment" section also.

Just in the last few weeks, the jdk6u-cpu repo has failed to build a control job in jprt, crashing on solaris i586 fastdebug. No good full backtrace of the fault, but apparently within SignatureHandlerLibrary::add, with another thread blocked on the Mutex acquire in the same routine. This change appears to fix it. 3 passing jprt runs so far. We'll go ahead with a jdk6u-cpu backport.
07-10-2015
EVALUATION Summary: Write method signature handler under lock to prevent race with growable array resizing Reviewed-by: dsamersoff, dholmes
03-02-2011
SUGGESTED FIX I've been removing fixes for this all day.
25-01-2011
PUBLIC COMMENTS This seems to be a basic race condition. method->set_signature_handler(_handlers->at(handler_index)); must be executed with the SignatureHandlerLibrary_lock held otherwise, as per the description, more than one thread can create and add a handler. Further, one thread can be accessing _handlers at the same time it is being grown and the crash occurs as described. An assertion failure in this code exposed this very issue only recently but the application to non-debug code was not realized at the time - see 6704010
13-01-2011

Duplicate :	JDK-8137023 - JDK6u sanity run is failing for platform "solaris_i586_5.8-fastdebug"
Relates :	JDK-6704010 - Internal Error (src/share/vm/interpreter/interpreterRuntime.cpp:1106)