JDK-6769124 : various 64-bit fixes for c1
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: hs14
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: solaris_9
  • CPU: sparc
  • Submitted: 2008-11-07
  • Updated: 2011-03-08
  • Resolved: 2011-03-08
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6 JDK 7 Other
6u21Fixed 7Fixed hs17Fixed
Description
As part of JRTS porting issues with the 64 bit c1 port have been identified since they are doing more extensive testing.  This bug covers a small collection of them.

Hi Tom,

For 32bit moves in 64bits LIR_Assembler::stack2stack() does:

__ pushl(frame_map()->address_for_slot(src ->single_stack_ix()));
__ popl (frame_map()->address_for_slot(dest->single_stack_ix()));

But there's no 32 bit push/pop in 64 bit mode. So pushl+popl ends up moving a 64 bit quantity and can corrupt the stack.

I've replaced it with:

 if (src->is_single_stack()) {
   if (type == T_OBJECT || type == T_ARRAY) {
     __ pushptr(frame_map()->address_for_slot(src ->single_stack_ix()));
     __ popptr (frame_map()->address_for_slot(dest->single_stack_ix()));
   } else {
     //no pushl on 64bits
#ifndef _LP64
     __ pushl(frame_map()->address_for_slot(src ->single_stack_ix()));
     __ popl (frame_map()->address_for_slot(dest->single_stack_ix()));
#else
     __ movl(rscratch1, frame_map()->address_for_slot(src ->single_stack_ix()));
     __ movl(frame_map()->address_for_slot(dest->single_stack_ix()), rscratch1);
#endif
   }

 } else if (src->is_double_stack()) {
Hi Tom,

Here is a bug in 64bit c1 that we hit with RTS.

The bug is CR 6848568. I attached a simple testcase to the CR that can
be used to reproduce the problem on both sparcv9 and amd64.

The problem is that the 64 bit registers that hold the 32 bit position
arguments to System.arraycopy() are used to computed start addresses for
the copy without making sure that the high 32 bits are cleared first.

I fixed it by changing LIR_Assembler::emit_arraycopy() to: 

   __ add(src, arrayOopDesc::base_offset_in_bytes(basic_type),
src_ptr);
+  __ sra(src_pos, 0, src_pos); //higher 32bits must be null
   if (shift == 0) {
     __ add(src_ptr, src_pos, src_ptr);
   } else {
     __ sll(src_pos, shift, tmp);
     __ add(src_ptr, tmp, src_ptr);
   }

   __ add(dst, arrayOopDesc::base_offset_in_bytes(basic_type),
dst_ptr);
+  __ sra(dst_pos, 0, dst_pos); //higher 32bits must be null
   if (shift == 0) {
     __ add(dst_ptr, dst_pos, dst_ptr);
   } else {
     __ sll(dst_pos, shift, tmp);
     __ add(dst_ptr, tmp, dst_ptr);
   }

on sparc. And:

 #ifdef _LP64
   assert_different_registers(c_rarg0, dst, dst_pos, length);
+  __ movl2ptr(src_pos, src_pos); //higher 32bits must be null
   __ lea(c_rarg0, Address(src, src_pos, scale,
arrayOopDesc::base_offset_in_bytes(basic_type)));
   assert_different_registers(c_rarg1, length);
+  __ movl2ptr(dst_pos, dst_pos); //higher 32bits must be null
   __ lea(c_rarg1, Address(dst, dst_pos, scale,
arrayOopDesc::base_offset_in_bytes(basic_type)));
   __ mov(c_rarg2, length);

 #else

on x86.

Roland.
minor issue in the OSR code:
 
I found a minor issue in the OSR code. It is armless in hotspot.: The
code in SharedRuntime::OSR_migration_begin() prepares an OSR buffer that
contains the current BasicObjectLocks. The displaced header is stored
first followed by the oop. Offsets in the OSR buffer for those fields
are hardcoded (2 entries per BasicObjectLock, displaced header is first,
the oop right behind). In LIR_Assembler::osr_entry() the OSR buffer is
unpacked. The code there computes offsets within the OSR buffer for
BasicObjectLock fields using BasicObjectLock::size(),
BasicObjectLock::obj_offset_in_bytes(),
BasicObjectLock::lock_offset_in_bytes(). BasicObjectLock::size() happens
to be 2 words in hotspot, the lock is first and object second in the
BasicObjectLock so that works ok. In Java RTS we have a different
BasicLock (to implement priority inheritance). BasicObjectLock size is
not 2 and there's something in between the lock and the
object. Eventhough we don't have to copy the extra something we've added
when entering an OSR method the current code breaks because the code
that build the OSR buffer uses hardcoded offsets while the code that
unpacks the OSR buffer uses offset in the data structure.

--- 

Depending on the endianness of the platform, in
StackValue::create_stack_value(), case Location::normal can be
broken. (JRTS bug 6788087)
 
I  made the following change to c1_LinearScan.cpp:
 
    } else if (opr->is_single_cpu()) {
      bool is_oop = opr->is_oop_register();
      int cache_idx = opr->cpu_regnr() * 2 + (is_oop ? 1 : 0);
 +    Location::Type int_loc_type = NOT_LP64(Location::normal) LP64_ONLY(Location::int_in_long);
 
      ScopeValue* sv = _scope_value_cache.at(cache_idx);
 +
      if (sv == NULL) {
 -      Location::Type loc_type = is_oop ? Location::oop : Location::normal;
 +      Location::Type loc_type = is_oop ? Location::oop : int_loc_type;
        VMReg rname = frame_map()->regname(opr);
        sv = new LocationValue(Location::new_reg_loc(loc_type, rname));
        _scope_value_cache.at_put(cache_idx, sv);
      }
 
      // check if cached value is correct
 -    DEBUG_ONLY(assert_equal(sv, new LocationValue(Location::new_reg_loc(is_oop ? Location::oop : Location::normal, frame_map()->regname(opr)))));
 +    DEBUG_ONLY(assert_equal(sv, new LocationValue(Location::new_reg_loc(is_oop ? Location::oop : int_loc_type, frame_map()->regname(opr)))));
 
      scope_values->append(sv);
      return 1;

---
Also, I think something like the following is missing from
c1_LinearScan.cpp in
LinearScan::append_scope_value_for_constant (JRTS bug 6787476):
 
[..]
#ifdef _LP64
    case T_LONG: 
    case T_DOUBLE: {
      scope_values->append(&_int_0_scope_value);
      scope_values->append(new ConstantLongValue(c->as_jlong_bits()));
      return 2;
    }
#else
[..]

--- 

Bug in handling of unaligned load with c1 on sparcv9
 
I found that the handling of unaligned load on sparc/64bits is broken. The  
current code does:
 
__ ld(base, offset + hi_word_offset_in_bytes, to_reg->as_register_lo());
__ sllx(to_reg->as_register_lo(), 32, to_reg->as_register_lo());
__ ld(base, offset + lo_word_offset_in_bytes, to_reg->as_register_lo());
 
The second load instead of just adding the missing 32bit piece to the  
register overwrites the whole register.

---

unaligned load for doubles on sparcv9 broken
 
int LIR_Assembler::load(Register base, int offset, LIR_Opr to_reg, BasicType type, bool unaligned) 
 
In the T_DOUBLE case, it does:
 
 __ ldf(FloatRegisterImpl::S, base, offset + BytesPerWord, reg->successor());
 
I think it should be:
 
 __ ldf(FloatRegisterImpl::S, base, offset + 4, reg->successor());

---
implicit exception not recognized with c1 on amd64
 
Another problem with c1, this time on amd64. This is JRTS bug 6798116.
 
In  LIR_Assembler::const2mem(),
 
     case T_OBJECT:  // fall through
     case T_ARRAY:
 
I think it should be:
 
             ShouldNotReachHere();
             __ movoop(as_Address(addr, noreg), c->as_jobject());
           } else {
+ #ifdef _LP64
+           __ movoop(rscratch1, c->as_jobject());
+ 	  null_check_here = code_offset();
+ 	  __ movptr(as_Address_lo(addr), rscratch1);
+ #else
             __ movoop(as_Address(addr), c->as_jobject());
+ #endif
           }
         }
         break;
 
So that the implicit null check is recorded with the correct pc.
On x86, LIRGenerator::do_CompareAndSwap() does:
 
LIR_Opr addr = new_pointer_register();
__ move(obj.result(), addr);
__ add(addr, offset.result(), addr);
 
If obj.result() is spilled at the time the move is generated, a
stack2reg is attempted from a "single" stack to a "double"
register. Something forbidden.

---

On sparc, LIRGenerator::generate_address() uses new_register(T_INT) and LIR_OprFact::intConst(disp) to build addresses rather than new_pointer_register() and LIR_OprFact::intptrConst(disp)

---

On x86:
MacroAssembler::andpd(), MacroAssembler::comisd(), MacroAssembler::comiss() might be used with an address that is not reachable with pc relative addressing.

---

In LIRGenerator::do_UnsafeGetRaw() and LIRGenerator::do_UnsafePutRaw(), if incoming index is a 32 bit value, it first needs to be converted to a 64 bit value before an address can be built from it.

---

Compressed oops are enabled for some 64bit configurations but should never be if c1 is used.

Comments
EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/323bd24c6520
04-11-2009