As part of JRTS porting issues with the 64 bit c1 port have been identified since they are doing more extensive testing. This bug covers a small collection of them. Hi Tom, For 32bit moves in 64bits LIR_Assembler::stack2stack() does: __ pushl(frame_map()->address_for_slot(src ->single_stack_ix())); __ popl (frame_map()->address_for_slot(dest->single_stack_ix())); But there's no 32 bit push/pop in 64 bit mode. So pushl+popl ends up moving a 64 bit quantity and can corrupt the stack. I've replaced it with: if (src->is_single_stack()) { if (type == T_OBJECT || type == T_ARRAY) { __ pushptr(frame_map()->address_for_slot(src ->single_stack_ix())); __ popptr (frame_map()->address_for_slot(dest->single_stack_ix())); } else { //no pushl on 64bits #ifndef _LP64 __ pushl(frame_map()->address_for_slot(src ->single_stack_ix())); __ popl (frame_map()->address_for_slot(dest->single_stack_ix())); #else __ movl(rscratch1, frame_map()->address_for_slot(src ->single_stack_ix())); __ movl(frame_map()->address_for_slot(dest->single_stack_ix()), rscratch1); #endif } } else if (src->is_double_stack()) { Hi Tom, Here is a bug in 64bit c1 that we hit with RTS. The bug is CR 6848568. I attached a simple testcase to the CR that can be used to reproduce the problem on both sparcv9 and amd64. The problem is that the 64 bit registers that hold the 32 bit position arguments to System.arraycopy() are used to computed start addresses for the copy without making sure that the high 32 bits are cleared first. I fixed it by changing LIR_Assembler::emit_arraycopy() to: __ add(src, arrayOopDesc::base_offset_in_bytes(basic_type), src_ptr); + __ sra(src_pos, 0, src_pos); //higher 32bits must be null if (shift == 0) { __ add(src_ptr, src_pos, src_ptr); } else { __ sll(src_pos, shift, tmp); __ add(src_ptr, tmp, src_ptr); } __ add(dst, arrayOopDesc::base_offset_in_bytes(basic_type), dst_ptr); + __ sra(dst_pos, 0, dst_pos); //higher 32bits must be null if (shift == 0) { __ add(dst_ptr, dst_pos, dst_ptr); } else { __ sll(dst_pos, shift, tmp); __ add(dst_ptr, tmp, dst_ptr); } on sparc. And: #ifdef _LP64 assert_different_registers(c_rarg0, dst, dst_pos, length); + __ movl2ptr(src_pos, src_pos); //higher 32bits must be null __ lea(c_rarg0, Address(src, src_pos, scale, arrayOopDesc::base_offset_in_bytes(basic_type))); assert_different_registers(c_rarg1, length); + __ movl2ptr(dst_pos, dst_pos); //higher 32bits must be null __ lea(c_rarg1, Address(dst, dst_pos, scale, arrayOopDesc::base_offset_in_bytes(basic_type))); __ mov(c_rarg2, length); #else on x86. Roland. minor issue in the OSR code: I found a minor issue in the OSR code. It is armless in hotspot.: The code in SharedRuntime::OSR_migration_begin() prepares an OSR buffer that contains the current BasicObjectLocks. The displaced header is stored first followed by the oop. Offsets in the OSR buffer for those fields are hardcoded (2 entries per BasicObjectLock, displaced header is first, the oop right behind). In LIR_Assembler::osr_entry() the OSR buffer is unpacked. The code there computes offsets within the OSR buffer for BasicObjectLock fields using BasicObjectLock::size(), BasicObjectLock::obj_offset_in_bytes(), BasicObjectLock::lock_offset_in_bytes(). BasicObjectLock::size() happens to be 2 words in hotspot, the lock is first and object second in the BasicObjectLock so that works ok. In Java RTS we have a different BasicLock (to implement priority inheritance). BasicObjectLock size is not 2 and there's something in between the lock and the object. Eventhough we don't have to copy the extra something we've added when entering an OSR method the current code breaks because the code that build the OSR buffer uses hardcoded offsets while the code that unpacks the OSR buffer uses offset in the data structure. --- Depending on the endianness of the platform, in StackValue::create_stack_value(), case Location::normal can be broken. (JRTS bug 6788087) I made the following change to c1_LinearScan.cpp: } else if (opr->is_single_cpu()) { bool is_oop = opr->is_oop_register(); int cache_idx = opr->cpu_regnr() * 2 + (is_oop ? 1 : 0); + Location::Type int_loc_type = NOT_LP64(Location::normal) LP64_ONLY(Location::int_in_long); ScopeValue* sv = _scope_value_cache.at(cache_idx); + if (sv == NULL) { - Location::Type loc_type = is_oop ? Location::oop : Location::normal; + Location::Type loc_type = is_oop ? Location::oop : int_loc_type; VMReg rname = frame_map()->regname(opr); sv = new LocationValue(Location::new_reg_loc(loc_type, rname)); _scope_value_cache.at_put(cache_idx, sv); } // check if cached value is correct - DEBUG_ONLY(assert_equal(sv, new LocationValue(Location::new_reg_loc(is_oop ? Location::oop : Location::normal, frame_map()->regname(opr))))); + DEBUG_ONLY(assert_equal(sv, new LocationValue(Location::new_reg_loc(is_oop ? Location::oop : int_loc_type, frame_map()->regname(opr))))); scope_values->append(sv); return 1; --- Also, I think something like the following is missing from c1_LinearScan.cpp in LinearScan::append_scope_value_for_constant (JRTS bug 6787476): [..] #ifdef _LP64 case T_LONG: case T_DOUBLE: { scope_values->append(&_int_0_scope_value); scope_values->append(new ConstantLongValue(c->as_jlong_bits())); return 2; } #else [..] --- Bug in handling of unaligned load with c1 on sparcv9 I found that the handling of unaligned load on sparc/64bits is broken. The current code does: __ ld(base, offset + hi_word_offset_in_bytes, to_reg->as_register_lo()); __ sllx(to_reg->as_register_lo(), 32, to_reg->as_register_lo()); __ ld(base, offset + lo_word_offset_in_bytes, to_reg->as_register_lo()); The second load instead of just adding the missing 32bit piece to the register overwrites the whole register. --- unaligned load for doubles on sparcv9 broken int LIR_Assembler::load(Register base, int offset, LIR_Opr to_reg, BasicType type, bool unaligned) In the T_DOUBLE case, it does: __ ldf(FloatRegisterImpl::S, base, offset + BytesPerWord, reg->successor()); I think it should be: __ ldf(FloatRegisterImpl::S, base, offset + 4, reg->successor()); --- implicit exception not recognized with c1 on amd64 Another problem with c1, this time on amd64. This is JRTS bug 6798116. In LIR_Assembler::const2mem(), case T_OBJECT: // fall through case T_ARRAY: I think it should be: ShouldNotReachHere(); __ movoop(as_Address(addr, noreg), c->as_jobject()); } else { + #ifdef _LP64 + __ movoop(rscratch1, c->as_jobject()); + null_check_here = code_offset(); + __ movptr(as_Address_lo(addr), rscratch1); + #else __ movoop(as_Address(addr), c->as_jobject()); + #endif } } break; So that the implicit null check is recorded with the correct pc. On x86, LIRGenerator::do_CompareAndSwap() does: LIR_Opr addr = new_pointer_register(); __ move(obj.result(), addr); __ add(addr, offset.result(), addr); If obj.result() is spilled at the time the move is generated, a stack2reg is attempted from a "single" stack to a "double" register. Something forbidden. --- On sparc, LIRGenerator::generate_address() uses new_register(T_INT) and LIR_OprFact::intConst(disp) to build addresses rather than new_pointer_register() and LIR_OprFact::intptrConst(disp) --- On x86: MacroAssembler::andpd(), MacroAssembler::comisd(), MacroAssembler::comiss() might be used with an address that is not reachable with pc relative addressing. --- In LIRGenerator::do_UnsafeGetRaw() and LIRGenerator::do_UnsafePutRaw(), if incoming index is a 32 bit value, it first needs to be converted to a 64 bit value before an address can be built from it. --- Compressed oops are enabled for some 64bit configurations but should never be if c1 is used.
|