The issue seems to be with 64-bit immediates in the instruction selection.
The attached test case clearly demonstrates the issue.
Running the test shows in OptoAssembly:
4ce B50: # B8 <- B49 B46 B45 Freq: 0.00189711
4ce MEMBAR-release ! (empty encoding)
4ce
4ce movq R10, [rsp + #8] # spill
4d3 ADDQ [[R10 + #152 (32-bit)]],#281474976710656
4dc
4dc MEMBAR-acquire ! (empty encoding)
Note the "proper" immediate "1 << 42" in the ADDQ. However, it gets further lowered to:
;; B49: # B8 <- B48 B45 B44 Freq: 0.00189711
0x00007f1c991d4a5e: mov 0x8(%rsp),%r10
0x00007f1c991d4a63: lock addq $0x0,0x98(%r10)
So we indeed adding zero, instead of the properly-offset bit.
There are two matching rules for this node, xaddL and xaddL_no_res. The problematic rule seems to be xaddL_no_res. If we mess with the code to force selecting the xaddL, then we get the register-based selection, and the proper code:
0x00007fe76d15b703: mov $0x1000000000000,%r10
;; B52: # B8 <- B51 B53 Freq: 0.00189711
0x00007fe76d15b70d: mov 0x8(%rsp),%r11
0x00007fe76d15b712: lock xadd %r10,0x98(%r11)
This opens up the way for workaround: feed the $delta-s that could not be constant-folded. The simplest would be reading the delta from the volatile field.
Speculation: the problem seems to be the *missing* overloaded macros addq(Address addr, int64_t), and we are probably selecting the best addq(Address addr, int32_t), truncating the immediate.
This issue affects jsr166 development, existing classes (AtomicLong), and has potentially large impact.
I/L/W = H/M/M => P2