JDK-8198543 : C2: Wrong type of return value from Unsafe.getAndSetObject() call
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 8u161
  • Priority: P2
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 2018-02-21
  • Updated: 2023-07-21
  • Resolved: 2023-07-21
Related Reports
Duplicate :  
Duplicate :  
Description
Information from JDK-8198531:

Sometimes C2 eliminates non-null branch in the following code as if it proved that ThreadCont.getAndSet() can't return non-null:

    val cont = ThreadCont.getAndSet(threads[i], null)
    if (cont != null) { /* do something */ }

Detailed problem description & reproducer (by Roman Elizarov):
  https://github.com/ktorio/ktor/blob/resumeAnyThread_HotSpotBug/BUG_README_FIRST.md



FULL PRODUCT VERSION :
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

FULL OS VERSION :
Darwin unit-940.labs.intellij.net 17.3.0 Darwin Kernel Version 17.3.0: Thu Nov  9 18:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64 x86_64

EXTRA RELEVANT SYSTEM CONFIGURATION :
Not relevant. Reproduces on various Linux versions, too

A DESCRIPTION OF THE PROBLEM :
We have a code that use AtomicReferenceFieldUpdater.getAndSet. The way it is used can be summaries like this (in Kotlin/JVM code):

val cont = ThreadCont.getAndSet(threads[i], null)
if (cont != null) { /* do something */ }

Now, under some circumstances HotSpot produces the following assembly for the above 'getAndSet':

  0x00000001110d42b2: xchg   QWORD PTR [rcx+0x1b8],r8  ;*invokevirtual getAndSetObject
                                                ; - java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl::getAndSet@19 (line 468)
                                                ; - io.ktor.network.util.IOCoroutineDispatcher::resumeAnyThread@38 (line 76)

  0x00000001110d42b9: mov    r8,QWORD PTR [rsp+0x28]

As you can see the result of getAndSet (xchg instruction) is immediately lost (overwritten) instead of checking it for null.



THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Clone this code branch from github: https://github.com/ktorio/ktor/tree/resumeAnyThread_HotSpotBug

git clone https://github.com/ktorio/ktor.git -b resumeAnyThread_HotSpotBug

2. Build the corresponding test classes:

./gradlew :ktor-client:ktor-client-cio:compileTestKotlin

3. Run the script that runs the test with all the appropriate JVM options (dumps assembly, etc)

./run_test.sh

EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected behavior: Test should pass (it takes up to 45 seconds)
Actual behavior: Test fails (hangs for 1 minute and more)
ERROR MESSAGES/STACK TRACES THAT OCCUR :
HotSpot does not crash, but miscompiles method 'resumeAnyThread' '(Lkotlinx/coroutines/experimental/internal/LockFreeLinkedListNode;)V' in 'io/ktor/network/util/IOCoroutineDispatcher'

This is the relevant part of the run_test.txt file (that is also committed to the branch). The miscompiled version can be found on line 444115 of run_test.txt file:

  0x00000001110d42b2: xchg   QWORD PTR [rcx+0x1b8],r8  ;*invokevirtual getAndSetObject
                                                ; - java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl::getAndSet@19 (line 468)
                                                ; - io.ktor.network.util.IOCoroutineDispatcher::resumeAnyThread@38 (line 76)

  0x00000001110d42b9: mov    r8,QWORD PTR [rsp+0x28]


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
We've failed to minimize the problem as it is fleeting. But it reproduces in just a stable way on this particular test of this particular application: https://github.com/ktorio/ktor/tree/resumeAnyThread_HotSpotBug
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
We've found that many changes in the code make the bug go away. The simplest workaround is to introduce a variable in the source code, e.g. replace this code:

val cont = ThreadCont.getAndSet(threads[i], null) // BAD

with this one:

val t = threads[i] ; val cont = ThreadCont.getAndSet(t, null) // GOOD

See here: https://github.com/ktorio/ktor/blob/41aa9a71c33b9c6fb1de7f50d63df5d3f029f4d1/ktor-network/src/io/ktor/network/util/IOCoroutineDispatcher.kt#L76

Also using array for threads (instead of ArrayList) fixes the problem,  extracting method, etc. Other simplifications to the code fix it too. For example, removing logging from this method fixes it as long as the method itself is not inlined with compiler oracle: -XX:CompileCommand=dontinline,*.resumeAnyThread

However, even the version w


Comments
Copying information from JDK-8198531: ILW = Incorrect execution of compiled code, reproducible with regression test provided by customer, modify Java source code (see report) or exclude method from compilation = HMM = P2
22-02-2018

Verified on 8u172 ea b06 and confirm the existance of issue, where as project has dependencies and could not verify 9 and 10. This looks like issue in inlined method generated code is immediately lost (overwritten) instead of checking it for null.
22-02-2018