JDK-8292302 : Windows GetLastError value overwritten by ThreadLocalStorage::thread
  • Type: Bug
  • Component: core-svc
  • Sub-Component: debugger
  • Affected Version: 17,20
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows
  • CPU: generic
  • Submitted: 2022-08-12
  • Updated: 2023-08-30
  • Resolved: 2022-09-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 20
20 b15Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
It seems that when running a program in a debugger (e.g. jdb), the value of GetLastError can be overwritten, and the behavior of the program essentially changes.

Take for instance this example program:

```
import java.lang.invoke.MethodHandle;

import java.lang.foreign.*;
import static java.lang.foreign.ValueLayout.*;

public class Main {
    public static void main(String[] args) throws Throwable {
        Linker linker = Linker.nativeLinker();
        System.loadLibrary("Kernel32");
        SymbolLookup lookup = SymbolLookup.loaderLookup();
        MethodHandle getLastError = linker.downcallHandle(
            lookup.lookup("GetLastError").orElseThrow(),
            FunctionDescriptor.of(JAVA_INT));
        MethodHandle setLastError = linker.downcallHandle(
            lookup.lookup("SetLastError").orElseThrow(),
            FunctionDescriptor.ofVoid(JAVA_INT));

        setLastError.invoke(42);
        System.out.println(getLastError.invoke());
    }
}
```
(Note that the same issue occurs with JNI, but the reproducer is more complex since it requires a native library to be compiled as well).

Compile with: `javac --release 20 --enable-preview -d classes Main.java`.

Then, when running normally with: `java --enable-preview -cp classes Main` the program prints '42'.

However, when the same program is run in `jdb` with `jdb -R--enable-preview -classpath classes Main`, the program prints '0'.

The GetLastError value is thread local, so it seems that some debugging routine is overwriting the value by calling Windows API methods on the debuggee thread, and afterwards doesn't restore the previous error value.
Comments
Changeset: dfc16e04 Author: Kevin Walls <kevinw@openjdk.org> Date: 2022-09-13 07:34:55 +0000 URL: https://git.openjdk.org/jdk/commit/dfc16e047f1f8adaa8510574d00bf9f958902c43
13-09-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/10147 Date: 2022-09-02 14:47:35 +0000
02-09-2022

Yes, that's right - looping the test I see 0,42,42,42,42,42...etc...
26-08-2022

Given your stack trace above, I assume that means this issue is only turning up for the test case the first time the code is executed, since it is triggered by constant pool resolution. If you put it in a loop, subsequent iterations of the loop would not see the problem.
26-08-2022

Update... We know a thread itself can break its own last error value in various ways, like suspending or calling Thread.currentThread(), and mostly we can argue that's not a problem as you should be calling GetLastError "immediately". Presence of jdb disturbs the test even when GetLastError is called as immediately as possible, so Java code using Panama to access the Windows last error value is "un-debuggable". Have found the specific place that happens under jdb: access to the JavaThread's thread->threadObj() by JvmtiVirtualThreadEventMark: 1:033> k # Child-SP RetAddr Call Site 00 000000b3`8f2fe0b0 00007ffc`f6ba1417 jvm!ThreadLocalStorage::thread+0xf [s\open\src\hotspot\os\windows\threadlocalstorage_windows.cpp @ 56] 01 (Inline Function) --------`-------- jvm!Thread::current_or_null_safe+0xe [s\open\src\hotspot\share\runtime\thread.hpp @ 663] 02 000000b3`8f2fe0e0 00007ffc`f6d14ed0 jvm!JavaThread::threadObj+0x17 [s\open\src\hotspot\share\runtime\javathread.cpp @ 163] 03 (Inline Function) --------`-------- jvm!JvmtiVirtualThreadEventMark::{ctor}+0x51 [s\open\src\hotspot\share\prims\jvmtiexport.cpp @ 205] 04 (Inline Function) --------`-------- jvm!JvmtiClassEventMark::{ctor}+0x51 [s\open\src\hotspot\share\prims\jvmtiexport.cpp @ 217] 05 000000b3`8f2fe110 00007ffc`f6b76de7 jvm!JvmtiExport::post_class_prepare+0x2b0 [s\open\src\hotspot\share\prims\jvmtiexport.cpp @ 1392] 06 000000b3`8f2fe250 00007ffc`f6f5edec jvm!InstanceKlass::link_class_impl+0x4f7 [s\open\src\hotspot\share\oops\instanceklass.cpp @ 919] 07 000000b3`8f2fe350 00007ffc`f6f5ec4a jvm!SystemDictionary::resolve_hidden_class_from_stream+0x18c [s\open\src\hotspot\share\classfile\systemdictionary.cpp @ 922] 08 000000b3`8f2fe3e0 00007ffc`f6c12bb8 jvm!SystemDictionary::resolve_from_stream+0x1a [s\open\src\hotspot\share\classfile\systemdictionary.cpp @ 1014] 09 000000b3`8f2fe420 00007ffc`f6c24643 jvm!jvm_lookup_define_class+0x508 [s\open\src\hotspot\share\prims\jvm.cpp @ 1004] 0a 000000b3`8f2fe690 00007ffd`5b731657 jvm!JVM_LookupDefineClass+0x103 [s\open\src\hotspot\share\prims\jvm.cpp @ 1060] 0b 000000b3`8f2fe6f0 000001b8`fba52b9c java!Java_java_lang_ClassLoader_defineClass0+0x13b [s\open\src\java.base\share\native\libjava\classloader.c @ 277] 0c 000000b3`8f2fe820 00000008`004261c8 0x000001b8`fba52b9c 0d 000000b3`8f2fe828 00000008`0043e470 0x00000008`004261c8 0e 000000b3`8f2fe830 00000006`22806878 0x00000008`0043e470 ...etc... JvmtiVirtualThreadEventMark calls: _jthread = to_jobject(thread->threadObj()); (JvmtiThreadEventMark does this also) PreserveLastError is a utility in os_windows.cpp. I can use a copy of this in JvmtiVirtualThreadEventMark before the call to threadObj(), and it avoids the problem. But there are a number of places JVMTI calls threadObj() which might need the same preservation code, so that could get messy and hard to know it's correct. So that's another possible solution: the brute-force band-aid, or specific preservation of last error in maybe various places in JVMTI, or to rework or revert JDK-8289091. The 8289091 change makes threadObj call Thread::current_or_null_safe "only" to be able to assert or guarantee on it. Putting that check back in to SharedRuntime::get_java_tid() still passes the GetLastError test in jdb, so that seems a good option. Then the bad news. You would expect a thread calling System.gc to break its own last error value. But, a separate thread calling System.gc can also break a thread that sets and checks last error! A GC between your set and get, will clear your last error. This is rare, it takes tens of thousands of attempts for last error to get reset. However, that is NOT caused by 8289091, I can reproduce it before that change. Does the brute-force/band aid save/restore fix that? No! Not sure what exactly in the GC is clearing it, but it doesn't seem to be from threadObj() or similar. It may be that switching back the Thread::current() and guarantee into SharedRuntime::get_java_tid() is the immediate fix here, as that makes most usage work as well as it used to. More follow up needed about the last error reset caused by GC.
26-08-2022

I don't think we should add even more overhead to ThreadLocalStorage::thread. I would rather we avoid the usage of ThreadLocalStorage::thread that is triggering the problem i.e that we revert the change made in JDK-8289091. From https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-getlasterror "The Return Value section of the documentation for each function that sets the last-error code notes the conditions under which the function sets the last-error code. Most functions that set the thread's last-error code set it when they fail. However, some functions also set the last-error code when they succeed. If the function is not documented to set the last-error code, the value returned by this function is simply the most recent last-error code to have been set; some functions set the last-error code to 0 on success and others do not." I remain convinced that the more general problem here is very much panama specific because it separates the API call and the check of GetLastError, in a way that you would not do in native code.
22-08-2022

I think this fix is probably a good idea, but it does seem somewhat like a band aid. Certainly there are many other windows APIs being called that could potentially set the last error if one occurred. The underlying issue of the jvm and libraries potentially causing the last error to be modified is still there and could be observed with code similar to the example given in this CR. It just a matter of triggering a failure on a windows API call.
22-08-2022

[~dholmes] - Got it. Thanks for the clarification. [~kevinw] - I like that fix. However, it is begging for a comment. Perhaps some like: // Save the last error value since TlsGetValue() always sets the last error value // unlike most Windows APIs which set last error value only when there is an error: DWORD lastErr = GetLastError();
19-08-2022

I think you usually only call GetLastError if the api call returns e.g. -1 to signify an error occurred.
19-08-2022

[~dcubed] When I said "back it out again" I didn't mean to imply it was previously backed out, I simply meant having put in this change perhaps we should take it out "again". > it specifically is documented to set the last error to zero. Most calls only set if there is an error. I would expect every successful call to a win32 API to set it to zero otherwise any pre-existing non-zero would be wrongly reported when the API call actually succeeded!
19-08-2022

ThreadLocalStorage::thread() could save, and restore, the last error value, to cope with TlsGetValue() being unusual in that it specifically is documented to set the last error to zero. Most calls only set if there is an error. That does work fine, but maybe has an overhead: bash-4.2$ git diff diff --git a/src/hotspot/os/windows/threadLocalStorage_windows.cpp b/src/hotspot/os/windows/threadLocalStorage_windows.cpp index 7d809518aab..dd556abb9a6 100644 --- a/src/hotspot/os/windows/threadLocalStorage_windows.cpp +++ b/src/hotspot/os/windows/threadLocalStorage_windows.cpp @@ -51,9 +51,11 @@ Thread* ThreadLocalStorage::thread() { // the initialization process, which is using Thread::current without // checking TLS is initialized - see java.cpp vm_exit assert(_initialized, "TLS not initialized yet!"); + DWORD lastErr = GetLastError(); Thread* current = (Thread*) TlsGetValue(_thread_key); assert(current != 0 || GetLastError() == ERROR_SUCCESS, "TlsGetValue failed with error code: %lu", GetLastError()); + SetLastError(lastErr); return current; }
19-08-2022

Wow this is quite the mess... First, I'm slightly confused about something: [~dholmes] you wrote: > but I think it may be better to back it out again. I don't remember backing out JDK-8289091 before. Please clarify. Second, perhaps we should be looking at ThreadLocalStorage::thread() which, as a getter, should not be overwriting the Windows last error value.
19-08-2022

Great find [~kevinw]! We may have to review what we did in JDK-8289091. Even ignoring this issue the use of current_or_null_safe() is slow, and that may have a performance impact that we have not observed/attributed yet. I was the one that wanted the change in JDK-8289091 but I think it may be better to back it out again. [~dcubed] - thoughts?
19-08-2022

It's more general than that, can reproduce it without jdb, e.g. if the thread is suspended. Any use of ThreadLocalStorage::thread, such as the call by Thread::current_or_null_safe, is enough to clear the Windows last errror value (and this happens frequently). 8289091 adds places we call Thread::current_or_null_safe, and does turn the behaviour on and off. Editing the testcase to do: ... setLastError.invoke(42); System.out.println(getLastError.invoke()); try { System.in.read(); } catch (Exception e) { } System.out.println(getLastError.invoke()); ...and running outside of jdb, will print 42 the first time, and 0 the second time (with jdb means it prints 0 both times). Similarly, calling: System.out.println(Thread.currentThread().getName()); ...will reset the last error value to zero, and so will entering a synchronized block.
18-08-2022

The debug agent and JDI also use TlsGetValue in the transport code. Amusingly, one call is indirectly from and API called setLastError, although has to do with the Windows SetLastError. Have you tried bracketing the TlsGetValue call with Get/SetLastError to restore it?
17-08-2022

Seems to be a recent change: failure first appears in jdk-20+5-232 jdk-20+5-231 WORKS ...prints 42 jdk-20+5-232 FAILS ...prints 0 In that build the change: JDK-8289091: move oop safety check from SharedRuntime::get_java_tid() to JavaThread::threadObj() ...has a new call to Thread::current_or_null_safe(), which calls ThreadLocalStorage::thread(). That calls TlsGetValue(), and checks the value of GetLastError(). TlsGetValue specifically calls SetLastError when it _succeeds_ which is unusual: https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-tlsgetvalue Built a latest jdk that still shows the error (prints 0), then built the same with 8289091 changed removed, and no error (prints 42). (which is not saying it should be removed!) A call to e.g. Thread::get_thread_name_string() could be the cause of the behaviour change under jdb.
17-08-2022

Yes, I had already looked into that. There are some, but I don't think it likely they would be be triggered during execution of the code in question: VM_INIT VM_DEATH THREAD_START THREAD_END CLASS_PREPARE GC_FINISH
16-08-2022

[~cjplummer] when running under the debugger are any JVM TI events enabled?
16-08-2022

Seems to me what is most likely resetting GetLastError is the debug agent's use of JVMTI raw monitors, which would make this a JVMTI bug. However that's just a guess at this point. Assuming no actual debugging is being done (breakpoints, single stepping, etc), and this problem still happens just by virtue of running in the debugger, then it's not clear to me what interaction the debug agent or JVMTI would be having with the app. Seems it should just run without any involvement by either.
15-08-2022

See the discussion in panama-dev: https://mail.openjdk.org/pipermail/panama-dev/2022-August/017339.html I think the debugger is but one thing that would have to be fixed for this to reliably work in general. Even then I don't know if it is actually technically feasible to save/restore last-error around the debugger "entry" points. The Java debugger isn't really designed/intended/expected to interact with native code this way.
15-08-2022

I made this issue to see how others think about this. It seems that at least in theory the debugger could save and restore the value of thread-local variables like (WSA)GetLastError/errno around API calls when running code on the debuggee thread. I don't know if there are any technical issue with that? From talking to people in the community, it seems like a common problem that frameworks that interact with native code have to work around. So, it seems like there is significant value in fixing this. (rather than adding yet another workaround)
13-08-2022

I think this is expecting a bit much of the debugger. It doesn't know about native state like this and can't know that your apparent Java code is actually manipulating native state.
12-08-2022