JDK-8270549 : Java_java_lang_ProcessEnvironment_environ is not thread safe
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 8,11,16,17,18
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2021-07-15
  • Updated: 2021-09-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Description
The native implementation of ProcessEnvironment.environ[1] iterates over the global `environ` variable twice, expecting it to be the same length in both iterations. However, any interleaving call to setenv/putenv could change the contents. One such call is from libjli.dylib on macOS, which sets an environment variable[2] after the JVM has been booted.

We see this with some frequency during GraalVM testing with stack traces such as:

Caused by: java.lang.ArrayIndexOutOfBoundsException: 145
        at java.lang.ProcessEnvironment.environ(java.base@11.0.10/Native Method)
        at java.lang.ProcessEnvironment.<clinit>(java.base@11.0.10/ProcessEnvironment.java:70)
        at java.lang.System.getenv(java.base@11.0.10/System.java:1049)
        at org.graalvm.compiler.options.ModuleSupport.<clinit>(jdk.internal.vm.compiler/ModuleSupport.java:31)

[1] https://github.com/openjdk/jdk/blob/739769c8fc4b496f08a92225a12d07414537b6c0/src/java.base/unix/native/libjava/ProcessEnvironment_md.c#L50
[2] https://github.com/openjdk/jdk/blob/739769c8fc/src/java.base/macosx/native/libjli/java_md_macosx.m#L862
Comments
[~rriggs] this problem continues to persist in GraalVM testing. At this point, it seems like the best option going forward is one of the following: * Fix AWT initialization on macOS to avoid use of env vars * Snapshot the env vars during VM startup while the VM is single threaded * Fix ProcessEnvironment.environ to be more defensive What do you think?
28-09-2021

Graal initialization happens when the first top tier compilation is scheduled on a CompilerThread. That is, it happens soon after JIT compilation is enabled in the bootstrap sequence. The problematic call is happening inside libgraal which has a separate heap (it's a GraalVM Native Image heap) from the HotSpot heap so calling into libs should not be a problem. It's only the JNI call to ProcessEnvironment.environ that is a problem. I think focussing on Graal may be a bit of a distraction. As far as I know, there's nothing guaranteeing that ProcessEnvironment.environ is called during VM startup. The first call to System.getenv() may be from app code long after the VM has started. And that app code may be running concurrently with another thread that is deep in some native library code that mutates `environ` with setenv/putenv. It makes sense to harden ProcessEnvironment.environ for that scenario as well. For instance, this has been reported in non-Graal cases (e.g. https://github.com/GoogleCloudPlatform/google-cloud-eclipse/issues/1445) Alternatively, maybe the focus should be on the AWT code in question. Maybe there's a better way to do the required communication other than via env vars.
31-08-2021

Another idea: eagerly copy environ while the VM is single threaded. The copy can be made in native code and still let the Java copy be done lazily.
22-07-2021

At what point in the bootstrap of the libraries is the GraalVM thread making that call to System.getenv()? Is initPhase3 complete? (See jdk.internal.misc.VM.initlevel()). Calling into the libraries before then can disturb the boot sequence and possibly affect startup performance. Can GrallVM directly read from the native environment instead to avoid changing the Java startup? (And it would apply to all versions, not just the latest).
22-07-2021

> Suppose AWT looses and the snapshot does not include the environment variable of the main class. As far as I can see, the consumer of the env var related to AWT is set by native code (in src/java.base/macosx/native/libjli/java_md_macosx.m) and also read by native code (in src/java.desktop/macosx/native/libosxapp/NSApplicationAWT.m) so hardening the Java snapshotting of the environment should not be a problem. In terms of hardening the snapshot, it could restart if some other thread changes the environment during the process of snapshotting.
21-07-2021

ProcessEnvironment makes a snapshot of the environment. The call by GrallVM is in a race with the AWT initialization. Suppose AWT looses and the snapshot does not include the environment variable of the main class. It could result in intermittent problems getting the application/main class. Simply hardening the snapshot will just trade one intermittent problem for another.
21-07-2021

I don't have a pre-existing AWT test that fails and I'm not sure what extra help it would be if there was one. How is the state reproduced by the GetEnv test invalid? Granted it may not be common, but a JNI method might well mutate `environ` as this test does. In fact, that's exactly what the JVM thread does. In terms of what we see with GraalVM, this is what I assume is happening: by the time the main thread calls PostJVMInit, the VM has already started the compiler threads. When the main thread calls SetMainClassForAWT (which mutates environ[1]), it can happen that a compiler thread is initializing Graal. Let me know what further context you need. The details of GR-30530 don't contain anything more than what I've outlined here. [1] https://github.com/openjdk/jdk/blob/739769c8fc/src/java.base/macosx/native/libjli/java_md_macosx.m#L862
21-07-2021

Ok, do you now have sufficient failure info? If not, can you please clarify what you mean by "original test failure".
21-07-2021

Thanks Doug, I want to understand the original test failure that occurs with AWT initialization before patching in a workaround.
21-07-2021

In addition to the stack trace in this issue's description, I have this stack trace: Exception during HotSpotJVMCIRuntime initialization java.lang.ExceptionInInitializerError at java.lang.System.getenv(java.base@11.0.11/System.java:1049) at org.graalvm.compiler.options.ModuleSupport.<clinit>(jdk.internal.vm.compiler/ModuleSupport.java:31) at org.graalvm.compiler.options.OptionsParser.getOptionsLoader(jdk.internal.vm.compiler/OptionsParser.java:54) at org.graalvm.compiler.hotspot.HotSpotGraalOptionValues.parseOptions(jdk.internal.vm.compiler/HotSpotGraalOptionValues.java:104) at org.graalvm.compiler.hotspot.HotSpotGraalOptionValues.initializeOptions(jdk.internal.vm.compiler/HotSpotGraalOptionValues.java:156) at org.graalvm.compiler.hotspot.HotSpotGraalOptionValues.defaultOptions(jdk.internal.vm.compiler/HotSpotGraalOptionValues.java:84) at org.graalvm.compiler.hotspot.HotSpotGraalCompilerFactory.onSelection(jdk.internal.vm.compiler/HotSpotGraalCompilerFactory.java:89) at jdk.vm.ci.hotspot.HotSpotJVMCICompilerConfig.getCompilerFactory(jdk.internal.vm.ci@11.0.11/HotSpotJVMCICompilerConfig.java:133) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.<init>(jdk.internal.vm.ci@11.0.11/HotSpotJVMCIRuntime.java:552) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(jdk.internal.vm.ci@11.0.11/HotSpotJVMCIRuntime.java:176) at jdk.vm.ci.runtime.JVMCI.initializeRuntime(jdk.internal.vm.ci@11.0.11/Native Method) at jdk.vm.ci.runtime.JVMCI.getRuntime(jdk.internal.vm.ci@11.0.11/JVMCI.java:65) Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 137 out of bounds for length 136 at java.lang.ProcessEnvironment.environ(java.base@11.0.11/Native Method) at java.lang.ProcessEnvironment.<clinit>(java.base@11.0.11/ProcessEnvironment.java:70) at java.lang.System.getenv(java.base@11.0.11/System.java:1049) at org.graalvm.compiler.options.ModuleSupport.<clinit>(jdk.internal.vm.compiler/ModuleSupport.java:31) at org.graalvm.compiler.options.OptionsParser.getOptionsLoader(jdk.internal.vm.compiler/OptionsParser.java:54) at org.graalvm.compiler.hotspot.HotSpotGraalOptionValues.parseOptions(jdk.internal.vm.compiler/HotSpotGraalOptionValues.java:104) at org.graalvm.compiler.hotspot.HotSpotGraalOptionValues.initializeOptions(jdk.internal.vm.compiler/HotSpotGraalOptionValues.java:156) at org.graalvm.compiler.hotspot.HotSpotGraalOptionValues.defaultOptions(jdk.internal.vm.compiler/HotSpotGraalOptionValues.java:84) at org.graalvm.compiler.hotspot.HotSpotGraalCompilerFactory.onSelection(jdk.internal.vm.compiler/HotSpotGraalCompilerFactory.java:89) at jdk.vm.ci.hotspot.HotSpotJVMCICompilerConfig.getCompilerFactory(jdk.internal.vm.ci@11.0.11/HotSpotJVMCICompilerConfig.java:133) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.<init>(jdk.internal.vm.ci@11.0.11/HotSpotJVMCIRuntime.java:552) at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.runtime(jdk.internal.vm.ci@11.0.11/HotSpotJVMCIRuntime.java:176) at jdk.vm.ci.runtime.JVMCI.initializeRuntime(jdk.internal.vm.ci@11.0.11/Native Method) at jdk.vm.ci.runtime.JVMCI.getRuntime(jdk.internal.vm.ci@11.0.11/JVMCI.java:65) In terms of a test that fails, there's the one I mention in the second comment of this issue.
20-07-2021

Can you give an example test or tests that fail. A stack trace of the other active threads would be useful too. The use of startOnFirstThread is said to be very atypical.
20-07-2021

In the context of GraalVM, one case where we have seen this is where Graal initialization reads an environment variable at the same time AWT is initializing. I agree that there appears to indeed be no way to completely safely snapshot the environment. However, the implementation could be more defensive. For example, the second loop over the `environ` global should only process at most `result.length` entries. If there are fewer than `result.length` entries, then `result` should be resized before returning.
19-07-2021

Looks like this issue has been lurking for some time, maybe the usages of putenv in the AWT code can be re-visited to see if they are still required. If they are then we may have to coordinate with the initialisation of the process environment.
19-07-2021

The Linux setenv, putenv, getenv are not thread safe and would require external synchronization. There is no thread safe way to snapshot the environment. The ProcessEnvironment asserts that it reads the process environment exactly once. What other threads are running that would be modifying the environment? The sole uses in the JDK are related to AWT startup.
19-07-2021

I wrote a simple jtreg test for this bug: https://github.com/dougxc/jdk/commit/1941789ae228f987f7a4b40c437e3b2c3c282cf0 make exploded-test TEST=test/jdk/java/lang/System/GetEnv.java
15-07-2021