JDK-8341518 : TestJcmdWithSideCar.java fails with 'sun.tools.jcmd.JCmd' missing from stdout/stderr
  • Type: Bug
  • Component: core-svc
  • Sub-Component: tools
  • Affected Version: 24
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2024-10-04
  • Updated: 2024-10-14
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Relates :  
Sub Tasks
JDK-8341803 :  
Description
The test containers/docker/TestJcmdWithSideCar.java is failing (sometimes!) in the TMP_MOUNTED_INTO_SIDECAR mode.

It fails where the command:

docker run --tty=true --rm --cap-add=SYS_PTRACE --sig-proxy=true --pid=container:test-container-main --cap-add=NET_BIND_SERVICE --user=10668:10668 --volumes-from test-container-main jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd /jdk/bin/jcmd -l

...gets no output, where a good run would show:

[STDOUT]
1 EventGeneratorLoop 120
24 jdk.jcmd/sun.tools.jcmd.JCmd -l

e.g.

[main-container-process] MAIN_METHOD_STARTED, argument is 120
Attach strategy TMP_MOUNTED_INTO_SIDECAR
[COMMAND]
docker run --tty=true --rm --cap-add=SYS_PTRACE --sig-proxy=true --pid=container:test-container-main --cap-add=NET_BIND_SERVICE --user=10668:10668 --volumes-from test-container-main jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd /jdk/bin/jcmd -l
[2024-10-03T04:30:35.273416534Z] Gathering output for process 12125
[ELAPSED: 1068 ms]
[STDERR]

[STDOUT]

Full child process STDOUT was saved to docker-stdout-12125.log
[2024-10-03T04:30:36.341378706Z] Waiting for completion for process 12125
[2024-10-03T04:30:36.341534140Z] Waiting for completion finished for process 12125
[COMMAND]
docker rmi --force jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd
[2024-10-03T04:30:36.349399928Z] Gathering output for process 12260
[ELAPSED: 27 ms]
[STDERR]

[STDOUT]
Untagged: jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd

Full child process STDOUT was saved to docker-stdout-12260.log
----------System.err:(16/748)----------
stdout: [];
stderr: []
exitValue = 0

java.lang.RuntimeException: 'sun.tools.jcmd.JCmd' missing from stdout/stderr
at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:253)
at TestJcmdWithSideCar.testCase01(TestJcmdWithSideCar.java:135)
at TestJcmdWithSideCar.main(TestJcmdWithSideCar.java:111)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:573)
at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
at java.base/java.lang.Thread.run(Thread.java:1576)
Comments
Our CI tests with docker, fail with elevated-true. "elevated" means add "--cap-add=NET_BIND_SERVICE" Our CI runs with podman are OK using "--cap-add=NET_BIND_SERVICE"
09-10-2024

This failure should be addressed by the change in JDK-8341436: https://github.com/openjdk/jdk/pull/21331 (plan to close this as a duplicate if that all works out!).
08-10-2024

Timing in several failures shows the EventGeneratorLoop (and main container) finishing after around 10 seconds, not the intended 120 seconds. e.g. [2024-10-05T03:17:24.894641357Z] is timestamp gathering output for first jcmd ... [2024-10-05T03:17:34.716242208Z] is timestamp before event generator ending message, then the failing jcmd runs. [2024-10-05T03:19:26.708287284Z] is the timestamp for gathering that jcmd output, which is 2 minutes later. EventGenerator loop could calculate and ending wall clock timestamp rather than count iterations of a 1-second delay loop.
07-10-2024

Looking through some failures: Often podman is quicker, and docker runs are slower and fail, but this is not always the case. A GOOD run on OEL 8 with podman: Attach strategy TMP_MOUNTED_INTO_SIDECAR [COMMAND] podman run --tty=true --rm --cap-add=SYS_PTRACE --sig-proxy=true --pid=container:test-container-main --volumes-from test-container-main jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd /jdk/bin/jcmd -l [2024-10-03T18:25:56.355782702Z] Gathering output for process 3428477 [ELAPSED: 800 ms] A bad run on OEL 7 with docker: The failing part of the test has a time like: [ELAPSED: 1304 ms] ..and has shown the mesage: EventGeneratorLoop is coming to an end But then we also have a GOOD OEL 7 run with docker: [main-container-process]:docker run --tty=true --rm --volume /opt/..../testoutput/test-support/jtreg_open_test_hotspot_jtreg_containers/classes/0/containers/docker/TestJcmdWithSideCar.d:/test-classes/ --cap-add=SYS_PTRACE --name test-container-main --volume /tmp --volume /opt/..../testoutput/test-support/jtreg_open_test_hotspot_jtreg_containers/scratch/0/.:/workdir/ jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd /jdk/bin/java -XX:+ErrorFileToStderr -cp /test-classes/ -XX:+UsePerfData EventGeneratorLoop 120 MainContainer: setting mainMethodStarted [main-container-process] MAIN_METHOD_STARTED, argument is 120 [COMMAND] docker run --tty=true --rm --cap-add=SYS_PTRACE --sig-proxy=true --pid=container:test-container-main --volumes-from test-container-main jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd /jdk/bin/jcmd -l [2024-10-03T17:14:52.878657109Z] Gathering output for process 16355 [ELAPSED: 1465 ms] [STDERR] [STDOUT] 1 EventGeneratorLoop 120 24 jdk.jcmd/sun.tools.jcmd.JCmd -l Much later ".......EventGeneratorLoop is coming to an end" appears just before the "docker rmi" call. If EventGeneratorLoop gets an InterruptedException, it can end early? (well, the 1 second sleep ends early, but it is doing that 120 times...) A timing issue? But the test that fails is only looking for JCmd (which it checks exits with 0). We don't even see JCmd in the output? It would look for EventGeneratorLoop next, but doesn't get that far.
04-10-2024