JDK-6303969 : JDWP: Socket Transport handshake fails rarely on InstancesTest.java
  • Type: Bug
  • Component: core-svc
  • Sub-Component: debugger
  • Affected Version: 5.0u16,6,7,8,9,10
  • Priority: P2
  • Status: Resolved
  • Resolution: Duplicate
  • OS: solaris,solaris_9
  • CPU: x86,sparc
  • Submitted: 2005-07-29
  • Updated: 2019-05-22
  • Resolved: 2017-10-16
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10
10Resolved
Related Reports
Duplicate :  
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
I've seen this twice out of 1000s of runs of InstancesTest.java with
a nightly test jdk - ie, a promoted jdk product build (eg b44) containing a fastdebug VM.

In the first failure, the test hung.
The debuggee was started by the debugger with this cmd line:

/net/vmsqe.sfbay/export/weekly/mustang/JDK/service_hs_baseline/jdk1.6/solaris-sparcv9/jre/bin/java \
   -Xdebug -Xrunjdwp:transport=dt_socket,address=mmm:32833,suspend=y \
   InstancesTarg

The debuggee is trying to attach to the debugger:

----------------- t@1 -----------------
0xff31c578	_so_recv + 0x8
0xfd090dd8	handshake + 0x78
0xfd091730	socketTransport_attach + 0x178
0xff0bd610	transport_startTransport + 0x2ac
0xff0a4904	startTransport + 0x70
0xff0a1004	bagEnumerateOver + 0x24
0xff0a5190	initialize + 0x4c0
0xff0a4078	cbEarlyVMInit + 0x11c
0xfddbdb34	void JvmtiExport::post_vm_initialized() + 0xb94
0xfe1f45a0	int Threads::create_vm(JavaVMInitArgs*,bool*) + 0x1064
0xfdadc174	JNI_CreateJavaVM + 0xe4
0x00011df8	main + 0x8e8
0x000110e0	_start + 0x108




Here are the relevant debugger threads.  It looks to me
like it is just waiting for the debuggee to attach:

---- the main thread -- it is waiting for the accept to finish
----------------- t@1 -----------------
0xff31f7d8	___lwp_cond_wait + 0x4
0xfdfd4358	void os::Solaris::Event::park() + 0xb8
0xfdfcf6c4	void ObjectMonitor::wait(long long,bool,Thread*) + 0x998
0xfe194aa8	void ObjectSynchronizer::wait(Handle,long long,Thread*) + 0x4cc
0xfdb4cc88	JVM_MonitorWait + 0xcd0
0xfb01513c	* java.lang.Object.wait(long) bci:0 (Interpreted frame)
0xfb015010	* java.lang.Object.wait(long) bci:0 (Interpreted frame)
0xfb0057e8	* java.lang.Object.wait() bci:2 line:484 (Interpreted frame)
0xfb0057e8	* com.sun.tools.jdi.AbstractLauncher$Helper.launchAndAccept() bci:46 line:170 (Interpreted frame)
0xfb0057e8	* com.sun.tools.jdi.AbstractLauncher.launch(java.lang.String[], java.lang.String, com.sun.jdi.connect.spi.TransportService$ListenKey, com.sun.jdi.connect.spi.TransportService) bci:17 line:114 (Interpreted frame)
0xfb005908	* com.sun.tools.jdi.SunCommandLineLauncher.launch(java.util.Map) bci:585 line:217 (Interpreted frame)


----- the connection acceptor thread - it is doing an accept

----------------- t@13 -----------------
0xff31c3ec	_so_accept + 0x4
0xfd04c500	Java_java_net_PlainSocketImpl_socketAccept + 0xfc
0xfb01513c	* java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) bci:694001496 (Interpreted frame)
0xfb015010	* java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) bci:0 (Interpreted frame)
0xfb0057e8	* java.net.PlainSocketImpl.accept(java.net.SocketImpl) bci:7 line:384 (Interpreted frame)
0xfb0057e8	* java.net.ServerSocket.implAccept(java.net.Socket) bci:50 line:450 (Interpreted frame)
0xfb0057e8	* java.net.ServerSocket.accept() bci:48 line:421 (Interpreted frame)
0xfb005908	* com.sun.tools.jdi.SocketTransportService.accept(com.sun.jdi.connect.spi.TransportService$ListenKey, long, long) bci:95 line:330 (Interpreted frame)
0xfb005908	* com.sun.tools.jdi.AbstractLauncher$Helper$2.run() bci:16 line:237 (Interpreted frame)



--- the monitoring thread - it is waiting for the debuggee to exit
----------------- t@14 -----------------
0xff31f7d8	___lwp_cond_wait + 0x4
0xfdfd4358	void os::Solaris::Event::park() + 0xb8
0xfdfcf6c4	void ObjectMonitor::wait(long long,bool,Thread*) + 0x998
0xfe194aa8	void ObjectSynchronizer::wait(Handle,long long,Thread*) + 0x4cc
0xfdb4cc88	JVM_MonitorWait + 0xcd0
0xfb01513c	* java.lang.Object.wait(long) bci:0 (Interpreted frame)
0xfb015010	* java.lang.Object.wait(long) bci:0 (Interpreted frame)
0xfb0057e8	* java.lang.Object.wait() bci:2 line:484 (Interpreted frame)
0xfb0057e8	* java.lang.UNIXProcess.waitFor() bci:8 line:115 (Interpreted frame)
0xfb005950	* com.sun.tools.jdi.AbstractLauncher$Helper$1.run() bci:7 line:217 (Interpreted frame)

----------------------------

In the 2nd failure, the test did not hang, instead we got a handshake failure
message, containing gabage in the msg received:
VM initialization failed for: /net/vmsqe.sfbay/export/weekly/mustang/JDK/service_hs_baseline/jdk1.6/solaris-sparcv9/jre/bin/java -Xdebug -Xrunjdwp:transport=dt_socket,address=mmm:48158,suspend=y InstancesTarg

ERROR: transport error 202: handshake failed - received >gnudoit: Conne< - excepted >JDWP-Handshake<
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:678]
FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

 Target VM failed to initialize.
Exception in thread "main" java.lang.NullPointerException
	at VMConnection.open(VMConnection.java:146)
	at TestScaffold.connect(TestScaffold.java:609)
	at TestScaffold.startUp(TestScaffold.java:340)
	at TestScaffold.startTo(TestScaffold.java:350)
	at TestScaffold.startToMain(TestScaffold.java:345)
	at InstancesTest.runTests(InstancesTest.java:129)
	at TestScaffold.startTests(TestScaffold.java:406)
	at InstancesTest.main(InstancesTest.java:99)


The garbage msg is "gnudoit: Conne".  gnudoit is a shell script in xemacs
that calls gnuclient to connect to a running xemacs process via a program
called gnuserv.  I suppose it uses sockets.
Is that msg just garbage, or did some sort of cross connection foul up occur?
I don't think I ever use gnudoit - I use gnuclient directly.
I don't think gnclient nor gnuserv were started or stopped
anywhere near when this error occured.
The previous entry refers to the following test:

    com/sun/jdi/InstancesTest.java


This failure mode has been seen in nightly testing. Here are
some entries from my analysis report:

New nsk.quick_jdwp failures (from 2007.12.15)
    nsk/jdwp/ReferenceType/Methods/methods001
        This test failed due to:

            ERROR: transport error 202: handshake failed - connection prematurally closed
            FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [/export2/BUILD_AREA/jdk7/jdk/src/share/back/debugInit.c:708]

        on Solaris AMD64 Server VM (machine colfax002). The test
        failure occurred between 2125 and 2155 on 2007.12.15.
        /var/adm/messages shows no log messages during that time; the
        previous message is at 2105 and the next message is at 2221.

New JDI_REGRESSION failures (from 2007.12.14)
    com/sun/jdi/GetLocalVariables3Test.sh
        This test failed due to:

            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [/export2/BUILD_AREA/jdk7/jdk/src/share/back/debugInit.c:708]

        on Solaris SPARC Server VM (machine vm-t2000-04c). The test
        failure occurred at 0222 on 2007.12.15. /var/adm/messages shows
        no log messages during that time; the previous message is a
        little less than an hour before and the next message is more
        than an hour later.

New nsk.quick_jdwp failures (from 2007.10.10)
    nsk/jdwp/ObjectReference/ReferringObjects/referringObjects001
        This test failed due to "FATAL ERROR in native method: JDWP No
        transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)"
        on Solaris X86 Client VM (machine vm-v20z-15).

        This test started at 2239 PDT and failed by 2309 PDT.
        /var/adm/messages on vm-v20z-15 shows an NFS error from pkg.eng
        about 1 minute after the test failed.

New nsk.quick_jdwp failures (from 2007.08.22)
    nsk/jdwp/ThreadReference/OwnedMonitors/ownmonitors001
        This test failed due to "FATAL ERROR in native method: JDWP No
        transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)"
        on Solaris X86 Server VM (machine vm-linux-1).

        Update: This test started at 2123 PDT and failed by 2154 PDT.
            /var/adm/messages on vm-linux-1 shows an NFS error from
            pkg.eng about 10 minutes before this test started and
            another a few minutes after the test failed. This test
            appeared to use all local resources.
Another occurrence in a different test from nightly:

New nsk.jdb failures (from 2008.01.29)
    nsk/jdb/set/set002
        This test failed due to:

            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [src/share/back/debugInit.c:708]
            CompileOnly: compileonly *nsk*.*
            VM option 'CompileOnly=nsk'
            FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

        on Solaris AMD64 Server VM (machine vm-v20z-15).
Another occurrence in a different test from nightly:

New nsk.quick-jdwp failures (from 2008.03.08)
*   nsk/jdwp/ReferenceType/GetValues/getvalues001
        This test failed due to:
            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [/export2/BUILD_AREA/jdk7-64bit/jdk/src/share/back/debugInit.c:708]
            FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

        on Solaris SPARC-64 Server VM (machine vm-v215-02).
Another occurrence in a different test from nightly:

New nsk.quick-jdwp failures (from 2008.04.05)
    nsk/jdwp/ObjectReference/GetValues/getvalues001
        This test failed due to:

            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [/BUILD_AREA/jdk7/jdk/src/share/back/debugInit.c:708]
            FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

        on Solaris X86 Client VM (machine vm-v20z-11).
New nsk.quick-jdwp failures (from 2008.06.04)
*   nsk/jdwp/ReferenceType/SourceDebugExtension/srcdebugext001
        This test failed due to:

            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/densk.share.Failure: Cought IOException while establishing JDWP transport connection:

        on Win32 Client VM (machine vmnightly2).
Another instance of this failure:

New JDI_REGRESSION failures (from 2008.07.29)
*   com/sun/jdi/RedefineImplementor.sh
        This test failed due to:

            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:708]

        on Solaris SPARC Client VM (machine sfsparc001).
Another instance of this failure from nightly:

New nsk.quick-jdwp failures (from 2008.09.20)
*   nsk/jdwp/Event/FIELD_ACCESS/fldaccess001
        This test failed due to

            FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:708]
            java.io.IOException: Target VM failed to handshake: too few bytes in reply: 0(expected: 14)
            at nsk.share.jdwp.Transport.handshake(Transport.java:115)

        on Win32 Client VM (machine vmnightly2).
Another instance of this failure from nightly:

New nsk.quick-jdwp failures (from 2008.10.21)
*   nsk/jdwp/ThreadReference/Frames/frames001
        This test failed due to

            FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:708]

        on Solaris X86 Client VM (machine vm-v20z-9).

        This failure mode is covered by the following bug:
            6303969 4/4 JDWP: Socket Transport handshake fails rarely
                        on InstancesTest.java

            I will copy this entry to 6303969.
nsk/jdwp/ReferenceType/ClassObject/classobj001 failed during testbase 1.5 testing with 
[2008-10-28T17:26:53.66] # Actual: /export/local/common/jdk/baseline/solaris-sparc/bin/java -client -Xmixed -Dsun.jvm.hotspot.runtime.VM.disableVersionCheck=1 nsk.jdwp.ReferenceType.ClassObject.classobj001 -arch=sparc -waittime=2 -debugee.vmkind=java -transport.address=dynamic "-debugee.vmkeys=-client -Xmixed"
[2008-10-28T17:26:53.67] debugee.stderr> ERROR: transport error 202: handshake failed - connection prematurally closed ["transport.c",L41]
[2008-10-28T17:56:53.98] debugee.stdout> FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=JVMTI_ERROR_INTERNAL(113)
[2008-10-28T17:56:53.98] debugee.stderr> ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510) ["debugInit.c",L500]
[2008-10-28T17:56:53.98] debugee.stderr> JDWP exit error JVMTI_ERROR_INTERNAL(113): No transports initialized
[

see
http://sqeweb.sfbay.sun.com/nfs/tools/gtee/results/JDK5/BASELINE/VM/r35_02/vm/solaris-sparc/client/mixed/solaris-sparc_client_mixed_nsk.jdwp.testlist/ResultDir/classobj001/classobj001.log
Another instance of this failure from nightly:

New JDI_REGRESSION failures (from 2008.11.11)
*   com/sun/jdi/WatchFramePop.sh
        This test failed due to

            FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:708]

        on Solaris X86 Client VM (machine sfv20-02).

        This failure mode is covered by the following bug:
            6303969 4/4 JDWP: Socket Transport handshake fails rarely
                        on InstancesTest.java

        I will copy this entry to 6303969.
Another instance of this failure from nightly:

New nsk.quick-jdwp failures (from 2008.11.26)
*   nsk/jdwp/Method/LineTable/linetable001
        This test failed due to

            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:708]
            FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

        on Solaris SPARC-64 Server VM (machine jtg-s130).

        This failure mode is covered by the following bug:
            6303969 4/4 JDWP: Socket Transport handshake fails rarely
                        on InstancesTest.java

        I will copy this entry to 6303969.
Another instance of this failure from nightly:

New JDI_REGRESSION failures (from 2009.01.07)
*   com/sun/jdi/ExclusiveBind.java
        This test failed due to

            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:708]
            FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

        on Solaris SPARC-64 Server VM (machine meena). This failure
        mode is covered by the following bug:

            6303969 4/4 JDWP: Socket Transport handshake fails rarely
                        on InstancesTest.java

        I will copy this entry to 6303969.

        This instance of the 6303969 failure mode is obscured by the
        nature of the test which is to try and launch two debuggees
        using the same port serially. The error messages from the
        expected failure are mixed in with the error messages from the
        6303969 failure mode.

http://sqeweb.sfbay.sun.com/nfs/results/vm/gtee/JDK7/NIGHTLY/VM/2009-01-07/RT_Baseline/javase/solaris-sparcv9/server/comp/solaris-sparcv9_server_comp_JDI_REGRESSION/analysis.html
Another instance of this failure from nightly:

New nsk.quick-jdwp failures (from 2009.03.13)
*   nsk/jdwp/ReferenceType/Instances/instances002
        This test failed due to

            ERROR: transport error 202: handshake failed - connection prematurally closed
            ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
            JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:708]
            FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)

        on Solaris X86 Server VM -Xcomp (machine intelsdv15). This
        failure mode is covered by the following bug:

            6303969 4/4 JDWP: Socket Transport handshake fails rarely
                        on InstancesTest.java

        I will copy this entry to 6303969.
While testing the fix for 6419370, I ran into an instance of
this bug with the following VM/NSK test:

    nsk/jdwp/ThreadReference/FrameCount/framecnt001

I was running the tests on my Solaris X86 machine using:

    java version "1.7.0-ea"
    Java(TM) SE Runtime Environment (build 1.7.0-ea-b71)
    Java HotSpot(TM) Server VM (build 17.0-b01-internal, mixed mode)

The machine info is:

SunOS oliver-gs 5.10 Generic_137138-09 i86pc i386 i86pc
Status of virtual processor 0 as of: 09/12/2009 20:06:46
  on-line since 08/22/2009 20:09:35.
  The i386 processor operates at 2800 MHz,
        and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 09/12/2009 20:06:46
  on-line since 08/22/2009 20:09:50.
  The i386 processor operates at 2800 MHz,
        and has an i387 compatible floating point processor.

I was able to reproduce the hang and got the following
interesting stack trace snippet:

THREAD t@2

t@2 (l@2) stopped in _so_recv at 0xfeec83a5
0xfeec83a5: _so_recv+0x0015:    jae      _so_recv+0x21  [ 0xfeec83b1, .+0xc ]
current thread: t@2
  [1] _so_recv(0x6, 0xfe26e600, 0xe, 0x0, 0xfe022b98), at 0xfeec83a5
  [2] dbgsysRecv(0x6, 0xfe26e600, 0xe, 0x0), at 0xfe0122f4
  [3] handshake(0x6, 0x0, 0x0, 0xfe011786), at 0xfe010ed7
  [4] socketTransport_attach(0xfe022d28, 0x8064972, 0x0, 0x0, 0x0, 0x0, 0xfed2e03c, 0xfe1458c9), at 0xfe011880
  [5] transport_startTransport(0x0, 0x8064968, 0x8064972, 0x0), at 0xfe145930
  [6] startTransport(0x80649a8, 0xfe26ec28, 0xfe26ebdc, 0xfe12d416), at 0xfe13047d
  [7] bagEnumerateOver(0x8060fc8, 0xfe130410, 0xfe26ec28, 0xfe13096d), at 0xfe12d442
  [8] initialize(0x8067110, 0x812a970, 0x13, 0xfe12fc2a), at 0xfe130986
  [9] cbEarlyVMInit(0x8064830, 0x8067110, 0x812a970, 0xfe6202fb), at 0xfe12fc93
  [10] JvmtiExport::post_vm_initialized(0x0, 0x0, 0x0, 0x0, 0x80666e0, 0xfed13d64), at 0xfe62068f
  [11] Threads::create_vm(0xfe26efb0, 0xfe26ef0f, 0xfe26efb0, 0xfe612dda), at 0xfeb30b63
=>[12] JNI_CreateJavaVM(vm = ???, penv = ???, args = ???) (optimized), at 0xfe612e57 (line ~3266) in "jni.cpp"

The entire doit_loop run got the following results:

Results: 18795 runs (3 FAILed, 18792 PASSed)
Failure details: hangs: 3
Run duration:   7 hour(s)  48 minute(s)  29 second(s)

I also did a run with the baseline Server VM:

Results: 31528 runs (5 FAILed, 31523 PASSed)
Failure details: hangs: 5
Run duration:   12 hour(s)  34 minute(s)  13 second(s)
While doing baseline JDK7-B72 testing, I ran into an instance of
this bug with the following VM/NSK test:

    nsk/jdwp/VirtualMachine/RedefineClasses/redefinecls001

I was running the tests on my Solaris X86 machine using:

    java version "1.7.0-ea"
    Java(TM) SE Runtime Environment (build 1.7.0-ea-b72)
    Java HotSpot(TM) Server VM (build 17.0-b01, mixed mode)
While doing baseline JDK7-B73 testing, I ran into an instance of
this bug with the following VM/NSK test:

    nsk/jdwp/Method/Bytecodes/bytecodes001

I was running the tests on my Solaris X86 machine using:

    CompileOnly: compileonly *nsk*.*
    VM option 'CompileOnly=nsk'
    java version "1.7.0-ea"
    Java(TM) SE Runtime Environment (build 1.7.0-ea-b73)
    Java HotSpot(TM) Client VM (build 17.0-b02-fastdebug, compiled mode)

I used my doit_loop script to try and reproduce the hang and
the entire doit_loop run got the following results:

Results: 43999 runs (6 FAILed, 43993 PASSed)
Failure details: crashes: 3 hangs: 3
Run duration:   20 hour(s)  5 minute(s)  22 second(s)

Run #7168 hung in the handshake() code path
Run #15074 crashed with "handshake failed - connection prematurally closed"
Run #23075 crashed with "handshake failed - connection prematurally closed"
Run #23076 hung in the handshake() code path
Run #24004 hung in the handshake() code path
Run #39378 crashed with "handshake failed - connection prematurally closed"
While testing the fix for 6580131, I ran into an instance of
this bug with the following VM/NSK test:

    nsk/jdwp/VirtualMachine/HoldEvents/holdevents001

I was running the tests on my Solaris X86 machine using:

    CompileOnly: compileonly *nsk*.*
    VM option 'CompileOnly=nsk'
    java version "1.7.0-ea"
    Java(TM) SE Runtime Environment (build 1.7.0-ea-b78)
    Java HotSpot(TM) Client VM (build 17.0-b06-internal-fastdebug, compiled mode)

Here is a snippet of the failure:

ERROR: transport error 202: handshake failed - connection prematurally closed
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:708]
FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
While testing the fix for 6580131, I ran into an instance of
the hanging version of this bug with the following VM/NSK test:

    nsk/jdwp/ObjectReference/InvokeMethod/invokemeth001

I was running the tests on my Solaris X86 machine using:

    java version "1.7.0-ea"
    Java(TM) SE Runtime Environment (build 1.7.0-ea-b78)
    Java HotSpot(TM) Server VM (build 17.0-b06-internal, mixed mode)

I was able to reproduce a hang with my doit_loop script in a 12 hour
run. Here are the stats for that run:

Results: 23297 runs (3 FAILed, 23294 PASSed)
Failure details: hangs: 3 
Run duration:   12 hour(s)  1 second(s)


Here is a snippet of the debuggee's stack trace for the first
hang; see attached threads.log.4853 for the whole stack dump:

THREAD t@2

t@2 (l@2) stopped in _so_recv at 0xfeec83a5
0xfeec83a5: _so_recv+0x0015:    jae      _so_recv+0x21  [ 0xfeec83b1, .+0xc ]
current thread: t@2
  [1] _so_recv(0x6, 0xfe23e5d0, 0xe, 0x0, 0xfe1c0490), at 0xfeec83a5
  [2] dbgsysRecv(0x6, 0xfe23e5d0, 0xe, 0x0), at 0xd710224c
  [3] handshake(0x6, 0x0, 0x0, 0xd71016de), at 0xd7100ed9
  [4] socketTransport_attach(0xd7112c80, 0x8064aca, 0x0, 0x0, 0x0, 0x0, 0xfed59e
c8, 0xfe1158c9), at 0xd71017d8
  [5] transport_startTransport(0x0, 0x8064ac0, 0x8064aca, 0x0), at 0xfe115930
  [6] startTransport(0x8064b00, 0xfe23ec28, 0xfe23ebdc, 0xfe0fd416), at 0xfe1004
7d
  [7] bagEnumerateOver(0x8060f78, 0xfe100410, 0xfe23ec28, 0xfe10096d), at 0xfe0f
d442
  [8] initialize(0x8067510, 0x812b130, 0x13, 0xfe0ffc2a), at 0xfe100986
  [9] cbEarlyVMInit(0x8064568, 0x8067510, 0x812b130, 0xfe61e587), at 0xfe0ffc93
  [10] JvmtiExport::post_vm_initialized(0x0, 0x0, 0x0, 0x80668f0, 0xfed3ed01, 0x
8067400), at 0xfe61e91b
  [11] Threads::create_vm(0xfe23efb0, 0xfe23ef0f, 0xfe23efb0, 0xfe610722), at 0x
feb4c247
=>[12] JNI_CreateJavaVM(vm = ???, penv = ???, args = ???) (optimized), at 0xfe61
079f (line ~3282) in "jni.cpp"
  [13] JavaMain(0x8047460), at 0xfef72d31
  [14] _thr_setup(0xfe1d0200), at 0xfeec59b9
  [15] _lwp_start(), at 0xfeec5ca0

Frame 5 in the above stack trace is:

src/share/back/transport.c: transport_startTransport()

   513      } else {
   514          /*
   515           * Note that we don't attempt to do a launch here. Launching
   516           * is currently supported only in server mode.
   517           */
   518
   519          /*
   520           * If we're connecting to another process, there shouldn't be
   521           * any concurrent listens, so its ok if we block here in this
   522           * thread, waiting for the attach to finish.
   523           */
   524           err = (*trans)->Attach(trans, address, timeout, 0);
   525           if (err != JDWPTRANSPORT_ERROR_NONE) {
   526               printLastError(trans, err);
   527               serror = JDWP_ERROR(TRANSPORT_INIT);
   528               return serror;
   529           }

So frame 5 in the debuggee is in the Attach() call on line 524.

Frame 4 in the above stack trace is in:

src/share/transport/socket/socketTransport.c: socketTransport_attach()


   459      err = dbgsysConnect(socketFD, (struct sockaddr *)&sa, sizeof(sa));
   460      if (err == DBG_EINPROGRESS && attachTimeout > 0) {
   461          err = dbgsysFinishConnect(socketFD, (long)attachTimeout);
   462
   463          if (err == DBG_ETIMEOUT) {
   464              dbgsysConfigureBlocking(socketFD, JNI_TRUE);
   465              RETURN_ERROR(JDWPTRANSPORT_ERROR_TIMEOUT, "connect timed out
");
   466          }
   467      }
   468
   469      if (err < 0) {
   470          RETURN_IO_ERROR("connect failed");
   471      }
   472
   473      if (attachTimeout > 0) {
   474          dbgsysConfigureBlocking(socketFD, JNI_TRUE);
   475      }
   476
   477      err = handshake(socketFD, handshakeTimeout);
   478      if (err) {
   479          dbgsysSocketClose(socketFD);
   480          socketFD = -1;
   481          return err;
   482      }
   483
   484      return JDWPTRANSPORT_ERROR_NONE;
   485  }

So frame 4 in the debuggee is in the handshake() call on line 477.
Note that the call to handshake() is after the call to
dbgsysConnect() so we should be after the "connect" portion of
the "accept/connect" protocol.

My doit_loop script doesn't generate a thread dump for the debugger
so I don't have that info. This means I don't know if the debugger
is still in the "accept" call like in description note #1.

The stack for the second hang matches the first; see the attached
threads.log.15733 for that complete stack dump.

However, here is a snippet of the debuggee's stack trace for the
third hang; see the attached threads.log.17034 for that stack dump:

THREAD t@2

t@2 (l@2) stopped in __pollsys at 0xfeec8da5
0xfeec8da5: __pollsys+0x0015:   jb       __cerror       [ 0xfee405a0, .-0x88805
]
current thread: t@2
=>[1] __pollsys(0xfe26e990, 0x1, 0xfe26e960, 0x0), at 0xfeec8da5
  [2] _poll(0xfe26e990, 0x1, 0x493e0, 0xfe8cfc92), at 0xfee6f502
  [3] hpi::timeout(0x7, 0x493e0, 0x0, 0xfe8c5f98, 0xfe26ea48), at 0xfe8cfdb2
  [4] JVM_Timeout(0x7, 0x493e0, 0x22, 0xb743a7b9), at 0xfe8c5fbd
  [5] Java_java_net_PlainSocketImpl_socketAccept(), at 0xb743a844
  [6] 0xfa80a152(0x0, 0xfa8080f9, 0xf3699bf0, 0xf366b4b8, 0xfe26ea9c, 0xb77dc2ef
), at 0xfa80a152
  [7] 0xfa80308d(0x0, 0xf366bf28, 0xf3699bf0, 0xf366b4b8, 0xfe26ead4, 0xb77d3eec
), at 0xfa80308d
  [8] 0xfa80308d(0x0, 0xf3699bf0, 0xf3699bd0, 0xf366ac50, 0xfe26eb0c, 0xb77d3de8
), at 0xfa80308d
  [9] 0xfa80308d(0xf3699bd0, 0xf366ac50, 0xf3699748, 0xfe26eb40, 0xb7804c19, 0xf
e26eb94), at 0xfa80308d
  [10] 0xfa802f27(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfa802f27
  [11] 0xfa80308d(0x493e0, 0x0, 0xf3699748, 0xfe26ebcc, 0xb77d0e4f, 0xfe26ebf4),
 at 0xfa80308d
  [12] 0xfa80308d(0xf366aa08, 0xfe26ebf8, 0xb77d0c80, 0xfe26ec24, 0xb77d2168, 0x
0), at 0xfa80308d
  [13] 0xfa80308d(0x0, 0xf366aa08, 0xfe26ec28, 0xb77b8256, 0xfe26ec64, 0xb77b93a
8), at 0xfa80308d
  [14] 0xfa802f27(0x0, 0x0, 0x0, 0xf366c718, 0xf36627a8, 0xf3638f50), at 0xfa802
f27
  [15] 0xfa80308d(0xf3638870, 0xfecc6000, 0x1f80, 0xfecc6000, 0xfa8002e3, 0x1),
at 0xfa80308d
  [16] 0xfa800348(0xfe26ecf0, 0xfe26eee4, 0xa, 0xb77b8570, 0xfa8090e0, 0xfe26edd
4, 0x1, 0x8066c00), at 0xfa800348
  [17] JavaCalls::call_helper(0xfe26eee0, 0xfe26eda4, 0xfe26edd0, 0x8066c00), at
 0xfe51ea36
  [18] os::os_exception_wrapper(0xfe51e874, 0xfe26eee0, 0xfe26eda4, 0xfe26edd0,
0x8066c00, 0x8066c00, 0xb77b8570, 0xfe51ecf8), at 0xfe51ece7
  [19] JavaCalls::call(0xfe26eee0), at 0xfe51ed1b
  [20] jni_invoke_static(0x8066d10, 0xfe26eee0, 0x0, 0x0, 0x80f71ac), at 0xfe5ab
1cc
  [21] jni_CallStaticVoidMethod(0x8066d10), at 0xfe5b51bb
  [22] JavaMain(0x80478e0), at 0xfef73487
  [23] _thr_setup(0xfe200200), at 0xfeec59b9
  [24] _lwp_start(), at 0xfeec5ca0
Here is an instance of this failure from 2010.08.16 RT_Baseline
nightly testing run:

http://sqeweb.sfbay/nfs/tools/gtee/results/JDK7/NIGHTLY/VM/2010-09-16/RT_Baseline/javase/solaris-i586/server/mixed/solaris-i586_javase_server_mixed_JT_JDK_com_sun_jdi/analysis.html

This failure instance affects the following test:

    com/sun/jdi/ExceptionEvents.java

Here is a snippet from the .jtr file:

Testing StackOverflowCaughtTarg with java.lang.Error/caught=false/uncaught=true suspend=Thread
-- Added debuggeeVM options from file /export/local/38228.JDK7.NIGHTLY.VM+solaris-i586_javase_server_mixed_JT_JDK_com_sun_jdi/results/workDir/classes/com/sun/jdi/@debuggeeVMOptions: -Xmixed -server -XX:-PrintVMOptions -XX:+StartAttachListener -Dsun.jvm.hotspot.runtime.VM.disableVersionCheck=1 -XX:DefaultMaxRAMFraction=8 -XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedOops
ERROR: transport error 202: handshake failed - connection prematurally closed
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [../../../src/share/back/debugInit.c:708]
FATAL ERROR in native method: JDWP No transports initialized, jvmtiError=AGENT_ERROR_TRANSPORT_INIT(197)
Current thread is 2
Dumping core ...
----------System.err:(32/1892)----------
run args: [StackOverflowCaughtTarg]
pass: got expected event
run args: [StackOverflowCaughtTarg]
pass: got expected event
run args: [StackOverflowCaughtTarg]
VM initialization failed for: /export/local/common/jdk/baseline/solaris-i586/jre/bin/java -classpath /export/local/38228.JDK7.NIGHTLY.VM+solaris-i586_javase_server_mixed_JT_JDK_com_sun_jdi/results/workDir/classes/com/sun/jdi -Xmixed -server -XX:-PrintVMOptions -XX:+StartAttachListener -Dsun.jvm.hotspot.runtime.VM.disableVersionCheck=1 -XX:DefaultMaxRAMFraction=8 -XX:+IgnoreUnrecognizedVMOptions -XX:-UseCompressedOops -Xdebug -Xrunjdwp:transport=dt_socket,address=vm-v20z-9:61915,suspend=y StackOverflowCaughtTarg


 Target VM failed to initialize.
java.lang.NullPointerException
	at VMConnection.open(VMConnection.java:196)
	at TestScaffold.connect(TestScaffold.java:632)
	at TestScaffold.startUp(TestScaffold.java:363)
	at TestScaffold.startTo(TestScaffold.java:373)
	at TestScaffold.startToMain(TestScaffold.java:368)
	at ExceptionEvents.runTests(ExceptionEvents.java:403)
	at TestScaffold.startTests(TestScaffold.java:429)
	at ExceptionEvents.go(ExceptionEvents.java:336)
	at ExceptionEvents.goSuspendPolicy(ExceptionEvents.java:292)
	at ExceptionEvents.goCaught(ExceptionEvents.java:246)
	at ExceptionEvents.main(ExceptionEvents.java:189)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:613)
	at com.sun.javatest.regtest.MainWrapper$MainThread.run(MainWrapper.java:94)
	at java.lang.Thread.run(Thread.java:717)

JavaTest Message: Test threw exception: java.lang.NullPointerException
JavaTest Message: shutting down test

STATUS:Failed.`main' threw exception: java.lang.NullPointerException
result: Failed. Execution failed: `main' threw exception: java.lang.NullPointerException


test result: Failed. Execution failed: `main' threw exception: java.lang.NullPointerException

Comments
I was hesitant to close this bug as a duplicate of "JDK-8182757 JDWP: Socket Transport handshake hangs on Solaris" because a few of the failures listed in this bug may not be a duplicate of JDK-8182757. I now agree that this bug should be closed as a duplicate of JDK-8182757 and if another failure is seen that does not match JDK-8182757 a new bug should be created.
16-10-2017

I agree, it makes sense to close it now as a dup of JDK-8182757.
16-10-2017

Since this CR has gotten overloaded with RULEs, and most are probably instances of JDK-8182757, which has been fixed, shouldn't we close it so we can do a better job of noticing new timeout failures?
16-10-2017

One more failure RULE "nsk/jdi/VirtualMachine/canRequestVMDeathEvent/canreqvmdev001" Timeout none http://aurora.us.oracle.com/functional/faces/RunDetails.xhtml?names=2320170.PIT.VM.hs-tier6-rt-24
13-06-2017

A bug like this that affects random tests and have a huge amount of generic rules that will catch any timeout issue in all these tests can not be considered a P4. Increasing priority to P2. This needs to be investigated.
15-05-2017

From JDK-8079704: RULE "nsk/jdwp/ThreadReference/Suspend/suspend001" Timeout none http://aurora.us.oracle.com/functional/faces/RunDetails.xhtml?names=1601579.JAVASE.NIGHTLY.VM.Main_Baseline-Tier5.2016-09-27-523 RULE "nsk/jdwp/ThreadGroupReference/Children/children001" Timeout none http://aurora.ru.oracle.com/functional/faces/RunDetails.xhtml?names=1926492.PIT.VM.hs-tier6-comp-89 RULE "nsk/jdwp/Event/THREAD_DEATH/thrdeath001" Timeout none http://aurora.ru.oracle.com/functional/faces/RunDetails.xhtml?names=1919748.JAVASE.NIGHTLY.VM.Main_Baseline-Tier5.2017-01-02-541
22-02-2017

RULE nsk/jdwp/Event/CLASS_PREPARE/clsprepare001 Timeout none RULE nsk/jdwp/Event/METHOD_ENTRY/methentry001 Timeout none ----------------- lwp# 2 / thread# 2 -------------------- fffffd7fff28235a recv (6, fffffd7ffb98e170, e, 0) fffffd7feb4c31aa dbgsysRecv () + 2e fffffd7feb4c2abe recv_fully () + 32 fffffd7feb4c1fbe handshake () + 5e fffffd7feb4c2777 socketTransport_attach () + d3 Instance of https://bugs.openjdk.java.net/browse/JDK-6303969 Test run URL: http://aurora.ru.oracle.com/functional/faces/RunDetails.xhtml?names=487246.JAVASE.NIGHTLY.VM.RT_Baseline.2014-05-15-224 Host: vm-x2250-01, Solaris 10 (5.10) Options: -server -Xmixed -XX:MaxRAMFraction=8 -XX:+CreateMinidumpOnCrash -XX:NativeMemoryTracking=detail -XX:ReservedCodeCacheSize=256M
16-05-2014