Bug ID: JDK-6954420 jps shows "process information unavailable" sometimes

Type: Bug
Component: hotspot
Sub-Component: svc
Affected Version: 7

Priority: P2
Status: Closed
Resolution: Fixed
OS: windows
CPU: generic

Submitted: 2010-05-20
Updated: 2011-03-08
Resolved: 2011-03-08

JDK 7	Other
7Fixed	hs21Fixed

The jps command is used on Windows by the VM/NSK run_sajdi.sh script
to translate from Cygwin pids to Windows pids for the purpose of
attaching to the target VM. Sometimes this command fails for unknown
reasons when the target VM is known to be up and running.

The failure mode looks like:

[2010-05-20T05:18:31.05] Debuggee class started
[2010-05-20T05:18:31.05] Debuggee class ready for debugging
[2010-05-20T05:18:31.05] DEBUGGEE_READY_FOR_DEBUGGING
[2010-05-20T05:18:31.05] script> Debuggee VM got ready for debugging
[2010-05-20T05:18:31.05] script> # ERROR: jps could not find Windows debuggee VM pid
[2010-05-20T05:18:31.05] script> # ERROR: DEBUGGEE_CLASS='nsk.sajdi.ReferenceType.fieldByName.fldbyname001a'
[2010-05-20T05:18:31.05] script> # ERROR: <begin_jps_and_ps_output>
[2010-05-20T05:18:31.05] 5480 
[2010-05-20T05:18:31.05] 29772 sun.tools.jps.Jps
[2010-05-20T05:18:31.05] 3956 -- process information unavailable
[2010-05-20T05:18:31.05] 10736 Tonga
[2010-05-20T05:18:31.05] 13376 com.sun.javaws.Main
[2010-05-20T05:18:31.05]     17172   10700   28952       3956  con 1004 23:16:58 /cygdrive/c/work/local/baseline_jdks/1.7.0_b93/windows-i586/bin/java
[2010-05-20T05:18:35.14] script> # ERROR: <end_jps_and_ps_output>
[2010-05-20T05:18:35.14] script> Kill debuggee VM process for pid: 17172


The java process is present and has written the DEBUGGEE_READY_FOR_DEBUGGING
token before waiting on a monitor for the next stage of the test. I don't
know why jps fails to get the process information and I'm filing this bug
to make note of the fact that this issue needs investigation.

No specific tests are named in this bug intentionally because this issue
shows up during VM/NSK sajdi test setup and thus can affect any test.

EVALUATION http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/de14f1eee390
16-02-2011
SUGGESTED FIX One minor change during the code review: ************* * 1358,1364 **** int ret_code = ::stat(filename, &statbuf); if (ret_code == OS_ERR) { if (PrintMiscellaneous && Verbose) { ! warning("could not stat file %s: %d\n", filename, ret_code); } CloseHandle(fmh); CloseHandle(fh); --- 1358,1365 ---- int ret_code = ::stat(filename, &statbuf); if (ret_code == OS_ERR) { if (PrintMiscellaneous && Verbose) { ! warning("Could not get status information from file %s: %s\n", ! filename, strerror(errno)); } CloseHandle(fmh); CloseHandle(fh); Not worth a new webrev...
16-02-2011
SUGGESTED FIX See the attached 6954420-webrev-cr0.tgz for the proposed fix that was sent out for code review (round 0).
16-02-2011
EVALUATION The "why" is due to an empty hsperfdata file. Or at least stat() returns an st_size == 0 value after the shared memory mapping is set up. Doesn't happen very often but it does happen.
14-02-2011
SUGGESTED FIX The target VM should flush the hsperfdata file after the shared memory mapping is created to make sure that the correct size is returned.
12-02-2011
EVALUATION src/os/windows/vm/perfMemory_windows.cpp: 1529 static void open_file_mapping(const char* user, int vmid, 1530 PerfMemory::PerfMemoryMode mode, 1531 char** addrp, size_t* sizep, TRAPS) { <snip> 1560 // if a user name wasn't specified, then find the user name for 1561 // the owner of the target vm. 1562 if (user == NULL \|\| strlen(user) == 0) { 1563 luser = get_user_name(vmid); 1564 } 1565 else { 1566 luser = user; 1567 } 1568 1569 if (luser == NULL) { 1570 THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(), 1571 "Could not map vmid to user name"); 1572 } The get_user_name(vmid) call on line 1563 is returning NULL which is causing the exception to be thrown on line 1570. get_user_name(vmid) returns NULL because it cannot find the hsperfdata file for the target VM; more correctly, the hsperfdata file is found, but it has a size of 0. The jps query code in run_sajdi.sh does the jps query in a 30 iteration loop with a 2 second delay between retries. This means that run_sajdi.sh tries for at least 1 minute to get the info that it needs. In a debug version of run_sajdi.sh, after the 30 iteration loop fails, I ran a special version of jps and that version also detected an hsperfdata file with a size of 0. I ran "ps -l" to verify that the target java process was there. I ran "ls -l" in the hsperfdata directory and the size of the hsperfdata file was no longer 0. I ran jps again and all was good. I added debug code to the target VM and verified that sometimes after the hsperfdata is mapped to the shared memory region, the size is 0 and not the expected 65536. So now we know the "what", but don't yet know the "why".
10-02-2011