United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6954420 jps shows "process information unavailable" sometimes
JDK-6954420 : jps shows "process information unavailable" sometimes

Details
Type:
Bug
Submit Date:
2010-05-20
Status:
Closed
Updated Date:
2011-03-08
Project Name:
JDK
Resolved Date:
2011-03-08
Component:
hotspot
OS:
windows
Sub-Component:
svc
CPU:
generic
Priority:
P2
Resolution:
Fixed
Affected Versions:
7
Fixed Versions:
hs21 (b03)

Related Reports
Backport:

Sub Tasks

Description
The jps command is used on Windows by the VM/NSK run_sajdi.sh script
to translate from Cygwin pids to Windows pids for the purpose of
attaching to the target VM. Sometimes this command fails for unknown
reasons when the target VM is known to be up and running.

The failure mode looks like:

[2010-05-20T05:18:31.05] Debuggee class started
[2010-05-20T05:18:31.05] Debuggee class ready for debugging
[2010-05-20T05:18:31.05] DEBUGGEE_READY_FOR_DEBUGGING
[2010-05-20T05:18:31.05] script> Debuggee VM got ready for debugging
[2010-05-20T05:18:31.05] script> # ERROR: jps could not find Windows debuggee VM pid
[2010-05-20T05:18:31.05] script> # ERROR: DEBUGGEE_CLASS='nsk.sajdi.ReferenceType.fieldByName.fldbyname001a'
[2010-05-20T05:18:31.05] script> # ERROR: <begin_jps_and_ps_output>
[2010-05-20T05:18:31.05] 5480 
[2010-05-20T05:18:31.05] 29772 sun.tools.jps.Jps
[2010-05-20T05:18:31.05] 3956 -- process information unavailable
[2010-05-20T05:18:31.05] 10736 Tonga
[2010-05-20T05:18:31.05] 13376 com.sun.javaws.Main
[2010-05-20T05:18:31.05]     17172   10700   28952       3956  con 1004 23:16:58 /cygdrive/c/work/local/baseline_jdks/1.7.0_b93/windows-i586/bin/java
[2010-05-20T05:18:35.14] script> # ERROR: <end_jps_and_ps_output>
[2010-05-20T05:18:35.14] script> Kill debuggee VM process for pid: 17172


The java process is present and has written the DEBUGGEE_READY_FOR_DEBUGGING
token before waiting on a monitor for the next stage of the test. I don't
know why jps fails to get the process information and I'm filing this bug
to make note of the fact that this issue needs investigation.

No specific tests are named in this bug intentionally because this issue
shows up during VM/NSK sajdi test setup and thus can affect any test.

                                    

Comments
EVALUATION

src/os/windows/vm/perfMemory_windows.cpp:

   1529 static void open_file_mapping(const char* user, int vmid,
   1530                               PerfMemory::PerfMemoryMode mode,
   1531                               char** addrp, size_t* sizep, TRAPS) {

<snip>

   1560   // if a user name wasn't specified, then find the user name for
   1561   // the owner of the target vm.
   1562   if (user == NULL || strlen(user) == 0) {
   1563     luser = get_user_name(vmid);
   1564   }
   1565   else {
   1566     luser = user;
   1567   }
   1568 
   1569   if (luser == NULL) {
   1570     THROW_MSG(vmSymbols::java_lang_IllegalArgumentException(),
   1571               "Could not map vmid to user name");
   1572   }

The get_user_name(vmid) call on line 1563 is returning NULL
which is causing the exception to be thrown on line 1570.

get_user_name(vmid) returns NULL because it cannot find the
hsperfdata file for the target VM; more correctly, the
hsperfdata file is found, but it has a size of 0.

The jps query code in run_sajdi.sh does the jps query in a
30 iteration loop with a 2 second delay between retries.
This means that run_sajdi.sh tries for at least 1 minute
to get the info that it needs. In a debug version of
run_sajdi.sh, after the 30 iteration loop fails, I ran
a special version of jps and that version also detected
an hsperfdata file with a size of 0. I ran "ps -l" to
verify that the target java process was there. I ran
"ls -l" in the hsperfdata directory and the size of the
hsperfdata file was no longer 0. I ran jps again and all
was good.

I added debug code to the target VM and verified that
sometimes after the hsperfdata is mapped to the shared
memory region, the size is 0 and not the expected 65536.

So now we know the "what", but don't yet know the "why".
                                     
2011-02-10
SUGGESTED FIX

The target VM should flush the hsperfdata file after the
shared memory mapping is created to make sure that the
correct size is returned.
                                     
2011-02-12
EVALUATION

The "why" is due to an empty hsperfdata file. Or at least stat()
returns an st_size == 0 value after the shared memory mapping is
set up. Doesn't happen very often but it does happen.
                                     
2011-02-14
SUGGESTED FIX

See the attached 6954420-webrev-cr0.tgz for the proposed fix that
was sent out for code review (round 0).
                                     
2011-02-16
SUGGESTED FIX

One minor change during the code review:

***************
*** 1358,1364 ****
      int ret_code = ::stat(filename, &statbuf);
      if (ret_code == OS_ERR) {
        if (PrintMiscellaneous && Verbose) {
!         warning("could not stat file %s: %d\n", filename, ret_code);
        }
        CloseHandle(fmh);
        CloseHandle(fh);
--- 1358,1365 ----
      int ret_code = ::stat(filename, &statbuf);
      if (ret_code == OS_ERR) {
        if (PrintMiscellaneous && Verbose) {
!         warning("Could not get status information from file %s: %s\n",
!             filename, strerror(errno));
        }
        CloseHandle(fmh);
        CloseHandle(fh);

Not worth a new webrev...
                                     
2011-02-16
EVALUATION

http://hg.openjdk.java.net/jdk7/hotspot-rt/hotspot/rev/de14f1eee390
                                     
2011-02-16



Hardware and Software, Engineered to Work Together