JDK-8160350 : cannot truss jdk9 [ solaris ]
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: solaris_10,solaris_11,solaris_12
  • CPU: generic
  • Submitted: 2016-06-27
  • Updated: 2017-08-17
  • Resolved: 2016-07-07
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b129Fixed
Related Reports
Relates :  
Description
when attempting to truss a jvm ...

truss -f java Main

/2:     mmap(0x00000000, 1048576, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE, 3, 0) = 0xFFFFFFFF792C0000
/2:     mmap(0xFFFFFFFF792C0000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFFFFFFFF792C0000
/2:     openat(AT_FDCWD, "/proc/self/ctl", O_WRONLY)
Error: Could not find or load main class MainClass.java
        *** process otherwise traced, releasing ...

this appears to be caused by

bool os::enable_vtime() {
  int fd = ::open("/proc/self/ctl", O_WRONLY);
  if (fd == -1) {
    return false;
  }

  long cmd[] = { PCSET, PR_MSACCT };
  int res = ::write(fd, cmd, sizeof(long) * 2);
  ::close(fd);
  if (res != sizeof(long) * 2) {
    return false;
  }
  return true;
}

which is curious that since Solaris 10, PR_MSACCT is deprecated and no longer has any effect.

this causes a serious loss of observability in the JDK
Comments
Note: the current proposal is to use https://bugs.openjdk.java.net/secure/attachment/60884/JDK-8160350.patch and defer the full cross-platform cleanup to JDK-8160887
06-07-2016

It's the exact opposite - on Solaris microstate accounting is always-on and has been since Solaris 10, so this is old dead code that simply needs ripping out. - it does absolutely nothing, other than causing truss to fail. The reason this causes problems is that it it uses /proc/self/ctl which is also used by truss. When truss detects the open of /proc/self/ctl it stops tracing the JVM process, which means the JVM can't be traced from startup.
01-07-2016

Remove obsolete *_vtime methods
30-06-2016

To answer the last question first, on Solaris gethrvtime() returns values at nanosecond resolution whereas getrusage() is microsecond resolution. In the kernel both use the same nanosecond-resolution microstate accounting data so the only difference is the resolution of the returned values. The values returned by gethrvtime() are more accurate as there is no scaling from nanoseconds to microseconds, so yes using gethrvtime() is significantly better. Windows, Linux and AIX all provide per-thread time. Linux and AIX use getrusage(RUSAGE_THREAD) which has microsecond resolution, Windows uses GetThreadTimes() which has 100 nanosecond resolution. Only BSD uses total process time which appears to be because the BSDs are a bit of a mess when it comes to measuring thread execution time. FreeBSD supports getrusage(RUSAGE_THREAD) but NetBSD, OpenBSD and OSX don't. Another possibility is pthread_getcpuclockid() + clock_gettime() but availability of those is also patchy across the BSDs. For OSX task_info() from the Mach subsystem looks like the best option. However, I'm not proposing to boil that particular ocean here. What I'll do is remove supports_vtime(), enable_vtime() and vtime_enabled() from all platforms, and any references to them.
29-06-2016

Thanks for clarify the connection back to truss. So our code is broken with respect to "vtime". We are using elapsedVTime without checking for supports_vtime(). Consequently every platform has implemented elapsedVTime even if it doesn't really return the per-thread execution time! If it was checked correctly, supports_vtime would still be needed and BSD would set it false and elapsedVTime() would be Unimplemented(); (or we'd find out how to do it on OSX if getrusage is not supported there). vtime_enabled() is dead code enable_vtime() becomes unnecessary once Solaris version is a no-op. Is gethrvtime significantly better than using getrusage on Solaris? if not we can promote this to a posix function and only implement it once instead of three times (four if OSX supports getrusage)
29-06-2016

That sounds fine. Thanks.
29-06-2016

Yes, if they're all nops, it's good to get rid of them. I'll sponsor your change. PM me if you need more instructions.
28-06-2016

I didn't simply remove it as there are Linux. BSD, AIX and Windows versions of enable_vtime(), although they are all in effect no-ops as well. If it was to be cleaned up fully then supports_vtime() and vtime_enabled() should be removed as well as enable_vtime(), along to any calls to them. I'm happy to do that but I don't know who gets to make that decision. Yes I'm happy to do the change.
28-06-2016

So the patch should be to remove enable_vtime() completely, including the call in G1. Should this be assigned to the GC group or will you make this change?
28-06-2016

And I'm not seeing the connection between the vtime calls and the truss problem - could that be clarified please.
28-06-2016

If noone supports vtime (what is that?) then it may be time for vtime to be ripped out.
28-06-2016

Suggested patch attached
27-06-2016

This appears to be a side-effect of the G1 garbage collector being made the default in JDK9, there is a call to os::enable_vtime() from GlCollectedHeap::initialize(). Using the appropriate -XX flag to select a different GC makes the problem disappear. As enable_vtime() is now a no-op the suggested fix is to make the behaviour of os::enable_vtime() and os::vtime_enabled() the same as Linux, i.e. to return false.
27-06-2016