JDK-6855228 : VMware+Solaris/x86: System.nanoTime stops advancing (and GC Logging malfunctions reporting 0 time)
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 6u13
  • Priority: P4
  • Status: Closed
  • Resolution: Cannot Reproduce
  • OS: solaris_10
  • CPU: sparc
  • Submitted: 2009-06-26
  • Updated: 2011-06-02
  • Resolved: 2010-09-03
Related Reports
Relates :  
Description
FULL PRODUCT VERSION :
Reproduced with both 1.6.0_2 and 1.6.0_13

ADDITIONAL OS VERSION INFORMATION :
SunOS 5.10 Generic_138889-06 i86pc i386 i86pc - latest patch level

EXTRA RELEVANT SYSTEM CONFIGURATION :
Reproduced on both AMD Opteron 8356 and Intel Xeon E5345.
This issue has only been observed when running with multiple processors on VMWare ESX. Not observed or reproducible on multi-processor Ubuntu 8.10, Kubuntu 9.04, Gentoo Linux, or Sparc Solaris when they are not running on VMWare. Also not observed on single processor Solaris running on VMWare ESX.

A DESCRIPTION OF THE PROBLEM :
GC logging malfunctions then stops reporting accurately. The underlying symptoms can be reproduced if System.nanoTime is polled continuously because it eventually stops incrementing. This condition causes many java.util.concurrent classes which rely on System.nanoTime to malfunction including ThreadPoolExecution, delay, and blocking queues among others.

Specifically in java.util.concurrent.locks.AbstractQueuedSynchronizer.ConditionObject.awaitNanos(long nanosTimeout) will not timeout if nanoTime stops incrementing and client code can hang indefinitely. Also it has been noted that System.nanoTime only on multi-processor VMWare sporadically increments by zero and negative values. This has never been observed when non-VMWare multi-processor or when on single-processor VMWare. System nanoTime returning sporadic negative values would also cause the above referenced awaitNanos to malfunction.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the attached test code on multi-processor VMWare and eventually the nanoTime will stop incrementing.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Not expecting nanoTime to stop incrementing indefinitely.
ACTUAL -
GC log sporadically reports large value, then reports zero indefinately:

Below you can see that GC logging initially is normal, then it reports a very large value ~38 hours, then reports zero indefinitely (for at least 12+ hours).

295.341: [GC [PSYoungGen: 197118K->3205K(204224K)] 578435K->388848K(1387968K), 0.5534078 secs]
296.489: [GC [PSYoungGen: 195013K->5698K(204672K)] 580656K->393592K(1388416K), 0.3963039 secs]
298.493: [GC [PSYoungGen: 197890K->3580K(204608K)] 585783K->392768K(1388352K), 0.4133058 secs]
300.697: [GC [PSYoungGen: 196156K->10969K(202688K)] 585344K->402131K(1386432K), 138476.3001686 secs]
138776.998: [GC [PSYoungGen: 201592K->12755K(203520K)] 592754K->413325K(1387264K), 0.0000000 secs]
138776.998: [GC [PSYoungGen: 203475K->12147K(198912K)] 604045K->423147K(1382656K), 0.0000000 secs]
138776.998: [GC [PSYoungGen: 197491K->8443K(201216K)] 608491K->430399K(1384960K), 0.0000000 secs]

Output from attached test code: eventually the nanoTime stops incrementing indefinately:

Below it can be seen that nanoTime has not incremented for 40-50 thousand executions.
Also it can be observed that the maximum difference between two consecutive nanoTime execution was reported as ~80 days. The test only ran for a few minutes.
On non-VMWare it increments on every execution and has never been observed to stop incrementing.

pool-1-thread-9 maxConsec=43314 minDiff=0 maxDiff=6978157632565349 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-22 maxConsec=48786 minDiff=0 maxDiff=6978158312802313 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-24 maxConsec=50552 minDiff=0 maxDiff=6978158093338584 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-16 maxConsec=44805 minDiff=0 maxDiff=6978157852663558 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-27 maxConsec=48969 minDiff=0 maxDiff=6978158252826428 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-19 maxConsec=48628 minDiff=0 maxDiff=6978157632643095 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-1 maxConsec=46531 minDiff=0 maxDiff=6978157792642802 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-3 maxConsec=46575 minDiff=0 maxDiff=6978157752584362 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-12 maxConsec=49622 minDiff=0 maxDiff=6978157472579260 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-17 maxConsec=48469 minDiff=0 maxDiff=6978157912748608 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-28 maxConsec=53015 minDiff=0 maxDiff=6978157674021878 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-13 maxConsec=42159 minDiff=0 maxDiff=6978157512622147 a=13956314863692146 b=13956314863692146 changed=maxConsec
pool-1-thread-28 maxConsec=53016 minDiff=0 maxDiff=6978157674021878 a=13956314863692146 b=13956314863692146 changed=maxConsec


REPRODUCIBILITY :
This bug can be reproduced often.

---------- BEGIN SOURCE ----------
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 * @author Troy T. Collinsworth
 */
public class TestNanoTime implements Runnable {

    public static void main(String[] args) throws InterruptedException {
        int threads = 30;

        final ExecutorService es = Executors.newFixedThreadPool(threads);
        for (int i = 0; i < threads; i++) {
            es.execute(new TestNanoTime());
        }
        Thread.sleep(30000000);
        System.out.println("Completed test, exiting");
        System.exit(0);
    }

    public void run() {
        int loopSize = 10;
        long[] times = new long[loopSize];

        int consec = 0;
        int maxConsec = 0;
        int currConsec = 0;
        long a = System.nanoTime();
        long b = System.nanoTime();
        long minDiff = b - a;
        long maxDiff = b - a;

        String changed = null;
        while (true) {
            for (int i = 0; i < loopSize; i++) {
                times[i] = System.nanoTime();
            }
            a = b;
            b = System.nanoTime();
            long newDiff = b - a;
            if (newDiff < minDiff) {
                minDiff = newDiff;
                changed = "minDiff";
            }
            if (newDiff > maxDiff) {
                maxDiff = newDiff;
                changed = "maxDiff";
            }
            if (a == b) {
                ++consec;
                if (consec > currConsec) {
                    currConsec = consec;
                }
                if (currConsec > maxConsec) {
                    maxConsec = currConsec;
                    changed = "maxConsec";
                }
            } else {
                consec = 0;
            }
            if (changed != null) {
                System.out.println(Thread.currentThread().getName() + " maxConsec=" + maxConsec + " minDiff=" + minDiff
                        + " maxDiff=" + maxDiff + " a=" + a + " b=" + b + " changed=" + changed);
                changed = null;
            }
        }
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Run on single processor VMWare or don't run on VMWare.

Comments
EVALUATION See 6953818 - when running under VMware you need to set certain options so that Solaris is presented with a stable TSC.
02-06-2011

EVALUATION Closing because we never heard back from submitter re efficacy of above patch.
03-09-2010

EVALUATION Please test with kernel patch 137138-09 in place. Bug 6600939, fixed in that kernel patch, may solve the problem you are seeing. In general, it is a good idea to stay abreast of the latest set of recommended patches for Java SE, see:- http://sunsolve.sun.com/show.do?target=patches/patch-access (click of "J2SE Patch Clusters", then get (for Solaris 10/x86) :- http://sunsolve.sun.com/pdownload.do?target=J2SE_Solaris_10_x86_Recommended.README http://sunsolve.sun.com/pdownload.do?target=J2SE_Solaris_10_x86_Recommended.zip May require SDN login/registration).
27-07-2009

EVALUATION The bug will be suitably processed once further information is available from the customer.
29-06-2009

EVALUATION If the timer is not incrementing this is a bug in VMware. GC logging behaves as one would expect with the malfunctioning timer.
26-06-2009

PUBLIC COMMENTS Is this behavior reproducible if -XX:+UseSerialGC is specified on the command line?
26-06-2009