JDK-5014723 : implement "strip mining" loop optimization
  • Type: Bug
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 9,10
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2004-03-17
  • Updated: 2019-09-13
  • Resolved: 2017-11-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10
Related Reports
Duplicate :  
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
These JSR-166 tests (part of the J2SE workspace)


are known to fail on machines with multiple CPUs, both
solaris-sparc and linux-amd64.  It has never been seen to fail on
uniprocessor machines (tested on linux-i586, windows-i586). 
The failures are seen on Tiger b43.

The failures are not easy to reproduce.  I see a failure about 1% of the
time for both tests on both solaris-sparc and linux-amd64.

My recipe for reproducing it is to run an infinite loop till a failure
occurs, eg. for linux-amd64 I did:

(martin@quoddy) java/util/concurrent/locks/ReentrantLock $ perl -e 'for ($i = 0;; $i++) { print "$i\n"; system("/u/martin/jct-tools/2.1.6/solaris/bin/jtreg -jdk:/u/martin/ws/tiger/build/linux-amd64/j2sdk-image -automatic TimeoutLockLoops.java"); exit 1 if $? != 0; }'

which failed after 98 iterations.

Here is an extract from a failing TimeoutLockLoops.jtr file:

Threads: 1	 0.038286s run time
Threads: 2	 0.039877s run time
Threads: 3	 0.0399s run time
Threads: 5	 0.039472s run time
Threads: 8	 0.059474s run time
Threads: 12	 0.079438s run time
result: Failed. Execution failed: Program `/u/martin/ws/tiger/build/linux-amd64/j2sdk-image/bin/java' interrupted! (timed out?)

test result: Failed. Execution failed: Program `/u/martin/ws/tiger/build/linux-amd64/j2sdk-image/bin/java' interrupted! (timed out?)

Here is an extract from a failing CancelledProducerConsumerLoops.jtr file:

ArrayBlockingQueue      	 3.0553042s run time
LinkedBlockingQueue     	 2.2291932s run time
SynchronousQueue        	 2.485516s run time
ArrayBlockingQueue      	 4.0495974s run time
LinkedBlockingQueue     	 3.4787056s run time
SynchronousQueue        	 3.558907s run time
ArrayBlockingQueue      	 4.8086858s run time
LinkedBlockingQueue     	 3.1188338s run time
SynchronousQueue        	 3.3789728s run time
ArrayBlockingQueue      	 3.3516484s run time
result: Failed. Execution failed: Program `/u/martin/ws/tiger/build/solaris-sparc/j2sdk-image/bin/java' interrupted! (timed out?)

test result: Failed. Execution failed: Program `/u/martin/ws/tiger/build/solaris-sparc/j2sdk-image/bin/java' interrupted! (timed out?)

I notice there are lingering processes after the test has stopped
running, which must be killed manually.

martin   15329  0.0  2.1 1344824 21604 pts/2 S    15:17   0:00 /u/martin/ws/tiger/build/linux-amd64/j2sdk-image/bin/java -Dtest.src=/export/martin/ws/canWrite/test/java/util/concurrent/BlockingQueue -Dtest.classes=/export/martin/ws/canWrite/test/java/util/concurrent/BlockingQueue/JTwork/classes/java/util/concurrent/BlockingQueue com.sun.javatest.regtest.MainWrapper /export/martin/ws/canWrite/test/java/util/concurrent/BlockingQueue/JTwork/classes/java/util/concurrent/BlockingQueue/CancelledProducerConsumerLoops.jta

###@###.### 2004-03-16

Implemented by JDK-8186027 changes.

Implementation suggested by Roland in RH is tracked by JDK-8186027. I will closed this bug when/if we push that implementation.

ILW = HLL = P4 sounds more reasonable for this bug, lowering back to P4.

JDK-8009993 (P3) was closed as a dup of this, increasing priority to P3

I would argue that W = M because it may not be possible for the end user to control the command line eg browser based apps.

ILW = HLL = P4 I = H: non-responsive application L = L: long running counted loop W = L: -XX:+UseCountedLoopSafepoints

SUGGESTED FIX Copied from Evaluation: The possible long solution (for next release) is the implementation of "strip mining" when you transform a countable loop into two nested loops and place safepoint into external loop.

CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: dragon

EVALUATION ###@###.### 2004-04-09 This is what I found. I got 2 kinds of hangs. One when it hangs without any message on amd64 and I could not connect gdb to it. The second with unreachable safepoint: # SafepointSynchronize::begin: Fatal error: # SafepointSynchronize::begin: Timed out while attempting to reach a safepoint. # SafepointSynchronize::begin: Threads which did not reach the safepoint: I reproduced the last hang on solaris_x86 (I know dbx better and I was able to debug my binaries). The loop "do {} while (n-- > 0);" in ReentrantLockLoop::run() is countable even if the limit is "ITERS = Integer.MAX_VALUE". We don't put safepoints in countable loops assuming it will finish in reasonable time. The scenario is simple: one thread acquired 'lock' (lock.tryLock() == true) and now processing the loop body. Other threads are in barrier.await() or waiting lock.tryLock(). At this time VM got a request for an operation and forced all threads to go to safepoint state. All threads which are in barrier.await() and lock.tryLock() go to safepoint (I have frames traces which confirms this). The only thread left is one which processed the loop body. And now the 'lock' can be easy acquired by this thread since all other threads are in safepoint state and don't look for 'lock'. The are only 2 places in a compiled method where we are placing safepoints: before 'return' and at the end of uncountable loops or loops where at least one call always executed (assuming we hit the safepoint on the return from the call). As I pointed above we don't have safepoint at the end of the loop. Due to optimization the methods compute1() and compute2(x) are fully inlined so we don't have safepoints for their 'return's. The methods lock.tryLock() and lock.unlock() are inlined partially: the code still has some nested calls (calls in tryLock() and unlock()). But the execution only of the inlined parts of lock.tryLock() and lock.unlock() is enough to acquire the 'lock' since all other threads are in safepoint state. I executed the compiled code of ReentrantLockLoop::run() by steps in dbx in this state and saw that we indeed never calls any functions. This is why we avoid any safepoints until the end of iterations and it is relatively long time (2**31-1 iterations). On perf-lx2 (2x1.4GHz P3): Threads: 18 0.225942996s run time Threads: 27 397.664724057s run time Exception in thread "main" java.lang.Error: Some thread completed instead of timing out at TimeoutLockLoops$ReentrantLockLoop.test(TimeoutLockLoops.java:69) at TimeoutLockLoops.main(TimeoutLockLoops.java:32) The compiler should somehow figure out that a countable loop may take long time to execute and place safepoint into it. The fast solution (for 1.5.0 beta2) is to add a new flag to force safepoint in countable loops. It should be 'false' by default since the performance can be hit very hard. But we will get the workaround for customers who get this problem. And we can use the flag for these failed (timeout) test cases. The possible long solution (for next release) is the implementation of "strip mining" when you transform a countable loop into two nested loops and place safepoint into external loop.