JDK-8043743 : Data missed in java.util.concurrent.LinkedTransferQueue
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Affected Version: 8
  • Priority: P2
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_7
  • CPU: x86_64
  • Submitted: 2014-05-20
  • Updated: 2016-06-13
  • Resolved: 2015-09-21
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 9
9 b88Resolved
Related Reports
Duplicate :  
Duplicate :  
Description
FULL PRODUCT VERSION :
java version "1.8.0"
Java<TM> SE Runtime Environment <build 1.8.0-b132>
Java HotSpot<TM> 64-Bit Server VM <build 25.0-b70, mixed mode>

A DESCRIPTION OF THE PROBLEM :
linkedTransferQueue is based on the Dual Queues with Slack in which nodes may represent either data or requests, and Nodes can be appended only if their predecessors are either already matched or are of the same mode(method tryAppend), but matched node and canceled node represented by the same way, their "item" field will eventually get node own values.So there is one situation in which we will lose data in this way: if the list is empty, we have an untimed call to take at first, have an untimed call to take with interrupted status at second, then have a call to offer. The first two calls will try to add request nodes to the list, the first node will wait for another thread to match node, the second node instead cancelling because the current thread was interrupted, and the offer call is found at the end of the list is a ������canceled������ node, and in the code is a ������matched������ node, So the offer call will also add a data node to the list. Now the first take call is waiting for the data but the data has been inserted into the list, They are separated by a ������matched" node, Obviously now lost data, and there are two modes exist in the list of nodes, the first is a request node, the last is a data node. In this case, If we once again call take, this will be caught in an infinite loop, one core of CPU will be occupied.

ADDITIONAL REGRESSION INFORMATION: 
java version "1.8.0"
Java<TM> SE Runtime Environment <build 1.8.0-b132>
Java HotSpot<TM> 64-Bit Server VM <build 25.0-b70, mixed mode>

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
The bug of Multi-threaded environment is extremely difficult to reproduce,
 So I reproduce it through breakpoints.
 Hit a breakpoint at line 643 of  the below code of       LinkedTransferQueue.class(jdk1.8.0):

 642                   s = new Node(e, haveData);
 643               Node pred = tryAppend(s, haveData);
 644               if (pred == null)

Then execute the test case in "Source code for an executable test case".
When these four threads(Thread-0/Thread-1/Thread-2/Thread-3) suspended on the breakpoint:

 let "Thread-0" go at first, no output;
 let "Thread-1" go at second, Examples of output "14 java.lang.InterruptedException";
 let"Thread-2" go at third, Examples of output "15 offerTask thread has   come out!", Now bug has occurred! Our "8" has been lost;
 Skip all breakpoints, then let "Thread-3" go at last, Now a CPU core has been completely occupied;



             

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
"8" will be output
ACTUAL -
"8" is not output

REPRODUCIBILITY :
This bug can be reproduced rarely.

---------- BEGIN SOURCE ----------

import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedTransferQueue;

public class TestForLinkedTransferQueue {
	public static void main(String[] args) {
		final BlockingQueue<Long> queue = new LinkedTransferQueue<Long>();

		Runnable takeTask = new Runnable() {
			public void run() {
				try {
					System.out.println(Thread.currentThread().getId() + " "
							+ queue.take());
				} catch (InterruptedException e) {
					e.printStackTrace();
				}
			}
		};
		Runnable takeTaskInterrupted = new Runnable() {
			public void run() {
				Thread.currentThread().interrupt();
				try {
					System.out.println(Thread.currentThread().getId() + " "
							+ queue.take());
				} catch (InterruptedException e) {
					System.out.println(Thread.currentThread().getId() + " " + e);
				}
			}
		};
		Runnable offerTask = new Runnable() {
			public void run() {
				queue.offer(8L);
				System.out.println(Thread.currentThread().getId() + " offerTask thread has come out!");
			}
		};
		new Thread(takeTask).start();// first untimed call to take
		new Thread(takeTaskInterrupted).start();// second untimed call to take with interrupted status 
		new Thread(offerTask).start();//  a call to offer
	        
		new Thread(takeTask).start();

	}
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
This bug is a potential problem, I cannot replicate it either in practice;

I have provided a patch for it:https://github.com/zuai/Hui/blob/master/LinkedTransferQueueFix.java




Comments
No bulk pulling from 166 to JDK 9 dev has occurred recently. It should soon happen in bulk over the next few months along with API updates.
28-08-2015

I think we pull 166 CVS changes in to JDK codebase periodically. Has this happened recently and if so, has this issue been addressed ?
25-08-2015

The submitter (I assume it was submitter) sent me mail last year asking about this. There is a fairly straightforward way to address the theoretical problem, but because it cannot be replicated in practice, the change wasn't committed. But I'll do so now to 166 CVS. This change unsplices nodes before (not after) self-matching to indicate cancellation, avoiding this scenario. It seems impossible to test though. *** LinkedTransferQueue.java.~1.83.~ 2015-03-04 19:40:31.306461097 -0500 --- LinkedTransferQueue.java 2015-03-05 19:12:20.825480962 -0500 *************** *** 676,686 **** return itemE; } else if (w.isInterrupted() || (timed && nanos <= 0)) { ! unsplice(pred, s); // try to unlink and cancel ! if (s.casItem(e, s)) // return normally if lost CAS return e; } ! else if (spins < 0) { // establish spins at/near front if ((spins = spinsFor(pred, s.isData)) > 0) randomYields = ThreadLocalRandom.current(); } --- 676,686 ---- return itemE; } else if (w.isInterrupted() || (timed && nanos <= 0)) { ! unsplice(pred, s); // try to unlink and cancel ! if (s.casItem(e, s)) // return normally if lost CAS return e; } ! else if (spins < 0) { // establish spins at/near front if ((spins = spinsFor(pred, s.isData)) > 0) randomYields = ThreadLocalRandom.current(); }
06-03-2015

can't reproduce it manually (exact as in suggested by customer steps to reproduce): $ ~/code/jdk8u-dev/build/linux-x86_64-normal-server-release/jdk/bin/jdb TestForLinkedTransferQueue Initializing jdb ... > stop at java.util.concurrent.LinkedTransferQueue:643 Deferring breakpoint java.util.concurrent.LinkedTransferQueue:643. It will be set after the class is loaded. > run run TestForLinkedTransferQueue Set uncaught java.lang.Throwable Set deferred uncaught java.lang.Throwable > VM Started: Set deferred breakpoint java.util.concurrent.LinkedTransferQueue:643 Breakpoint hit: "thread=Thread-1", java.util.concurrent.LinkedTransferQueue.xfer(), line=643 bci=246 Thread-1[1] threads Group system: (java.lang.ref.Reference$ReferenceHandler)0x14d Reference Handler cond. waiting (java.lang.ref.Finalizer$FinalizerThread)0x14c Finalizer cond. waiting (java.lang.Thread)0x14b Signal Dispatcher running Group main: (java.lang.Thread)0x1 main running (java.lang.Thread)0x1ab Thread-0 running (at breakpoint) (java.lang.Thread)0x1ac Thread-1 running (at breakpoint) (java.lang.Thread)0x1b0 Thread-2 running Thread-1[1] thread 0x1ab Thread-0[1] cont > Breakpoint hit: "thread=Thread-0", java.util.concurrent.LinkedTransferQueue.xfer(), line=643 bci=246 Thread-0[1] cont > Breakpoint hit: "thread=Thread-3", java.util.concurrent.LinkedTransferQueue.xfer(), line=643 bci=246 Thread-3[1] thread 0x1ac Thread-1[1] cont > 13 java.lang.InterruptedException 12 8 14 offerTask thread has come out! > threads Group system: (java.lang.ref.Reference$ReferenceHandler)0x14d Reference Handler cond. waiting (java.lang.ref.Finalizer$FinalizerThread)0x14c Finalizer cond. waiting (java.lang.Thread)0x14b Signal Dispatcher running Group main: (java.lang.Thread)0x1b8 Thread-3 cond. waiting (java.lang.Thread)0x1b7 DestroyJavaVM running
05-03-2015

[~pardesha], could you please ask customer - provide reliable reproducer (with custom jdwp agent if it reproducible always with debugger) Or - to find what I'm doing wrong wile trying to reproduce it in manual debug above.
05-03-2015

Sounds like another hard-to-reproduce possible bug. We don't have a clean repro and we don't have a clean patch. The submitter's change includes unrelated gratuitous changes, making it tough to extract a change against head. The model appears to get more complicated with a CANCEL state.
05-03-2015

Looks like this bug got lost somewhere :( Flagging it with j.u.c upstream maintainers.
05-03-2015

Unable to reproduce with 8u20, moving to PDE team for evaluation
22-05-2014