JDK-4850373 : Blocking Selector stops Blocking occasionally
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio
  • Affected Version: 1.4.2,1.4.2_02
  • Priority: P3
  • Status: Closed
  • Resolution: Fixed
  • OS: windows_2000,windows_xp
  • CPU: x86
  • Submitted: 2003-04-17
  • Updated: 2004-04-28
  • Resolved: 2004-02-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.2_05 05Fixed
Description

Name: nt126004			Date: 04/17/2003


FULL PRODUCT VERSION :
java version "1.4.2-beta"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)

FULL OS VERSION :
Microsoft Windows 2000 [Version 5.00.2195]

A DESCRIPTION OF THE PROBLEM :
Blocking Selector stops Blocking occasionally.

It would appear that when wakeup() is called at a certain point then the selector.select() method starts returning instantly with no selected keys from then on.

This problem is difficult to consistantly reproduce hence it is difficult to track down the exact cause of the problem.

It is a very serious problem for anyone using blocking io as the cpu usage jumps to 100%.

As it is difficult to produce a small test case I have done some research to help find the bug:

* once the problem is occuring the following debug information is available:
keys == {key with interestOps == 1 fd == 1788}
cancelledKeys == {}
sourceFD == 1552
sinkFD == 1600
interruptTriggered == false
timeout == -1
threads == {}
2 channels
exceptIDs == {0, 0, 0, ...}
readIDs == {1, 1552, 1788, 0, 0, 0, ...}
writeIDs == {0, 1788, 0, 0, 0, ...}

* the trace through the code of WindowsSelectorImpl doSelect(long aTimout)
1) calls processDeregisterQueue();
2) calls adjustThreadsCount();
3) calls finishLock.reset();
4) calls startLock.startThreads();
5) calls begin();
6) calls subSelector.poll(); (this doesn't block, I guess this is the problem)
7) calls end();
8) calls resetWakeupSocket();
9) calls finishLock.checkForException();
10) calls processDeregisterQueue();
11) calls updateSelectedKeys();
12) returns 0

Note the same problem also exists in JDK1.4.1_02 (I though maybe the synchronization changes would have fixed it, however they didn't)
This is a replacement bug for review ID: 183301 (review id 183301 is no longer important)

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Difficult to provide a test case due to the nature of the problem.
I have only been able to reproduce the problem when wakeup() has been called.
It is too difficult to establish at what point in select() the code is up to when wakeup() causes the problem.

EXPECTED VERSUS ACTUAL BEHAVIOR :
when no keys are ready to be selected select() to block again until keys become ready for selection or wakeup/interrupt is called.
select() returns no keys without blocking

REPRODUCIBILITY :
This bug can be reproduced occasionally.

---------- BEGIN SOURCE ----------
Difficult to make small test case, however we are able to reproduce the problem occasionally with our actual code, hence I can provide more debug info if what is provided in the desription is not enough.  This is a very serious problem for everone relying on select() to block, hence I will try to answer questions about this ASAP.
---------- END SOURCE ----------
(Review ID: 183670) 
======================================================================

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.4.2_05 generic tiger-beta2 FIXED IN: 1.4.2_05 tiger-beta2 INTEGRATED IN: 1.4.2_05 tiger-b38 tiger-beta2 VERIFIED IN: 1.4.2_05
14-06-2004

EVALUATION Not for mantis. ###@###.### 2003-04-22 Selector spin can be caused by these two situations, both of which are the result of ready to write being identical to ready to connect at the native level. 1. The channel is not connected, the key interested in writing and the key ready for writing. Because the key is ready for something it is interested in, the select operation returns immediately, but the key is not marked as ready for write, because the channel is not in the connected state, therefore the key was not added to the selected set. So the selector spins and returns 0. 2. The channel is connected, the key interested in connecting and the key ready for connecting. Because the key is ready for something it is interested in, the select operation returns immediately, but the key is not marked as ready to connect, because the channel is already connected. Therefore the key was not added to the selected set, the selector spins and returns 0. ###@###.### 2003-08-13 There is a timing issue in the wakeup mechanism which arises when the wakeup is observed before the "wakeup byte" is received. Specifically WindowsSelectorImpl.doSelect will reset the wakeup socket when interruptedTriggered is set. If the wakeup byte hasn't been received (nagle is enabled by default for example) then all subsequent calls to select will return immediately. ###@###.### 2004-01-07
07-01-2004