JDK-6429204 : (se) Concurrent Selector.register and SelectionKey.interestOps can ignore interestOps
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.nio
  • Affected Version: 1.4.1,6
  • Priority: P4
  • Status: Closed
  • Resolution: Fixed
  • OS: generic,windows_xp
  • CPU: generic,x86
  • Submitted: 2006-05-23
  • Updated: 2013-06-26
  • Resolved: 2013-02-18
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8
7u40Fixed 8 b82Fixed
Related Reports
Duplicate :  
Relates :  
Description
FULL PRODUCT VERSION :
This bug is present in all NIO releases.  Verified on 1.5 and 1.6 beta.

ADDITIONAL OS VERSION INFORMATION :
Windows XP

EXTRA RELEVANT SYSTEM CONFIGURATION :
2 CPU box (totoal of 4 virtual CPUs due to hyperthreading)

A DESCRIPTION OF THE PROBLEM :
  Bug in Sun WindowsSelectorImpl can cause concurrent Selector.register and SelectionKey.interestOps can ignore interestOps.  I suspect that a concurrent Selector.deregister and SelectionKey.interestOps can cause the same problem (although I did not debug that mode of failure carefully).

The problem happens when Windows Selector Impl tries to grow the natively allocated FD array (via PollWrapper.grow()).  To do this, a new bigger array is allocated, the data from the old array is coipied into the new one, and the new one is assigned to be used by the Selector.

However, if a change to the interestOps happens while the process above is being performed, the new interest Ops could be written to the OLD array after that channel's record has been copied to the new array but before the copying process is complete.  That will cause the change to the interest Ops to be lost.

The way deregister is moving the last channel to the deleted position also seems to open up a possibility to lose an interestOps update to the last channel (the one being moved).

Basically, change to the interest ops must be synchronized with the growing and reorg of the FD array.

As sides points:
1) Why is the replaceEntry in PollWrapper NOT a static function?
2) Why is PollWrapper.grow so inefficient in copying data?!  It performs a few function calls and lots of arithmetics per each record copied.  This is bad for cases when you have a LOT of connections and need to double the array -- the very cases that NIO was designed to implement efficiently.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
I have an NIO Server that I am stress testing to accept over 50K connections as rapidly as I can, while doing a full-dulplex communication over those connections.

I impelment this by having one thread accept all IO requests and save the keys away for the worker thread pool to execute.  To prevent the save key being enqueued more than once at a time, I set its interest in events to 0 when I upt it on the queue (this is done in a thread with the selector, so it is done synchronously with other selector ops).

When a worker thread is done with the IO operation on the socket channel, it puts its interest back to the "interested" operations.  That is done asynchronously in a different thread than the selector.

Hence, I am performing a lot of IO and interest set changes while other socket channels are being registered.

This problem reproduces almost every run.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
When I reset the interest back to, say, READ, I expect the channel to be selected some time soon after new data arrives (or has already arrived) to the socket.
ACTUAL -
Some of the channels whose interest set was changed while the selector was growing its array (and in a VERY inefficient way, which opened up the window for the bug even wider) would never be selected again -- because setting their interest from 0 to READ was lost.

REPRODUCIBILITY :
This bug can be reproduced often.

---------- BEGIN SOURCE ----------
Just read the description in "Steps to Reproduce".
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Externally synchronize key.interestOps() to selector.register and selector.deregister.

Comments
Review thread: http://mail.openjdk.java.net/pipermail/nio-dev/2013-February/002105.html
18-02-2013

EVALUATION This issue is mostly addressed by the changes for 5025260 in jdk7 b39. These changes synchronize the registration and event updates so that the problems listed in the description cannot happen. The only remaining issue that needs to be checked is the deregistration which is not fully synchronized yet.
31-03-2009

EVALUATION -- We plan to putback the fix for this bug early in jdk7. Once it has baked for a bit we evaluate it for inclusion in a jdk6 update too.
07-12-2006

EVALUATION The analysis in the description is correct. If the interest set is updated during the expansion of the poll array then the update is lost. Unfortunately, this bug was submitted too late to fix in Mustang so it will need to examined for update instead. The /dev/poll or epoll Selectors are not impacted by this issue.
25-05-2006