JDK-6705443 : D3D: Crash in awt.dll code when SwingSet2 is closed, on Windows + multiscreen
  • Type: Bug
  • Component: client-libs
  • Sub-Component: 2d
  • Affected Version: 6u10
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: windows
  • CPU: x86
  • Submitted: 2008-05-21
  • Updated: 2011-01-19
  • Resolved: 2008-06-09
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6u10 b26Fixed
Description
SwingSet2 sometimes crashes on closing when running on my WinXP dual-screen desktop. No specific actions are performed before exit, just usual navigation through the most of SwingSet2 tabs and performing other actions (change L&F, create SS2 on different windows, show various dialogs and choosers, etc.)

HotSpot log file is attached to the bug report. According to the awt.map file for 6u10-b24, the crash happens somewhere in AccelGlyphCache code, which is a Java2D area, I beleive. It seems that multiscreen environment is important to reproduce the bug, however this may be a timing issue as well. I couldn't reproduce the crash with 6u10-b12 on the same desktop.

Comments
SUGGESTED FIX http://sa.sfbay.sun.com/projects/java2d_data/6u10/6705443.1
28-05-2008

EVALUATION Looks like I have identified the cause of the crash. When a strike is released we currently set the cache cell's glyph reference to NULL to indicate that this cell no longer has glyph associated with it. The problem is that there may be more than once cell associated with a glyph (they are linked in a list through nextGCI field) - this is because we have a cache per device (and also per text type - grayscale and lcd). That means that some cells could have pointers to freed glyphs, which may lead to crashs then we walk through the cell list in order to invalidate them. After I added the code to invalidate all cells associated with a glyph before the glyph is disposed the crash went away. That still leaves the problem with the (lack of) synchronization between glyph disposal and invalidation of accelerated glyph cache. We do have a mechanism that ensures that the glyphs currently in the queue to be rendered are not collected until the queue is flushed, but in theory the strikes may be released just when the accelerated glyph cache is being invalidated - this can happen during a display mode change, or when Vista's UC dialog is shown, or when fs screen saver is enabled, etc. The fix for this will require that the strike disposal happens on the Rendering thread (if one of the accelerated pipelines is present), simple RQ locking may not be enough because the accelerated cache invalidation may happen on the rendering (the toolkit thread in case of the d3d pipeline) without taking the RQ lock. Freeing the strikes on the RT shouldn't be a problem performance-wise, since it is a relatively rare operation. (surface disposal is a more frequent operation, and it doesn't seem to be an issue).
21-05-2008

EVALUATION The crash can be reproduced with SwingSet2, always on exit. Just play around with SS2 for a while, change L&Fs, then click "X" button on the main frame to quit. About 1/3 times it crashes. The crash happens in AccelGlyphCache_RemoveCellInfo() function (AccelGlyphCache.c) during AccelGlyphCache_Invalidate() call, which is called when the d3d pipeline shuts down. What I am seeing in the debugger is that the cache cell is valid but it has a pointer to an apparently disposed glyph since the cellinfo->glyphInfo points to trash memory. It crashes when trying to dereference nextGCI: prevInfo = currCellInfo; currCellInfo = currCellInfo->nextGCI; <<<<<< here } while (currCellInfo != NULL); currCellInfo is taken from the glyph. I noticed that the glyph info can be released on the Disposer thread from one of the StrikeCache_free*Memory() calls in scalerMethods.c - which probably can happen during the vm shutdown. These functions do set the cache cell info to NULL prior to freeing the GlyphInfo structure, but it is not enough since we could already have stored a pointer to this GlyphInfo in a local variable on another thread (the toolkit thread in this case) and were about to use it. So it looks like we need some sort of synchronization between the users of accelerated caches and these strike disposer methods. One way would be for the disposer method to lock on the rendering thread lock when disposing. Or it can also execute the *free methods on the RQ thread. Unfortunately while this is a legitimate concern it may not be the cause of this particular crash (unless there are some other places where ginfo can be released from). Basically, I don't see these free* methods being called during shutdown - so it must be something else that's stomping on that memory. But it is definitely the GlyphInfo that is trashed - the cache cells themselves have correct data.
21-05-2008