United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6705443 D3D: Crash in awt.dll code when SwingSet2 is closed, on Windows + multiscreen
JDK-6705443 : D3D: Crash in awt.dll code when SwingSet2 is closed, on Windows + multiscreen

Submit Date:
Updated Date:
Project Name:
Resolved Date:
Affected Versions:
Fixed Versions:
6u10 (b26)

Related Reports

Sub Tasks

SwingSet2 sometimes crashes on closing when running on my WinXP dual-screen desktop. No specific actions are performed before exit, just usual navigation through the most of SwingSet2 tabs and performing other actions (change L&F, create SS2 on different windows, show various dialogs and choosers, etc.)

HotSpot log file is attached to the bug report. According to the awt.map file for 6u10-b24, the crash happens somewhere in AccelGlyphCache code, which is a Java2D area, I beleive. It seems that multiscreen environment is important to reproduce the bug, however this may be a timing issue as well. I couldn't reproduce the crash with 6u10-b12 on the same desktop.



The crash can be reproduced with SwingSet2, always on exit.

  Just play around with SS2 for a while, change L&Fs, then
  click "X" button on the main frame to quit.
  About 1/3 times it crashes.

  The crash happens in AccelGlyphCache_RemoveCellInfo()
  function (AccelGlyphCache.c) during AccelGlyphCache_Invalidate()
  call, which is called when the d3d pipeline shuts down.

  What I am seeing in the debugger is that the cache
  cell is valid but it has a pointer to an apparently
  disposed glyph since the cellinfo->glyphInfo points
  to trash memory.

  It crashes when trying to dereference nextGCI:
        prevInfo = currCellInfo;
        currCellInfo = currCellInfo->nextGCI; <<<<<< here
    } while (currCellInfo != NULL);

  currCellInfo is taken from the glyph.

  I noticed that the glyph info can be released on the
  Disposer thread from one of the StrikeCache_free*Memory()
  calls in scalerMethods.c - which probably can happen
  during the vm shutdown.

  These functions do set the cache cell info to NULL prior
  to freeing the GlyphInfo structure, but it is not enough
  since we could already have stored a pointer
  to this GlyphInfo in a local variable on another thread
  (the toolkit thread in this case) and were about to use it.

  So it looks like we need some sort of synchronization
  between the users of accelerated caches and these
  strike disposer methods.

  One way would be for the disposer method to lock
  on the rendering thread lock when disposing. Or it can
  also execute the *free methods on the RQ thread.

  Unfortunately while this is a legitimate concern it may
  not be the cause of this particular crash (unless there are
  some other places where ginfo can be released from).

  Basically, I don't see these free* methods being called
  during shutdown - so it must be something else that's
  stomping on that memory.

  But it is definitely the GlyphInfo that is trashed - the
  cache cells themselves have correct data.

Looks like I have identified the cause of the crash. 
When a strike is released we currently set the cache cell's
glyph reference to NULL to indicate that this cell no
longer has glyph associated with it. 

The problem is that there may be more than once cell
associated with a glyph (they are linked in a list
through nextGCI field) - this is because we have a cache
per device (and also per text type - grayscale and lcd).

That means that some cells could have pointers to freed glyphs,
which may lead to crashs then we walk through the cell list 
in order to invalidate them.

After I added the code to invalidate all cells associated with
a glyph before the glyph is disposed the crash went away.

That still leaves the problem with the (lack of) synchronization
between glyph disposal and invalidation of accelerated glyph

We do have a mechanism that ensures that the glyphs 
currently in the queue to be rendered are not collected until
the queue is flushed, but in theory the strikes may be released 
just when the accelerated glyph cache is being invalidated - this 
can happen during a display mode change, or when Vista's UC dialog
is shown, or when fs screen saver is enabled, etc.

The fix for this will require that the strike disposal
happens on the Rendering thread (if one of the accelerated
pipelines is present), simple RQ locking may not be
enough because the accelerated cache invalidation may happen
on the rendering (the toolkit thread in case of the d3d pipeline)
without taking the RQ lock.

Freeing the strikes on the RT shouldn't be a problem performance-wise,
since it is a relatively rare operation.
(surface disposal is a more frequent operation, and it doesn't seem to be
an issue).


Hardware and Software, Engineered to Work Together