JDK-6245013 : When switching users on Windows BufferStrategy.contentsLost continually returns true
  • Type: Bug
  • Component: client-libs
  • Sub-Component: 2d
  • Affected Version: 5.0,6
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_2000,windows_xp
  • CPU: x86
  • Submitted: 2005-03-23
  • Updated: 2008-02-06
  • Resolved: 2005-04-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 6
6 b32Fixed
Related Reports
Duplicate :  
Relates :  
Description
Here's the test case.
Try running it, switching users and logging back in.  In some situations, not all, the code will keep printing out contents lost.  To avoid getting stuck in an infinite lop the code exits after the 10th paint.

Both Chet and myself have been able to reproduce this.

import java.awt.*;
import java.awt.event.*;
import java.awt.image.*;
import javax.swing.*;

public class BSTest extends Frame {
    private int GRID_SIZE = 30;

    public static void main(String[] args) {
        BSTest f = new BSTest();
        f.setBounds(0, 0, 200, 200);
        f.show();
    }

    private BufferStrategy bs;

    public BSTest() {
    }

    private int paintIndex = 0;

    public void paint(Graphics g) {
        System.err.println(paintIndex + " painting");
        if (paintIndex++ > 10) {
            System.exit(0);
        }
        if (bs == null) {
            bs = createBufferStrategy();
            Graphics g2 = bs.getDrawGraphics();
            drawGrid(g2);
            g2.dispose();
        }
        Graphics g2 = bs.getDrawGraphics();
        if (bs.contentsRestored()) {
            System.err.println("contents restored, forcing repaint");
            repaint();
        }
        else if (bs.contentsLost()) {
            System.err.println("contents lost, forcing repaint");
            repaint();
        }
        else {
            System.err.println("all good");
            drawGrid(g2);
            bs.show();
            if (bs.contentsLost()) {
                System.err.println("\tcontents lost2, forcing repaint");
                repaint();
            }
        }
        g2.dispose();
    }

    private void drawGrid(Graphics g) {
        g.setColor(Color.WHITE);
        g.fillRect(0, 0, getWidth(), getHeight());
        Insets insets = getInsets();
        FontMetrics fm = g.getFontMetrics();
        int ascent = fm.getAscent();
        g.setColor(Color.BLACK);
        for (int x = 0; x < getWidth(); x += GRID_SIZE) {
            g.drawLine(x, 0, x, getHeight());
            g.drawString(Integer.toString(x), x, ascent +
                         (insets.top / GRID_SIZE) * GRID_SIZE);
        }
        int stringXOrigin = getInsets().left;
        for (int y = 0; y < getHeight(); y += GRID_SIZE) {
            g.drawLine(0, y, getWidth(), y);
            g.drawString(Integer.toString(y), stringXOrigin, y + ascent);
        }
    }

    private BufferStrategy createBufferStrategy() {
        BufferCapabilities caps = new BufferCapabilities(
		new ImageCapabilities(true),
		new ImageCapabilities(true),
                null);
        try {
            createBufferStrategy(2, caps);
        } catch (AWTException ae) {
            System.out.println("ae: " + ae);
        }
        return getBufferStrategy();
    }
}

###@###.### 2005-03-23 18:46:49 GMT

Comments
EVALUATION This is a problem with our validation/restoreSurface logic for VolatileImages. Basically, when we fail to restore a surface, we install a software image as sdCurrent in VolatileSM. But we still use the lostSurface boolean to tell the SM that the accelerated surface is lost. Since we're already using the sw surface, we shouldn't care whether the hw surface has been restored (other than to continue trying to restore it when appropriate). Here's the series of calls that screws us up: WinVolSM.validate(): lostSurface = false isAccelerationEnabled() true, so: sdAccel.isSurfaceLost() true, so: restoreAcceleratedSurface(), native call which calls DDRestoreSurface(), which may or may not succeed DDLock(), which gets DDERR_SURFACELOST error and calls Win32OSSD_RestoreSurface, which calls Win32OSSD.markSurfaceLost(), which calls WinVolatileSM.acceleratedSurfaceLost(), which checks isAccelerationEnabled() and sets lostSurface = true and then it throws an IPE sdCurrent = getBackupSurface() return IMAGE_OK then later the app calls contentsLost(), which calls return lostSurface then the app will spin around the validate/render/contentsLost() loop again (and again, and again, until either the app is exited or the accelerated surface gets back into a state where it can be restored). Somehow, we need to distinguish between the acceleratedSurface needing to be restored and the VolatileImage (which has more than simply the accelerated surface) being lost. I'm curious why we are failing restoration in the first place; there seems to be something odd with DirectX here. We succeed in our call to RestoreSurface(), but we run a further check to make sure we've really restored it, DDLock()/DDUnlock(). This is where we fail, when we get an error in DDLock(). I think this is beyond our control and we should just fix the lostSurface logic above, but it is a strange thing about DirectX (surprise, surprise). ###@###.### 2005-03-23 21:54:57 GMT I came up with an easy potential fix, but it brought forth a new related problem that has to be fixed. The potential fix to the root problem here is to make the logic of VolatileSurfaceManager.acceleratedSurfaceLost() more correct. Currently, a call to this method (which is caused by the failure of Win32OSSD_restoreSurface(), as detailed above) tells the VolatileSM that the accelerated surface is lost and must be restored. This is a fine thing to do in principle, but the result of acceleratedSurfaceLost() is the setting of the more general lostSurface boolean. lostSurface is used to detect whether the "VolatileImage" needs to be restored, not simply the accelerated version of the VolatileImage. In the situation where the accelerated surface has been lost and we have created a backup software surface (sdBackup), we should keep trying to restore the accelerated surface when possible ... but we should not flag the VolatileImage itself as lost and needing restoration; that sdBackup surface is constant and does not ever need restoration (after the initial creation time when the contents need to be drawn the first time). The possible solution here is to change the condition of acceleratedSurfaceLost() from this: if (isAccelerationEnabled()) { lostSurface = true; } to this: if (isAccelerationEnabled() && (sdCurrent == sdAccel)) { lostSurface = true; } This new version says "only bother to mark the VolatileImage as lost if the accelerated surface is the one currently in use by the VolatileImage". Note that we have already marked the actual _surface_ lost in the call to Win32OSSD.markSurfaceLost() (which then calls this acceleratedSurfaceLost() method). So the only reason to call acceleratedSurfaceLost() is to tell the surface manager that its accelerated representation needs to be restored. In this situation, that does not warrant setting the lostSurface flag; we use that flag to indicate that the VolatileImage itself is lost, and we use the call to sdAccel.isSurfaceLost() in validate() to check whether the accelerated surface specifically needs to be restored. So I believe this fix to be the one we want here; this should trigger a restoration for the VolatileImage in cases where it matters (when we are using the accelerated surface as the current representation of the VImage), but it should avoid needless restoration in cases where we have already punted to the backup surface. ... and now we get to a further bug, that is masked by this bug and by Swing's workaround to such issues (Swing tends to punt liberally to a BufferedImage back buffer when it gets too many contentsLost() failures in a row): - We are not reporting a failure when a Blt to the screen happens to the primary surface which needs restoring. When I plugged in the fix above and re-ran the test, tracing output showed that the offscreen surface was being correctly (and no longer infinitely) restored. But the window was blank after returning from hibernation or switching users. A simple expose event on the window forced a repaint, and things returned to normal. What was happening under the hood was that the offscreen surface was getting restored properly, contentsLost() was returning the correct thing (causing a re-rendering when the accelerated surface got restored), but in copying the back buffer contents to the screen, we were getting a ddraw error from the Blt operation. Specifically, we were getting a DDERR_SURFACELOST error (the same one that caused the offscreen surface to be restored earlier). This causes us to restore the onscreen surface ... but it does not cause us to redo the Blt operation. So what's happening now is that the Blt operation occurs, we get an error, we restore the onscreen (primary) surface (which succeeds), and we return from the Blt call, but the caller of the Blt operation (Swing and BufferStrategy) have no idea that anything bad happened, so the bs.show() call gets noop'd and the window is blank (until another paint event causes things to happen correctly). This is specifically a problem with operations to the onscreen surface. Whenever there is a ddraw operation to the offscreen surface, an error of SURFACELOST will cause us to call Win32OSSD_RestoreSurface(), which throws an InvalidPipeException. This trickles up to the caller of the rendering operation, which revalidates the pipelines and calls the operation again. But in the case of an onscreen operation like bs.show(), we may call Win32SD_RestoreSurface() which simply restores the onscreen surface without throwing an exception, thus there is no indication at the Java level that anything bad happened. A simple and obvious fix here is to apply the same logic to Win32SD_RestoreSurface() as we currently have in Win32OSSD_RestoreSurface(); go ahead and try to restore the surface, but throw an exception while we're at it. This is worth trying, but we need to make sure that this new exception will not cause problems in the current code (such as if we may now throw an exception in a place that causes problems with our JNI interactions). ###@###.### 2005-03-25 15:25:12 GMT It appears that we used to have this exact logic in Win32SD_RestoreSurface(); it was removed in fixing bug 4965928. The SCCS comments for that change are as follows: ^Ac 4965928: do not throw an exception during restoration failure. ^Ac Failure at this point would be followed by repainting events in any case, ^Ac and re-rendering one part of a frame based on this exception would be ^Ac worthless It seems that the assumption that there would be a repaint was incorrect. Maybe this repainting used to occur because of the incorrect logic of the offscreen restore... ###@###.### 2005-03-25 15:41:48 GMT Re-inserting this exception worked ... and didn't. First of all, the logic of this code is different from the original; originally, we threw an IPE if the restore on the primary surface failed. But in this case, we want the opposite; we want to throw an exception in the case where a restore _succeeded_; this is to indicate to the caller that they need to redo their rendering operation because the surface is now in a different state than it was previously. So I added some code to throw the IPE if DDRestored() succeeded in Win32SD_RestoreSurface(), and this had the correct effect of causing the window to get repainted, and the application appears to work now. However, this succeeded through unexpected means. The IPE is caught in DrawImage.renderImageCopy(), where we force the source surface (in this case, the already-restored offscreen surface) to restore itself and then re-do the operation. This offscreen replacement causes (you guessed it) marking the offscreen lostSurface flag to true, which causes a contentsLost() at the BufferStrategy level to succeed, thus Swing forces a repaint. The end result is correct; Swing goes through another repaint cycle to restore the contents. But the intermediate results are completely wrong; we should not replace the offscreen surface; we only want to force a repaint of the onscreen surface. In addition, throwing this new exception could have unexpected results, such as causing other native code to be incorrect because it assumes that we have not thrown an exception. (For example, throwing an exception while we are in the middle of one already can cause problems). I'm wondering if there is a better way to work around this issue. Namely, we can force a repaint at the native level that will give the results we want. We have restored the onscreen surface and the offscreen surface and everything is ready to go; we just need to make sure that we get a further request to repaint to make the contents correct (in line with the incorrect sccs comment pasted above...) ###@###.### 2005-03-25 16:42:21 GMT This approach appears to work; calling: InvalidateRect(wsdo->window, NULL, FALSE); causes Windows to invalidate the entire client region of the window, which causes a later WM_PAINT message to propagate, which causes a Java-level repaint() event (at least without the buffer-per-window fix, which should just cause the back buffer to be copied to the screen, which should do just as well). I think the logic here is sound: we have just restored the onscreen surface, and we need to ensure that it gets repainted in case any rendering to it got noop'd prior to restoration. Invalidating the region will force a repaint, but if a repaint was already on the way it will not force a second refresh (it simply adds the window client area to the current area waiting to be refreshed). So the current fix proposal is twofold: - fix the logic in VolatileSM.acceleratedSurfaceLost() to only set the lostSurface flag if the accelerated surface is the current surface for that surface manager. - upon successful restoration of the primary surface (in Win32SD_RestoreSurface), invalidate the window in question to force a later repaint. ###@###.### 2005-03-25 18:47:29 GMT
23-03-2005