JDK-4527745 : drawLine performance seriously degraded in JDK1.4 vs 1.3
  • Type: Bug
  • Component: client-libs
  • Sub-Component: 2d
  • Affected Version: 1.4.0
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • OS: windows_2000
  • CPU: x86
  • Submitted: 2001-11-15
  • Updated: 2002-05-13
  • Resolved: 2002-05-13
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.4.1 betaFixed
Related Reports
Relates :  
Description

Name: ddT132432			Date: 11/15/2001


java version "1.4.0-beta3"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta3-b84)
Java HotSpot(TM) Client VM (build 1.4.0-beta3-b84, mixed mode)

I am drawing approximately 1000 line segments (using Graphics.drawLine) in a
JPanel. Running under JDK 1.3.1, the draw operation completes in approximately
10 ms. Running under JDK 1.4beta3, the same operation requires approximately 60
ms. I've noticed that vertical/horizontal lines are equally quick to draw in
1.3 and 1.4. The issue is only with non-vertical/horizontal lines.

I have a Celeron 366 with an ATI Rage Pro Turbo. Turning hardware acceleration
completely off in the Windows Display Settings dialog restores the JDK 1.4
performance to equal that of JDK 1.3.

Test case is attached to this bug

Webbugs Test
--------------------
Using 1.3.1 b24
C:\01>c:\jdk1.3.1\bin\java DrawLineTest
plotData took 90 ms
plotData took 90 ms
plotData took 120 ms
plotData took 100 ms
plotData took 20 ms
plotData took 10 ms
plotData took 20 ms
plotData took 10 ms
plotData took 10 ms
plotData took 0 ms
plotData took 10 ms
plotData took 0 ms

Using 1.4beta3 b85
C:\01>c:\jdk1.4\bin\java DrawLineTest
plotData took 60 ms
plotData took 291 ms
plotData took 301 ms
plotData took 290 ms
plotData took 280 ms
plotData took 301 ms
plotData took 290 ms
plotData took 290 ms
plotData took 321 ms
plotData took 360 ms
plotData took 300 ms

Using latest build JDK1.4 b86
plotData took 331 ms
plotData took 280 ms
plotData took 290 ms
plotData took 290 ms
plotData took 311 ms
plotData took 291 ms
plotData took 291 ms
plotData took 291 ms
plotData took 320 ms
plotData took 331 ms
plotData took 291 ms
plotData took 291 ms
(Review ID: 135678) 
======================================================================

Note that you can also reproduce the results above by running the test
in jdk1.4 with the -Dsun.java2d.ddoffscreen=false flag  on win32 (forcing the
VolatileImage to live in system memory instead of VRAM).

###@###.### 2001-11-20

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: hopper-beta FIXED IN: hopper-beta INTEGRATED IN: hopper-beta
14-06-2004

WORK AROUND See flags documented in Evaluation for possible workarounds in jdk1.4.
11-06-2004

EVALUATION Diagonal lines are indeed slower in some situations in 1.4 vs 1.3. In particular, diagonal lines drawn into a VolatileImage are slower on win32. The DrawLineTest example uses a JPanel as the drawing destination. This is a Swing component which is automatically double-buffered, so rendering to the component actually goes to the back buffer and is then copied onto the screen. So the test is actually measuring the time taken for drawing diagonal lines to a VolatileImage (1.4) vs. the type of back buffer used in 1.3 (a BufferedImage). The reason for the slowdown is that we do not hardware-accelerate diagonal line drawing in this release. So for each line, we end up getting a lock to VRAM, drawing the line, and unlocking VRAM (actually, we use GDI via DirectDraw, but that does essentially the same thing). Locking VRAM is somewhat expensive, but so is drawing into VRAM directly (as opposed to using hardware acceleration commands, like DirectDraw). The fact is that it is much slower to send pixels over the system bus than it is out to system memory (which could even be cached local to the CPU). So writes to VRAM will necessarily be more expensive than writes to system memory. Thus pixels drawn into a VolatileImage (on win32) will take longer than pixels drawn into a BufferedImage (which is always stored in system memory). This is a known problem about using VolatileImage objects for the back buffer and one that will not be fixed for this release. However, it was decided that, in general, the tradeoff is worthwhile. If all we did was draw diagonal lines (or other primtives which cannot be hardware-accelerated), then a VRAM back buffer would obviously be a horrible thing in general. But there are many other things drawn to and from the buffer which make the tradeoff worthwhile. For one particularly important example, copying the back buffer to the screen _is_ hardware accelerated. Other operations, such as rectangular fills to the back buffer, are also accelerated. In general, normal Swing GUI operations tend to be hardware accelerated (barring some examples such as diagonal lines and text), so the net result is better performance for Swing overall. To illustrate the importance of the tradeoff, I wrote a new example called LineBufferPerformanceTest and attached it to this bug. This test times diagonal line AND buffer copying performance together and illustrates the tradeoff in performance. It runs the test on both a VolatileImage buffer and BufferedImage buffer (equivalent to the 1.4 vs. 1.3 test). For each test, a certain number of 100-pixel lines is drawn into the buffer and the buffer is copied onto the window. For some tests, this is done many times for one timing so that we can get a more accurate measurement (for example, the 1 line/frame test is run 1000 times). The total time for each test is taken and printed out so that the user can compare performance of a VolatileImage situation to a BufferedImage situation. On most framebuffers of the last 3 years or so (I ran my test on a TnT2), you should find that for the tests of less lines/frame, there is a net win for the VolatileImage approach. This is because the win from the buffer copy greatly outshadows the loss due to slower line performance. As the number of lines/frame increase, this win is less obvious. When you get to the number of lines used in the DrawLineTest submitted with this bug (5000), the test favors the BufferedImage approach. Here are the test results I got for my system (dual PIII-850, TnT2, 32 bpp): Frames Lines/Frame VImage buffer Image buffer ------ ----------- ------------- ------------ 1000 1 11312 ms 18578 ms 100 10 1156 ms 1859 ms 10 100 141 ms 187 ms 1 1000 47 ms 31 ms 1 5000 141 ms 94 ms 1 10000 281 ms 141 ms You can see from these results that when the number of lines reaches 1000 per frame, the loss of performance due to lines outweights the gain due to the buffer copy and the BufferedImage is slower. These results should be similar on most current framebuffers and most older ones (I did notice somewhat poor performance on my ATI Rage Pro Turbo card, however). Note, too, that the speed of the CPU will affect the tests as well. In general, the benefit of having a VolatileImage back buffer on win32 is dependent on the application. It is expected that most applications (especially those performing simple GUI rendering) will greatly benefit from the acceleration provided by VolatileImage. Some applications may be doing so much unaccelerated rendering into the back buffer each frame that the performance may not be as obvious (or perhaps the app will even performe somewhat worse). Hopefully the test application rendering 5000 diagonal lines per frame is not a typical usage scenario for the bug submitter. I would point out a couple of points that might help users in the long run: - We cannot accelerate diagonal lines and various other primitives in the 1.4 release, but we are looking into doing so in future releases. There is a fair amount of work to be done for this (thus we cannot deliver it in 1.4), but the benefit should be obvious... - If a user does find that performance is bad for their particular situation (such as an app that draws 5000 diagonal lines per frame), there are some runtime flags which may help. These flags can be used to force the Swing back buffer into system memory instead. Note that these flags are useful for now (jdk1.4), but they may be changed or go away in the future (hopefully when we do not have these performance concerns). -Dsun.java2d.noddraw=true This disables the use of DirectDraw entirely. This is a somewhat extreme workaround, since there are still benefits to DirectDraw acceleration even if the back buffer is not a DirectDraw surface. For example, copies to the screen still use DirectDraw, which is faster than the alternative. If you're just trying to disable the use of DirectDraw for storing images in VRAM, try one of the other suggestions below. -Dsun.java2d.ddoffscreen=false This disables the use of DirectDraw for all VolatileImage objects, thus making them equivalent to BufferedImage objects (at least as far as memory placement is concerned) -Dswing.volatileImageBufferEnabled=false This tells Swing to not use VolatileImage for its back buffer, thus avoiding the whole VRAM issue. Also, you can programmatically tell Swing to be single-buffered instead of double-buffered, so all rendering goes directly to the screen. This is done by the following static method: RepaintManager.setDoubleBufferingEnabled(false); Since the main problem that the bug was submitted under is not going to be fixed, I would like to close the bug as "Will not fix". However, there are some outstanding issues with slight degradations in BufferedImage performance between 1.3 and 1.4, so I will keep the bug open for now. ###@###.### 2001-11-20 I think the real fix for this problem is to use an API that will allow us to draw hw-accelerated lines to our hw-accelerated images. On Windows, this means using Direct3D or OpenGL; the only two APIs generally available that have hardware-accelerated line and offscreen buffer support. Given our current use of DDraw, using Direct3D is probably the best way to go for now. I implemented a prototype of this support and got favorable results. In general, I found that: - vertical/horizontal lines were still faster using DDraw than D3D (presumably because a simple color-filled rectangle is way faster than the rasterization setup for even a simple d3d primitive). So we should probably keep these operations going through DDraw for now, and just handle non-DDraw-friendly primitives through D3D. - diagonal lines were way faster in D3D than in our current code (in which we punt from DDraw to GDI). - diagonal lines through D3D still do not match the speed of diagonal lines to BufferedImage surfaces, presumably because the overhead of D3D operations for such a simple primitive outweighs the acceleration we get. For a BufferedImage, we simply use a straightforward Bresenham algorithm to render the line into system memory which, for small primitives, is pretty fast. But we do, at least, come a lot closer to the old software-buffer performance than we do in jdk1.4 with only DDraw/GDI support. - Outlined rectangles (Graphics.drawRect()) are much faster in D3D than in DDraw, presumably because we can now batch multiline primitives in d3d instead of issuing separate DDraw calls for each of the four outlines. - All of this performance data is sensitive to the OS, the video card and driver, and the size of the lines being tested. Given these results, using D3D seems like a pretty good way to go. Although it does not match the performance of lines on software back buffers (ala 1.3.1) in every situation, it does get us considerably closer to that performance than we were in jdk1.4. And given the other benefits of the hardware- accelerated back buffer, this seems at least a lot better than where we're at now and possibly good enough in general. However, the other major thing I found in implementing this functionality was rendering differences between differen video cards. Specifically, I found that diagonal lines on the Intel i815E graphics chip were rendered 2-pixels wide by default. This is not something that is specifiable or even queriable through the d3d API; it's just a bug in how this primitive is rendered on that specific hardware. I believe the same bug also exists on earlier Intel chips (e.g., the 810 chip). Since we obviously cannot provide lines that render arbitrarily based on your runtime platform, using D3D must involve checking for rendering artifacts and enabling/disabling this approach based on runtime tests. Given this extra work, we have not yet made d3d the default renderer for lines in 1.4.1. However, we did putback code that allows you to force the use of d3d for testing purposes. You can enable this for now by using the command-line switch -Dsun.java2d.d3d=true (Note that this is yet another unsupported command-line switch that may go away or evolve with future releases) Hopefully we will finish the error-checking part of this work soon and be able to depend on D3D lines (or punting in artifact-laden situations) as the default case. ###@###.### 2002-04-29 This fix is now complete. We now do a runtime check to ensure that the runtime platform draws lines that are compatible with our software renderer. Note that the original test case for this bug (DrawLineTest) does not show any improvement (in fact, it may actually run slower with d3d lines one some platforms). But this can be fixed in the test itself. The test creates a frame of size 500x500 and then draws lines whose endpoints are randomly generated in that same area. But a frame of 500x500 will have a drawing surface that is actually only 492x473. So many of the lines are being drawn to/from points outside the drawing area. Currently, the d3d renderer does not handle clipping of lines and punts to a more general solution. This punting causes a performance hit (due to thrashing between Direct3D and GDI, from what I have been able to figure out). A simple change to the test app would be to change the random generator code to generate points only within the drawing area (using getWidth() and getHeight(), for example). When this change is made, the use of d3d lines make the app significantly faster than either 1.4 or 1.3.1 (assuming your platform has a 3D accelerator that draws Java2D-compatible lines, which should be the case for most modern video cards). A future fix should include the ability to submit lines that require clipping without the performance penalty we are currently seeing. But for now, the base case of unclipped diagonal lines is vastly improved over previous releases, so I'm marking this bug fixed. ###@###.### 2002-05-10
10-05-2002