JDK-8233028 : Investigate slow performance of ICMColorDataTest on Windows x64
  • Type: Bug
  • Component: client-libs
  • Sub-Component: 2d
  • Affected Version: 7,8,11,13,14
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: windows
  • CPU: x86_64
  • Submitted: 2019-10-25
  • Updated: 2021-10-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Description
During code review for JDK-8232634, it was noted [1] ICMColorDataTest¹ completes 300_000 iterations on macOS in about 3 minutes whereas on 64-bit Windows it usually takes more than 30 minutes.

It makes sense to investigate where the most time is spent on Windows. [2][3]

¹ test/jdk/java/awt/image/BufferedImage/ICMColorDataTest/ICMColorDataTest.java
[1] http://mail.openjdk.java.net/pipermail/2d-dev/2019-October/010359.html
[2] http://mail.openjdk.java.net/pipermail/2d-dev/2019-October/010365.html
[3] http://mail.openjdk.java.net/pipermail/2d-dev/2019-October/010366.html
Comments
That would be OK with me. We still have Sergey's open issue to figure out why it is so darn slow and we can keep that around.
23-03-2020

[~prr] The proposed patch for JDK-8232634 http://mail.openjdk.java.net/pipermail/2d-dev/2019-October/010358.html reduces the number of iterations to 30,000. Limiting to 1,000 iterations is not enough even for 32 bit Windows: 20,000–25,000 iterations are required. The proposed patch reduces the time on Windows to about 5 minutes and changes the timeout to 10 minutes. I should've followed up to problem list the test after we decided to investigate the differences in performance more thoroughly. Since the test mostly targets 32 bit Windows which is now unsupported, shall we remove the test altogether?
23-03-2020

We even have a bug for that: http://mail.openjdk.java.net/pipermail/2d-dev/2019-October/010368.html https://bugs.openjdk.java.net/browse/JDK-8232634
20-03-2020

Seems like there's no quick solution. We should problem list this test since it is causing entire test tasks to timeout. And it is a lot more than 30 minutes - I am seeing 90 minutes on a GOOD day.
20-03-2020

I'll try: - Remove all finalizers, probably we overload finalizer thread. - Split the work between Disposer thread and some other thread. - Check how Cleaner will work, probably our DIsposer thread is too complicated(too advanced) for such use-cases?
22-01-2020

But everything above did not shed the light on why the Windows is so much slower.
21-01-2020

Sergey, Your understanding is correct. The test produces lots of garbage both in Java and in native memory. The purpose of the test is to exhaust the native memory. If native memory allocation fails, the test catches OutOfMemoryError or NullPointerException. Before the fix for JDK-8201433 where the test was added, this condition lead to a crash. As I explained earlier, the test is targeted to 32 bit Windows where this problem easily reproducible. If you call System.gc() from the test, the native memory will never be exhausted to a point that native memory allocation fails. Your investigation also shows that 2D primitives work fast. If it were not for the accumulated garbage, the test would be pretty fast. Having the above in mind, I see two options: * Apply the fix suggested in JDK-8232634 which reduces the execution time (on 64 bit Windows). * Remove the test altogether as 32 bit platforms are unsupported now. It is unlikely we can exhaust native memory on 64 bit platform to a point where native allocation fails.
20-01-2020

My observations: 1. The test generates a custom IndexColorModel and creates a BufferedImage on top of it. 2. The test draws the line to the BufferedImage. 3. The test repeats the steps above 300_000 times. ========================================== - The culprit is in the drawRect if this method is commented then the test completes in seconds - The test generates tons of garbage in the java heap, but the time spent on it is not comparable to drawRect - The test also generates tonnes of native garbage, which cleaned by the disposer thread. - The test also allocates tonnes of the small objects via malloc and clean them via free Initially, the test works fast but at some point, it slows down extremely and spent most of the time in this trace: -- DrawRect -- BufImg_Lock -- BufImg_SetupICM -- initCubemap -- recursiveLevel -<<<< This method call itself recursively 47 times My understanding of this issue: - There is no difference between the code executed on different platforms(at least related to the Client Libraries) - Every piece of test works fast if run separately The reason for slowness is: - In the BufImg_SetupICM we use GetPrimitiveArrayCritical which blocks the GC. - Inside GetPrimitiveArrayCritical we call "recursiveLevel" which is our slowest place in the test - Since all other places in the test are fast we spent most of the time GetPrimitiveArrayCritical section, effectively blocking any GC - The native data which should be deallocated via disposer stop working, the usage of the memory grows - At some point, we will get a situation when the GC/Finalizer/Disposer threads start to clean the 1G of memory allocated by the test, while the test itself still call GetPrimitiveArrayCritical at hight rate. As a result, the DrawRect starts to execute in more than 10 seconds. This is the reason why the test so much slower on the single-threaded system. To prove above things I have tried(my initial time is ~1h30m): - Add a System.gc per 1000 iterations, it helps - 5 minutes - Change the max memory of the test from 1G to 30m, it helps - 1m45sec - increase the memory to 5G, I have waited for the result for the whole day, and killed the test. Still trying to figure out is it possible to fix this in the product or not.
19-01-2020

I will take a look to this as a side project.
16-01-2020

So I think there is nothing to do here. H/W accelerated is fast on Windows and Mac. S/W is much slower.
16-01-2020

Running the test with -Dsun.java2d.trace=count gives the following output: 300000 calls to sun.java2d.loops.DrawRect::DrawRect(AnyColor, SrcNoEa, ByteBinary1Bit)
13-11-2019

I ran this test on different hardware. It looks the speed greatly depends on graphics card. A laptop with an integrated graphics card (i5 @ 2.50 GHz): noddraw d3d true false true false 1:46:08 1:11:57 1:11:36 1:26:29 A laptop with a discrete graphics card (i7 @ 2.20 GHz): noddraw d3d true false true false 0:04:51 0:05:33 0:05:21 0:05:17 These are average times over 5 runs of ICMColorDataTest. I also ran this test on an old desktop, average run time was 0:21:06. Running this test on single-core VM took more than 10 hours.
12-11-2019

Looks like all software loops, and should be shared code, but perhaps it behaves a little differently ? 10x can't just be down to compiler options. Is the VM and GC playing a big role here ?
28-10-2019