JDK-8240849 : Improve performance of IntTo4ByteSameConverter
Type:Enhancement
Component:javafx
Sub-Component:graphics
Affected Version:openjfx14
Priority:P4
Status:Open
Resolution:Unresolved
Submitted:2020-03-11
Updated:2020-05-12
The Version table provides details related to the release that this issue/RFE will be addressed.
Unresolved : Release in which this issue/RFE will be addressed. Resolved: Release in which this issue/RFE has been resolved. Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.
IntTo4ByteSameConverter's methods do bulk manipulation of arrays directly. It could be possible to improve their performance with Buffers (specifically, IntBuffer).
Comments
I'm not sure about the performance difference. I suggested to check it, but if you are familiar with it we can close this. I think that maybe one of the loops can be removed because the height calculation can be determined.
I think that the question about the native instructions should be asked on a technical development list, maybe the compiler-dev.
12-05-2020
For input buffers that are allocated on-heap, I think it is is actually faster to manipulate the underlying array rather that use the get and put from the Buffer class.
The current implementation actually goes as far as figuring out whether the underlying array is accessible and in this case explicitly favors array manipulation over the buffer accessors; I'm guessing precisely for that reason.
However, looking at the exact nature of the byte manipulations this method does, swapping pixel format between ARGB and BGRA, it turns out there exists an x86 assembly instruction that does precisely that: BSWAP (https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf#I5.1.337692)
Short of a SIMD equivalent (which I'm not aware of), that's no doubt the most efficient way to do it but unfortunately, I'm not aware of a Java Intrinsic that exposes it explicitly.
Does anyone know if C2 can - now or in an upcoming JEP - intrinsify that byte swapping loop automatically ?