Bug ID: JDK-8311774 Optimize MTLCommandBuffer creation and commit

Type: Sub-task
Component: javafx
Sub-Component: graphics
Affected Version: internal

Priority: P3
Status: Resolved
Resolution: Fixed
OS: os_x
CPU: generic

Submitted: 2023-07-10
Updated: 2024-07-06
Resolved: 2023-08-14

Other
internalFixed

Currently we create a new MTLCommandBuffer each time when Prism makes a draw call: This results in several MTLCommandBuffers being committed per frame.
but we should have one or two MTLCommandBuffers per frame, as per the best practices suggested by Apple here :
https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/MTLBestPracticesGuide/CommandBuffers.html

We need to follow this guideline to improve performance.
Below is an performance observation of basic trial of this change.
Changes:
1. Single CommandBuffer for each frame.
2. Remove [commandBuffer waitUntilCompleted]
3. Remove for loop for copying rtt data ( in Java_com_sun_prism_mtl_MTLRTTexture_nReadPixelsFromContextRTT) : displays black screen

Tests execution and observation:
40,000 Rectangles:
ES2: Rectangle (Objects Frames FPS), 40000, 148, 14.741
MTL: Rectangle (Objects Frames FPS), 40000, 174, 17.376 ( with all 3 changes above )
MTL: Rectangle (Objects Frames FPS), 40000, 172, 17.102 ( with only 1 and 2 changes above )

10,000 Rectangles:
ES2: Rectangle (Objects Frames FPS), 10000, 548, 54.793
MTL: Rectangle (Objects Frames FPS), 10000, 552, 55.114 ( with all 3 changes above )
MTL: Rectangle (Objects Frames FPS), 10000, 454, 45.370 ( with only 1 and 2 changes above )

more about above three tasks
1. Single CommandBuffer for each frame-> We might need more than one CommandBuffer per frame. We need to identify all the scenarios which would lead to committing CommandBuffer.
2. Remove [commandBuffer waitUntilCompleted] -> We need to use the CompletionHandler provided by MTLCommandBuffer : This will need some synchronisation logic among, blit from rtt to CAMetalLayer, Handling completion handlers, committing a new command buffer
3. Remove for loop for copying rtt data: we will eventually use BlitEncoder.
With above changes done correctly, Metal should be close to ES2 for large data.

After that we can re-evaluate FPS for smaller data. ( which should definitely be more than current )

Changeset: 83c8385b Author: Ambarish Rapte <arapte@openjdk.org> Date: 2023-08-14 12:25:12 +0000 URL: https://git.openjdk.org/jfx-sandbox/commit/83c8385ba5db7119924a065b2f0309da9e602f04

14-08-2023