Currently we create a new MTLCommandBuffer each time when Prism makes a draw call: This results in several MTLCommandBuffers being committed per frame.
but we should have one or two MTLCommandBuffers per frame, as per the best practices suggested by Apple here :
https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/MTLBestPracticesGuide/CommandBuffers.html
We need to follow this guideline to improve performance.
Below is an performance observation of basic trial of this change.
Changes:
1. Single CommandBuffer for each frame.
2. Remove [commandBuffer waitUntilCompleted]
3. Remove for loop for copying rtt data ( in Java_com_sun_prism_mtl_MTLRTTexture_nReadPixelsFromContextRTT) : displays black screen
Tests execution and observation:
40,000 Rectangles:
ES2: Rectangle (Objects Frames FPS), 40000, 148, 14.741
MTL: Rectangle (Objects Frames FPS), 40000, 174, 17.376 ( with all 3 changes above )
MTL: Rectangle (Objects Frames FPS), 40000, 172, 17.102 ( with only 1 and 2 changes above )
10,000 Rectangles:
ES2: Rectangle (Objects Frames FPS), 10000, 548, 54.793
MTL: Rectangle (Objects Frames FPS), 10000, 552, 55.114 ( with all 3 changes above )
MTL: Rectangle (Objects Frames FPS), 10000, 454, 45.370 ( with only 1 and 2 changes above )
more about above three tasks
1. Single CommandBuffer for each frame-> We might need more than one CommandBuffer per frame. We need to identify all the scenarios which would lead to committing CommandBuffer.
2. Remove [commandBuffer waitUntilCompleted] -> We need to use the CompletionHandler provided by MTLCommandBuffer : This will need some synchronisation logic among, blit from rtt to CAMetalLayer, Handling completion handlers, committing a new command buffer
3. Remove for loop for copying rtt data: we will eventually use BlitEncoder.
With above changes done correctly, Metal should be close to ES2 for large data.
After that we can re-evaluate FPS for smaller data. ( which should definitely be more than current )