In implementing a solution for RT-30107 (use new 1-step clip support in Canvas code) a bottleneck was discovered in the way that we manage the clip (and temp) buffers - we validate and lock them and then unlock them on every operation that uses these buffers. For the Bitmap test with a large number of monsters we render hundreds or thousands of images per frame and each one requires us to lock a clip texture if clipping is turned on (and if it is non-rectangular after the fix for RT-37300).
As a result, the method of performing the clip, as mentioned by RT-30107, has little effect on the performance, but a simple change to only lock the clip the first time it is used in rendering a buffer and then to only unlock it as the method exits increased performance by 6-8x.