Rendering performance has been greatly improved on Windows platforms thanks to the new GPU-accelerated D3D pipeline in 6u10. However, there still remain many machines (maybe 50% or more of the Windows market) that are unable to make use of the D3D pipeline due to insufficient drivers and/or hardware (e.g. Intel integrated graphics chipsets). For those machines without D3D support, we use the software-based pipelines by default, so it is essential that we improve performance of the software-based code paths to bring them closer the performance level of the GPU-accelerated paths. Adding MMX and/or SSE optimized versions of our software loops, similar to what was done with our VIS loops on SPARC platforms, would go a long way to improving rendering performance of many important cases on x86/x64 hardware.