Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
|
Relates :
|
This is VNNI VPDPWSSD instruction support with autovectorization. It can vectorize this operation in the loop: out[i] += ((in1[2*i] * in2[2*i]) + (in1[2*i+1] * in2[2*i+1])); This patch is useful for AI ML/DL applications such as convolution based Neural Nets. More information on VNNI can be found here: https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf Code contributed by: razvan.a.lupusoru@intel.com and vdeshpande(vivek.r.deshpande@intel.com) The initial performance gains with micro on skylake with AVX3 is 10.8x. and it generates vmovdqu xmm3, xmmword ptr [rbp+r8*2+0x10] vmovdqu xmm6, xmmword ptr [rdx+r8*2+0x10] vpmaddwd xmm3, xmm6, xmm3 vpaddd xmm3, xmm3, xmmword ptr [r9+rdi*4+0x10] vmovdqu xmmword ptr [r9+rdi*4+0x10], xmm3 It can generate vpdpwssd instruction on cascadelake. The webrev is here: http://cr.openjdk.java.net/~vdeshpande/8214751/VNNI/webrev.00/
|