JDK-8265783 : Create a separate library for x86 Intel SVML assembly intrinsics
  • Type: Sub-task
  • Component: hotspot
  • Sub-Component: compiler
  • Affected Version: 17
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • CPU: x86
  • Submitted: 2021-04-22
  • Updated: 2021-06-18
  • Resolved: 2021-06-03
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 17
17 b26Fixed
Related Reports
CSR :  
Description
Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods.
These methods are built into a separate library instead of being part of libjvm.so or jvm.dll.

The following changes are made:
   The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml.
   The assembly source files are named as ���*.S��� and include files are named as ���*.S.inc���.
   The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk.
   Changes are made to build system to support dependency tracking for assembly files with includes.
   The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux.
   The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library.

Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bursie@oracle.com).

This work is part of second round of incubation of the Vector API.
JEP: https://bugs.openjdk.java.net/browse/JDK-8261663

Performance:

Micro benchmark	Base	Optimized	Unit	Gain(Optimized/Base)
Double128Vector.ACOS	45.91	87.34	ops/ms	1.90
Double128Vector.ASIN	45.06	92.36	ops/ms	2.05
Double128Vector.ATAN	19.92	118.36	ops/ms	5.94
Double128Vector.ATAN2	15.24	88.17	ops/ms	5.79
Double128Vector.CBRT	45.77	208.36	ops/ms	4.55
Double128Vector.COS	49.94	245.89	ops/ms	4.92
Double128Vector.COSH	26.91	126.00	ops/ms	4.68
Double128Vector.EXP	71.64	379.65	ops/ms	5.30
Double128Vector.EXPM1	35.95	150.37	ops/ms	4.18
Double128Vector.HYPOT	50.67	174.10	ops/ms	3.44
Double128Vector.LOG	61.95	279.84	ops/ms	4.52
Double128Vector.LOG10	59.34	239.05	ops/ms	4.03
Double128Vector.LOG1P	18.56	200.32	ops/ms	10.79
Double128Vector.SIN	49.36	240.79	ops/ms	4.88
Double128Vector.SINH	26.59	103.75	ops/ms	3.90
Double128Vector.TAN	41.05	152.39	ops/ms	3.71
Double128Vector.TANH	45.29	169.53	ops/ms	3.74
Double256Vector.ACOS	54.21	106.39	ops/ms	1.96
Double256Vector.ASIN	53.60	107.99	ops/ms	2.01
Double256Vector.ATAN	21.53	189.11	ops/ms	8.78
Double256Vector.ATAN2	16.67	140.76	ops/ms	8.44
Double256Vector.CBRT	56.45	397.13	ops/ms	7.04
Double256Vector.COS	58.26	389.77	ops/ms	6.69
Double256Vector.COSH	29.44	151.11	ops/ms	5.13
Double256Vector.EXP	86.67	564.68	ops/ms	6.52
Double256Vector.EXPM1	41.96	201.28	ops/ms	4.80
Double256Vector.HYPOT	66.18	305.74	ops/ms	4.62
Double256Vector.LOG	71.52	394.90	ops/ms	5.52
Double256Vector.LOG10	65.43	362.32	ops/ms	5.54
Double256Vector.LOG1P	19.99	300.88	ops/ms	15.05
Double256Vector.SIN	57.06	380.98	ops/ms	6.68
Double256Vector.SINH	29.40	117.37	ops/ms	3.99
Double256Vector.TAN	44.90	279.90	ops/ms	6.23
Double256Vector.TANH	54.08	274.71	ops/ms	5.08
Double512Vector.ACOS	55.65	687.54	ops/ms	12.35
Double512Vector.ASIN	57.31	777.72	ops/ms	13.57
Double512Vector.ATAN	21.42	729.21	ops/ms	34.04
Double512Vector.ATAN2	16.37	414.33	ops/ms	25.32
Double512Vector.CBRT	56.78	834.38	ops/ms	14.69
Double512Vector.COS	59.88	837.04	ops/ms	13.98
Double512Vector.COSH	30.34	172.76	ops/ms	5.70
Double512Vector.EXP	99.66	1608.12	ops/ms	16.14
Double512Vector.EXPM1	43.39	318.61	ops/ms	7.34
Double512Vector.HYPOT	73.87	1502.72	ops/ms	20.34
Double512Vector.LOG	74.84	996.00	ops/ms	13.31
Double512Vector.LOG10	71.12	1046.52	ops/ms	14.72
Double512Vector.LOG1P	19.75	776.87	ops/ms	39.34
Double512Vector.POW	37.42	384.13	ops/ms	10.26
Double512Vector.SIN	59.74	728.45	ops/ms	12.19
Double512Vector.SINH	29.47	143.38	ops/ms	4.87
Double512Vector.TAN	46.20	587.21	ops/ms	12.71
Double512Vector.TANH	57.36	495.42	ops/ms	8.64
Double64Vector.ACOS	24.04	73.67	ops/ms	3.06
Double64Vector.ASIN	23.78	75.11	ops/ms	3.16
Double64Vector.ATAN	14.14	62.81	ops/ms	4.44
Double64Vector.ATAN2	10.38	44.43	ops/ms	4.28
Double64Vector.CBRT	16.47	107.50	ops/ms	6.53
Double64Vector.COS	23.42	152.01	ops/ms	6.49
Double64Vector.COSH	17.34	113.34	ops/ms	6.54
Double64Vector.EXP	27.08	203.53	ops/ms	7.52
Double64Vector.EXPM1	18.77	96.73	ops/ms	5.15
Double64Vector.HYPOT	18.54	103.62	ops/ms	5.59
Double64Vector.LOG	26.75	142.63	ops/ms	5.33
Double64Vector.LOG10	25.85	139.71	ops/ms	5.40
Double64Vector.LOG1P	13.26	97.94	ops/ms	7.38
Double64Vector.SIN	23.28	146.91	ops/ms	6.31
Double64Vector.SINH	17.62	88.59	ops/ms	5.03
Double64Vector.TAN	21.00	86.43	ops/ms	4.12
Double64Vector.TANH	23.75	111.35	ops/ms	4.69
Float128Vector.ACOS	57.52	110.65	ops/ms	1.92
Float128Vector.ASIN	57.15	117.95	ops/ms	2.06
Float128Vector.ATAN	22.52	318.74	ops/ms	14.15
Float128Vector.ATAN2	17.06	246.07	ops/ms	14.42
Float128Vector.CBRT	29.72	443.74	ops/ms	14.93
Float128Vector.COS	42.82	803.02	ops/ms	18.75
Float128Vector.COSH	31.44	118.34	ops/ms	3.76
Float128Vector.EXP	72.43	855.33	ops/ms	11.81
Float128Vector.EXPM1	37.82	127.85	ops/ms	3.38
Float128Vector.HYPOT	53.20	591.68	ops/ms	11.12
Float128Vector.LOG	52.95	877.94	ops/ms	16.58
Float128Vector.LOG10	49.26	603.72	ops/ms	12.26
Float128Vector.LOG1P	20.89	430.59	ops/ms	20.61
Float128Vector.SIN	43.38	745.31	ops/ms	17.18
Float128Vector.SINH	31.11	112.91	ops/ms	3.63
Float128Vector.TAN	37.25	332.13	ops/ms	8.92
Float128Vector.TANH	57.63	453.77	ops/ms	7.87
Float256Vector.ACOS	65.23	123.73	ops/ms	1.90
Float256Vector.ASIN	63.41	132.86	ops/ms	2.10
Float256Vector.ATAN	23.51	649.02	ops/ms	27.61
Float256Vector.ATAN2	18.19	455.95	ops/ms	25.07
Float256Vector.CBRT	45.99	594.81	ops/ms	12.93
Float256Vector.COS	43.75	926.69	ops/ms	21.18
Float256Vector.COSH	33.52	130.46	ops/ms	3.89
Float256Vector.EXP	75.70	1366.72	ops/ms	18.05
Float256Vector.EXPM1	39.00	149.72	ops/ms	3.84
Float256Vector.HYPOT	52.91	1023.18	ops/ms	19.34
Float256Vector.LOG	53.31	1545.77	ops/ms	29.00
Float256Vector.LOG10	50.31	863.80	ops/ms	17.17
Float256Vector.LOG1P	21.51	616.59	ops/ms	28.66
Float256Vector.SIN	44.07	911.04	ops/ms	20.67
Float256Vector.SINH	33.16	122.50	ops/ms	3.69
Float256Vector.TAN	37.85	497.75	ops/ms	13.15
Float256Vector.TANH	64.27	537.20	ops/ms	8.36
Float512Vector.ACOS	67.33	1718.00	ops/ms	25.52
Float512Vector.ASIN	66.12	1780.85	ops/ms	26.93
Float512Vector.ATAN	22.63	1780.31	ops/ms	78.69
Float512Vector.ATAN2	17.52	1113.93	ops/ms	63.57
Float512Vector.CBRT	54.78	2087.58	ops/ms	38.11
Float512Vector.COS	40.92	1567.93	ops/ms	38.32
Float512Vector.COSH	33.42	138.36	ops/ms	4.14
Float512Vector.EXP	70.51	3835.97	ops/ms	54.41
Float512Vector.EXPM1	38.06	279.80	ops/ms	7.35
Float512Vector.HYPOT	50.99	3287.55	ops/ms	64.47
Float512Vector.LOG	49.61	3156.99	ops/ms	63.64
Float512Vector.LOG10	46.94	2489.16	ops/ms	53.02
Float512Vector.LOG1P	20.66	1689.86	ops/ms	81.81
Float512Vector.POW	32.73	1015.85	ops/ms	31.04
Float512Vector.SIN	41.17	1587.71	ops/ms	38.56
Float512Vector.SINH	33.05	129.39	ops/ms	3.91
Float512Vector.TAN	35.60	1336.11	ops/ms	37.53
Float512Vector.TANH	65.77	2295.28	ops/ms	34.90
Float64Vector.ACOS	48.41	89.34	ops/ms	1.85
Float64Vector.ASIN	47.30	95.72	ops/ms	2.02
Float64Vector.ATAN	20.62	49.45	ops/ms	2.40
Float64Vector.ATAN2	15.95	112.35	ops/ms	7.04
Float64Vector.CBRT	24.03	134.57	ops/ms	5.60
Float64Vector.COS	44.28	394.33	ops/ms	8.91
Float64Vector.COSH	28.35	95.27	ops/ms	3.36
Float64Vector.EXP	65.80	486.37	ops/ms	7.39
Float64Vector.EXPM1	34.61	85.99	ops/ms	2.48
Float64Vector.HYPOT	50.40	147.82	ops/ms	2.93
Float64Vector.LOG	51.93	163.25	ops/ms	3.14
Float64Vector.LOG10	49.53	147.98	ops/ms	2.99
Float64Vector.LOG1P	19.20	206.81	ops/ms	10.77
Float64Vector.SIN	44.41	382.09	ops/ms	8.60
Float64Vector.SINH	28.20	90.68	ops/ms	3.22
Float64Vector.TAN	36.29	160.89	ops/ms	4.43
Float64Vector.TANH	47.65	214.04	ops/ms	4.49

Comments
Hi Sandhya , thanks for confirming the minimum requirement of binutils 2.25 . Should we have a test in the configure step for this ? (My colleagues first blamed the gcc 7 when seeing the build error). I created https://bugs.openjdk.java.net/browse/JDK-8269031 for the binutils check .
18-06-2021

Yes this is due to binutils. Minimum required would be binutils 2.25 where AVX512 support was added for Skylake.
15-06-2021

On an old Ubuntu Linux with binutils 2.24 and gcc-7.3 we noticed this build error : * For target support_native_jdk.incubator.vector_libsvml_svml_d_acos_linux_x86.o: svml_d_acos.c: Assembler messages: svml_d_acos.c:1137: Error: operand type mismatch for `vorpd' svml_d_acos.c:1138: Error: operand type mismatch for `vandpd' svml_d_acos.c:1149: Error: operand type mismatch for `vxorpd' svml_d_acos.c:1155: Error: operand type mismatch for `vorpd' svml_d_acos.c:1166: Error: operand type mismatch for `vxorpd' svml_d_acos.c:1191: Error: operand type mismatch for `vorpd' svml_d_acos.c:1196: Error: operand type mismatch for `vxorpd' svml_d_acos.c:1199: Error: operand type mismatch for `vorpd' svml_d_acos.c:1201: Error: operand type mismatch for `vxorpd' svml_d_acos.c:1205: Error: operand type mismatch for `vxorpd' No issue has been seen however with gcc-7/gcc-8 and later binutils (e.g. 2.29). Is there some minimum binutils requirement for this to compile ? If so, should we check for the binutils version in configure (or is it something else and not related to binutils) ?
15-06-2021

Changeset: 9f05c411 Author: Sandhya Viswanathan <sviswanathan@openjdk.org> Date: 2021-06-03 20:03:36 +0000 URL: https://git.openjdk.java.net/jdk/commit/9f05c411e6d6bdf612cf0cf8b9fe4ca9ecde50d1
03-06-2021