United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6396979 Performance slide in cipher micro-benchmark.
JDK-6396979 : Performance slide in cipher micro-benchmark.

Details
Type:
Bug
Submit Date:
2006-03-10
Status:
Resolved
Updated Date:
2010-04-03
Project Name:
JDK
Resolved Date:
2006-04-19
Component:
hotspot
OS:
linux
Sub-Component:
compiler
CPU:
x86
Priority:
P2
Resolution:
Fixed
Affected Versions:
6
Fixed Versions:

Related Reports
Backport:
Duplicate:

Sub Tasks

Description
FULL PRODUCT VERSION :
java version "1.6.0-beta2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-beta2-b72)
Java HotSpot(TM) Client VM (build 1.6.0-beta2-b72, mixed mode)


ADDITIONAL OS VERSION INFORMATION :
Linux cehost 2.6.15-1.1831_FC4 #1 Tue Feb 7 13:37:42 EST 2006 i686 i686 i386 GNU/Linux

EXTRA RELEVANT SYSTEM CONFIGURATION :
Pentium4 Northwood, 2.6ghz

A DESCRIPTION OF THE PROBLEM :
I use some encryption code in my application and just for interest I thought benchmarking it would be quite interesting. The code was ported from C (not by me) and performs quite poor on hotspot.
Please don't comment on code-quality, it is just an example where code generated by hotspot does not perform as well as it could.

Here are some numbers for the original version:
IBM 1.4.2:                             2.29874488529263mb/s
Hotspot 1.3.1 Server:         1.6164489848700376mb/s
Hotspot 6.0b72 Server:     1.5088873464707124mb/s

Numbers for the tuned Version:
Hotspot 1.3.1 Server:       2.9244896765514414mb/s
Hotspot 6.0b72 Server:   2.7785495971103082mb/s
IBM 1.4.2:                           2.7506312220847015mb/s

Both times the Hotspot 1.3.1 Server  compiler produces faster code, thats why I marked it as regression.
The main problem seems to be the access to member-variables inside of code(), IBM can optimize this quite well whereas Hotspot has some troubles with the code.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the sample-code attached with the server compiler.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Mustang's c2 should at least be faster than 1.3.1's hotspot server compiler.
ACTUAL -
6.0b72 is always slower than 1.3.1, IBM seems to handle member-field access much better on P4.

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
1.) Not tuned version:
-----------------------------------------------------------------------
/* PORTED TO JAVA BY ROBERT NEILD November 1999 */

/* PC1 Cipher Algorithm ( Pukall Cipher 1 ) */
/* By Alexander PUKALL 1991 */
/* free code no restriction to use */
/* please include the name of the Author in the final software */
/* the Key is 128 bits */

/* Only the K zone change in the two routines */
/* You can create a single routine with the two parts in it */

public class PC1_Stream
{
	int ax, bx, cx, dx, si, tmp, x1a2, res, i, inter, cfc, cfd, compte;
	int x1a0[] = new int[8];
	byte cle[] = new byte[17]; // Hold key
	private boolean usedAsEncrypter = false;
	private boolean alreadyUsed = false;

	/**
	 * Creates a PC1_InputStream. Decodes an input stream of encoded data.
	 * @param in  The input stream to decode.
	 * @param password 16 byte encryption key
	 */

	public PC1_Stream(byte[] password)
	{
		System.arraycopy(password, 0, cle, 0, Math.min(16, password.length));
	}

	private void assemble()
	{
		x1a0[0] = ((cle[0] * 256) + cle[1]);

		code();
		inter = res;

		x1a0[1] = (x1a0[0] ^ ((cle[2] * 256) + cle[3]));
		code();
		inter = (inter ^ res);

		x1a0[2] = (x1a0[1] ^ ((cle[4] * 256) + cle[5]));
		code();
		inter = (inter ^ res);

		x1a0[3] = (x1a0[2] ^ ((cle[6] * 256) + cle[7]));
		code();
		inter = (inter ^ res);

		x1a0[4] = (x1a0[3] ^ ((cle[8] * 256) + cle[9]));
		code();
		inter = (inter ^ res);

		x1a0[5] = (x1a0[4] ^ ((cle[10] * 256) + cle[11]));
		code();
		inter = (inter ^ res);

		x1a0[6] = (x1a0[5] ^ ((cle[12] * 256) + cle[13]));
		code();
		inter = (inter ^ res);

		x1a0[7] = (x1a0[6] ^ ((cle[14] * 256) + cle[15]));
		code();
		inter = (inter ^ res);

		i = 0;
	}

	void code()
	{
		dx = (x1a2 + i);
		ax = x1a0[i];

		cx = 0x015a;
		bx = 0x4e35;

		tmp = ax;
		ax = si;
		si = tmp;

		tmp = ax;
		ax = dx;
		dx = tmp;

		if (ax != 0)
		{
			ax = (ax * bx);
		}

		tmp = ax;
		ax = cx;
		cx = tmp;

		if (ax != 0)
		{
			ax = (ax * si);
			cx = (ax + cx);
		}

		tmp = ax;
		ax = si;
		si = tmp;
		ax = (ax * bx);
		dx = (cx + dx);

		ax = (ax + 1);

		x1a2 = dx;
		x1a0[i] = ax;

		res = (ax ^ dx);
		i = (i + 1);
	}

	/**
	 * Returns a plain byte, which has been unencrypted from the underlying
	 * InputStream.
	 * @see java.io.FilterInputStream
	 */

	public byte[] decrypt(byte[] encryptedData)
	{
		checkUsage(false);

		for (int i = 0; i < encryptedData.length; i++)
		{
			int c = encryptedData[i];

			assemble();
			cfc = (inter >> 8);
			cfd = (inter & 255);

			c = c ^ (cfc ^ cfd);

			for (compte = 0; compte <= 15; compte++)
			{
				/* we mix the plaintext byte with the key */
				cle[compte] = (byte) (cle[compte] ^ c);
			}

			encryptedData[i] = (byte) c;
		}

		return encryptedData;
	}

	public byte[] encrypt(byte[] data)
	{
		checkUsage(true);

		for (int i = 0; i < data.length; i++)
		{
			int c = data[i];

			assemble();
			cfc = (inter >> 8);
			cfd = (inter & 255);

			for (compte = 0; compte <= 15; compte++)
			{
				/* we mix the plaintext byte with the key */
				cle[compte] = (byte) (cle[compte] ^ c);
			}

			c = c ^ (cfc ^ cfd);
			data[i] = (byte) c;
		}

		return data;
	}

	private void checkUsage(boolean isEncrypter)
	{
		if (alreadyUsed)
		{
			if (usedAsEncrypter != isEncrypter)
			{
				throw new IllegalArgumentException("You may either use this class as encrypter or decrypter, not both!");
			}
		} else
		{
			alreadyUsed = true;
			usedAsEncrypter = isEncrypter;
		}
	}

	public static void main(String[] args)
	{
		byte[] testData = new byte[1024 * 1024];
		for (int i = 0; i < testData.length; i++)
		{
			testData[i] = (byte) (i % 127);
		}

		PC1_Stream dec = new PC1_Stream(testData);
		PC1_Stream enc = new PC1_Stream(testData);

		for (int i = 0; i < 5; i++)
		{
			enc.encrypt(testData);
			dec.decrypt(testData);
		}

		for (int m = 0; m < 10; m++)
		{
			int encCount = 50;
			System.out.println("Starte Verschl????sselung");
			long start = System.currentTimeMillis();
			for (int i = 0; i < encCount; i++)
			{
				enc.encrypt(testData);
				dec.decrypt(testData);
			}
			long end = System.currentTimeMillis();
			long duration = end - start;
			System.out.println("Encryption took: " + duration + " with " + ((double) encCount / ((double) duration / (double) 1000)) + "mb/s ");
		}
	}
}
------------------------------------------------------------------------------

2.) Slightly tuned version:
-----------------------------------------------------------------------------
public class PC1_Stream_tuned
{
	int si, x1a2;
	int x1a0[] = new int[8];
	byte cle[] = new byte[17]; // Hold key
	private boolean usedAsEncrypter = false;
	private boolean alreadyUsed = false;

	/**
	 * Creates a PC1_InputStream. Decodes an input stream of encoded data.
	 * @param in  The input stream to decode.
	 * @param password 16 byte encryption key
	 */

	public PC1_Stream_tuned(byte[] password)
	{
		System.arraycopy(password, 0, cle, 0, Math.min(16, password.length));
	}

	private int assemble()
	{
		x1a0[0] = ((cle[0] * 256) + cle[1]);

		int inter = code(0);

		x1a0[1] = (x1a0[0] ^ ((cle[2] * 256) + cle[3]));
		inter = inter ^ code(1);

		x1a0[2] = (x1a0[1] ^ ((cle[4] * 256) + cle[5]));
		inter = inter ^ code(2);

		x1a0[3] = (x1a0[2] ^ ((cle[6] * 256) + cle[7]));
		inter = inter ^ code(3);

		x1a0[4] = (x1a0[3] ^ ((cle[8] * 256) + cle[9]));
		inter = inter ^ code(4);

		x1a0[5] = (x1a0[4] ^ ((cle[10] * 256) + cle[11]));
		inter = inter ^ code(5);

		x1a0[6] = (x1a0[5] ^ ((cle[12] * 256) + cle[13]));
		inter = inter ^ code(6);

		x1a0[7] = (x1a0[6] ^ ((cle[14] * 256) + cle[15]));
		inter = inter ^ code(7);
		
		return inter;
	}

	protected int code(final int i)
	{
		int ax = x1a0[i];
		int dx = (x1a2 + i);
		int cx = 0x015a;
		int tmp;
		final int bx = 0x4e35;
	
		tmp = ax;
		ax = si;
		si = tmp;

		tmp = ax;
		ax = dx;
		dx = tmp;

		if (ax != 0)
		{
			ax = (ax * bx);
		}

		tmp = ax;
		ax = cx;
		cx = tmp;

		if (ax != 0)
		{
			ax = (ax * si);
			cx = (ax + cx);
		}

		tmp = ax;
		ax = si;
		si = tmp;
		
		ax = (ax * bx);
		dx = (cx + dx);
		ax++;

		x1a2 = dx;
		x1a0[i] = ax;

		return  (ax ^ dx);
	}

	/**
	 * Returns a plain byte, which has been unencrypted from the underlying
	 * InputStream.
	 * @see java.io.FilterInputStream
	 */

	public byte[] decrypt(byte[] encryptedData)
	{
		checkUsage(false);

		for (int i = 0; i < encryptedData.length; i++)
		{
			int c = encryptedData[i];

			int inter = assemble();
			int cfc = (inter >> 8);
			int cfd = (inter & 255);

			c = c ^ (cfc ^ cfd);

			for (int compte = 0; compte <= 15; compte++)
			{
				/* we mix the plaintext byte with the key */
				cle[compte] = (byte) (cle[compte] ^ c);
			}

			encryptedData[i] = (byte) c;
		}

		return encryptedData;
	}

	public byte[] encrypt(byte[] data)
	{
		checkUsage(true);

		for (int i = 0; i < data.length; i++)
		{
			int c = data[i];

			int inter = assemble();
			int cfc = (inter >> 8);
			int cfd = (inter & 255);

			for (int compte = 0; compte <= 15; compte++)
			{
				/* we mix the plaintext byte with the key */
				cle[compte] = (byte) (cle[compte] ^ c);
			}

			c = c ^ (cfc ^ cfd);
			data[i] = (byte) c;
		}

		return data;
	}

	private void checkUsage(boolean isEncrypter)
	{
		if (alreadyUsed)
		{
			if (usedAsEncrypter != isEncrypter)
			{
				throw new IllegalArgumentException("You may either use this class as encrypter or decrypter, not both!");
			}
		} else
		{
			alreadyUsed = true;
			usedAsEncrypter = isEncrypter;
		}
	}

	public static void main(String[] args)
	{
		byte[] testData = new byte[1024 * 1024];
		for (int i = 0; i < testData.length; i++)
		{
			testData[i] = (byte) (i % 127);
		}

		PC1_Stream_tuned dec = new PC1_Stream_tuned(testData);
		PC1_Stream_tuned enc = new PC1_Stream_tuned(testData);

		for (int i = 0; i < 5; i++)
		{
			enc.encrypt(testData);
			dec.decrypt(testData);
		}

		for (int m = 0; m < 10; m++)
		{
			int encCount = 50;
			System.out.println("Starte Verschl????sselung");
			long start = System.currentTimeMillis();
			for (int i = 0; i < encCount; i++)
			{
				enc.encrypt(testData);
				dec.decrypt(testData);
			}
			long end = System.currentTimeMillis();
			long duration = end - start;
			System.out.println("Encryption took: " + duration + " with " + ((double) encCount / ((double) duration / (double) 1000)) + "mb/s ");
		}
	}
}


---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
use IBM Jvm, try to not access member-variables too often.

Release Regression From : 1.3.1
The above release value was the last known release where this 
bug was known to work. Since then there has been a regression.

                                    

Comments
EVALUATION

This micro-benchmark performance regression is difficult
to trace back. Most likely it is due to icache or dcache conflicts.
Part of the regression came from safepoint polling.
But it was noticed that we generate back-to-back stores to the same address
in this test case. Only this problem was fixed but it gave good results.
                                     
2006-04-05
SUGGESTED FIX

Webrev:                 http://analemma.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2006/20060405104120.kvn.6396979/workspace/webrevs/webrev-2006.04.05/index.html

Put store's uses on IGVN worklist when one of uses is removed to
enable folding optimization for store/store and store/load to the same address.
                                     
2006-04-06



Hardware and Software, Engineered to Work Together