JDK-8028623 : SA: hash codes in SymbolTable mismatching java_lang_String::hash_code for extended characters.
  • Type: Bug
  • Component: core-svc
  • Sub-Component: tools
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: generic
  • CPU: generic
  • Submitted: 2013-11-19
  • Updated: 2015-02-23
  • Resolved: 2014-01-20
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 7 JDK 8 JDK 9
7u72Fixed 8u20Fixed 9 b03Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
For some Strings containing extended characters, the SA fails to find the String in its SymbolTable as the hash code it calculates does not match that which is stored in the table (which came from 
java_lang_String::hash_code).

This manifests as jmap -F failing to dump a heap with:

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:491)
	at sun.tools.jmap.JMap.runTool(JMap.java:197)
	at sun.tools.jmap.JMap.main(JMap.java:128)
Caused by: java.lang.NullPointerException
	at sun.jvm.hotspot.utilities.HeapHprofBinWriter.writeSymbolID(HeapHprofBinWriter.java:995)
	at sun.jvm.hotspot.utilities.HeapHprofBinWriter.writeFieldDescriptors(HeapHprofBinWriter.java:828)
	at sun.jvm.hotspot.utilities.HeapHprofBinWriter.writeClassDumpRecord(HeapHprofBinWriter.java:587)
	at sun.jvm.hotspot.utilities.HeapHprofBinWriter.access$000(HeapHprofBinWriter.java:310)
	at sun.jvm.hotspot.utilities.HeapHprofBinWriter$1.visit(HeapHprofBinWriter.java:524)
	at sun.jvm.hotspot.memory.SystemDictionary$2.visit(SystemDictionary.java:179)
	at sun.jvm.hotspot.memory.Dictionary.classesDo(Dictionary.java:68)
	at sun.jvm.hotspot.memory.SystemDictionary.classesDo(SystemDictionary.java:190)
	at sun.jvm.hotspot.memory.SystemDictionary.allClassesDo(SystemDictionary.java:183)
	at sun.jvm.hotspot.utilities.HeapHprofBinWriter.writeClassDumpRecords(HeapHprofBinWriter.java:520)
	at sun.jvm.hotspot.utilities.HeapHprofBinWriter.write(HeapHprofBinWriter.java:430)
	at sun.jvm.hotspot.tools.HeapDumper.run(HeapDumper.java:62)
	at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
	at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
	at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
	at sun.jvm.hotspot.tools.HeapDumper.main(HeapDumper.java:83)
	... 6 more
Comments
Suggested fix, will get out for review: http://cr.openjdk.java.net/~kevinw/8028623/webrev.00/
2013-12-18

Testing a Java app with a one-character field identifier: �� On using jmap -F to dump the heap, the above error is seen. writeFieldDescriptors fails to find a Symbol, as SymbolTable.probe fails to find an entry with a hash = 0x1820 The actual Symbol table contains: �� with hash = 0xfffff820 The SA's Hashtable and javaClasses.hpp are getting different values for the same String. SA Hashtable.java does: protected static long hashSymbol(byte[] buf) { long h = 0; int s = 0; int len = buf.length; while (len-- > 0) { h = 31*h + (0xFFL & buf[s]); s++; } return h & 0xFFFFFFFFL; } The algorithm in the string class: template <typename T> static unsigned int hash_code(T* s, int len) { unsigned int h = 0; while (len-- > 0) { h = 31*h + (unsigned int) *s; s++; } return h; } When the native hash_code does: (unsigned int) *s; The Java/SA implementation does: (0xFFL & buf[s]) This problematic char is 0xc3 0x83 in utf8. As unsigned int they are treated as: 0xffffffc3. 0xffffff83. Either side could be "corrected", least impact/risk would likely be the SA side. I'm currently getting good, matching results by having the SA Hashtable do: while (len-- > 0) { h = 31*h + ( 0xFFFFFFFFL & buf[s]); s++; } return h & 0xFFFFFFFFL; }
2013-11-19