JDK-8151751 : Stack trace population code can (re-)use cached Strings
  • Type: Enhancement
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 9
  • Priority: P3
  • Status: Closed
  • Resolution: Other
  • Submitted: 2016-03-11
  • Updated: 2019-05-02
  • Resolved: 2019-01-15
Related Reports
Relates :  
Relates :  
Relates :  
Sub Tasks
JDK-8216302 :  
JDK-8216307 :  
JDK-8216308 :  
JDK-8216309 :  
Description
JDK-8150778 switched to StackWalker API for populating stacktrace elements in Throwable. We may optimize the VM parts even further:
 http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-March/039280.html

Profiling shows we spent lots of time StringTable::intern-ing to convert Symbols to Strings:

                6.600   (60%)    org.openjdk.StackTraceBench.doDepth(int)
                +-  3.920   (36%)    java.lang.Throwable.getStackTrace()
                  +-  3.920   (36%)    java.lang.Throwable.getStackTraceElements(java.lang.StackTraceElement[])
                    +-  3.920   (36%)    JVM_GetStackTraceElements
                      +-  3.910   (36%)    java_lang_Throwable::get_stack_trace_elements(Handle,objArrayHandle,Thread*)
                      |  +-  3.910   (36%)    java_lang_StackTraceElement::fill_in(Handle,InstanceKlass*,const methodHandle&,int,int,int,Thread*)
                      |    +-  1.890   (17%)    StringTable::intern(const char*,Thread*)
                      |    |  +-  0.480   (4%)    StringTable::lookup_shared(unsigned short*,int)
                      |    |  +-  0.350   (3%)    UTF8::unicode_length(const char*,bool&,bool&)
                      |    |  +-  0.320   (3%)    java_lang_String::equals(oopDesc*,unsigned short*,int)
                      |    |  +-  0.300   (3%)    StringTable::hash_string<unsigned short>(const unsigned short*,int)
                      |    |  +-  0.260   (2%)    UTF8::convert_to_unicode<unsigned short>(const char*,unsigned short*,int)
                      |    |  +-  0.140   (1%)    resource_allocate_bytes(unsigned long,AllocFailStrategy::AllocFailEnum)
                      |    |  +-  0.040   (0%)    G1SATBCardTableModRefBS::enqueue(oopDesc*)
                      |    +-  1.260   (12%)    StringTable::intern(Symbol*,Thread*)
                      |    |  +-  0.490   (4%)    java_lang_String::equals(oopDesc*,unsigned short*,int)
                      |    |  +-  0.360   (3%)    StringTable::lookup_shared(unsigned short*,int)
                      |    |  +-  0.300   (3%)    StringTable::hash_string<unsigned short>(const unsigned short*,int)
                      |    |  +-  0.080   (1%)    Symbol::as_unicode(int&)const
                      |    |  +-  0.030   (0%)    G1SATBCardTableModRefBS::enqueue(oopDesc*)
                      |    +-  0.500   (5%)    Klass::external_name()const
                      |    |  +-  0.380   (3%)    Symbol::as_klass_external_name()const
                      |    |  +-  0.120   (1%)    strlen
                      |    +-  0.250   (2%)    Backtrace::get_line_number(const methodHandle&,int)
                      |    |  +-  0.250   (2%)    Method::line_number_from_bci(int)const
                      |    +-  0.010   (0%)    oopDesc::obj_field_put(int,oopDesc*)
                      +-  0.010   (0%)    HandleMark::pop_and_restore()

At least class name may be cached in Class.name field in Java, e.g.:
 http://cr.openjdk.java.net/~shade/8151751/poc-cache-Class.name.patch

Method name and source file name might piggyback on ConstantPool constant string resolution?

The performance improvements are very promising, up to 3-4x performance improvement across stack trace depths, see the benchmark and results here:
  http://cr.openjdk.java.net/~shade/8151751/StackTraceBench.java
  (runnable JAR: http://cr.openjdk.java.net/~shade/8151751/benchmarks.jar)

Comments
Parent task is closed until future need arises.
15-01-2019

It seems two first subtasks have fixed the performance regression since 8, and got 2x improvement on top. The other two subtasks do not seem to worth it at this point. Therefore, I'd say this is complete, at least for a time being. We can restart other two subtasks if future need arises. Benchmark (depth) Mode Cnt Score Error Units # 8u191 StackTraceBench.test 1 avgt 15 10.857 �� 0.041 us/op StackTraceBench.test 10 avgt 15 15.356 �� 0.086 us/op StackTraceBench.test 100 avgt 15 60.160 �� 0.645 us/op StackTraceBench.test 1000 avgt 15 530.555 �� 4.319 us/op # 11.0.1 StackTraceBench.test 1 avgt 15 19.372 �� 0.215 us/op StackTraceBench.test 10 avgt 15 25.771 �� 0.155 us/op StackTraceBench.test 100 avgt 15 90.448 �� 0.413 us/op StackTraceBench.test 1000 avgt 15 754.071 �� 3.623 us/op # jdk/jdk (before patches) StackTraceBench.test 1 avgt 15 23.835 �� 0.188 us/op StackTraceBench.test 10 avgt 15 33.204 �� 0.191 us/op StackTraceBench.test 100 avgt 15 125.195 �� 0.694 us/op StackTraceBench.test 1000 avgt 15 1051.047 �� 9.779 us/op # jdk/jdk + JDK-8216302 + JDK-8216308 StackTraceBench.test 1 avgt 15 8.803 �� 0.034 us/op StackTraceBench.test 10 avgt 15 12.005 �� 0.067 us/op StackTraceBench.test 100 avgt 15 43.472 �� 0.427 us/op StackTraceBench.test 1000 avgt 15 356.592 �� 2.801 us/op # jdk/jdk + JDK-8216302 + JDK-8216308 + JDK-8216049 StackTraceBench.test 1 avgt 15 7.037 �� 0.053 us/op StackTraceBench.test 10 avgt 15 9.145 �� 0.082 us/op StackTraceBench.test 100 avgt 15 30.239 �� 0.240 us/op StackTraceBench.test 1000 avgt 15 239.322 �� 2.426 us/op
14-01-2019

Come to think about it, I am going to split this thing in three patches: *) Class {name, source_file} *) Module {name, version} *) Method {name} Class name and source_file seems to be the least controversial. It seems to me Module version and name are already available in java.lang.Module::version and java.lang.Module::descriptor -> java.lang.module.ModuleDescriptor.Version::version. Accessing those would require drilling a few new holes in VM symbols. Method name would require some more tinkering, and maybe we would not cache that anywhere.
07-01-2019

[~shade] great, send me a preview and I'll help it along.
07-01-2019

I am reassigning this to myself, while I am working for a better patch.
07-01-2019

--- a/src/hotspot/share/classfile/moduleEntry.hpp Wed Jan 02 13:37:55 2019 -0500 +++ b/src/hotspot/share/classfile/moduleEntry.hpp Thu Jan 03 00:47:04 2019 +0100 @@ -66,10 +66,12 @@ OopHandle _pd; // java.security.ProtectionDomain, cached // for shared classes from this module ClassLoaderData* _loader_data; GrowableArray<ModuleEntry*>* _reads; // list of modules that are readable by this module Symbol* _version; // module version number + jobject _string_version; + jobject _string_name; Like I said before, you could cache the name and version oop into the module oop, which this moduleEntry points to. Then you don't have to deallocate the jobject somewhere and we don't want to have Metadata pointing to jobjects or oops either.
03-01-2019

+ // FIXME: Hack! We better avoid having these fields in method, due to increased footprint + jobject _name; + jobject name() const { return _name; } + void set_name(jobject name) { _name = name; } + Yes, plus you leak the jobject when unloading occurs. What is the performance if you just cache the class name?
03-01-2019

Was following up on StackOverflow regression report here: https://stackoverflow.com/q/54008778/2613885 -- it seems to be caused by what we observed here. Rebased the "cache all" proof-of-concept to jdk/jdk, and verified the performance regression goes away (and performance even improves) after this dirty patch: http://cr.openjdk.java.net/~shade/8151751/poc-cache-all-jdk13.patch I had to run with -Xshare:off, because new patch does not like CDS for some reason.
02-01-2019

We shouldn't have any more jobject fields in metadata. The module oop should have a version string that you can get to if you need it to be an oop.
25-08-2017

Instead of adding a new version related field to the ModuleEntry*, as proposed above: Symbol* _version; // module version number + jobject _string_version; We could possibly just replace the "Symbol* version;" with "jobject _string_version;" ? The _version field is primarily used to generate a C_String for logging, stack traces, etc. (Although JFR also uses it.)
17-03-2017

Okay, even without caching the method, we have very good improvements: http://cr.openjdk.java.net/~shade/8151751/poc-cache-allButMethod.patch http://cr.openjdk.java.net/~shade/8151751/StackTraceBench.java Looking deeper at what can be done to improve Method's Symbol* -> String conversion, it would seem the conversion itself is not very efficient, see JDK-8153849.
08-04-2016

I like the injecting source_file_name in java_lang_Class but you can't add a jobject field into constMethod because it'll increase footprint a lot. Caching the name from the java_lang_ref_Module makes a lot of sense though.
07-04-2016

Update proof-of-concept, that caches everything, including new module data: http://cr.openjdk.java.net/~shade/8151751/poc-cache-all.patch This keeps improving performance significantly: http://cr.openjdk.java.net/~shade/8151751/StackTraceBench.java The potential drawback is that it adds a field in Method, Class, and ModuleEntry, which will have some impact on footprint. Rule of thumb: the entire JDK has around 2^15 classes and 2^18 methods, which makes the very stressful scenario of resolving all the methods in JDK (e.g. CompileTheWorld) take +2M of footprint. sizeof(Method) + sizeOf(ConstMethod) is about 88 bytes in current baseline, which makes method metadata to consume *at least* (not counting anything else references from *Method) 23M in the same scenario. I am not sure trying to cut this down has significant impact on footprint in realistic cases. Nevertheless, I tried to piggyback on ConstantPool's resolved_references mechanics for keeping cached JVMConstant_UTF8 -> java.lang.String resuly -- that is doable. But it linearly searches through the _resolved_references looking for a match -- which has unpredictable impact on performance in classes with large constant pools.
06-04-2016