JDK-8158168 : Missing bounds checks for some String intrinsics
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.lang
  • Affected Version: 9
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • Submitted: 2016-05-30
  • Updated: 2022-08-05
  • Resolved: 2017-04-12
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 10 JDK 9
10Fixed 9 b166Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
#  SIGSEGV (0xb) at pc=0x0000007fa4726ec0, pid=3211, tid=3218
#
# JRE version: Java(TM) SE Runtime Environment (9.0) (build 9-internal+0-2016-05-26-211740.amurillo.jdk9-hs-2016-05-26-snapshot)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (9-internal+0-2016-05-26-211740.amurillo.jdk9-hs-2016-05-26-snapshot, compiled mode, tiered, compressed oops, g1 gc, linux-aarch64)
# Problematic frame:
# V  [libjvm.so+0x3d2ec0]  CollectedHeap::fill_with_objects(HeapWord*, unsigned long, bool)+0xa8
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %P" (or dumping to /export/local/aurora/sandbox/results/ResultDir/stressHierarchy009/core.3211)


---------------  S U M M A R Y ------------

Command Line: -XX:MaxMetaspaceSize=450m -Xss10m -Xbootclasspath/a:/export/local/aurora/CommonData/vm/lib/wb.jar -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -Xcomp -XX:+CreateCoredumpOnCrash -XX:+IgnoreUnrecognizedVMOptions -XX:ReservedCodeCacheSize=256M -Xcomp metaspace.stressHierarchy.common.StressHierarchy3 -treeDepth 70 -minLevelSize 10 -maxLevelSize 100 -hierarchyType INTERFACES -triggerUnloadingByFillingMetaspace

Host: AArch64 Processor rev 0 (aarch64), 8 cores, 15G, Ubuntu 14.04.3 LTS
Time: Fri May 27 10:15:09 2016 PDT elapsed time: 26 seconds (0d 0h 0m 26s)

---------------  T H R E A D  ---------------

Current thread (0x0000007f9c029000):  GCTaskThread "GC Thread#0" [stack: 0x0000007fa344a000,0x0000007fa354a000] [id=3218]

Stack: [0x0000007fa344a000,0x0000007fa354a000],  sp=0x0000007fa3548740,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x3d2ec0]  CollectedHeap::fill_with_objects(HeapWord*, unsigned long, bool)+0xa8
V  [libjvm.so+0x4dc5b8]  RemoveSelfForwardPtrHRClosure::doHeapRegion(HeapRegion*)+0x190
V  [libjvm.so+0x4c2a54]  G1CollectedHeap::collection_set_iterate_from(HeapRegion*, HeapRegionClosure*)+0x3c
V  [libjvm.so+0x4dc26c]  G1ParRemoveSelfForwardPtrsTask::work(unsigned int)+0x9c
V  [libjvm.so+0x9db9ac]  GangWorker::loop()+0x50
V  [libjvm.so+0x9db888]  AbstractGangWorker::run()+0x2c
V  [libjvm.so+0x825fe4]  thread_native_entry(Thread*)+0x104
C  [libpthread.so.0+0x7e2c]  start_thread+0xb0


siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x00000006d4c3093c
Comments
URL: http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/ede22275fbfa User: lana Date: 2017-04-19 21:02:35 +0000
19-04-2017

URL: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/4d6df9a75465 User: lana Date: 2017-04-19 21:02:23 +0000
19-04-2017

URL: http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/4d6df9a75465 User: dlong Date: 2017-04-12 21:47:55 +0000
12-04-2017

URL: http://hg.openjdk.java.net/jdk9/dev/jdk/rev/ede22275fbfa User: dlong Date: 2017-04-12 21:47:55 +0000
12-04-2017

This is a crash in JDK 9, should be fixed in 9.
06-04-2017

I changed the JBS category. [~alanb], please take a look.
31-03-2017

I am fine with Hotspot changes but Hotspot changes are just one check and tests only. The fix is mostly in core libs. I think it should be approved by Code Libs Lead - Alan Bateman. May be we need to change JBS category. Also you need SQE approval.
31-03-2017

Fix Request http://cr.openjdk.java.net/~dlong/8158168/webrev.2/ Fix needed to prevent heap corruption. Fix refactors code to place bounds checks on all paths to getChar/putChar intrinsics, and adds a test for those methods in StringUTF16. Tests run: Test coverage: hs-tier0-comp, jdk-tier1, jdk-tier2, hotspot/test/:hotspot_all,noncolo.testlist,vm.compiler.testlist,vm.regression.testlist,nsk.regression.testlist,nsk.split_verifier.testlist,nsk.stress.testlist,nsk.stress.jck.testlist,jdk/test/:jdk_jfr,jdk/test/:svc_tools,jdk/test/:jdk_instrument,jdk/test/:jdk_lang,jdk/test/:jdk_svc,Kitchensink,runThese,Weblogic12medrec, jck/api/java_lang,jck/lang/EXPR,jck/api/signaturetest Reviewed by vlivanov, thartmann, sherman. Review thread: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-March/025858.html
30-03-2017

incomplete or incorrect in the face of concurrent updates? I read it as the latter based on what is described above - in which case the intrinsic can not trust the caller to do the check and the whole arrangement is fundamentally broken. Seems AbstractStringBuilder can not use the intrinsic.
21-03-2017

So the problem is the missing bounds check in the intrinsic? Unsynchronized access should never lead to a crash. How does CompactStrings avoid the problem?
21-03-2017

The reason this happens on arm64 is because it doesn't implement compact strings. I would expect the same problem on other platforms if -XX:-CompactStrings is used.
10-02-2017

I've made some progress. I can reproduce the failure about 33% of the time now, with a fastdebug build, and with instrumentation to check for a bad array at arbitrary points in the Java code.
09-02-2017

Yes. The objects that are referenced are in the collection set, i.e. eden or some old regions (before gc, previous gc's survivors are relabelled as eden). Eden regions are only written by the mutator, so the safepointing provides memory consistency. G1 also does not write into any old region in the collection set, so the object's values must have been written during the last gc too. The GC also does not overwrite or reuse the length field in these areas due to some hacks (however the length field of the destination object for large j.l.O. arrays may not always contain the true length of the array). That one is synchronized across threads by the implicit memory barriers in the work steal protocol.
08-02-2017

Is it safe for G1ParScanClosure::do_oop_nv to examine the size of an object for the is_in_cset path where we push it to the queue?
08-02-2017

Does not reproduce after 1984 iterations, so reassigning. If you think I can help further in some way, tell me.
08-02-2017

[~tschatzl] If it doesn't reproduce with -Xint, go ahead and assign it back to me.
08-02-2017

I got a crash (with fastdebug) where the bad byte[] length was 0x00320044. The actual length appeared to be 74. The funny thing is, 0x00320044 divides evenly by 74. Coincidence?
07-02-2017

Isn't the pattern with the block size always 0x00xx00yy? And you are seeing different numbers after 11 seconds than I did after 30 seconds, so I wonder if these values are changine (increasing?) with time.
06-02-2017

This test has 30 threads doing the following: V [libjvm.so+0x945d10] TypeArrayKlass::copy_array(arrayOopDesc*, int, arrayOopDesc*, int, int, Thread*)+0x244 V [libjvm.so+0x619c48] JVM_ArrayCopy+0x9c J 35 java.lang.System.arraycopy(Ljava/lang/Object;ILjava/lang/Object;II)V java.base@9-internal (0 bytes) @ 0x000003ff8b58b734 [0x000003ff8b58b680+0x00000000000000b4] v blob 0x000003ff8b522938 j java.lang.String.getBytes([BIB)V+22 java.base@9-internal j java.lang.AbstractStringBuilder.putStringAt(ILjava/lang/String;)V+25 java.base@9-internal j java.lang.AbstractStringBuilder.append(Ljava/lang/String;)Ljava/lang/AbstractStringBuilder;+30 java.base@9-internal j java.lang.StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;+2 java.base@9-internal j nsk.share.gc.gp.classload.GeneratedClassProducer.getNewName()Ljava/lang/String;+22 j nsk.share.gc.gp.classload.GeneratedClassProducer.create(J)Ljava/lang/Class;+65 j vm.share.gc.TriggerUnloadingByFillingMetaspace$FillMetaspace$FillMetaspaceTask.call()Ljava/lang/Object;+35 j java.util.concurrent.FutureTask.run()V+39 java.base@9-internal j java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+95 java.base@9-internal j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@9-internal j java.lang.Thread.run()V+11 java.base@9-internal The StringBuilder is used to generated class names, and is not synchronized, but is shared between all threads. This results in lots of weird class names, and also plenty of ClassNotFoundException, ArrayIndexOutOfBoundsException, StringIndexOutOfBoundsException exceptions. These are the same class names that are showing up in the byte[] with the bad length.
03-02-2017

Nevermind, it fails if "end" is at the exact end of the region.
02-02-2017

Is this check safe? Because I hit this assert even on x64 (unfortunately I didn't get a core file): HeapWord* end = obj_addr + obj_size1; if (_hr->is_young()) { guarantee(_hr->is_in(end), "BEFORE, object is out of region(" PTR_FORMAT ")", p2i(end)); }
02-02-2017

[~tschatzl] "1 failure on AMD out of 800+ runs." comment and earlier.
01-02-2017

@[~dlong]: in your investigation, which of your comments relate to successful reproduction without [~sangheki]'s patch? As mentioned above, that patch causes crashes itself...
31-01-2017

The current code to reproduce the issue is buggy, i.e. both all verification patches attached to this so far. Oop* with reference into to-space on the task queue might come from the fact that we don't guarantee that cards are only iterated over once. I.e. a particular oop* p might be pushed onto the task queue twice, the first one fixes up the value of *p, the second may access the object at *p (in to-space) in these verification switches with that memory not necessarily completely visible yet. (The comment that refers to that in G1ParScanThreadState::do_oop_evac() should be improved) So the actual crash issue is something else.
31-01-2017

Actually there should not be a reference from the to-space in the survivor/old gen on the task queue at all.
31-01-2017

The problem is that the klass value seems to be not visible in one thread after it has been popped from the task queue (which has been added by another thread). G1 requires this, as well as other collectors iirc. Attached a patch that - prints a message with information about a PLAB when it has been allocated - for the first object copied into the PLAB, print its source (and destination) - adds a check to verify that after popping an object from the task queue, its klass is useful Example output of the crash: guarantee(!Klass::is_null(_metadata._compressed_klass)) failed: 0x00007f84c4051000 klass of oop is null, oop is 0x00000005e7043590 klass 0 I.e. thread 0x00007f84c4051000 is going to process object at 0x00000005e7043590, but found its klass has value 0 searching the log for 0x00000005e7043590 gives: [25.338s][info][gc ] GC(6) New PLAB bottom 0x00000005e7043590 top 0x00000005e7043590 end 0x00000005e704e918 hard_end 0x00000005e704e928 size 5747 [...] [25.338s][info][gc ] GC(6) alloc new plab 0x00000005e7043590 refilled 0 [25.381s][info][gc ] GC(6) 0x00007f84c404d800 old 0x00000005e881e078 new 0x00000005e7043590 sz 21 [25.381s][info][gc ] GC(6) retire PLAB bottom 0x00000005e7043590 top 0x00000005e704e900 end 0x00000005e704e918 hard_end 0x00000005e704e928 size 5747 [...] I.e. the thread allocated a PLAB starting at 0x00000005e7043590, (thread 0x00007f84c404d800, from the next message), copied an object from 0x00000005e881e078 to 0x00000005e7043590. GDB'ing the associated core file shows that the klass field (offset 8) contains the same (non-zero) values at both the source and destination. (gdb) x/10x 0x00000005e881e078 0x5e881e078: 0xe7043593 0x00000005 0xf80000f5 0x00000093 0x5e881e088: 0xbebafeca 0x35000000 0x00070900 0x07000706 0x5e881e098: 0x01080007 0x6f530a00 (gdb) x/10x 0x00000005e7043590 0x5e7043590: 0x00000009 0x00000000 0xf80000f5 0x00000093 0x5e70435a0: 0xbebafeca 0x35000000 0x00070900 0x07000706 0x5e70435b0: 0x01080007 0x6f530a00 I.e. the compressed klass pointer 0xf80000f5.
31-01-2017

I can reproduce the crashes locally with fastdebug builds with Sangheon's patches typically after a few young-only gcs (no marking running). The bad object is always at the start of a new PLAB (I think Dean meant that when talking about TLAB; they are basically the same, just during GC). -XX:-UseTLAB does not disable PLABs. There is no way to disable PLABs. The most common issue I can reproduce is that the narrow klass is zero ("narrow klass value can never be zero") for these objects. So far they were all located in old gen.
31-01-2017

Also, with the verify_size() patch, I was seeing crashes even before we hit a VM_CollectForMetadataAllocation.
31-01-2017

The test crashes with -Xint.
27-01-2017

Some more observations: 1) I am seeing crashes where we are looking at a bad Klass* located at 0x800000000 (narrow klass base), so we must have read a narrow klass value of 0, but in the core dump the value is no longer 0. The object is in an Old region. 2) The bad array length values look suspiciously like these fields from Klass: _layout_helper = 32, _super_check_offset = 56
27-01-2017

[~sangheki] Thanks for the verify_size() patch. It's very useful for hitting the problem sooner. In my analysis, the oop map is actually correct, but the code is inlined and optimized so that gcc displays the wrong value. So we have a StringBuilder object which contains a byte[] field. That array does not appear to cross regions, because I see what appears to be forwarded objects following it. The only thing wrong with the array is its length is bad.
26-01-2017

It is okay. VM_CollectForMetadataAllocation may also start concurrent mark (actually the first thing that is tried) by executing an initial-mark young gc. That one might have an evacuation failure that needs to remove self-forwarding pointers. I.e. the code path is entirely possible.
26-01-2017

[~tschatzl] "it seems that an initial mark is triggered for this VM operation" Does that mean removing self forwarding pointers is OK during CollectForMetadataAllocation? If not, I can provide a VM Thread stack trace.
26-01-2017

If I use -XX:-UseTLAB, I still see the large [I arrays, so maybe that aren't TLABs. The crash with TLABs off had a different bad length value of 0x00390020.
25-01-2017

After reproducing this several times on xgene, I have noticed the following pattern. The first object in the heap region looks like a retired TLAB, because its Klass is [I but its contents look like smaller objects. Following the [I are one or more String and [B objects. The problem is when the [B object has a bad length. Because compressed oops are on, the length follows the 32-bit compressed klass pointer: 0x6d4532698: 0x00000001 0x6d453269c: 0x00000000 0x6d45326a0: 0x000007a8 0x6d45326a4: 0x00380020 0x6d45326a8: 0x006c0043 0x6d45326ac: 0x00730061 0x6d45326b0: 0x00300073 0x6d45326b4: 0x005f005f 0x6d45326b8: 0x005f005f The bad length is always the same (0x00380020) and looks suspiciously like '8' and ' '. Usually the bad array is immediately after the [I TLAB, but not always.
24-01-2017

[~tschatzl]: thanks for adding the information. Attached (8158168_verifier.diff) which used for the testing.
30-06-2016

[~sangheki], could you please provide the patch with the verify_size() method?
30-06-2016

Hi Rahul, can you please try to reproduce this issue (once you have time)? Thank you! Best regards, Zoltan
30-06-2016

Also Sangheon reported that some byte arrays (apparently the "character array" of the stringbuilder) were allocated such that they crossed regions. This must never happen, and this explains the values in the stack trace in Sangheon's comment about the "start address is larger than end address" in https://bugs.openjdk.java.net/browse/JDK-8158168?focusedCommentId=13952842&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13952842 . He also could not reproduce the issue with -Xint (within 1000 iterations).
30-06-2016

Compiler team, please look at this issue. The oop map has negative value and sometimes very large positive values. (from frame 11, stack trace below) (gdb) p map->_offset $1 = -1632484432 (gdb) p map->_count $2 = 127 G1ParScanThreadState::copy_to_survivor_space() { obj->iterate_backward(); // ==> obj is StringBuilder ... InstanceKlass::oop_oop_iterate_oop_map_reverse() { T* const start = (T*)obj->obj_field_addr<T>(map->offset()); T* p = start + map->count(); while (start < p) { Devirtualizer<nv>::do_oop(closure, p); // ==> p is byte array and oop::size() is ok but it doesn't fit into current region. } } The original SIGSEGV occurred as we are trying to zap dead object which has wrong(very large, negative value) size. From the original core dump, start address is larger than the end address and this results in the crash. However this is due to wrong oop map. The zap operation is started as a part of post evacuation failure. And I observed that oop map is broken when we process G1ParScanThreadState::copy_to_survivor_space(). Below stack trace is created at copy_to_survivor_space() when we have wrong oop map. Stack trace: (gdb) bt #0 0x0000007f9ed91814 in read () from /home/gtee/8158168/rt2_20/libpthread.so.0 #1 0x0000007f9e854f74 in os::message_box (title=title@entry=0x7f9ea82b20 "Unexpected Error", message=message@entry=0x7f9ebf4300 <VMError::report_and_die(int, char const*, char const*, std::__va_list, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)::buffer> "Internal Error at g1CollectedHeap.inline.hpp:323, pid=22740, tid=22742\nguarantee(is_in) failed: verify_size, -, ill, from mark, region=0x0000007f602ab580 'oop(0x00000007048d8b40, [B) + oop->size(37356"...) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/os/linux/vm/os_linux.cpp:5149 #2 0x0000007f9e857060 in os::start_debugging ( buf=buf@entry=0x7f9ebf4300 <VMError::report_and_die(int, char const*, char const*, std::__va_list, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)::buffer> "Internal Error at g1CollectedHeap.inline.hpp:323, pid=22740, tid=22742\nguarantee(is_in) failed: verify_size, -, ill, from mark, region=0x0000007f602ab580 'oop(0x00000007048d8b40, [B) + oop->size(37356"..., buflen=buflen@entry=2000) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/os/linux/vm/os_linux.cpp:6091 #3 0x0000007f9e9ed4cc in show_message_box (buflen=<optimized out>, buf=<optimized out>) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/utilities/vmError.cpp:1394 #4 VMError::report_and_die (id=id@entry=-536870912, message=message@entry=0x7f9ea48bb0 "guarantee(is_in) failed", detail_fmt=0x7f9ea48bc8 "verify_size, %s, %s, %s, region=0x%016lx 'oop(0x%016lx, %s) + oop->size(%lu)' (0x%016lx) exceeds region->end(0x%016lx, %s) age=%u", detail_args=..., thread=0x7f98029800, pc=pc@entry=0x0, siginfo=siginfo@entry=0x0, context=context@entry=0x0, filename=filename@entry=0x7f9ea48b50 "/scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1CollectedHeap.inline.hpp", lineno=lineno@entry=323, size=size@entry=0) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/utilities/vmError.cpp:1155 #5 0x0000007f9e9edc20 in VMError::report_and_die (thread=<optimized out>, filename=filename@entry=0x7f9ea48b50 "/scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1CollectedHeap.inline.hpp", lineno=lineno@entry=323, message=message@entry=0x7f9ea48bb0 "guarantee(is_in) failed", detail_fmt=<optimized out>, detail_args=<error reading variable: Cannot access memory at address 0xffffffbb>) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/utilities/vmError.cpp:1103 #6 0x0000007f9e46a8f0 in report_vm_error ( file=file@entry=0x7f9ea48b50 "/scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1CollectedHeap.inline.hpp", line=line@entry=323, error_msg=error_msg@entry=0x7f9ea48bb0 "guarantee(is_in) failed", detail_fmt=detail_fmt@entry=0x7f9ea48bc8 "verify_size, %s, %s, %s, region=0x%016lx 'oop(0x%016lx, %s) + oop->size(%lu)' (0x%016lx) exceeds region->end(0x%016lx, %s) age=%u") at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/utilities/debug.cpp:224 #7 0x0000007f9e51ff38 in verify_size (p=0x7048d8b40, this=<optimized out>) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1CollectedHeap.inline.hpp:320 #8 verify_size (ref=0x7030e60a4, this=<optimized out>) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1CollectedHeap.inline.hpp:330 #9 G1ParScanClosure::do_oop_nv<unsigned int> (this=0x7f7c0059f0, p=0x7030e60a4) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1OopClosures.inline.hpp:70 #10 0x0000007f9e5189bc in do_oop<G1ParScanClosure, unsigned int> (p=0x7030e60a4, closure=0x7f7c0059f0) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/memory/iterator.inline.hpp:72 #11 oop_oop_iterate_oop_map_reverse<true, unsigned int, G1ParScanClosure> (this=<optimized out>, closure=0x7f7c0059f0, obj=0x7030e6090, map=0x80000e478) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/oops/instanceKlass.inline.hpp:58 #12 oop_oop_iterate_oop_maps_specialized_reverse<true, unsigned int, G1ParScanClosure> (closure=0x7f7c0059f0, obj=0x7030e6090, this=<optimized out>) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/oops/instanceKlass.inline.hpp:104 #13 oop_oop_iterate_oop_maps_reverse<true, G1ParScanClosure> (closure=0x7f7c0059f0, obj=0x7030e6090, this=<optimized out>) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/oops/instanceKlass.inline.hpp:132 #14 oop_oop_iterate_reverse<true, G1ParScanClosure> (closure=0x7f7c0059f0, obj=0x7030e6090, this=<optimized out>) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/oops/instanceKlass.inline.hpp:165 #15 InstanceKlass::oop_oop_iterate_backwards_nv (this=<optimized out>, obj=0x7030e6090, closure=0x7f7c0059f0) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1OopClosures.cpp:64 #16 0x0000007f9e52393c in oop_iterate_backwards (blk=0x7f7c0059f0, this=0x7030e6090) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/oops/oop.inline.hpp:726 #17 G1ParScanThreadState::copy_to_survivor_space (this=this@entry=0x7f7c005880, state=..., old=<optimized out>, old_mark=0x5) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1ParScanThreadState.cpp:325 #18 0x0000007f9e524274 in do_oop_evac<unsigned int> (from=0x7f60296dc0, p=0x70319db70, this=0x7f7c005880) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1ParScanThreadState.inline.hpp:47 #19 deal_with_reference<unsigned int> (ref_to_scan=0x70319db70, this=0x7f7c005880) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1ParScanThreadState.inline.hpp:120 #20 dispatch_reference (ref=..., this=0x7f7c005880) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1ParScanThreadState.inline.hpp:129 #21 G1ParScanThreadState::trim_queue (this=this@entry=0x7f7c005880) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1ParScanThreadState.cpp:144 #22 0x0000007f9e4f5df8 in G1ParEvacuateFollowersClosure::do_void (this=this@entry=0x7f9d220870) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1CollectedHeap.cpp:3529 #23 0x0000007f9e4fad08 in G1ParTask::work (this=0x7f7884e080, worker_id=0) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/g1/g1CollectedHeap.cpp:3593 #24 0x0000007f9ea0e52c in run_task (data=..., this=0x7f98029800) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/shared/workgroup.cpp:327 #25 GangWorker::loop (this=0x7f98029800) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/shared/workgroup.cpp:337 #26 0x0000007f9ea0e408 in AbstractGangWorker::run (this=0x7f98029800) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/share/vm/gc/shared/workgroup.cpp:286 #27 0x0000007f9e857200 in thread_native_entry (thread=0x7f98029800) at /scratch/opt/jprt/T/P1/041108.sangheki/s/hotspot/src/os/linux/vm/os_linux.cpp:688 #28 0x0000007f9ed89e48 in start_thread () from /home/gtee/8158168/rt2_20/libpthread.so.0 #29 0x0000007f9ecd8610 in clone () from /home/gtee/8158168/rt2_20/libc.so.6
30-06-2016

The stack trace seems odd. We are currently in a "CollectForMetadataAllocation" VM operation, i.e. full gc according to the log. Event: 26.796 Executing VM operation: CollectForMetadataAllocation During full gc that task (removing self forwarding pointers) should never be executed. This is a phase only ever executed during young gc. What's the stack trace of the VM thread? Edit: it seems that an initial mark is triggered for this VM operation.
29-06-2016

Right, as you already expected segv still happens with your patch. The crash happened after 187 iteration which is usual.
29-06-2016

This issue is an error within the pause. The barrier changes won't help here.
29-06-2016

2 comments from Thomas at JDK-8159864. ----------------------------------------------------- (attached: assert-to-guarantee.diff) This is the change that modifies one assert to a guarantee as discussed recently. I think particular for this test, you could add an upper bound for the object size of maybe 32 * M too. (From the logs the max object size is 1M) ----------------------------------------------------- (attached: some-barreirs.diff) Patch adds storestore/loadload barriers (in a somewhat conservative way) to see if the test uses garbage values when scanning objects. Potential partial fix for JDK-8160369. Note that I think there is still a missing storestore barriers that is needed after setting the layout helper in a Klass - I could not find a good place, and maybe it's already done somewhere (before returning the klass pointer to store it somewhere). (There is no need for a loadload barrier in our supported platforms as the layout_helper is an address dependent load from the klass pointer). This does not seem to be a problem either in this particular test case as the amount of class loading done here is minimal and should be finished by the time we experience the crashes.
28-06-2016