Bug ID: JDK-8237859 C2: Crash when loads float above range check

JDK 11	JDK 14	JDK 15
11.0.8-oracleFixed	14.0.2Fixed	15 b16Fixed

Fix request (11u) -- will label after testing completed. I would like to downport this for parity with 11.0.8-oracle. Applies clean.
14-04-2020
Fix Request (JDK 14.0.2) This patch fixes a crash that often manifests with ZGC. The fix is medium risk and applies cleanly to JDK 14.0.2. In addition to CI testing through all tiers in JDK 15 and 11u, additional testing will be executed in JDK 14.0.2 before pushing.
08-04-2020
URL: https://hg.openjdk.java.net/jdk/jdk/rev/5d3f6f0582fe User: chagedorn Date: 2020-03-25 15:17:24 +0000
25-03-2020
The following patch is an incremental patch to JDK-8240227 [1] which is a prerequisite for this fix: https://bugs.openjdk.java.net/browse/JDK-8237859 http://cr.openjdk.java.net/~chagedorn/8237859/webrev.incremental.00/ (incremental changes to [1]) http://cr.openjdk.java.net/~chagedorn/8237859/webrev.01/ (combined changes, JDK-8237859 with [1]) In test(), the loop is unswitched and the slow and fast loop are unrolled once. For both loop versions, pre/main/post loops are created. The pre-loop is executed once and the main-loop represents 2 iterations. Depending on the value of flag, wrappers_array can have length 2 or 3. But the unrolled loop bodies contain an access to wrappers_array[1] and wrappers_array[2]. This means that in some cases (when flag is true and wrappers_array has length 2), the main-loop is not entered and the out-of-bounds access of wrappers_array[2] should not be exeucted. Therefore, the corresponding LoadP node should not be scheduled before the If node that decides if the main-loop is executed or not. However, this is exactly what happens: The LoadP node is scheduled before the If node which decides if the main-loop should be executed resulting in a segfault. Background: A first observation was that the loop is additionally unswitched. We do not process and rewire control edges of range check predicates to data nodes belonging to the loop (for example, 366 IfTrue to 247 LoadP in [2] for test()) when unswitching the loop (before doing pre/main/post and unrolling). As a result, the slow and fast loop do not have any predicates before the loop header node. Some loads could now even be performed before the "loop-selection-if" itself (which either selects the slow or fast loop). This can be seen in the mach graph [3] where 145 jmpCon is the loop-selection-if that selects between the fast and slow loop and 161 zLoadP reads from index 2 (which is out of bounds when the wrappers_array has only length 2). This load can now be scheduled way too early before deciding if the main-loop is executed (and in fact is scheduled too early resulting in a segfault). The fix for JDK-8240227 [1] addresses this problem for loop unswitching and clones range check predicates to both unswitched loops and updates the control edges to the data nodes to the new predicates accordingly. This first seemed to fix this bug. However, it is not enough when only ensuring that the data nodes are not scheduled before the loop-selection-if. They can still be scheduled before the If that selects if the main-loop should be entered when enabling -XX:+StressGCM. This can even be observed when running test3() with -XX:+StressGCM which does not do any loop unswitching. A segfault occurs after a few runs. The problem is when creating pre/main/post loops, we do not rewire the control inputs from the range check predicate projections to data nodes of the main- and post-loop body to the main- and post-loop. For example, in [4], 247 LoadP belongs to the main-loop, 377 LoadP belongs to the post-loop and 418 LoadP to the pre-loop. But the control edges of the LoadP nodes belonging to the main- and post-loop (247, 377) are not updated. As a result, all new loads created at unrolling also have a control input from 366 IfTrue. These loads belonging to the main-loop can then end up being scheduled before the If that decides if the main-loop should be entered resulting in a segfault. This fix addresses this issue and rewires the control input of data nodes belonging to the main-loop to a range check predicate before the pre-loop to the copied main-loop range check predicates (copied in PhaseIdealLoop::copy_skeleton_predicates_to_main_loop_helper). When unrolling, the range check predicates of the main-loop are updated together with the control edges (done in PhaseIdealLoop::update_main_loop_skeleton_predicates). The data nodes belonging to the post-loop are rewired to the zero-trip guard If node right before the CountedLoopNode (there are no predicates copied down to the post-loop). For example, in [5] (in contrast to [4]), we rewired the control of 247 LoadP to the main-loop and the control of 377 LoadP to the post-loop. I included some renaming for predicate related code that should make the difference between the various predicate types and how they are used clearer (skeleton, copies, updated copies...). [1] http://cr.openjdk.java.net/~chagedorn/8240227/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8240227 [2] https://bugs.openjdk.java.net/secure/attachment/87123/LoadP_control_Input.png [3] https://bugs.openjdk.java.net/secure/attachment/87125/wrong_controls_in_mach_graph.png [4] https://bugs.openjdk.java.net/secure/attachment/87261/wrong_control_edges_pre_main_post.png [5] https://bugs.openjdk.java.net/secure/attachment/87262/fixed_control_edges_pre_main_post.png
11-03-2020
A fix for JDK-8240227 [1] is currently being tested which clones a range check predicate down to both unswitched loop versions if it has control edges to data nodes that are specific to one of the unswitched loops (part of the original, unswitched loop and therefore cloned to the slow loop while the original node belongs to the fast loop). It also rewires the control inputs to the original predicates to the corresponding cloned version. A range check predicate is not cloned if it has only control edges to data nodes that are not part of the (original, unswitched) loop anymore. Range check predicates without any control edges to data nodes (like a range check for the loop condition variable) are cloned. The fix for JDK-8240227 is needed for the current fix of JDK-8237859 [2] on top of it which rewires control edges to data nodes from the (already cloned) range check predicates of the preloop in the main loop. Additionally, we also rewire all control inputs to predicates of those data nodes belonging to the post loop. Since we do not have any predicates for the post loop, we just use the projection of the zero-trip guard if node as control input. A first run of the Lucene-Solr test suite with ZGC did not show any crashes. I will run some additional testing over the weekend. [1] http://cr.openjdk.java.net/~chagedorn/8240227/webrev.00/ [2] http://cr.openjdk.java.net/~chagedorn/8237859/webrev.01/ (including changes for [1])
06-03-2020
I'm pretty sure now that we need to copy down the loop predicates to the unswitched loop versions. I played around with some existing overunrolling tests to additionally enable loop unswitching for them and found a crash which must be related to those missing copies in the unswitched loop versions. I filed JDK-8240227 to fix this issue. The current fix [1] for copying down any control edges from loop predicates to the main and to the post loop (without considering loop unswitching) will only work once the loop predicates are part of the unswitched loops (i.e. when JDK-8240227 is fixed). Only then, they will be found together with the control edges to the LoadP nodes and can be moved to the main and post loop and fixing this bug. The original thought "fix" [2] only works because GCM does not schedule it too early. But it does not fix the problem. [1] http://cr.openjdk.java.net/~chagedorn/8237859/webrev.00/ [2] http://cr.openjdk.java.net/~chagedorn/8237859/webrev.initial/
28-02-2020
Latest updates/insights: We initially traced the problem back to loop unswitching where we do not consider control inputs from RangeCheck projection nodes to LoadP nodes (for example, 247 LoadP in [1]) when unswitching the loop. As a result, some loads are performed even before the "loop unswitching if" itself. This can be seen in [2] where 145 jmpCon is the "loop unswitching if" that selects between the fast and slow loop and 161 zLoadP reads from index 2 which is out of bounds when the wrappers_array has only 2 elements - this load is scheduled way too early. A first fix tried to move the loads down to the corresponding fast or slow loop. That seemed to work [3]. However, [~stefank] investigated further together with [~neliasso] and [~phedlin] and found that the IR also looks wrong without loop unswitching right after creating the pre-main-post loops. But it did not crash anymore. I reran the Reproducer2.java with -XX:+StressGCM and after some repeated runs it crashed! This indicates that the original problem is somewhere else. I found an even simpler test together with [~thartmann], Reproducer3.java [4], that crashes with -XX:+StressGCM after some runs. A closer analysis suggests that when creating the pre-main-post loops, we do not rewire the control inputs from the RangeCheck projections (like 247 LoadP in [1]) from loop predication down to the main and post loop. When later optimizing the loads in the main loop it assumes that the loop is executed twice (i.e. with the array of length 3). If that is not the case it does not enter the main loop. But since the load at index 2 has a control input from the RangeCheck projection before the pre loop, it can be scheduled before deciding if the main loop is executed or not, resulting in a crash. I'm currently working on a fix to rewire the control inputs from the RangeCheck nodes to the main loop and post loop (right before them). However, loop unswitching is still a problem since we do not copy the loop predication RangeCheck nodes (for example, RangeCheck 360 in [1]) to neither the fast nor the slow loop. Therefore, when creating the pre-main-post loop we do not find any RangeCheck nodes anymore and therefore cannot apply the very same fix. This case needs some more thought. Maybe we need to also clone those RangeChecks down at unswitching. [1] https://bugs.openjdk.java.net/secure/attachment/87123/LoadP_control_Input.png [2] https://bugs.openjdk.java.net/secure/attachment/87125/wrong_controls_in_mach_graph.png [3] http://cr.openjdk.java.net/~chagedorn/8237859/webrev.initial/ [4] https://bugs.openjdk.java.net/secure/attachment/87124/Reproducer3.java
27-02-2020
The problem is even earlier than update_skeleton_predicates. Because of the unswitching of the code in PhaseIdealLoop::duplicate_predicates (called from insert_pre_post_loops) doesn't recognize the graph pattern and doesn't copy the "predicate" and the "skeleton predicate". In a run with out loop unswitching (-XX:-LoopUnswitch) we have this control graph when entering duplicate_predicates: RangeCheck : for "skeleton predicate" IfTrue If : for "predicate" (Note: This and previous predicate binds the IfFalse leg to the same region. Something the code looks for) IfTrue If : for "profile predicate" IfTrue If : for "loop limit check" IfTrue CountedLoop This is what it looks like with loop unswitching (-XX:+LoopUnswitching): --- Same as above --- RangeCheck : for "skeleton predicate" IfTrue If : for "predicate" IfTrue If : for "profile predicate" IfTrue If : for "loop limit check" --- But then comes the unswitching and cloned predicates --- If : Take us to one or the other loops created by the "unswitching" IfFalse (or IfTrue) If : for "predicate" IfTrue If : for "profile predicate IfTrue (note "loop limit check" is removed in unswitched loops) CountedLoop The duplicate_predicates searches from the last (unswitched) CountedLoop, finds the node for "predicate", but can't find the matching "skeleton predicate".
24-02-2020
[~chagedorn] Simplified the reproducer further - I've uploaded this version as Reproducer2.java
24-02-2020
Latest insights: It looks like the code has undergone loop prediction: https://wiki.openjdk.java.net/display/HotSpot/LoopPredication But it's odd that from the IfTrue node (in the out edges) of that range check, we have the expected control flow, but we also immediately have LoadPs to array[0], array[1], array[2] without any further range checks. I see that there used to be a "skeleton range check" between the IfTrue node and the LoadPs, but that gets eliminated. The skeleton range check is added with this function: --- // After pre/main/post loops are created, we'll put a copy of some // range checks between the pre and main loop to validate the value // of the main loop induction variable. Make a copy of the predicates // here with an opaque node as a place holder for the value (will be // updated by PhaseIdealLoop::clone_skeleton_predicate()). ProjNode* PhaseIdealLoop::insert_skeleton_predicate(IfNode* iff, IdealLoopTree loop, --- clone_skeleton_predicate is called by: - duplicate_predicates_helper - update_skeleton_predicate The latter is called when a loop is unrolled, which is the case for the main loop in this code. It is unrolled two visit two elements. update_skeleton_predicates have this comment --- void PhaseIdealLoop::update_skeleton_predicates(Node ctrl, CountedLoopNode* loop_head, Node* init, int stride_con) { // Search for skeleton predicates and update them according to the new stride --- So, with new stride (two instead of one element) we should update this skeleton. I traced the call to update_skeleton_predicates and we don't seem to perform the update ... The code immediately exits through: --- if (proj->unique_ctrl_out()->Opcode() != Op_Halt) { break; } --- What's more interesting is that if I turn off LoopUnswitching, the update_skeleton_predicates function actually do perform the mentioned update! LoopUnswitching takes the Reproducer code: --- for (Wrapper wrapper : wrappers_array) { wrapper.maybeMaskBits(two); } --- With inlining: --- for (Wrapper wrapper : wrappers_array) { if (b) { wrapper.longs &= 0x1F1F1F1F; } } --- and turns it into: --- if (b) { for (Wrapper wrapper : wrappers_array) { wrapper.longs &= 0x1F1F1F1F; } } else { for (Wrapper wrapper : wrappers_array) { // Eventually nothing and the else branch disappears } } --- My best guess so far is that the unswitching causes the update_skeleton_predicates to not understand the graph, failing to update the range check stride, and incorrectly getting the range check removed.
20-02-2020
The above check was simply the "upper bounds range check". Focusing on loop peeling instead. The original loop is split into two because of the if (b) check in the reproducer. The following focusing on one of these two loops: * The loop is split into a pre-loop, main-loop, post-loop - The pre-loop strides one element at a time - The main-loop strides two elements - Using an AddI(..., 2) for the index - The post-loop strides one element * The main-loop is removed in "PhasedIdealLoop iterations" * The pre-loop is removed in "Macro expand" If we zoom in on the graph after "PhaseIdealLoop iterations": - The pre-loop iterates over all elements - There's a check after the pre-loop whether the 'index < length - 1', which I guess is a check to see if we can take a two-element stride. This is weird, becuase the main-loop is removed and we shouldn't be striding two-elements any more. If we follow the IfTrue branch for that check, we find another weird thing, in that the AddI(..., 2) node from the main-loop is then used to increment the index. I see in the graphs that after "PhaseIdealLoop iterations" that the AddI(..., 2) node used in the main-loop is used to pre-increment the index coming out from the pre-loop, and then add it to AddP(.., array, 32). This seems to be remnants of the main-loop even though it has already been removed. My guess is that later, when we remove the pre-loop in "Macro expand", the post-loop is already broken and we get the resulting, broken code. I think the next step is to investigate why the AddI(..., 2) stride node is not removed with the main-loop.
19-02-2020
I've been looking at Ideal graphs and stepping through the code with rr. I see that PhaseIdealLoop::rc_predicate converts the range check into something that looks really weird to me: if (array.length - 1 < array.length) Later code that calculates the offset into the array, looks at these nodes and determine that the type of the cmp is int:1..2, and converts the CastII node to type int:1. After that many nodes are constant folded and we end up with an offset outside of the array (40 = 24 + 8 + 8) [one of the eights come from the 1 above]. The code that figures out that the type must be int:1 is here: const Type* CastIINode::Value(PhaseGVN* phase) const { ... if (m == BoolTest::le \|\| m == BoolTest::lt) { hi_long = in2_t->_hi; if (m == BoolTest::lt) { hi_long -= 1; } which seems to check if there's a dominating CmpI/If that can sharpen the type. This stops crashing if I run with -XX:-UseLoopPredicate.
18-02-2020
Attached: Reproducer.java - since JBS messes with the whitespaces.
17-02-2020
I've managed to create a small, stand-alone reproducer for this bug: --- import java.util.Random; public class Reproducer { static Random random = new Random(0); static Object dummy; public static void main(String... args) { int length = Integer.parseInt(args[0]); for (int x = 0; x < length; x++) { dummy = makeRandomQuery(); } } private static class Wrapper { long longs; public void maybeMaskBits(boolean b) { if (b) { longs &= 0x1F1F1F1F; } } } private static String makeRandomQuery() { Wrapper w1 = new Wrapper(); Wrapper w2 = new Wrapper(); Wrapper w3 = new Wrapper(); boolean one = false; boolean two = random.nextBoolean(); Wrapper[] wrappers_array = one ? new Wrapper[] {w1} : (two ? new Wrapper[] {w1, w2} : new Wrapper[] {w1, w2, w3}); boolean b = random.nextBoolean(); for (Wrapper wrapper : wrappers_array) { wrapper.maybeMaskBits(b); } return " " + one; } } --- With patch above, this reproduces with: javac Reproducer.java && java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ZCollectionInterval=1 -cp . Reproducer 100000000 Giving a crash that says something like this: # guarantee((addr & 0x7) == 0) failed: Should not have bad bits: 0x000006bebaadbabe It probably reproduces without the patch, but the patch makes us crash in the load barrier of the problematic read. Without the patch, we crash later when the marking threads find the broken oop.
17-02-2020
FWIW this reproducer also crashes with Shenandoah.
14-02-2020
< 1 minute reproducer: - Apply the attached guarantee.patch file - export ANT_HOME=/usr/share/ant/ - export JAVA_HOME=/home/stefank/hg/jdk/jdk/build/fastdebug/images/jdk/ - cd ~/src/solr-8.4.1/solr/core - ant -autoproxy test -Dargs="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xms512m -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=0 -XX:PrintIdealGraphFile=graph.xml -XX:CompileCommand=option,org.apache.solr.search.TestFiltering::makeRandomQuery,intx,IGVPrintLevel,3 -XX:ZCollectionInterval=1" -Dtests.jvms=1 -Dtests.heapsize=512m -Dtestcase=TestFiltering -Dtestmethod=testRandomFiltering -Dtests.iters=100
14-02-2020
Yes, after fixing it I will backport to 14 update. That is the plan.
06-02-2020
As far as I can see, this bug affects all GCs, not just ZGC. There's no guarantee that the allocated array has a small buffer after it that is safe to read. The array might not even be allocated in a TLAB at all. It just happens to be easier to catch this bug with ZGC since the problem will be detected by the load barrier. If we defer this out of 14, I think we should at least work on a fix for 14.0.1.
06-02-2020
Deferral Request I need more time to work on solution. The issue is not new and C2 related where loads are moved above range check. which may be fine in other than ZGC - load will be from TLAB after array and we have small buffer at the end of TLAB which prevents access outside TLAN so no SEGV and load results is ignored. With ZGC we have load barriers which catch the issue.
06-02-2020
Crashes with latest jdk/jdk (0f53754d8577).
30-01-2020
Copy conversation form Slack: Stefan Karlsson 2 day ago We are seeing a JDK 14 ZGC Lucene/Solr crash that seems to be a C2 bug. It creates a two-element array with this funny code shape: boolean exclude = facetQuery ? false : random().nextBoolean(); // can't exclude a facet query from facetingFixedBitSet[] sets = facetQuery ? new FixedBitSet[]{model.facetQuery} : (exclude ? new FixedBitSet[]{model.answer, model.facetQuery} : new FixedBitSet[]{model.answer, model.multiSelect, model.facetQuery}); but then we index the third element of that array (so, out-of-bounds).It almost looks like: //------------------------------optimize_trichotomy-------------------------- // Optimize nested comparisons of the following kind: // // int compare(int a, int b) { // return (a < b) ? -1 : (a == b) ? 0 : 1; // } Could someone take a quick look? It's fairly hard to reproduce, but we have an instance of this failure in the debugger.https://bugs.openjdk.java.net/browse/JDK-8237859 (edited) 23 replies Vladimir Kozlov 1 day ago But last array has 3 elements: new FixedBitSet[]{model.answer, model.multiSelect, model.facetQuery}) Stefan Karlsson 1 day ago When this crashes the array has only two elements. Vladimir Kozlov 1 day ago may be branches messed up Vladimir Kozlov 23 hours ago @skarlsso can you attach disassembler code for whole method to bug report. Especially around code which loads barriers. My suspicion is that code is moved above range check. Stefan Karlsson 23 hours ago @vkozlov file is uploaded Stefan Karlsson 23 hours ago In the assembly the loads seems to come from this code: for (FixedBitSet set : sets) { set.clear(0,l); if (u + 1 < model.indexSize) { set.clear(u+1, model.indexSize); } } Stefan Karlsson 23 hours ago We see code from clear both before and after the loads. (edited) Vladimir Kozlov 22 hours ago There are 2 allocations and 2 loops in code: 7f3d8ccfcff4-7f3d8ccfd0b6 array [3]allocation 7f3d8ccfd0bb-0x00007f3d8ccfd15c array [2] array One good loop where barriers are checked inside loop: 0x00007f3d8ccfdaa0-0x00007f3d8ccfdb37Bad loop where checks are moved outside: 0x00007f3d8ccfdc40-0x00007f3d8ccfdcd1 but I don't see condition of jump to it in code you provided The start of the block with bad loop is 0x00007f3d8ccfdb45 Vladimir Kozlov 15 hours ago I think it is next loops: if (positive) { for (FixedBitSet set : sets) { set.and(pset); } } else { for (FixedBitSet set : sets) { set.andNot(pset); } } Vladimir Kozlov 15 hours ago because I see and at the end of loops: 0x00007f3d8ccfdb24: mov %rbx,%r8 0x00007f3d8ccfdb27: or %r9,%r8 0x00007f3d8ccfdb2a: and %r8,0x18(%rbp,%rdx,8) Stefan Karlsson 13 hours ago Weird. Because I see generated code for set.clear around the instruction were we crash. The instructions you pasted is from the "good loop", but we don't crash in that block. Need to check more when I get to the office. Stefan Karlsson 10 hours ago Doh. The file I uploaded didn't contain the entire assembly listing. I've uploaded a new file. Stefan Karlsson 7 hours ago Me, ErikÖ and Nils has traced the code and are quite certain that we're inside 339-344: 310 String makeRandomQuery(Model model, boolean mainQuery, boolean facetQuery) { 311 312 boolean cache = random().nextBoolean(); 313 int cost = cache ? 0 : random().nextBoolean() ? random().nextInt(200) : 0; 314 boolean positive = random().nextBoolean(); 315 boolean exclude = facetQuery ? false : random().nextBoolean(); // can't exclude a facet query from faceting 316 317 FixedBitSet[] sets = facetQuery ? new FixedBitSet[]{model.facetQuery} : 318 (exclude ? new FixedBitSet[]{model.answer, model.facetQuery} : new FixedBitSet[]{model.answer, model.multiSelect, model.facetQuery}); 319 320 if (random().nextInt(100) < 60) { 321 // frange 322 int l=0; 323 int u=0; 324 325 if (positive) { 326 // positive frange, make it big by taking the max of 4 tries 327 int n=-1; 328 329 for (int i=0; i<4; i++) { 330 int ll = random().nextInt(model.indexSize); 331 int uu = ll + ((ll==model.indexSize-1) ? 0 : random().nextInt(model.indexSize-l)); 332 if (uu-ll+1 > n) { 333 n = uu-ll+1; 334 u = uu; 335 l = ll; 336 } 337 } 338 339 for (FixedBitSet set : sets) { 340 set.clear(0,l); 341 if (u + 1 < model.indexSize) { 342 set.clear(u+1, model.indexSize); 343 } 344 } Stefan Karlsson 7 hours ago After loading the three elements we jump to this code: 0x00007f3d8ccfdcdc: mov 0x10(%rsi),%edx - assert numBits 0x00007f3d8ccfdcdf: test %edx,%edx - 0x00007f3d8ccfdce1: jle 0x7f3d8cd061f4 0x00007f3d8ccfdce7: mov 0x4c(%rsp),%r11d - the other assert?? 0x00007f3d8ccfdcec: cmp %edx,%r11d 0x00007f3d8ccfdcef: jg 0x7f3d8cd062a4 0x00007f3d8ccfdcf5: mov 0x18(%rsi),%r10 - read object (bits?) 0x00007f3d8ccfdcf9: test %r10,0x20(%r15) 0x00007f3d8ccfdcfd: jne 0x7f3d8cd0852b 0x00007f3d8ccfdd03: vmovq %r10,%xmm0 0x00007f3d8ccfdd08: mov 0x10(%r10),%r11d - read bits.length 0x00007f3d8ccfdd0c: test %r11d,%r11d - not negative? 0x00007f3d8ccfdd0f: jbe 0x7f3d8cd05604 0x00007f3d8ccfdd15: and %rcx,0x18(%r10) - & first element 0x00007f3d8ccfdd19: mov 0xc(%rsp),%r10d - load facetQuery 0x00007f3d8ccfdd1e: cmp $0x1,%r10d - 0x00007f3d8ccfdd22: jle 0x7f3d8cd05724 - uncommon trapp 0x00007f3d8ccfdd28: mov 0x10(%r12),%edx 0x00007f3d8ccfdd2d: test %edx,%edx 0x00007f3d8ccfdd2f: jle 0x7f3d8cd061fc 0x00007f3d8ccfdd35: mov 0x4c(%rsp),%r11d 0x00007f3d8ccfdd3a: cmp %edx,%r11d 0x00007f3d8ccfdd3d: jg 0x7f3d8cd062ac 0x00007f3d8ccfdd43: mov 0x18(%r12),%r10 0x00007f3d8ccfdd48: test %r10,0x20(%r15) 0x00007f3d8ccfdd4c: jne 0x7f3d8cd0855c 0x00007f3d8ccfdd52: vmovq %r10,%xmm0 0x00007f3d8ccfdd57: mov 0x10(%r10),%r10d 0x00007f3d8ccfdd5b: test %r10d,%r10d 0x00007f3d8ccfdd5e: jbe 0x7f3d8cd0560c 0x00007f3d8ccfdd64: vmovq %xmm0,%r10 0x00007f3d8ccfdd69: and %rcx,0x18(%r10) 0x00007f3d8ccfdd6d: mov 0x10(%r14),%edx 0x00007f3d8ccfdd71: test %edx,%edx 0x00007f3d8ccfdd73: jle 0x7f3d8cd06207 0x00007f3d8ccfdd79: cmp %edx,%r11d 0x00007f3d8ccfdd7c: jg 0x7f3d8cd062b7 0x00007f3d8ccfdd82: mov 0x18(%r14),%r10 0x00007f3d8ccfdd86: test %r10,0x20(%r15) 0x00007f3d8ccfdd8a: jne 0x7f3d8cd0858e 0x00007f3d8ccfdd90: vmovq %r10,%xmm0 0x00007f3d8ccfdd95: mov 0x10(%r10),%r10d 0x00007f3d8ccfdd99: test %r10d,%r10d 0x00007f3d8ccfdd9c: jbe 0x7f3d8cd05617 0x00007f3d8ccfdda2: vmovq %xmm0,%r10 0x00007f3d8ccfdda7: and %rcx,0x18(%r10) That uses the three loaded values. It seems to have a check of the case if we only have one element (facetQuery == true), but no check if it two or three. Stefan Karlsson 7 hours ago When tracing this we see that there's a path from "alloc Array[2]" to this usage above. (Given that we found the correct way). Stefan Karlsson 7 hours ago Array[2]: new FixedBitSet[]{model.answer, model.facetQuery} 0x00007f3d8ccfd0bb: mov 0x120(%r15),%r8 ... 320 if (random().nextInt(100) < 60) { 0x00007f3d8ccfd2f3: cmp $0x3c,%r11d 0x00007f3d8ccfd2f7: jl 0x7f3d8ccfd653 ... 0x00007f3d8ccfd653: mov 0x30(%rsp),%r10 325 if (positive) { 0x00007f3d8ccfd658: test %r10,%r10 0x00007f3d8ccfd65b: jl 0x7f3d8ccfd6eb If not jumped (we take this path): 0x00007f3d8ccfd661: mov $0x2,%r10d 0x00007f3d8ccfd667: mov $0x11,%r8d 0x00007f3d8ccfd66d: mov $0xffffffffffffffff,%rcx 0x00007f3d8ccfd674: movabs $0x400050014f8,%rax 0x00007f3d8ccfd67e: movabs $0x40001866f00,%rdx 0x00007f3d8ccfd688: mov $0x7ffffff7,%edi 0x00007f3d8ccfd68d: movabs $0x40003800668,%rsi 0x00007f3d8ccfd697: mov $0x2d,%r11d 0x00007f3d8ccfd69d: mov $0xffffffff,%r12d 0x00007f3d8ccfd6a3: xor %r9d,%r9d 0x00007f3d8ccfd6a6: xor %ebx,%ebx 0x00007f3d8ccfd6a8: xor %ebp,%ebp 0x00007f3d8ccfd6aa: mov %r10d,0x9c(%rsp) 0x00007f3d8ccfd6b2: mov %r8d,0x98(%rsp) 0x00007f3d8ccfd6ba: mov %rcx,0x88(%rsp) 0x00007f3d8ccfd6c2: mov %rax,0x78(%rsp) 0x00007f3d8ccfd6c7: mov %rdx,0xa8(%rsp) 0x00007f3d8ccfd6cf: mov %edi,0x80(%rsp) 0x00007f3d8ccfd6d6: mov %rsi,0xa0(%rsp) 0x00007f3d8ccfd6de: mov %r11d,0xb0(%rsp) Jump to 329 for (int i=0; i<4; i++) { 0x00007f3d8ccfd6e6: jmpq 0x7f3d8cd00943 ... Top of loop: 329 for (int i=0; i<4; i++) { 0x00007f3d8cd00909: add 0x70(%rsp),%edx 0x00007f3d8cd0090d: mov %edx,0x34(%rsp) 0x00007f3d8cd00911: mov %r11d,0x68(%rsp) 0x00007f3d8cd00916: mov 0x70(%rsp),%r11d 0x00007f3d8cd0091b: mov %r11d,0x4c(%rsp) 0x00007f3d8cd00920: mov 0x6c(%rsp),%r11d 0x00007f3d8cd00925: inc %r11d 0x00007f3d8cd00928: cmp $0x4,%r11d Jump to broken code (exit the loop): 0x00007f3d8cd0092c: jge 0x7f3d8ccfdb45 0x00007f3d8cd00932: mov 0x68(%rsp),%r12d 0x00007f3d8cd00937: mov 0x4c(%rsp),%r9d 0x00007f3d8cd0093c: mov 0x34(%rsp),%ebx 0x00007f3d8cd00940: mov %r11d,%ebp 0x00007f3d8cd00943: mov %ebp,0x6c(%rsp) 0x00007f3d8cd00947: mov %ebx,0x34(%rsp) 0x00007f3d8cd0094b: mov %r9d,0x4c(%rsp) 0x00007f3d8cd00950: mov %r12d,0x68(%rsp) Stefan Karlsson 7 hours ago We are confused by this part: 325 if (positive) { 0x00007f3d8ccfd658: test %r10,%r10 0x00007f3d8ccfd65b: jl 0x7f3d8ccfd6eb From what I could tell this test/jl seemed to check if the msb is set. It doesn't make sense for a boolean, but maybe it all depends on how random.nextBoolean() is implemented. Stefan Karlsson 7 hours ago I'm going to assign this bug over to hs/comp, but will keep looking into this. Should I assign this to someone else? Vladimir Kozlov 2 hours ago Assign it to me I tried to reproduce this code shape by preparing small test based on sources and keeping TestFiltering code unchanged. But I was not able to get even closer to assembler we have here. One thing is strange is the loops I see which seems coming from ‘clean()’ method don’t have polling page touch code. It means they are counted but I don’t see corresponding code around it. We should have polling page touch code somewhere. The was bug Roland fixed in b33 related to loop strip mining which may be related or not. If we can reproduce the failure reliably and try to switch off strip mining it would be great. Vladimir Kozlov 2 hours ago And I was mistaking about and() vs clean() methods. I looked on wrong clean() method with one argument. Clean() method with 2 arguments which is used here has ‘and’ instruction at the end. So you are right.
29-01-2020
We had a thought that this could be caused by: //------------------------------optimize_trichotomy-------------------------- // Optimize nested comparisons of the following kind: However, adding guarantees in this code showed that this optimization doesn't change anything in the affected function.
29-01-2020
FTR: Also caught the same broken load from a relocate_object load barrier (matching the bug title)
28-01-2020
We've the try_mark_object bug in a debugger now. We added an assert that the object is sane when we push entries to the marking stack. What we see is that we push a broken object pointer when execute the load barrier in this C2 compiled method: org/apache/solr/search/TestFiltering.makeRandomQuery(Lorg/apache/solr/search/TestFiltering$Model;ZZ)Ljava/lang/String; We crash when we read the third element in a FixedBitSet array of size 2! 0x00007f3d8ccfdbad: mov 0x28(%r10),%r14 0x00007f3d8ccfdbb1: test %r14,0x20(%r15) Broken object pointer is 0x28(%r10), which is 0x40004227dd0. Looking at the address: 0x40004227dc0 - 0x28: 0x40004227da8: 0x0000000000000001 0x00007f3b48457648 0x40004227db8: 0x0000000000000002 0x0000040004227c58 0x40004227dc8: 0x0000040004227cd8 0x00007f3da02b1860 We have the following data at the offsets: 0x0: mark word 0x8: Klass* 0x00007f3b48457648 - the object klass is an array: [Lorg/apache/lucene/util/FixedBitSet; 0x10: array length: 0x2 0x18: FixedBitSet instance 0x0000040004227c58 0x20: FixedBitSet instance 0x0000040004227cd8 0x28: outside of the object (pointing to stale data that happens to be a Klass*) The code that triggers the load barrier that asserts is: 0x00007f3d8ccfdba8: mov 0x40(%rsp),%r10 ; Load the FixedBitSet[] instance (call it sets) 0x00007f3d8ccfdbad: mov 0x28(%r10),%r14 ; Load sets[2] // This is the broken load 0x00007f3d8ccfdbb1: test %r14,0x20(%r15) ; Load barrier test 0x00007f3d8ccfdbb5: jne 0x7f3d8cd08325 ; Load barrier jump // Goes to the load barrier that finds the broken load 0x00007f3d8ccfdbbb: mov 0x20(%r10),%r12 ; Load sets[1] 0x00007f3d8ccfdbbf: test %r12,0x20(%r15) ; Load barrier test 0x00007f3d8ccfdbc3: jne 0x7f3d8cd0838a ; Load barrier jump 0x00007f3d8ccfdbc9: mov 0x18(%r10),%rsi ; Load sets[0] 0x00007f3d8ccfdbcd: test %rsi,0x20(%r15) ; Load barrier jump 0x00007f3d8ccfdbd1: jne 0x7f3d8cd083ef Now looking at the code: String makeRandomQuery(Model model, boolean mainQuery, boolean facetQuery) { ... boolean exclude = facetQuery ? false : random().nextBoolean(); // can't exclude a facet query from faceting FixedBitSet[] sets = facetQuery ? new FixedBitSet[]{model.facetQuery} : (exclude ? new FixedBitSet[]{model.answer, model.facetQuery} : new FixedBitSet[]{model.answer, model.multiSelect, model.facetQuery}); So, depending on some parameters and randomness, it sets up a FixedBitSet[] that is of one of the lengths: 1, 2, or 3. Then further down, we run the code that matches the assembly: for (FixedBitSet set : sets) { set.clear(0,l); if (u + 1 < model.indexSize) { set.clear(u+1, model.indexSize); } } The loads seems to come from the for loop: for (FixedBitSet set : sets) { It looks like C2 has managed to allocate a FixedBitSet[2] but is running the code as if it were a FixedBitSet[3]. This points towards a non-GC specific C2 bug.
28-01-2020
Regarding the 13.0.2 crash. It's not unlikely that it's caused by: JDK-8233506: ZGC: the load for Reference.get() can be converted to a load for strong refs which hasn't been backported to 13.0.2.
27-01-2020
No luck so far with catching this with more verification. These kind of crashes indicate that an object is broken. Need to figure out if it is caused by GC, missing handles, missing oop maps, compiler bugs, or anything else.
27-01-2020
Managed to reproduce a marking crash with JDK 15: $ cd /localhome/src/solr-8.4.1/solr/core $ JAVA_HOME=/home/stefank/hg/jdk/jdk/build/fastdebug/images/jdk/ ant -autoproxy test -Dargs="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC" # SIGSEGV (0xb) at pc=0x00007fa37cc265e6, pid=18631, tid=18638 V [libjvm.so+0x17ed5e6] ZMark::try_mark_object(ZMarkCache, unsigned long, bool)+0x56 V [libjvm.so+0x17ed5e6] ZMark::try_mark_object(ZMarkCache, unsigned long, bool)+0x56 V [libjvm.so+0x17eee19] ZMark::mark_and_follow(ZMarkCache, ZMarkStackEntry)+0x39 V [libjvm.so+0x17ef13d] ZMark::work_without_timeout(ZMarkCache, ZMarkStripe, ZMarkThreadLocalStacks)+0xed V [libjvm.so+0x17ef665] ZMark::work(unsigned long)+0xf5 V [libjvm.so+0x1826f9d] ZTask::GangTask::work(unsigned int)+0x1d V [libjvm.so+0x17b1804] GangWorker::run_task(WorkData)+0x84 V [libjvm.so+0x17b1948] GangWorker::loop()+0x48 V [libjvm.so+0x16882e6] Thread::call_run()+0xf6 V [libjvm.so+0x12917de] thread_native_entry(Thread*)+0x10e Rerunning with more verification.
26-01-2020
Yes, sorry. I reconfigured Jenkins to keep logs of failed builds for longer time and don't discard them.
26-01-2020
It would be interesting to get hold of a hs_err file from JDK 14. The logs seems to have been deleted for that crash.
26-01-2020
In one of the 13.0.2 hs_err files, we crash while scanning java.lang.ref.[*]Reference objects and the stack contains addresses to WeakHashMap. I wonder that crash is a reference processing bug that we've fixed in 14. The other hs_err file also has WeakHashMaps on the stack.
26-01-2020
Here is a similar one with 13.0.2: https://jenkins.thetaphi.de/job/Lucene-Solr-8.x-Linux/1900/ (hserr and replay files can be downloaded there).
26-01-2020

Relates :	JDK-8333840 - C2 SuperWord: wrong result for MulAddS2I when inputs permuted
Relates :	JDK-8240227 - Loop predicates should be copied to unswitched loops