JDK-8232642 : Investigate tsan warnings with jsr166 tests
  • Type: Task
  • Component: core-libs
  • Sub-Component: java.util.concurrent
  • Priority: P3
  • Status: Open
  • Resolution: Unresolved
  • Submitted: 2019-10-18
  • Updated: 2019-10-22
Description
The java tsan project has advanced enough that  it can find improvements to java.util.concurrent, and java.util.concurrent is weird enough that it finds problems in tsan.
Comments
Here's a breadcrumb-encrusted script I've been using to investigate this. Doug, would you like to try it out? ``` #!/bin/bash readonly JSR166_WORKSPACE=$1 TSAN_JDK_HOME=$2 [[ "$#" == 2 \ && -d "$JSR166_WORKSPACE/src/main/java/util/concurrent" \ && -x "$TSAN_JDK_HOME/bin/java" ]] \ || { PROG="${0##*/}" cat <<EOF $PROG JSR166-CVS-WORKSPACE TSAN-JDK-HOME Runs jsr166 tck tests reporting java tsan warnings. Preparation: 1. Check out a jsr166 CVS workspace http://gee.cs.oswego.edu/dl/concurrency-interest/index.html 2. Build a tsan-enabled jdk as described here: https://wiki.openjdk.java.net/display/tsan/Main 3. Disable troublesome test TimeUnitTest.testTimedWait_IllegalMonitorException Else a FATAL error masks later warnings. 4. Spray volatile over ForkJoinPool.WorkQueue fields. EOF exit 1 } set -eu -o pipefail readonly TMPDIR="/tmp/${0##*/}" readonly IGNORE_FILE="$TMPDIR/IGNORE_FILE" readonly SUPPRESSIONS_FILE="$TMPDIR/SUPPRESSIONS_FILE" readonly LOG_PREFIX="$TMPDIR/log" readonly LOG="$TMPDIR/log" readonly -a TSAN_OPTIONS=( log_path="$LOG_PREFIX" exitcode=0 suppressions="$SUPPRESSIONS_FILE" ) readonly -a TSAN_VMOPTIONS=( -XX:+ThreadSanitizer -XX:ThreadSanitizerIgnoreFile="$IGNORE_FILE" ) rm -rf "$TMPDIR" mkdir -p "$TMPDIR" #---------------------------------------------------------------- cat <<'EOF' > "$IGNORE_FILE" # Intentional races on Thread fields relying on OOTA guarantees java.util.concurrent.locks.AbstractOwnableSynchronizer exclusiveOwnerThread java.util.concurrent.locks.AbstractQueuedSynchronizer$Node waiter java.util.concurrent.locks.AbstractQueuedLongSynchronizer$Node waiter java.util.concurrent.locks.StampedLock$Node waiter java.util.concurrent.locks.ReentrantReadWriteLock$Sync cachedHoldCounter java.util.concurrent.locks.ReentrantReadWriteLock$Sync firstReader # Intentional races nulling out fields in Completion implementations java.util.concurrent.CompletableFuture$UniCompletion executor java.util.concurrent.CompletableFuture$UniCompletion src java.util.concurrent.CompletableFuture$UniCompletion dep java.util.concurrent.CompletableFuture$BiCompletion snd java.util.concurrent.CompletableFuture$OrRun fn java.util.concurrent.CompletableFuture$OrAccept fn java.util.concurrent.CompletableFuture$OrApply fn java.util.concurrent.CompletableFuture$BiRun fn java.util.concurrent.CompletableFuture$BiAccept fn java.util.concurrent.CompletableFuture$BiApply fn java.util.concurrent.CompletableFuture$UniHandle fn # Intentional races lazily initializing stateless view objects java.util.concurrent.locks.StampedLock readLockView java.util.concurrent.locks.StampedLock writeLockView java.util.concurrent.locks.StampedLock readWriteLockView # TODO: Investigate cancel/run race java.util.concurrent.FutureTask callable # TODO: Investigate rare failure (spliterator related) java.util.SplittableRandom seed # TODO: Investigate java.util.concurrent.SubmissionPublisher$BufferedSubscription head java.util.concurrent.SubmissionPublisher$BufferedSubscription tail java.util.concurrent.SubmissionPublisher$BufferedSubscription waiter java.util.concurrent.SubmissionPublisher$BufferedSubscription executor # TODO: Investigate java.util.concurrent.ConcurrentSkipListMap$Index right java.util.concurrent.ConcurrentSkipListMap$Node val java.util.concurrent.ConcurrentSkipListMap$Node next # TODO: Investigate java.util.stream.ForEachOps$ForEachTask spliterator java.util.stream.ForEachOps$ForEachTask targetSize # Work stealing queue mechanics # jtsan seems to have trouble understanding fork-join (who doesn't?) java.util.concurrent.ForkJoinPool$WorkQueue base java.util.concurrent.ForkJoinPool$WorkQueue top java.util.concurrent.ForkJoinPool$WorkQueue stackPred java.util.concurrent.ForkJoinPool$WorkQueue nsteals java.util.concurrent.ForkJoinPool$WorkQueue array # TODO: Fix test infrastructure buglets ForkJoinTask8Test$AsyncFib number JSR166TestCase$LatchAwaiter state EOF #---------------------------------------------------------------- # Suppressions syntax (yes, horrifying!) documented here: # https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions # Use '*' to work around inability to escape '$' # Is the semantics of race_top documented somewhere? cat <<'EOF' > "$SUPPRESSIONS_FILE" # tsan doesn't know about mutex in hotspot native code mutex:^ZIP_Get_From_Cache mutex:^ZIP_GetEntry2 # Concurrent access to ForkJoinPool.workQueues array race_top:^java.util.concurrent.ForkJoinPool.registerWorker( race_top:^java.util.concurrent.ForkJoinPool.deregisterWorker( # Concurrent access to WorkQueue.array race_top:^java.util.concurrent.ForkJoinPool.scan( race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.lockedPush( # TODO: investigate WorkQueue mechanics # race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.tryRemoveAndExec( # race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.push( # race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.poll( # race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.tryUnpush( # race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.nextLocalTask( # race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.<init>( # intentional monitor misuse in testTimedWait_IllegalMonitorException # causes FATAL tsan error # race:TimeUnitTest*1.realRun( # TODO: investigate - possibly related to Spliterators race_top:^java.util.concurrent.LinkedBlockingDeque.unlinkFirst( race_top:^java.util.concurrent.LinkedBlockingDeque.unlinkLast( race_top:^java.util.concurrent.LinkedBlockingDeque.linkLast( race_top:^java.util.concurrent.LinkedBlockingDeque.linkFirst( race_top:^jjava.util.concurrent.LinkedBlockingDeque*LBDSpliterator.trySplit( race_top:^java.util.concurrent.LinkedBlockingQueue.enqueue( race_top:^java.util.concurrent.LinkedBlockingQueue*LBQSpliterator.trySplit( # TODO: racy accesses to Cell[] elements race_top:^java.util.concurrent.atomic.Striped64.longAccumulate( race_top:^java.util.concurrent.atomic.Striped64.doubleAccumulate( race_top:^java.util.concurrent.atomic.LongAdder.add( race_top:^java.util.concurrent.atomic.LongAdder.sum( race_top:^java.util.concurrent.atomic.DoubleAdder.sum( # TODO: Vector.equals acquires both locks in deadlock-prone order deadlock:java.util.Vector.equals( EOF main() { cd "$JSR166_WORKSPACE" env TSAN_OPTIONS="${TSAN_OPTIONS[*]}" \ ant -v -Djava.runtime.target=13 -Djdk13.home="$TSAN_JDK_HOME" \ -Dvmoptions="${TSAN_VMOPTIONS[*]}" \ tck >& "$LOG" ls -1l "$LOG"* exit 0 } main "$@" ```
22-10-2019

tsan wishlist: - llvm tsan doesn't support fences, but in jsr166 we are increasingly using fences - we use racy accesses to array elements in fork-join and Striped64, and there's no way to suppress these with field suppressions. We could use VarHandles for such array access. - tsan race reports threads with strings like "thread T31" but we would like the kind of information Java gives us, notably the Java-level thread name and lock status. - the tsan suppression mini-language is an embarrassment and should be made more powerful (but I can understand why the llvm project might not want to make suppressions easier) - we should have more powerful annotations indicating programmer intent for races. - running under tsan should be completely transparent except for producing a file by side effect, but that seems impossible. Again, because of the mindset that C++ races are undefined behavior and so abnormal termination is considered a feature. - it's time for the tsan project to be updated to jdk head
22-10-2019

> java tsan doesn't seem to have a bug component. Is that a bug? No, tsan is only a project. If tsan is using a special project repository then they could request for a repo-tsan release value and use that for filing bugs.
20-10-2019

java tsan doesn't seem to have a bug component. Is that a bug? --- java tsan seems to have trouble understanding fork-join, but not sure why. As a result many false positives are reported from tasks running in a fork join pool. Probably we need to teach tsan about fences, at the very least. This is a showstopper for adoption in jdk14+. --- This test is big trouble, causing the test process to abort: public void testTimedWait_IllegalMonitorException() { Thread t = newStartedThread(new CheckedRunnable() { public void realRun() throws InterruptedException { Object o = new Object(); try { MILLISECONDS.timedWait(o, LONGER_DELAY_MS); threadShouldThrow(); } catch (IllegalMonitorStateException success) {} }}); awaitTermination(t); } ThreadSanitizer:DEADLYSIGNAL or FATAL: ThreadSanitizer CHECK failed: /build/llvm-toolchain-snapshot-Js2WL8/llvm-toolchain-snapshot-8~svn342269/projects/compiler-rt/lib/tsan/rtl/tsan_interface_java.cc:224 "((rec)) > ((0))" (0x0, 0x0) #0 __tsan::TsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ??:? (java+0x49379d) #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ??:? (java+0x4b0bc5) #2 __tsan_java_mutex_lock_rec ??:? (java+0x479d0c) #3 ObjectSynchronizer::wait(Handle, long, Thread*) :? (libjvm.so+0xbf1412) #4 JVM_MonitorWait ??:? (libjvm.so+0x8447f2) #5 <null> <null> (0x7f8a93af98b8)
18-10-2019