JDK-8232642 : Investigate tsan warnings with jsr166 tests
Type:Task
Component:core-libs
Sub-Component:java.util.concurrent
Priority:P3
Status:Open
Resolution:Unresolved
Submitted:2019-10-18
Updated:2019-10-22
Description
The java tsan project has advanced enough that it can find improvements to java.util.concurrent, and java.util.concurrent is weird enough that it finds problems in tsan.
Comments
Here's a breadcrumb-encrusted script I've been using to investigate this.
Doug, would you like to try it out?
```
#!/bin/bash
readonly JSR166_WORKSPACE=$1 TSAN_JDK_HOME=$2
[[ "$#" == 2 \
&& -d "$JSR166_WORKSPACE/src/main/java/util/concurrent" \
&& -x "$TSAN_JDK_HOME/bin/java" ]] \
|| {
PROG="${0##*/}"
cat <<EOF
$PROG JSR166-CVS-WORKSPACE TSAN-JDK-HOME
Runs jsr166 tck tests reporting java tsan warnings.
Preparation:
1. Check out a jsr166 CVS workspace
http://gee.cs.oswego.edu/dl/concurrency-interest/index.html
2. Build a tsan-enabled jdk as described here:
https://wiki.openjdk.java.net/display/tsan/Main
3. Disable troublesome test TimeUnitTest.testTimedWait_IllegalMonitorException
Else a FATAL error masks later warnings.
4. Spray volatile over ForkJoinPool.WorkQueue fields.
EOF
exit 1
}
set -eu -o pipefail
readonly TMPDIR="/tmp/${0##*/}"
readonly IGNORE_FILE="$TMPDIR/IGNORE_FILE"
readonly SUPPRESSIONS_FILE="$TMPDIR/SUPPRESSIONS_FILE"
readonly LOG_PREFIX="$TMPDIR/log"
readonly LOG="$TMPDIR/log"
readonly -a TSAN_OPTIONS=(
log_path="$LOG_PREFIX"
exitcode=0
suppressions="$SUPPRESSIONS_FILE"
)
readonly -a TSAN_VMOPTIONS=(
-XX:+ThreadSanitizer
-XX:ThreadSanitizerIgnoreFile="$IGNORE_FILE"
)
rm -rf "$TMPDIR"
mkdir -p "$TMPDIR"
#----------------------------------------------------------------
cat <<'EOF' > "$IGNORE_FILE"
# Intentional races on Thread fields relying on OOTA guarantees
java.util.concurrent.locks.AbstractOwnableSynchronizer exclusiveOwnerThread
java.util.concurrent.locks.AbstractQueuedSynchronizer$Node waiter
java.util.concurrent.locks.AbstractQueuedLongSynchronizer$Node waiter
java.util.concurrent.locks.StampedLock$Node waiter
java.util.concurrent.locks.ReentrantReadWriteLock$Sync cachedHoldCounter
java.util.concurrent.locks.ReentrantReadWriteLock$Sync firstReader
# Intentional races nulling out fields in Completion implementations
java.util.concurrent.CompletableFuture$UniCompletion executor
java.util.concurrent.CompletableFuture$UniCompletion src
java.util.concurrent.CompletableFuture$UniCompletion dep
java.util.concurrent.CompletableFuture$BiCompletion snd
java.util.concurrent.CompletableFuture$OrRun fn
java.util.concurrent.CompletableFuture$OrAccept fn
java.util.concurrent.CompletableFuture$OrApply fn
java.util.concurrent.CompletableFuture$BiRun fn
java.util.concurrent.CompletableFuture$BiAccept fn
java.util.concurrent.CompletableFuture$BiApply fn
java.util.concurrent.CompletableFuture$UniHandle fn
# Intentional races lazily initializing stateless view objects
java.util.concurrent.locks.StampedLock readLockView
java.util.concurrent.locks.StampedLock writeLockView
java.util.concurrent.locks.StampedLock readWriteLockView
# TODO: Investigate cancel/run race
java.util.concurrent.FutureTask callable
# TODO: Investigate rare failure (spliterator related)
java.util.SplittableRandom seed
# TODO: Investigate
java.util.concurrent.SubmissionPublisher$BufferedSubscription head
java.util.concurrent.SubmissionPublisher$BufferedSubscription tail
java.util.concurrent.SubmissionPublisher$BufferedSubscription waiter
java.util.concurrent.SubmissionPublisher$BufferedSubscription executor
# TODO: Investigate
java.util.concurrent.ConcurrentSkipListMap$Index right
java.util.concurrent.ConcurrentSkipListMap$Node val
java.util.concurrent.ConcurrentSkipListMap$Node next
# TODO: Investigate
java.util.stream.ForEachOps$ForEachTask spliterator
java.util.stream.ForEachOps$ForEachTask targetSize
# Work stealing queue mechanics
# jtsan seems to have trouble understanding fork-join (who doesn't?)
java.util.concurrent.ForkJoinPool$WorkQueue base
java.util.concurrent.ForkJoinPool$WorkQueue top
java.util.concurrent.ForkJoinPool$WorkQueue stackPred
java.util.concurrent.ForkJoinPool$WorkQueue nsteals
java.util.concurrent.ForkJoinPool$WorkQueue array
# TODO: Fix test infrastructure buglets
ForkJoinTask8Test$AsyncFib number
JSR166TestCase$LatchAwaiter state
EOF
#----------------------------------------------------------------
# Suppressions syntax (yes, horrifying!) documented here:
# https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions
# Use '*' to work around inability to escape '$'
# Is the semantics of race_top documented somewhere?
cat <<'EOF' > "$SUPPRESSIONS_FILE"
# tsan doesn't know about mutex in hotspot native code
mutex:^ZIP_Get_From_Cache
mutex:^ZIP_GetEntry2
# Concurrent access to ForkJoinPool.workQueues array
race_top:^java.util.concurrent.ForkJoinPool.registerWorker(
race_top:^java.util.concurrent.ForkJoinPool.deregisterWorker(
# Concurrent access to WorkQueue.array
race_top:^java.util.concurrent.ForkJoinPool.scan(
race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.lockedPush(
# TODO: investigate WorkQueue mechanics
# race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.tryRemoveAndExec(
# race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.push(
# race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.poll(
# race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.tryUnpush(
# race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.nextLocalTask(
# race_top:^java.util.concurrent.ForkJoinPool*WorkQueue.<init>(
# intentional monitor misuse in testTimedWait_IllegalMonitorException
# causes FATAL tsan error
# race:TimeUnitTest*1.realRun(
# TODO: investigate - possibly related to Spliterators
race_top:^java.util.concurrent.LinkedBlockingDeque.unlinkFirst(
race_top:^java.util.concurrent.LinkedBlockingDeque.unlinkLast(
race_top:^java.util.concurrent.LinkedBlockingDeque.linkLast(
race_top:^java.util.concurrent.LinkedBlockingDeque.linkFirst(
race_top:^jjava.util.concurrent.LinkedBlockingDeque*LBDSpliterator.trySplit(
race_top:^java.util.concurrent.LinkedBlockingQueue.enqueue(
race_top:^java.util.concurrent.LinkedBlockingQueue*LBQSpliterator.trySplit(
# TODO: racy accesses to Cell[] elements
race_top:^java.util.concurrent.atomic.Striped64.longAccumulate(
race_top:^java.util.concurrent.atomic.Striped64.doubleAccumulate(
race_top:^java.util.concurrent.atomic.LongAdder.add(
race_top:^java.util.concurrent.atomic.LongAdder.sum(
race_top:^java.util.concurrent.atomic.DoubleAdder.sum(
# TODO: Vector.equals acquires both locks in deadlock-prone order
deadlock:java.util.Vector.equals(
EOF
main() {
cd "$JSR166_WORKSPACE"
env TSAN_OPTIONS="${TSAN_OPTIONS[*]}" \
ant -v -Djava.runtime.target=13 -Djdk13.home="$TSAN_JDK_HOME" \
-Dvmoptions="${TSAN_VMOPTIONS[*]}" \
tck >& "$LOG"
ls -1l "$LOG"*
exit 0
}
main "$@"
```
22-10-2019
tsan wishlist:
- llvm tsan doesn't support fences, but in jsr166 we are increasingly using fences
- we use racy accesses to array elements in fork-join and Striped64, and there's no way to suppress these with field suppressions. We could use VarHandles for such array access.
- tsan race reports threads with strings like "thread T31" but we would like the kind of information Java gives us, notably the Java-level thread name and lock status.
- the tsan suppression mini-language is an embarrassment and should be made more powerful (but I can understand why the llvm project might not want to make suppressions easier)
- we should have more powerful annotations indicating programmer intent for races.
- running under tsan should be completely transparent except for producing a file by side effect, but that seems impossible. Again, because of the mindset that C++ races are undefined behavior and so abnormal termination is considered a feature.
- it's time for the tsan project to be updated to jdk head
22-10-2019
> java tsan doesn't seem to have a bug component. Is that a bug?
No, tsan is only a project. If tsan is using a special project repository then they could request for a repo-tsan release value and use that for filing bugs.
20-10-2019
java tsan doesn't seem to have a bug component. Is that a bug?
---
java tsan seems to have trouble understanding fork-join, but not sure why.
As a result many false positives are reported from tasks running in a fork join pool.
Probably we need to teach tsan about fences, at the very least.
This is a showstopper for adoption in jdk14+.
---
This test is big trouble, causing the test process to abort:
public void testTimedWait_IllegalMonitorException() {
Thread t = newStartedThread(new CheckedRunnable() {
public void realRun() throws InterruptedException {
Object o = new Object();
try {
MILLISECONDS.timedWait(o, LONGER_DELAY_MS);
threadShouldThrow();
} catch (IllegalMonitorStateException success) {}
}});
awaitTermination(t);
}
ThreadSanitizer:DEADLYSIGNAL
or
FATAL: ThreadSanitizer CHECK failed: /build/llvm-toolchain-snapshot-Js2WL8/llvm-toolchain-snapshot-8~svn342269/projects/compiler-rt/lib/tsan/rtl/tsan_interface_java.cc:224 "((rec)) > ((0))" (0x0, 0x0)
#0 __tsan::TsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ??:? (java+0x49379d)
#1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) ??:? (java+0x4b0bc5)
#2 __tsan_java_mutex_lock_rec ??:? (java+0x479d0c)
#3 ObjectSynchronizer::wait(Handle, long, Thread*) :? (libjvm.so+0xbf1412)
#4 JVM_MonitorWait ??:? (libjvm.so+0x8447f2)
#5 <null> <null> (0x7f8a93af98b8)