The descriptions of index entries in the JDK API Specification are not deterministic. Their differences reveal an underlying bug unrelated to the goal of reproducible builds.
There are 55,782 index entries in the JDK 20 API Specification, but 56 of them get different descriptions depending on whether I run the build on my local workstation, on a remote Launchpad build machine, or in a QEMU/KVM virtual machine. Of those 56 index entries, 40 are static variables of classes in the package 'java.util.jar'.
For example, each build defines a single entry for LOCCRC in the L-Index file and in the member search index, but the descriptions for the entry differ as shown below.
Local Workstation
LOCCRC - Static variable in class java.util.jar.JarEntry
Search: java.util.zip.JarEntry.LOCCRC -> File not found
Remote Launchpad
LOCCRC - Static variable in class java.util.jar.JarOutputStream
Search: java.util.zip.JarOutputStream.LOCCRC -> File not found
Virtual Machine
LOCCRC - Static variable in class java.util.jar.JarInputStream
Search: java.util.zip.JarInputStream.LOCCRC -> File not found
When I list the source files of the package in directory order (unsorted) on my local workstation, JarEntry is the first class found that inherits the LOCCRC variable:
$ ls -1U ~/opt/jdk-20/src/java.base/java/util/jar/
JarEntry.java
package-info.java
JarOutputStream.java
Attributes.java
JarInputStream.java
JarException.java
JarFile.java
JarVerifier.java
JavaUtilJarAccessImpl.java
Manifest.java
On the virtual machine, JarInputStream is the first such class found:
$ ls -1U ~/opt/jdk-20/src/java.base/java/util/jar/
JarVerifier.java
JarInputStream.java
JavaUtilJarAccessImpl.java
Manifest.java
JarOutputStream.java
Attributes.java
JarFile.java
JarException.java
package-info.java
JarEntry.java
At first, this issue appeared to be the usual file-ordering problem of reproducible builds. Yet the LOCCRC variable is inherited by four of the classes in the 'java.util.jar' package: JarEntry, JarFile, JarInputStream, and JarOutputStream. The variable is also inherited by four classes in the 'java.util.zip' package: ZipEntry, ZipFile, ZipInputStream, ZipOutputStream.
The underlying issue is that the LOCCRC index entry, and others like it, should be listed once for each inheriting class. For LOCCRC, that would require eight entries. The problem seems to occur when documenting members inherited from classes with package access. Such members are to be documented as though they were declared in the inheriting class. See JDK-4780441 for details.
SYSTEM / OS / JAVA RUNTIME INFORMATION
System information for my local workstation running Ubuntu 20.04.4 LTS is listed below:
$ uname -a
Linux tower 5.15.0-46-generic #49~20.04.1-Ubuntu SMP
Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ ldd --version
ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31
$ getconf GNU_LIBPTHREAD_VERSION
NPTL 2.31
$ $HOME/opt/jdk-20/bin/java --version
openjdk 20-ea 2023-03-21
OpenJDK Runtime Environment (build 20-ea+11-661)
OpenJDK 64-Bit Server VM (build 20-ea+11-661, mixed mode, sharing)
STEPS TO REPRODUCE
I was able to reproduce the problem by building on three different systems with the following processors:
* Local Workstation: 4-core Intel Xeon CPU E3-1225 v5
* Remote Launchpad: 4-core AMD EPYC-Rome Processor
* Virtual Machine: Single-core Intel Core Processor (Skylake, IBRS)
I can't be certain that the processor played a role in the ordering of files in their directories, but it may have affected the timing of the process that created them.
EXPECTED RESULTS
The builds of the JDK are identical.
ACTUAL RESULT
The builds are different, but they differ only in their Javadoc API index files and the corresponding 'member-search-index.js' file. There are 56 entries in the index that differ:
$ git diff --numstat --shortstat local remote
11 11 {local => remote}/index-12.txt
1 1 {local => remote}/index-13.txt
5 5 {local => remote}/index-18.txt
1 1 {local => remote}/index-19.txt
2 2 {local => remote}/index-20.txt
1 1 {local => remote}/index-21.txt
22 22 {local => remote}/index-3.txt
12 12 {local => remote}/index-5.txt
1 1 {local => remote}/index-8.txt
9 files changed, 56 insertions(+), 56 deletions(-)
The differences occur for index entries that are identified on my local workstation as:
* Methods in class java.awt.BufferCapabilities.FlipContents
* Methods in class java.time.chrono.HijrahDate
* Methods in class jdk.incubator.vector.ByteVector
* Static variables in class java.util.jar.JarEntry
They are:
java.awt.BufferCapabilities.FlipContents.hashCode()
java.awt.BufferCapabilities.FlipContents.toString()
java.time.chrono.HijrahDate.toString()
java.time.chrono.HijrahDate.until(Temporal, TemporalUnit)
java.util.jar.JarEntry.CENATT
java.util.jar.JarEntry.CENATX
java.util.jar.JarEntry.CENCOM
java.util.jar.JarEntry.CENCRC
java.util.jar.JarEntry.CENDSK
java.util.jar.JarEntry.CENEXT
java.util.jar.JarEntry.CENFLG
java.util.jar.JarEntry.CENHDR
java.util.jar.JarEntry.CENHOW
java.util.jar.JarEntry.CENLEN
java.util.jar.JarEntry.CENNAM
java.util.jar.JarEntry.CENOFF
java.util.jar.JarEntry.CENSIG
java.util.jar.JarEntry.CENSIZ
java.util.jar.JarEntry.CENTIM
java.util.jar.JarEntry.CENVEM
java.util.jar.JarEntry.CENVER
java.util.jar.JarEntry.ENDCOM
java.util.jar.JarEntry.ENDHDR
java.util.jar.JarEntry.ENDOFF
java.util.jar.JarEntry.ENDSIG
java.util.jar.JarEntry.ENDSIZ
java.util.jar.JarEntry.ENDSUB
java.util.jar.JarEntry.ENDTOT
java.util.jar.JarEntry.EXTCRC
java.util.jar.JarEntry.EXTHDR
java.util.jar.JarEntry.EXTLEN
java.util.jar.JarEntry.EXTSIG
java.util.jar.JarEntry.EXTSIZ
java.util.jar.JarEntry.LOCCRC
java.util.jar.JarEntry.LOCEXT
java.util.jar.JarEntry.LOCFLG
java.util.jar.JarEntry.LOCHDR
java.util.jar.JarEntry.LOCHOW
java.util.jar.JarEntry.LOCLEN
java.util.jar.JarEntry.LOCNAM
java.util.jar.JarEntry.LOCSIG
java.util.jar.JarEntry.LOCSIZ
java.util.jar.JarEntry.LOCTIM
java.util.jar.JarEntry.LOCVER
jdk.incubator.vector.ByteVector.castShape(VectorSpecies<F>, int)
jdk.incubator.vector.ByteVector.check(Class<F>)
jdk.incubator.vector.ByteVector.check(VectorSpecies<F>)
jdk.incubator.vector.ByteVector.convertShape(
VectorOperators.Conversion<Byte, F>, VectorSpecies<F>, int)
jdk.incubator.vector.ByteVector.convert(
VectorOperators.Conversion<Byte, F>, int)
jdk.incubator.vector.ByteVector.maskAll(boolean)
jdk.incubator.vector.ByteVector.reinterpretAsDoubles()
jdk.incubator.vector.ByteVector.reinterpretAsFloats()
jdk.incubator.vector.ByteVector.reinterpretAsInts()
jdk.incubator.vector.ByteVector.reinterpretAsLongs()
jdk.incubator.vector.ByteVector.reinterpretAsShorts()
jdk.incubator.vector.ByteVector.species()
I attached the following two files that show all of the differences:
* index-local-vs-remote.diff - Compares 'api/index-files/*.html'
* search-local-vs-remote.diff - Compares 'api/member-search-index.js'
I made the comparisons easier by converting the HTML files to plain text with 'w3m' and by expanding the JavaScript file using the 'js-beautify' tool, as in the examples below:
$ w3m -dump -cols 10000 -T text/html index-12.html > index-12.txt
$ js-beautify --end-with-newline -o search-local.js \
member-search-index.js
SOURCE CODE FOR AN EXECUTABLE TEST CASE
I used the following shell script to narrow the scope of packages while testing:
#!/bin/bash
# Runs Javadoc for testing
# The JDK home directory and its extracted source files
jdk_dir="$HOME/opt/jdk-20"
jdk_src="$jdk_dir/src"
"$jdk_dir/bin/java" --patch-module jdk.javadoc=target/classes \
jdk.javadoc.internal.tool.Main \
--source-path "$jdk_src/java.base" \
-d tmp/doc -notimestamp -Xdoclint:none \
java.util.jar "$@"
WORKAROUND
I don't have a workaround, but I do have a fix. My pull request will follow shortly.