Context
======
This is the result of thinking about the observed increase in the size (character count) of the JDK API docs, as a result of introducing the level of module directory (JDK-8195795). Although motivated by that work, this is an idea for a conceptually independent small project to generally reduce the size of docs by improving the use of relative links within and between pages.
The general problem is that many/most links are generated in the form <path-from-origin-to-root>/<path-from-root-to-target>. For example, in java/lang/Object.html in the current JDK docs, there are 94 hrefs beginning '..', of which 76 are to other java.* classes, 74 are to other java.lang classes, and 37 are to the same file, java/lang/Object.html! Here's an extract from the list, so you get the picture:
../../stylesheet.css
../../jquery/jquery-ui.css
../../overview-summary.html
../../java.base-summary.html
../../deprecated-list.html
../../index-files/index-1.html
../../help-doc.html
../../java/lang/NumberFormatException.html
../../java/lang/OutOfMemoryError.html
../../index.html?java/lang/Object.html
../../allclasses-noframe.html
../../java.base-summary.html
../../java/lang/package-summary.html
../../java/lang/Class.html
../../java/lang/Object.html#%3Cinit%3E()
../../java/lang/Object.html
../../java/lang/Object.html#clone()
../../java/lang/Object.html
../../java/lang/Object.html#finalize()
../../java/lang/Class.html
../../java/lang/Object.html#getClass()
../../java/lang/Object.html#hashCode()
../../java/lang/Object.html#notify()
../../java/lang/Object.html#notifyAll()
../../java/lang/String.html
../../java/lang/Object.html#toString()
../../java/lang/Object.html#wait()
../../java/lang/Object.html#wait(long)
../../java/lang/Object.html#wait(long,int)
../../java/lang/Class.html
../../java/util/HashMap.html
../../java/lang/Object.html#equals(java.lang.Object)
So, yes, we could optimize those references, a lot, by removing unnecessary ../<name> pairs.
Classes
======
Now, a brief taxonomy of the classes that go to make up links. This list is in essentially top-down order.
* LinkFactoryImpl
Very high level class, responsible for making full links between elements ... the full <a href="...">description</a>
* Links
Low level factory for creating HtmlTree nodes for <a href="...">description</a>, given the link and content
* DocLink
A (String) path to another file, together with optional query and fragment components ... all the bits to go in an href attribute
* DocPath
A relative path to another file
* HtmlTree
The basic element of the generated docs
Proposal
======
The proposal to optimize links comes in parts, for DocPath, DocLink and Links. By doing the work at these levels, we should catch all links generated by javadoc itself (i.e. ignoring, for now, explicit links in user-provided doc comments.)
1. Enhance DocPath
DocPath is currently mostly just a type-safe wrapper around a string representing a relative path. It was added back in 2012, JDK-8000741, for those that are interested. It does have a "resolve" method, but that's about it.
>> We should add normalize and relativize methods, similar to those found on java.nio.file.Path and java.net.URI. There's nothing rocket-sciency about adding such methods.
2. Enhance DocLink
DocLink is a type-safe tuple of (path, query, fragment), added at the same time as DocPath. The path component of a DocLink is currently a String, but all instances are created directly or indirectly using a DocPath. Previously, the DocPath was unwrapped and discarded because there was no reason to keep it. Now, there is.
>> The path component of a DocLink should be a DocPath, so that we have access to the resolve/normalize/relativize methods on that DocPath.
3. Enhance Links
The Links class was added recently, as part of the Html[Doc[let]]Writer cleanup. It provides factory methods for HtmlTree objects representing links. Currently, most methods are static methods, although a few are instance methods, when the generated HtmlTree object depends on the HtmlVersion (this is to choose between '<a name=...>' (HTML 4) and '<a id=...>' (HTML5). Currently there is a single global instance of this class, available from the HtmlConfiguration object.
>> We should make the Links class be one-per-HtmlDocletWriter, such that it contains the DocPath for the page being generated. All static methods to generate links should be converted to instance methods, so that the DocPath in any DocLink can be normalized and made relative to the DocPath for the page being generated, which is the goal.
Testing
======
This has the potential to impact a lot of our golden-file type tests. That will imply a lot of fixup work :-( but it also implies a lot of positive test cases :-)
Future Work
=========
We have already noted problems where user-written links in doc comments are made invalid when the text is copied to elsewhere in the generated documentation. One interesting possibility (not yet investigated) would be to detect the use of '<a href=....>' in doc comments, and to internalize that into a form such as a hidden taglet, that can leverage the preceding mechanisms, so that the link is automatically relativized appropriately for the context in which it is placed. That means this proposal becomes the basis for bug fixes as well as enhancements.
The same general comments might also apply to the method HtmlDocletWriter.redirectRelativeLInks: it too might be better rewritten to use the mechanisms proposed here.