JDK-8044555 : The Tree API gives no access to non-Javadoc comments
  • Type: Enhancement
  • Component: tools
  • Sub-Component: javac
  • Priority: P4
  • Status: New
  • Resolution: Unresolved
  • Submitted: 2014-06-02
  • Updated: 2014-06-02
Related Reports
Relates :  
Description
The Tree API gives access to Javadoc comments but not to any other comments such as line and multi-line comments.
There are at least 2 reasons for access to non-Javadoc comments:
- the Tree API should give a complete view on the Java source code, see bug JDK-8024098.
- some annotations such as @todo appear in non-Javadoc comments and Java Models want to make such annotations easily accessible.
Comments
Yes, although javadoc comments are not part of the syntax tree (and not part of the tree nodes in com.sun.source.tree.*), they are significant enough to warrant representation in the "utility" classes in com.sun.source.util.*. We also provide methods to relate tree nodes to their position in the source text, and we provide methods that relate tree nodes to constructs in the Java Language Model API, as represented by javax.lang.model.*.
02-06-2014

My point in the second paragraph was that you say the Tree API is a syntax tree, not a lexical tree, while at the same time the Tree API provides access to Javadoc comments, which are apparently not part of the syntactical grammar but part of the lexical grammar.
02-06-2014

I accept that some applications, like IDEs, may want to build a full and exact model of the Java source code. That is not what the Tree API, and the underlying javac AST, aims to provide. The javac AST provides no way of keeping info about white space (e.g. spaces vs. tabs), Unicode character encoding (e.g. space or \u0020), non-javadoc comments and even misplaced/irrelevant javadoc comments (e.g. within a method body.) It currently has no way of retaining that low level lexical information, nor would most users want javac to provide all that information. So, I'm sorry, but javac can't easily do what you're asking. As to your second paragraph, I'm not sure exactly the point you are making. If you are saying that javadoc comments are not defined in JLS, then yes, that is true. They have never been formally specified in JLS. They are mostly a convention adopted by the javadoc tool. That being said, their existence is admitted by the Java Language Model API in Elements.getDocComment, see http://docs.oracle.com/javase/8/docs/api/javax/lang/model/util/Elements.html#getDocComment-javax.lang.model.element.Element- although that only admits to a comment and its general form, but says nothing about any interpretation of the text of the comment.
02-06-2014

This is a good example of the difference between efforts to build a full fledged Java Model on top of the Tree API and the efforts to supply enough data in that Tree API. It's easy to say you need to read the source file to augment the Tree API, but it puts a strain on the Java Model, e.g. a full merge of all non-Javadoc comments and the Tree API doesn't perform very well, something that Javac could easily do. Also, I can't find Javadoc comments in the description of the syntactic grammar of Java code. I agree that all comments basically don't exist for the Javac compiler, but at the same time all comments are important to any Java Model that gives a complete view on the source code.
02-06-2014

Note that it *is* a goal for the Tree API to provide positions that are accurate enough for you to be able to read the source text in the vicinity of the tree nodes.
02-06-2014

This is a non-goal for the Tree API. The "Tree" is a syntax tree, not a lexical tree. At some point, to see the raw lexical items in the source file, you have to get down n dirty and read the source file. Note that JDK-8024098 specifically excludes going down to the whitespace and comment level. Also, @todo in a C-style or C++-style comment is neither an annotation nor a javadoc tag. It is simply an arbitrary set of characters.
02-06-2014