JDK-8214423 : URI.getQuery(): decoding all percent-encoded octets can prevent the returned string from being correctly interpreted
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.net
  • Affected Version: 8,11,12
  • Priority: P4
  • Status: Open
  • Resolution: Unresolved
  • OS: windows_8
  • CPU: x86_64
  • Submitted: 2018-11-24
  • Updated: 2018-11-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
tbdUnresolved
Description
A DESCRIPTION OF THE PROBLEM :
URI.getQuery() decodes the query string as documented, but query strings should never be decoded in that way because it loses the distinction between ampersands separating arguments and ampersands that are in argument values.

This means part of the javadoc is either incorrect or very misleading, where it says:
     new URI(u.getScheme(),
             u.getUserInfo(), u.getAuthority(),
             u.getPath(), u.getQuery(),
             u.getFragment())
     .equals(u)
It says this only holds true if the URI "does not encode characters except those that must be quoted". You could say that when a query string argument value contains an ampersand character it must be encoded, but only because of the conventional syntax of query strings, which is not part of the URI specification.

For example:
URI u=new URI("http://localhost/?x=Q%26A&y=2");
URI u2=new URI(u.getScheme(),u.getUserInfo(),u.getHost(),u.getPort(),u.getPath(),u.getQuery(),u.getFragment());

The value of u2 is "http://localhost/?x=Q&A&y=2", which is not equal to u, and represents a completely different set of query string arguments.

The arguments in http://localhost/?x=Q%26A&y=2 are:
x=Q&A, y=2
The arguments in http://localhost/?x=Q&A&y=2 are:
x=Q, A, y=2

My suggestion is to completely deprecate the URI.getQuery() method in favour of the getRawQuery() method.

At the very least the javadoc of the getQuery() method should explain that is should never be used on query strings that follow the standard convention of separating arguments with ampersands.


---------- BEGIN SOURCE ----------
URI u=new URI("http://localhost/?x=Q%26A&y=2");
URI u2=new URI(u.getScheme(),u.getUserInfo(),u.getHost(),u.getPort(),u.getPath(),u.getQuery(),u.getFragment());
System.out.println(u.equals(u2)); // returns false, but any reasonable person reading the javadoc would expect true.
---------- END SOURCE ----------

FREQUENCY : always



Comments
URI::getQuery, URI::getSchemeSpecificPart, and URI::getAuthority all decode any %encoded octets - and therefore should not be used in case where further decomposition into sub-components is necessary, since the decoding may alter the structure of the string. Whenever decomposition in sub component is necessary, the raw form of the component should be used by the application, and split into its sub-components prior to decoding. This obviously has the disadvantage of putting the burden of decoding on the application itself, but there is not much that can be done in the generic URI class - since it is by definition only aware of the generic syntax. An @apiNote to clarify the issue might be envisaged though.
28-11-2018

To reproduce the issue, run the attached test case. JDK 8u191 - Fail JDK 11.0.1 - Fail JDK 12-ea+ 21 - Fail Output: false
28-11-2018