JDK-5049382 : compiler failed with "invalid" bytes in comments lines on UTF-8 environment
  • Type: Bug
  • Component: tools
  • Sub-Component: javac
  • Affected Version: 5.0
  • Priority: P3
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 2004-05-19
  • Updated: 2004-05-19
  • Resolved: 2004-05-19
Related Reports
Duplicate :  
Relates :  
Relates :  
Description
java version : 1.5.0-beta2 b51
Platform : Solaris Sparc 9
Locale : ja_JP.UTF-8, zh_CN.UTF-8, ... (any UTF-8 configs)

If an "invalid" bytes are added as comments lines in a java source code as attached(HelloWorld.java), the compiler fails on UTF-8 locales. Compiler should have handled this by using the old way (1.4: silently replace them with a unicode replacement character).

This issue is caused by a CCC4767128 putback. I agree the compilation fails if the "invalid" bytes are not java comments. Putting native characters in comments lines are an expected and common bahavior for a non-English speaker end-users.

On the other hand, even though the HelloWorld.java includes "invalid" bytes, it doesn't have any compilation issues if the locale is not setup to a UTF-8 locale. This seems doesn't match the CCC strictly. The compilation should also fail as designed. I tried zh_CN.GBK, ja_JP.eucJP, zh_CN.eucCN. BTW, the "invalid" bytes are under eucJP encoding.

Produce steps:
1. get HelloWorld.java from bugtraq (attached)
2. set locale to any UTF-8 locale 
   setenv LC_ALL ja_JP.UTF-8
3. compile it with b49 or after of java beta2

###@###.### 2004-05-18