JDK-4518039 : "~" in the html file does not display on the URL scraper.
  • Type: Bug
  • Component: core-libs
  • Sub-Component: java.util:i18n
  • Affected Version: 3.0,6
  • Priority: P1
  • Status: Resolved
  • Resolution: Fixed
  • OS: solaris_8
  • CPU: generic,sparc
  • Submitted: 2001-10-23
  • Updated: 2003-02-12
  • Resolved: 2003-01-31
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other Other
1.2.2_015 015Fixed 1.2.2_15Fixed 1.3.1_08Fixed
Related Reports
Relates :  
Relates :  
Relates :  
Description
NCJQA-00007843

If html file have an "~" character, iPS3.0 SP3 can not display the character correctly.
The contents is below, and the contents should be displayed as follows.
http://boss.mcom.com:8001/hisho.htm

However it is displayed at the URL Scraper like below.
http://spiderman.mcom.com/~hirayama/fujitsu/7843/imageB0M.JPG

Many sites uses this character and this is critical problem.
Please provide a fix for the problem and let me know whether or not iPS has another
undisplayed characters for all of Japanese characters or special characters.
 
[Reproduce step]
1. access to console and login with super user.
2. Specify the URL(http://boss.mcom.com:8001/hisho.htm) at the desktop channel wizard.
3. access and login to gateway and select the URL scraper in the contents tab.



###@###.### 2001-11-21

Due to AOL's redeployment, boss.mcom.com can not be accessed any more, I setup
test environment at: jun.red.iplanet.com and created two URL Scraper -- 
"shift_jis URL Scraper" and "euc-jp URL Scraper". the reproducible steps are:
1. goto URL:  http://jun.red.iplanet.com:8080
2. use Unix authentication, login in as yourself
3. select contents
4. select the two URL Scraper, click ok
5. You will see only one of the 2 URL scraper channel can be displayed correctly
6, if you go to console to switch your Defualt HTML Charator set to shift_jis
   from euc-jp, you will see another channel can be displayed correctly, but
   the one "euc-jp URL Scraper" can not be displayed correctly

In the customer's environment, it will be very common for end-user to URL Scraper different websites into their desktop page. as the customer have no
control about what's kinds of char set those websites will use, the customer
strongly request portal server can display multiple channel which have different
char set (like: shift_jis and euc-jp).

###@###.### 2001-11-21

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.2.2_015 1.2.2_15 1.3.1_08 FIXED IN: 1.2.2_015 1.2.2_15 1.3.1_08 INTEGRATED IN: 1.2.2_015 1.2.2_15 1.3.1_08
14-06-2004

PUBLIC COMMENTS The iPS software is working correctly. The problem in this case is that the EUC-JP character set has characters that cannot be represented using SJIS. Specifically, the character in question is called WAVE DASH. In Unicode, this character has the value \u301C. In EUC-JP, this charcter is represented as the two byte sequence 0xa1 0xc1. However, this character has no representation in the SJIS character set. So when the user selects SJIS as the encoding type, the character is displayed as a question mark. There are several other characters that are similar to WAVE DASH: - ASCII tilde (~), Unicode \u007e, SJIS 0x7e, EUC-JP 0x7e - WAVY DASH, Unicode \u3030, no SJIS representation - FULL WIDTH TILDE, Unicode \uff5e, SJIS 0x81 0x60, EUC-JP 0x8f 0xa2 0xb7 One of the tilde characters can be used instead of WAVE DASH to resolve this problem.
10-06-2004

EVALUATION The iPS software is working correctly. The problem in this case is that the EUC-JP character set has characters that cannot be represented using SJIS. Specifically, the character in question is called WAVE DASH. In Unicode, this character has the value \u301C. In EUC-JP, this charcter is represented as the two byte sequence 0xa1 0xc1. However, this character has no representation in the SJIS character set. So when the user selects SJIS as the encoding type, the character is displayed as a question mark. There are several other characters that are similar to WAVE DASH: - ASCII tilde (~), Unicode \u007e, SJIS 0x7e, EUC-JP 0x7e - WAVY DASH, Unicode \u3030, no SJIS representation - FULL WIDTH TILDE, Unicode \uff5e, SJIS 0x81 0x60, EUC-JP 0x8f 0xa2 0xb7 One of the tilde characters can be used instead of WAVE DASH to resolve this problem. ###@###.### 2002-02-05 Re-open the bug: 1. Wave Dash Charactor exsits in both Shift_jis and EUC-JP. 2. It is a known issue in Japan that each vendor provid different conversion table between Shift_jis <-> Unicode, and EUC-JP <-> Unicode. This leads to same charactor mapping to different code in Unicode. In Windows platform (shift_jis), Wave Dash is mapped to \uff5e In UNIX platform (EUC-JP) WAVE DASH is mapped to \u301c This is why the problem happen. 3. Lots of venders who sell software products in Japan, do an intelligent job to prevent this problem. (maintain a special mapping table) 4. from end-user point of view, they do not care how those charactors are encoded. They just want to see the Wave Dash symbol 5. If end-user use pure browser to view the URL scraper channel, the browser can display the channel correctly, but go through our portal, Wave Dash is shown up like '?' mark. SO, it is bug, we need to resolve it. The fundamental problem here is that Microsoft uses their own mappings between JIS X 0208 and Unicode for their products rather than following the national standard. JIS X 0208:1997 clearly defines that the code point 0x2141 (GL) (which is 0x8060 (SJIS)) is WAVE DASH. Note that JIS X 0208:1997 does define SJIS. In Unicode, U+301C is WAVE DASH and U+FF5E is FULLWIDTH TILDE. In J2SE, the JIS X 0208 and SJIS converters strictly follow the JIS X 0208:1997 standard. On the other hand, the MS932 converters support bug-for-bug compatible Microsoft Codepage 932 conversions. If you mix both the conversions, you have the well-known problem described above. See also 4426415. ###@###.### 2002-12-03
03-12-2002