JDK-6589705 : JDK should provide support for charset detection.
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.nio.charsets
  • Affected Version: 6
  • Priority: P5
  • Status: Resolved
  • Resolution: Won't Fix
  • OS: windows_vista
  • CPU: x86
  • Submitted: 2007-08-06
  • Updated: 2017-08-18
  • Resolved: 2017-08-18
Related Reports
Relates :  
Description
A DESCRIPTION OF THE REQUEST :
The ICU Unicode utilities include a CharsetDetector class that is able to scan a byte stream and detect the character encoding of character data in an unknown format. It produces a CharsetMatch that contains the name of the detected charset and indicates the level of confidence of the match. It can also guess the language of the text. This would be a very useful addition to the core Java libraries.

JUSTIFICATION :
Charset detection is useful when reading in text documents where the character set is unknown. It would allow the documents to be decoded correctly with a reasonable level of confidence. The language detection allows the text to be processed correctly in a language-sensitive manner.


CUSTOMER SUBMITTED WORKAROUND :
Bundle the ICU libraries with the application.

Comments
EVALUATION A "useful but difficult to implement correctly" feature. Might worth considering to add into the core lib as a utility class should request for the same feature keep coming in.
18-12-2007