A DESCRIPTION OF THE REQUEST :
The ICU Unicode utilities include a CharsetDetector class that is able to scan a byte stream and detect the character encoding of character data in an unknown format. It produces a CharsetMatch that contains the name of the detected charset and indicates the level of confidence of the match. It can also guess the language of the text. This would be a very useful addition to the core Java libraries.
JUSTIFICATION :
Charset detection is useful when reading in text documents where the character set is unknown. It would allow the documents to be decoded correctly with a reasonable level of confidence. The language detection allows the text to be processed correctly in a language-sensitive manner.
CUSTOMER SUBMITTED WORKAROUND :
Bundle the ICU libraries with the application.