United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-4614120 UTF-8 vmspec not verified by java -Xfuture
JDK-4614120 : UTF-8 vmspec not verified by java -Xfuture

Details
Type:
Bug
Submit Date:
2001-12-14
Status:
Closed
Updated Date:
2012-10-08
Project Name:
JDK
Resolved Date:
2002-10-26
Component:
hotspot
OS:
generic
Sub-Component:
runtime
CPU:
generic
Priority:
P4
Resolution:
Fixed
Affected Versions:
1.4.0,1.4.2
Fixed Versions:
1.4.2 (mantis)

Related Reports
Duplicate:

Sub Tasks

Description
Name: gm110360			Date: 12/14/2001


java version "1.4.0-beta3"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-beta3-b84)
Java HotSpot(TM) Client VM (build 1.4.0-beta3-b84, mixed mode)

> http://java.sun.com/products/jdk/1.2/compatibility.html

> Runtime Incompatibilities in Version 1.2

> In JDK 1.2 software
> the -Xfuture option enables the strictest possible
> class-file format checks ...

If only this were true.  Please try the demo below, see that something's
broken, tell me what it is, and fix it.

> reject... illegal UTF-8 strings

I hope I'm right to think you agree that vmspec UTF-8, by "4.4.7 The
CONSTANT_Utf8_info Structure", is only shortest form UTF-8 except that u0000,
if present, appears always as x C0 80?

By that definition, the `java -Xfuture` verification of .class file format
rejects a lot less than all forms of "illegal UTF-8 strings".

1) The verification never complains of not-shortest-form UTF-8.  (Though it
does complain of the too-short-form x 00.)

2) The verification accepts truncated and ill-formed UTF-8 in string values,
attribute names, and unused entries.

We care because by design, vmspec UTF-8 defines precisely zero or one ways to
represent any sequence of chars.  By defining more than one sequence of bytes
as equal to a given sequence of chars, we raise unanswerable questions.  Does
one method override another?  Is a field present?  Is a constant initialiser
present?

Now for the promised quick, rough demo of some of this.  Try editing the binary
A.class after compiling this source:

        class A
            {
            final static int theInt = 0x9ABCDEF0;
            String theString = "ConstantValue";
            }

        class B
            {
            public static void main(String[] strings)
                {
                System.out.println(Character.isJavaIdentifierPart('\u00E0'));

                A a = new A();
                String st = a.theString;
                for (int index = 0; index < st.length(); ++index)
                        {
                        char ch = st.charAt(index);
                        System.out.println("x" + Integer.
                                toHexString(ch).toUpperCase());
                        }
                }
            }

In the binary A.class, confirm you see only one CONSTANT_Utf8_info entry that
equals "theInt":

        01 00:06 74 68 65 49 6E 74 // theInt

  See also that `java -Xfuture B` accepts the A.class binary.

Now change the A.class binary.  Change the trailing x74 to an xE0.  See that
`java -Xfuture B` explodes, complaining of an "Illegal Field name".  So far so
good.

Now restore the original A.class binary (most simply, recompile it).  Go find
the one entry of:

        01 00:0D 43 6F 6E 73 74 61 6E 74 56 61 6C 75 65 // ConstantValue

Change the trailing x65 to an xE0.  See that `java -Xfuture B` is happy.

Conclude that string values and attribute names may contain truncated Utf.

Repeat, if you like, changing two trailing bytes, to see constant pool Utf may
contain ill-formed Utf, such as x D0 01 (b10xx:xxxx does not follow b110x:xxxx).

Repeat, if you like, changing three trailing bytes, to see constant pool Utf
may contain not-shortest-form Utf, such as x E0 90 81.  So may field names, etc.

Please tell me what's broken and fix it - or unconfuse me!

Thanks in advance.    Pat LaVarre

> http://developer.java.sun.com/developer/bugParade/
> +-Xfuture +utf
> 4 Results Found, Sorted by [lack of] Relevance
(Review ID: 136117) 
======================================================================

                                    

Comments
EVALUATION

By fixing bug 4169783, this bug is partially fixed. The format checker will verify shortest form of utf8 strings in future releases.

###@###.### 2002-03-19
                                     
2002-03-19
WORK AROUND

Name: gm110360			Date: 12/14/2001


Workaround?  Ouch.  No easy answer here.

I guess people can run a separate verifier to reject stuff we think we'll
almost never find, like:

        truncated or ill-formed UTF-8
        unused entries containing illegal UTF-8 of any kind

But as for not-shortest-form UTF-8, neglecting to fix that in jdk1.2 has left
us facing a slowly growing horror.

Per review ID 136105, we know javac no more antique than jdk1.3 commonlu
produces not-shortest-form UTF-8, for chars u0400..u07FF i.e. Cyrillic,
Armenian, Hebrew, Arabic, Syriac, and Thaana.  Not just strings: identifiers
too.

We know only now, with the jdk1.4 of late 2001, has a javac decided by default
to break compatibility with the jdk1.0.2 jvm's of Win95 IE 3.

We can conclude we're going to be living with not-shortest-form UTF-8 for a
long, long time.

Somebody Who Matters has gotta decide.  Is it better to have the vmspec be
stable, short, and unreal ... or do we answer the unanswerable.

When is it ok for two .class file readers - a jvm, a javac, java.lang.reflect,
whatever - to disagree about what a .class file means?  When must each see
bytes as bytes?  When must each see bytes as chars?

If we say a jvm has to see bytes as bytes, don't we have to let
java.lang.reflect see bytes as bytes?

If we say a jvm has to see bytes as chars, we implicitly require any jvm to
convert every UTF-8 string it sees?

All wrong answers.

I can't tell you how pleased I'd be to be told I'm all mixed up here.
======================================================================
                                     
2004-06-11
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
mantis

FIXED IN:
mantis

INTEGRATED IN:
mantis
mantis-b05


                                     
2004-06-14



Hardware and Software, Engineered to Work Together