JDK-4213176 : A Proposal for User-Defined Lightweight Value-Semantics Objects
  • Type: Enhancement
  • Component: tools
  • Sub-Component: javac
  • Affected Version: 1.2.0
  • Priority: P5
  • Status: Closed
  • Resolution: Duplicate
  • OS: generic
  • CPU: generic
  • Submitted: 1999-02-19
  • Updated: 1999-02-26
  • Resolved: 1999-02-26
Related Reports
Duplicate :  
Description
=================================


Name: sg39081			Date: 02/19/99


A Proposal for User-Defined Lightweight Value-Semantics Objects

Introduction:
-------------

This is a proposal to allow a Java programmer to define new
Lightweight Value-Semantics Objects [LVSOs] to the Java language.
"Lightweight" means that manipulating the objects is approximately as
efficient as manipulating primitive objects.  "Value-Semantics" means
that the "objects" have value semantics as opposed to the reference
semantics of ordinary Java objects.  "Objects" means that the
declarations whose addition I propose be added adding to the language
are similar to class declarations, and you get all of the class
options that are reasonable given value semantics.  

Please note that I propose to allow the user to declare new classes of
LVSOs, not to create the notion of LVSOs ab initio.  Java already has
LVSOs; int, boolean, etc.  We should not debate whether we need LVSOs,
only whether we should let the user add new LVSO classes.

I will develop the proposal in three stages: 

   the basic stage, in which you essentially get to define your
   primitive types 

   the subtypable primitives stage, in which we explore the
   implications of inheritance for LVSOs [whose method calls would not
   be polymorphic, since there are no pointers in Java and no
   references to LVSOs].

   the overloaded operators stage.  I won't devote too much attention
   to this because the Java developers do not plan to introduce
   operator overloading in the near future.  See the resolution of bug
   #4087427.

Some of the ideas for type safety language features come from Ada.

Motivation:
-----------

Consider the Apples and Oranges classes of Appendix A.  These classes
have simple behaviors and few instance variables; the motivation for
declaring the classes is type safety; the compiler won't let you
compare Apples and Oranges.  In this case the programmer's intent was
to create a new primitive type with int's semantics but with type
safety.

Consider the Money class and its subclasses of Appendix B.  In this
case we have a class that implements a non-integral fixed point
representation, and two subclasses that are, again, provided for type
safety.

Lastly, consider the Complex class of Appendix C.  The Complex class
is a canonical example of a type that is not a primitive type in many
languages, but which you would like to be implemented efficiently.
The Complex class differs from the other two in the fact that the
instance variable set is more than a single instance of one primitive
type.

These classes could be treated similarly to complicated classes for
which the overhead of object creation is amortized over a long object
lifetime, but there are several reasons why the programmer might
prefer a different treatment for these tiny objects:

  1: The objects are small.  They don't represent the fruits of much
     computation, making the high cost of object creation a problem. 

  2: You probably want value semantics rather than reference
     semantics.  In particular, equals() is likely to do the right
     thing in more cases than == 

Because all objects are manipulated with references in Java, and
storage for non-primitive objects are managed by the garbage collector
and never stack allocated, we must allocate and free objects to get
type safety or to get the effects of structures.  You would like the
compiler to prevent you from comparing Apples and Oranges, but it cost
980 units of time[1] to create an Apple or an Orange, where a simple
assignment costs one time unit.  In the real world people will be
inclined to declare integer variables for discrete units such as
Apples and Oranges, and float variables for physical units such as
meters or seconds, and from time to time Java programs will have bugs
that trace to type mistakes.  In the case of complex numbers and other
small aggregates of primitive types with simple methods, programmers
will have three unappetizing choices:

  1: Create the objects and pay the performance penalty

  2: Create the objects and reuse them, managing your own storage and
     having the same dangling reference problem you get in languages
     like C++ [although there would be no memory leaks]

  3: Maintain clusters of local variables; in this example you might
     have double ac_signal_real; double ac_signal_imag; where you
     would like to have written Complex ac_signal; this is what you
     would like the compiler to do for you.

One problem with simply creating the objects is that we are then faced
with a second choice.  We can make the small objects immutable, so for
example the sole Complex addition method looks like this

       Complex add(Complex addend)
       {
          return new Complex(real + addend.real, imag + addend.imag);
       }

but efficiency considerations would impel us to at least provide the
option of modifying an existing Complex and including the method

       Complex add_d(Complex addend)
       {
          real = real + addend.real;
          imag = imag + addend.imag;
          return this;
       }

and the programmer must always keep in mind whether a Complex
reference has more than one name at a given point in his program in
order to be able to use add_d and avoid the cost of an allocation
every time you add.

The Basic Proposal; extensions of Primitive
-------------------------------------------

When a programmer declares an int in Java [or, for that matter, in
almost any other programming language], he allocates a cell able to
contain a basic integer in whatever frame he's declaring in.  For
example, if he declares an int in a method he gets an int in the local
variables area of the stack frame of every invocation of that method.
If the programmer keeps modifying that int, the same cell is reused.

On the other hand, when a programmer declares an object, including a
boxed instance of a primitive type such as Integer, he allocates a
cell able to hold a reference to an instance.  The class designer has
to decide whether to make the object updatable, which might reduce the
creation of new objects but which will lead to unexpected behavior if
there are multiple pointers to the same object, or immutable, in which
case the new operator with its performance penalty needs to be
performed repeatedly.  The boxed primitive types in java.lang.* are
immutable.

I propose that the programmer be able to declare that a class extends
java.lang.Primitive instead of the usual java.lang.Object or one of
its subclasses.  If he does this, then the declared memory contains
the object itself instead of a reference to the object.  If a class is
derived from Primitive then there is no way to declare a reference to
instances of that class.

I further propose the following conventions:

  1: You may use any visible constructor to create a value of a
     Primitive extension.  You can't apply new to the class name,
     since Primitive extensions are not allocated on the heap.

  2: If you declare a variable of a Primitive extension but provide no
     initial value the instance variables are built with the default
     constructor.  There must be one, or an uninitialized declaration
     is forbidden.  However, a consequence of rule 1 is that if you
     elect to initialize your variable you could write something like

        Complex unit_real_vector = Complex(1.0, 0.0);

     Of course the constructor must exist and must be visible per its
     access modifier.

  3: The equals method inherited from Primitive recursively applies
     equal to corresponding instance variables.  This can, of course,
     be overridden but there is no polymorphism because all value
     types can be determined at compile time.

     A == B is not allowed if A and B are different Primitive subtypes
     of if one operand is a Primitive subtype and the other isn't, and
     is determined by applying A.equals(B) if they are of the same
     Primitive subtype.

  4: Primitive subtypes cannot be declared to implement an interface.

A Primitive subtype cannot contain itself as an instance variable.
There can be no sequence of Primitive subtypes S1, S2, ... Sn such
that S1 contains an S2, S2 contains an S3, ..., S(n-1) contains an Sn
and Sn contains an S1.

proposed implementation
-----------------------

Types come in three flavors: primitive, Primitive subtype, or object.
Every Primitive subtype reduces to a sequence of types of the form
<Array>^i<plain Java type> where i is a non-negative integer.  A
<plain Java type> is a primitive type or an Object.  The reduction is
as follows:

  1: a non-array Object or a primitive type reduces to itself

  2: a Primitive subtype reduces to the concatenation of the reductions
     of the types of its instance variables

  3: If Type reduces to [ <array>^i1<t1>, <array>^i2<t2>, ... ] then
     Type[] reduces to [ <array>^i1+1<t1>, <array>^i2+1<t2>, ... ]


This is guaranteed to terminate by noncircularity of the instance
variable type definitions and the fact that there is a finite number
of Primitive subtypes.  In practice I do not expect the reduction of a
primitive subtype to reduce to more than a handful of types.  We call
the result of this process the "decomposition" of the Primitive
subtype.

If a Primitive subtype P decomposes to a single type T, then the
compiler shall generate code that allocates a cell that can hold a
value of type T for each variable of type P.  The compiler shall
arrange for the methods to be called on implicit arguments of type T.
Byte code for static methods is the way to accomplish this since there
is no this for an instance method call.  Therefore, the "methods" of a
Primitive subtype compile as static methods.

If a Primitive subtype P decomposes to a sequence of types T1, T2,
... Tn, then the compiler shall generate code that allocates n cells,
one each to hold a value of each of the Ti types, for each variable of
type P.  The compiler shall arrange for the methods to be called on a
sequence of implicit arguments of types Ti.  For each of the explicit
arguments of a Primitive subtype several formal parameters, one for
each of that subtype's Ti's, shall be passed instead.  For example,

    double magnitude(Complex C){ .. real .. imag .. }

compiles into byte code appropriate for

    static double magnitude(double real, double imag){ .. real .. imag .. }

Do note that an array of Complex decomposes to two arrays, each of
double. 

Because the byte code fundamentally can't return multiple values [as
in the LISP style], return values of Primitive subtypes which
decompose to multiple types must be returned in some other manner.  I
propose hidden formal parameters to methods returning such a value.
The compiler shall generate one mutable boxed primitive type instance
for each primitive value returned as a Primitive subtype, except for
the first type T1 which can be returned as the return value.  [Owen,
help me word this?]  Although these boxed types must be allocated,
they only need to be allocated once per calling method invocation at
worst, and in some cases a method would be able to pass some of the
hidden boxed primitive types to its callees and therefore avoid extra
allocations.  Another possibility is to have instance variables on
Thread that maintain per-Thread lists of mutable boxed primitive type
instances for this purpose.  So

    Complex add(Complex addend){ ... }

compiles as if it were

    static double add(boxed_double imag_ret, double real, double imag,
                      double addend_real, double addend_imag)
      { ... }

There is one other small detail.  The == operator must generate a call to
equals() when the operands are compatible LVSOs.

Extending Primitive subtypes:
-----------------------------

I do not propose that type information be kept with LVSOs.  You may
therefore not assign a DollarsAndCents to a Money.  However,
inheritance is useful even without polymorphism.  I propose the Ada
convention that an operation on a supertype be extended to the subtype
by replacing the supertype by the subtype everywhere in the signature,
including the result type.  An operation will not be extended if any
part of the signature is already of the subtype, or if the result is
of the supertype and the subtype adds instance variables.

Overloaded Operators:
---------------------

It doesn't look like operator overloading will be added to Java any
time soon.  However, if it ever is added, say with C++ syntax, you
might want to be able to engage an operation on its right-hand
operand.  The C++ convention of declaring a variant of a method
applicable to a different syntax by adding a bogus int formal
parameter may serve us well here.  For example, scalar multiplication
of a Complex might look like this:

    Complex operation*(double scalar)
    {
       return Complex(real * scalar, imag * scalar);
    }
    Complex operation*(double scalar, int i)
    {
       return Complex(real * scalar, imag * scalar);
    }

...

    Complex C(1.0, 2.0)

       ... C * 3.0  // engages Complex operation*(double scalar)

       ... 3.0 * C  // engages Complex operation*(double scalar, int i)





			      Appendix A

public final class Apples
{
  private int number;

  public Apples(int i)
  {
    number = i;
  }

  public Apples plus(Apples addend)
  {
    return Apples(number + addend.number);
  }

  // you get this one for free; it's placed here for emphasis 
  public boolean equals(Apples comparand)
  {
    return number == comparand.number;
  }

  ...
}

public final class Oranges
{
  private int number;

  public Oranges(int i)
  {
    number = i;
  }

  public Oranges plus(Oranges addend)
  {
    return Oranges(number + addend.number);
  }

  // you get this one for free; it's placed here for emphasis 
  public boolean equals(Oranges comparand)
  {
    return number == comparand.number;
  }

  ...

}

			      Appendix B

public class Money
{
  private long cents;

  public Money(double amount)
  {
    cents = amount * 100 + 0.1;
  }

  private Money(long amount)
  {
    cents = amount;
  }

  public Money add(Money addend)
  {
    return Money(cents + addend.cents);
  }

  public boolean equals(Money comparand)
  {
    return cents = addend.cents;
  }

  ...
}

public final class DollarsAndCents extends Money
...

public final class Euros extends Money
...

			      Appendix C

public final class Complex
{
  private double real;
  private double imag;

  public Complex(double _real, double _imag)
  {
    real = _real; imag = _imag;
  }

  public Complex add(Complex addend)
  {
    return Complex(real + addend.real, imag + addend.imag);
  }

  public boolean equals(Complex comparand)
  {
    return real == comparand.real && imag = comparand.imag;
  }

  ...
}








notes

[1] Bruce Eckels, "Thinking in Java", Prentice Hall, 1998,
    Pp. 1060-1061

    These timings were performed on Sun's JDK1.1.4 on a Pentium Pro
    200MHz.
(Review ID: 48962)
=====================================

Comments
WORK AROUND Name: sg39081 Date: 02/19/99 Since this is a request for an enhancement, this section might not really be appropriate. The alternatives, however, are: 1: Create objects and pay the performance penalty 2: Create the objects and reuse them, managing your own storage and having the same dangling reference problem you get in languages like C++ [although there would be no memory leaks] 3: Maintain clusters of local variables; in this example you might have double ac_signal_real; double ac_signal_imag; where you would like to have written Complex ac_signal; this is what you would like the compiler to do for you. ======================================================================
11-06-2004

EVALUATION This is a duplicate bug report. iris.garcia@eng 1999-02-26
26-02-1999