JDK-6389107 : (ref) Request Reference.link(Object,Object) for "biweak" caches
  • Type: Enhancement
  • Status: Open
  • Resolution: Unresolved
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P3
  • Affected Version: 5.0
  • OS: linux
  • CPU: x86
  • Submit Date: 2006-02-22
  • Updated Date: 2017-01-10
Related Reports
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
First off - this RFE will probably be closed as "won't fix", but I figure it's worth a shot.

In various caching algorithms it is desirable to have a value which refers to the key. The clearest example is BeanInfo. A BeanInfo object has to keep a strong reference to the Class which it introspects, since it produces Method objects and so on. How do you then implement Introspector.getBeanInfo(Class)? You have some choices, none of which are satisfactory:

1. Compute a fresh BeanInfo every time it is called. Can be expensive if it is called a lot. You want to cache it.

2. Keep a WeakHashMap<Class,BeanInfo> cache. This is unacceptable since the Class object will never be collected (since the value refers to the key), and so dynamic class loaders will never be collected, JARs will never be unlocked, etc.

3. Keep a WeakHashMap<Class,WeakReference<BeanInfo>>. This solves the problem in #2 but is not much of an improvement over #1 since the WeakReference<BeanInfo> will in practice be collected very quickly and have to be recomputed.

4. Keep a WeakHashMap<Class,SoftReference<BeanInfo>>. This avoids frequent recomputation (does not prevent it altogether), and it does permit the Class to be collected; but it will keep the Class (and its ClassLoader, etc.) in memory much longer than it is actually needed, which is undesirable.

5. Use #2 but introduce an explicit clearCache() method. This is undesirable as it forces every application making a custom class loader to know about this cache, even if its usage is buried in some otherwise self-contained library. And it forces the application to explicitly manage memory by deciding when the ClassLoader is no longer in use, which is not always possible.

6. Use #3 but have the cache use as a key a subclass of the nominal key which includes a strong reference to the cache value. Works fine if you can in fact subclass the key's class and there is a reliable way of finding the cache key from an externally visible key. Does not work for java.lang.Class.

7. Make the value hold only a weak (or soft) reference to the key. If some method is called on it which would require the original key, but the key has already been collected, throw an IllegalStateException or similar and ask clients to deal with that.

Here are some instances of this problem in the real world:

1. JDK #5102804 regarding BeanInfo (as above), with an impact on the NetBeans IDE (http://www.netbeans.org/nonav/issues/show_bug.cgi?id=46532) and only worked around in a crude way by #5.

2. Apache Ant uses a similar per-Class cache:


which suffers from the same problem, worked around using technique #3.

3. Tomcat apparently has a similar problem:


Not known to be worked around yet.

4. In the NetBeans IDE, we had a cache which was leaking memory.


Worked around using technique #7.
JSR 292 turns up another interesting scenario. It describes a MethodHandle, which must hold strong refs to various Class objects (return and parameter types). The JSR requests that MethodHandle's be interned, i.e. the factory return the same object when given the same Class arguments.

That would seem to be impossible without an RFE like this one. In fact just

  void link(Object from, Object to)

would not seem to suffice: there is no single Class which is "distinguished" in general, since the classes might have different loaders, and not all of these loaders need be in a straight chain of ancestry.

(If they were, then you could associate the handle with a class from the most derived loader: classes in base loaders could not be collected before this one anyway. For example, in 'String myMethod(MyObject x)', you would want to tie the MethodType to the loader of MyObject.class, not String.class. The problem can be seen with a method 'void m(A a, B b)' where A.class and B.class come from sibling loaders; linking the MethodType to, say, A.class would mean that B.class and thus its loader could not be collected so long as A.class and its loader existed, creating a potentially huge memory leak.)

A more powerful primitive API would be

  void link(Object to, Object... from)

with the semantics that 'to' is to be considered strongly held so long as all of the 'from' objects are strongly held, but collectible as soon as any of them becomes collectible.

In the absence of such a facility, 292 will likely resort to a slightly compromised implementation of its spec: hold MethodType's using soft references. One of these references may be cleared during a GC cycle even when all the associated Class objects are strongly held, which is undesirable but not likely to be a significant performance problem under the assumption that active code in the JVM is holding local references to most of the MethodType's. (The impl can however assure that no two MethodType's with the same signature are live at a given time.)

SUGGESTED FIX At least one existing language has direct support for this use case: http://docs.racket-lang.org/reference/ephemerons.html

SUGGESTED FIX A much more conservative fix for many, though not all, use cases would be to permit certain very common key classes of which there are typically few instances in a VM to have attached foreign fields, with an API at its simplest like JComponent.putClientProperty. Clearly Class and ClassLoader are prime key candidates. Thread already has ThreadLocal, which might be a cleaner API pattern than putClientProperty, as it guarantees by construction that only the maintainer of a cache can access its values. The performance penalty when the feature is unused can be kept to one null object reference field per key instance; no change to the GC implementation would be needed. Clearly this tactic is unworkable for key classes with numerous small instances, as you would be adding noticeable bloat to the heap. For the special case of java.lang.Class as a key, adding a foreign field only to ClassLoader might suffice, since a class and its loader refer to each other strongly anyway. Then for example the equivalent of the originally motivating BiweakHashMap<Class,BeanInfo> could be implemented easily as a ClassLoaderLocal<HashMap<Class,BeanInfo>>: either the whole ClassLoader is collected together with the HashMap and all its keys and values, or all stay in memory. This trick would greatly reduce the performance penalty since you then need add only one field to ClassLoader, of which there will typically be far fewer instances than Class. Such a system would not be helpful for unsupported key classes; nor for the case that you need multiple origins for the reference, e.g. a cross-product as a cache key as in the MethodHandle example. It would also be possible to create a generic API that would permit any cooperative key class to offer to store foreign fields in its instances: public interface ObjectLocalCapable { void initLocal(Object referent); Object retrieveLocal(); } public class ObjectLocal<K extends LocalCapable, V> { protected V initialValue(); public ObjectLocal(); public V get(K key); public void set(K key, V value); public void remove(K key); } The Object arguments mentioned in ObjectCapable would be some private inner class of ObjectLocal, comparable to ThreadLocalMap, and thus opaque. The implementation would be trivial: public ClassLoader implements ObjectLocalCapable { // ... private Object local; public void initLocal(Object referent) { if (local != null) throw new IllegalStateException(); local = referent; } public Object retrieveLocal() { return local; } }

SUGGESTED FIX How can the problem be solved? I don't believe it is possible to solve it using stock JVMs; I have tried to do it using ReferenceQueue tricks but without success. The minimal primitive VM feature would be a single static method, e.g. in java.lang.ref.Reference: public static native void link(Object from, Object to); with the contract that after calling this method, 'from' is considered to strongly refer to 'to' for purposes of garbage collection. Here is the unit test for its behavior: public void testLinkControl() { Object a = new Object(); Object b = new Object[] {a}; Reference ra = new WeakReference(a); Reference rb = new WeakReference(b); assertFalse(collectible(ra)); assertFalse(collectible(rb)); b = null; assertFalse(collectible(ra)); assertTrue(collectible(rb)); a = null; assertTrue(collectible(ra)); assertTrue(collectible(rb)); } public void testLink() { Object a = new Object(); Object b = new Object[] {a}; Reference.link(a, b); Reference ra = new WeakReference(a); Reference rb = new WeakReference(b); assertFalse(collectible(ra)); assertFalse(collectible(rb)); b = null; assertFalse(collectible(ra)); assertFalse("a still links to b", collectible(rb)); a = null; assertTrue("both are collectible #1", collectible(ra)); assertTrue("both are collectible #2", collectible(rb)); } private static boolean collectible(Reference ref) { // adapted from org.netbeans.junit.NbTestCase: List alloc = new ArrayList(); int size = 100000; for (int i = 0; i < 50; i++) { if (ref.get() == null) { return true; } System.gc(); System.runFinalization(); try { alloc.add(new byte[size]); size = (int) (((double) size) * 1.3); } catch (OutOfMemoryError error) { size = size / 2; } try { if (i % 3 == 0) { Thread.sleep(321); } } catch (InterruptedException t) { // ignore } } return false; } With such a primitive you could implement e.g. a BiweakHashMap<K,V> which would act like WeakHashMap except that the value is permitted to strongly refer to the key without causing a memory leak. (I.e. an entry will remain in the map so long as either the key or the value is referenced, but no longer.) An alternate primitive, which might admit to an efficient implementation (I don't understand enough about garbage collectors to know for sure) would be as follows: public class LazarusReference<T> extends Reference<T> { public LazarusReference(T referent, ReferenceQueue<? super T> q) { super(referent, q); } public native void dispose(); } In contrast to WeakReference, code finding this reference from a ReferenceQueue (when there are no outstanding references to the referent) would still see a non-null value from get(); the referent would really be discarded only if dispose() were called. If dispose() is not called the reference is simply left alone until the next collection cycle and the referent lives on. I think you could then implement link(...) using this kind of reference, but of course I cannot try it. Something like public static void link(Object from, Object to) { bindings.add(new Binding(from, to)); } private static final Collection bindings = Collections.synchronizedList(new LinkedList()); private static final class Binding extends LazarusReference { final Reference from; Binding(Object from, Object to) { super(to, QUEUE); this.from = new WeakReference(from); } } private static final ReferenceQueue QUEUE = new ReferenceQueue(); private static final class Cleaner implements Runnable { public void run() { try { while (true) { Binding b = (Binding) QUEUE.remove(); if (b == null) { continue; } if (b.from.get() == null) { bindings.remove(b); b.dispose(); } } } catch (InterruptedException x) { x.printStackTrace(); } } } static { Thread t = new Thread(new Cleaner()); t.setDaemon(true); t.start(); }