United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: JDK-6920977 G1: guarantee(k == probe->klass(),"klass should be in dictionary") fails
JDK-6920977 : G1: guarantee(k == probe->klass(),"klass should be in dictionary") fails

Details
Type:
Bug
Submit Date:
2010-01-28
Status:
Closed
Updated Date:
2012-10-08
Project Name:
JDK
Resolved Date:
2010-03-02
Component:
hotspot
OS:
generic
Sub-Component:
runtime
CPU:
generic
Priority:
P3
Resolution:
Fixed
Affected Versions:
hs17
Fixed Versions:
hs17 (b10)

Related Reports
Backport:
Backport:

Sub Tasks

Description
While testing G1 I hit the following guarantee failure:

Internal Error at loaderConstraints.cpp:500, pid=21053, tid=15 
Error: guarantee(k == probe->klass(),"klass should be in dictionary")

I can reproduce it reasonably easily (within maybe an hour) on my OpenSolaris workstation (intel quadcore) with a solaris / 32-bit / fastdebug build.

                                    

Comments
EVALUATION

I've been looking at this with help from Karen (thanks!) and I've got it to fail with a lot of extra instrumentation. This is what we know so far.

The guarantee fails while the loader constraints table is being verified. Verification is initiated from Universe::verify(), which is called because +VerifyBeforeGC was set. So, the issue is found at the beginning of a G1 pause.

In all the failures I've seen, the class in question always seems to be  'java/lang/SecurityManager'. The reason the guarantee fails is that when we do a lookup in the system dictionary to find the class that's in the probe, we don't find it and we get back NULL:

  k              = 0x00000000

whereas what we're looking for clearly is not:

  probe->klass() = 0xf4f93b00
  probe->name()  = 0xf4a0fad8
                 = java/lang/SecurityManager
  probe->loaders()
                   sun/misc/Launcher$AppClassLoader
                   <bootloader>

The class loader of ik (ik = instanceKlass::cast(probe->klass())) seems to be the boot loader:

  ik->class_loader()   = <bootloader>

The first two times I saw this it happend at the beginning of the first pause after a cleanup pause, so I naturally thought they would be related. However, in the last two failures, it happens two pauses after a cleanup pause. So, maybe, thinking that the cleanup pause is somehow to blame might be a red herring and it could simply be due to timing. However, I had added instrumentation to dump the contents of the system dictionary before and after the cleanup pause and enabled verification during GC to also verify that cleanup didn't mess anything up. There are a few interesting observations that we can make by looking at the contents of the system dictionary at that point:

- 'java/lang/Security' does not seem to be in the system dictionary
- 'java/lang/Security' does appear in the loader constraints table:

   3: Symbol: 'java/lang/SecurityManager' , loaders:a 'sun/misc/Launcher$AppClassLoader', NULL, 

- verification passes

I print the contents of the system dictionary when we reach the point of the failure too. The scenario is as above, with a minor difference:

- 'java/land/Security' does not seem to be in the system dictionary
- 'java/lang/Security' does appear in the loader constraints table:

   3: Symbol: 'java/lang/SecurityManager' , loaders:a 'sun/misc/Launcher$AppClassLoader', NULL, 

- 'java/lang/SecurityManager' also now appears in the placeholders table:

 place holder 'java/lang/SecurityManager', supername 'java/lang/Object', definer AllocatedObj(0x099f1c00)

- verification now fails
                                     
2010-01-28
SUGGESTED FIX

Change the guarantee to also check the placeholders table, if find_class() returns NULL. I'll also add an explicit check to check that k != NULL, to catch that condition separately.
                                     
2010-02-01
WORK AROUND

Well, there's no "easy" workaround. Given that this is a bug in the verification code, not running with verification turned on is the only way not to trip over it.
                                     
2010-02-01
EVALUATION

The issue seems to be that the klass oop can be installed in the placeholders table, while a class is being defined, and before it is added to the system dictionary. So, find_class() on the dictionary is not guaranteed to find the klassOop in the system dictionary, when said klassOop is well formed and appearing in the loader constraints table.

So, this guarantee:

        klassOop k = dictionary->find_class(d_index, d_hash, name, loader);
        guarantee(k == probe->klass(), "klass should be in dictionary");

is too strict, as the klassOop might not yet be appearing in the system dictionary.
                                     
2010-02-01
EVALUATION

Also, an e-mail from Karen:

So - to the best of my understanding you were actually correct - the guarantee should
be changed from find_class to find_class_or_placeholder. Sorry I didn't have this
clearly thought out before.

Here is the logic:

In define_instance_class the order of operations is tricky and deliberate:
  1. check_constraints: which both checks the constraints, and updates the constraint table with the new instanceKlass
  2. javaCall to loader_addClass_method
  3. update_dictionary: which adds the new instanceKlass with the defining loader
  4. remove from placeholder table

There is a comment that check_constraints and update_dictionary need to be "atomic" with
respect to each other.

You have to check_constraints before you update the java class loader table. You can't hold
a lock around the call out to java. You can't update the dictionary until all checks are done,
because once an instanceKlass is in the dictionary it is globally visible.

So if you have a gc between the check_constraints and update_dictionary - you will see the
situation you have today. And you've caught it in define_instance_class because you
have a placeholder entry with a definer set.

I suspect that I should think through a bit more how to recover from an OutOfMemoryError
in the call to loader_addClass_method. Right now we crash so it is not critical. If MVM lets
you keep running we will end up with an entry in the loader_constraint table referring to
an instanceKlass that is not in the dictionary, and a left over placeholder table entry, so
I should add a comment to the code that when we recover from OutOfMemoryErrors that
should be cleaned up.
                                     
2010-02-01
SUGGESTED FIX

So, it turns out that if the entry is in the placeholders table, the instanceKlass on the placeholder entry might not have been installed yet. So, we'll split the check into two:

- if the look up for the class in the system dictionary is successful, we'll leave the k == probe->klass() guarantee as is
- if the look up for the class in the system dictionary fails, we'll only check that the class appears in the placeholders table (as we cannot k from the placeholde entry).
                                     
2010-02-05
EVALUATION

ChangeSet=http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/38836cf1d8d2,ChangeRequest=6920977
                                     
2010-02-10



Hardware and Software, Engineered to Work Together