JDK-8169931 : 8k class metaspace chunks misallocated from 4k chunk freelist
  • Type: Bug
  • Status: Closed
  • Resolution: Fixed
  • Component: hotspot
  • Sub-Component: gc
  • Priority: P3
  • Affected Version: 8,9
  • OS: linux
  • CPU: x86_64
  • Submit Date: 2016-11-17
  • Updated Date: 2016-12-22
  • Resolved Date: 2016-11-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availabitlity Release.

To download the current JDK release, click here.
JDK 9
9 b150Fixed
Related Reports
Cloners :  
Relates :  
Description
FULL PRODUCT VERSION :
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)

Reproduced with a default clone of OpenJDK jdk8u

FULL OS VERSION :
Linux pm-cluster-rhel7-1b 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
(can be reproduced on any 64-bit Linux flavour)


A DESCRIPTION OF THE PROBLEM :
We have an application server that generates code when applications are deployed. Recently it started failing during the deploy of a large application with "java.lang.OutOfMemoryError: Metaspace" errors. Experimenting with the command-line metaspace configuration flags made no difference. The only thing that did work was to disable CMS entirely, but this is not a practical long-term solution.

To try to determine the root cause of the issue, it was investigated using a clone of the OpenJDK jdk8u Mercurial repository. Local builds of the JDK with extra debug logging were made. Eventually the bug was tracked down to an implementation error in hotspot/src/share/vm/memory/metaspace.cpp. The ChunkManager::list_index() method returns the wrong answer for humongous class metadata chunks if the chunk size happens to be the same size as a non-class metadata medium chunk (8K).

Chunk sizes are specified as so (from metaspace.cpp):

 77  enum ChunkSizes {    // in words.
 78    ClassSpecializedChunk = 128,
 79    SpecializedChunk = 128,
 80    ClassSmallChunk = 256,
 81    SmallChunk = 512,
 82    ClassMediumChunk = 4 * K,
 83    MediumChunk = 8 * K
 84  };

list_index() is a static method that returns the index of an appropriate freelist:

2330  ChunkIndex ChunkManager::list_index(size_t size) {
2331    switch (size) {
2332      case SpecializedChunk:
2333        assert(SpecializedChunk == ClassSpecializedChunk,
2334               "Need branch for ClassSpecializedChunk");
2335        return SpecializedIndex;
2336      case SmallChunk:
2337      case ClassSmallChunk:
2338        return SmallIndex;
2339      case MediumChunk:
2340      case ClassMediumChunk:
2341        return MediumIndex;
2342      default:
2343        assert(size > MediumChunk || size > ClassMediumChunk,
2344               "Not a humongous chunk");
2345        return HumongousIndex;
2346    }
2347  }

It's obvious looking at the code that if an 8K class metadata chunk is requested, this method is going to erroneously claim that it's a medium chunk not a humongous chunk. This leads to 4K chunks being allocated from medium chunk freelist, if any are available there, which aren't big enough to hold the 8K of data needed. Consequently, the allocation fails, is retried a couple of times, causes GC to be initiated, the allocation is subsequently tried again, but fails for the same reason, eventually causing the java.lang.OutOfMemoryError.

The error *only* occurs when there are free chunks available on the medium chunk freelist. If there aren't any there, new chunks *of the correct size* are allocated from virtual memory space and all is well.

THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

REGRESSION.  Last worked in version 7u80

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Load a class requiring an 8K class metadata chunk when there are 4K chunks available on the medium chunk freelist.


EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected: The class should load successfully

Actual: A java.lang.OutOfMemoryError: Metaspace error occurs
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Metaspace debug log showing a (failed) request for 8061 words being "satisfied" using a 4096 word chunk:

SpaceManager::grow_and_allocate for 8061 words 2627 words used 1469 words left
Metadata humongous allocation:
  word_size 0x0000000000001f7d
  chunk_word_size 0x0000000000002000
    chunk overhead 0x0000000000000005
ChunkManager::free_chunks_get: free_list 0x00007f57c00a3fc0 head 0x0000000104729c00 size 4096
ChunkManager::chunk_freelist_allocate: 0x00007f57c00a3f80 chunk 0x0000000104729c00  size 4096 count 292 Free chunk total 1285504  count 609
SpaceManager::add_chunk: 8) Metachunk: bottom 0x0000000104729c00 top 0x0000000104729c28 end 0x0000000104731c00 size 4096
    used 5 free 4091


REPRODUCIBILITY :
This bug can be reproduced often.

---------- BEGIN SOURCE ----------
Once the issue was understood an attempt was made to create a standalone test case that could reproduce it, but that effort has so far failed.
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Disabling CMS GC is the only known effective workaround.

A patch against the OpenJDK that fixes the issue has been written, but it's too big to fit here.


Comments
This is a great description of the bug! I and ErikH created a patch to fix this: http://cr.openjdk.java.net/~stefank/8169931/webrev.01 The above split into a test and a fix patch: http://cr.openjdk.java.net/~stefank/8169931/webrev.01.unittest/ http://cr.openjdk.java.net/~stefank/8169931/webrev.01.fix/ I'm going to run more tests on this patch.
2016-11-18

The report contains detailed explanation of the issue. This is misleading as when 8k class metadata chunck is requested "ChunkManager::list_index(size_t size)" always returns 4k chunks.
2016-11-18