JDK-4469343 : Heap snapshot causes signal 11
  • Type: Bug
  • Component: vm-legacy
  • Sub-Component: jvmpi
  • Affected Version: 1.3.1
  • Priority: P1
  • Status: Closed
  • Resolution: Fixed
  • OS: generic
  • CPU: sparc
  • Submitted: 2001-06-13
  • Updated: 2007-12-07
  • Resolved: 2002-08-30
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other
1.3.1_01 01Fixed
Related Reports
Relates :  
Relates :  
Description
Customer description of problem:

With JDK 1.2.2
===============

When we requested a heapdump through JVMPI it always appeared that a
garbage
collection was done before the heap dump was performed.

With JDK 1.3.1 (released version)
=================================

When we request a heapdump it does not always seem to be the case that a
Garbage collection is first performed. In some cases we have tracked the
request being called of the JVM, but then the machine will not be
responsive
for up to 15 minutes and usually the JVM crashes.

Even in cases where the heap dump succeeds, we see many objects in the
dump without referrers or referees, which indicates that a gc has not taken
place.
One possibility which occurred to us is that this problem might be
related to generational garbage collection, but we haven't been able to
test this theory.

However, if we request a Garbage collection manually through our
app's interface first, we usually are able to take heap snapshots
correctly. It also
seems to be the case that if we add code to request a gc through JVMPI
before requesting the heap dump, the heap dump operation completes as
expected. We don't know why, but maybe through doing a garbage
collection the heap information is better structured to allow for a
heapdump (?).


janet.koenig@Eng 2001-06-13


karen.kinnear@East 2001-06-21

Customer's small test case for duplicating:

Attached to this bug report is examples.tar.gz. After gunzip
and tar extracting it, run (UseBoundThreads they only use on Solaris)

java -noclassgc -Xrunhprof:heap=dump -XX:+UseBoundThreads examples/profiler/applications/network/Network

A small screen comes up - click the START button. Once the simulation
has started, where you originally said java, now press CTRL \ which
should give you a thread dump and a heap dump.

See comments section for the 5 fixes for hprof to now work.

To duplicate with JProbe:

See Karen for a preliminary copy of JProbe 3.0 for Solaris.

Install via java -Xmx100m -jar suite30_solaris.jar

You also want the kknetwork.jpl file for initial settings. This is
attached here.

To run the profiler, execute "<JProbeDir>/profiler/jpprofiler -Dvmcheck=false

After the "Welcome" dialog, click RUN which will bring you to
the JProbe LaunchPad. Click "Load ..." and use the chooser to get
the kknetwork.jpl file. Change the Workding Directory on the Program tab
to be the JProbe install directory. On the VM tab change the VM path.
You proably want to uncheck "close system console on exit". You
can leave the existing VM arguments or add. Then just click
Run and it will run the Network example (assumes that examples/...
is in the JProbe30 directory - or a link is there).

When you click RUN an xterm pops up with some JProbe debug and any
VM debug info. On the "Runtime Heap Summary" there is a blue and pink
memory use graph. On the toolbar, there is a group of buttons including
a little yellow "mountain" and at the right end of that group a little
camera whose tooltip says "Take Heap Snapshot". If you click that
it should do a heap_dump.

This JProbe version will print out debugging info like:

jvmpi:RequestEvent(heap_dump, 0x...)
jvmpi: 0x NotifyEvent(37:, requestedheap_dump) ...
jvmpi::0x NotifyEvent() ok
jvmpi: RequestEvent(): 0

If you click on the camera a lot as the app is coming up and running it crashes
trying to call RequestEvent (logging lines are flushed).

(note: these instructions are from Sitraka and I ran them and it duplicated
the problem - I did not try the manual instructions below)

You can also run the viewer separately if you want to. From the LaunchPad, "Save As ..."
your jpl file. From the the Program menu "Attach to remote session" and 
click ok. Then in a terminal window you can run 
  "profiler/jprun -jp_input=/full/path/to/jpl"

It should connect to the viewer and you can take heap snapshots again.
You can skip jprun and go straight to java if you want, by adding
the profiler directory to you PATH and LD_LIBRARY_PATH and setting
the environment variable JPROBE_ARGS_0="-jp_input=/full/path/to/jpl",
and running with "java -Xnoclassgc -Xbootclasspath/a:<JProbeDir>/profiler/jpagent.jar -Xrunjprobeagent etc."

Note that the customer sees these problems on Win2000, Linux as well
as Solaris.



Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: 1.3.1_01 generic FIXED IN: 1.3.1_01 INTEGRATED IN: 1.3.1_01 VERIFIED IN: 1.3.1_01
14-06-2004

SUGGESTED FIX mandy.chung@eng 2001-07-19 Attached is the webrev for fixes for this bug for Ladybird. For Merlin, see 4478223. mandy.chung@eng 2001-07-25 Chris Hegarty from CTE caught that the change in globals.cpp is not necessary. The third argument in develop_pd macro is just a description and doesn't allow to specify the initial value. Karen and I have confirmed that we can take out the changes in globals.cpp for Ladybird. The change in globals.hpp is: 870c870 < develop_pd(bool, UseStackBanging, true --- > develop_pd(bool, UseStackBanging,
11-06-2004

EVALUATION karen.kinnear@East 2001-06-21 Initial analysis of the heap dump bug in the comment section: "oopmap not found" matches the non-debug stack trace duplicated here: > [11] __sighndlr(0xb, 0xfee0b670, 0xfee0b3b8, 0xff368f78, 0xfee0be10, > 0xfee0be00), at 0xff36bbcc > ---- called from signal handler with signal 11 (SIGSEGV) ------ > [12] __1cJOopMapSetSfind_map_at_offset6kMill_pnGOopMap__(), at 0xfe6286e0 > [13] > __1cJOopMapSetGall_do6FpknFframe_pnICodeBlob_pknLRegisterMap_pFppnHoopDesc__vpF9 > C9C_vpF9C_vp9H_v_(0xfee0b8c8, 0xfb008850, 0xfee0b8d8, 0xfe6d2cd4, 0xfe71d810, > 0xfe71d810), at 0xfe627f28 > [14] > __1cJOopMapSetHoops_do6FpknFframe_pnICodeBlob_pknLRegisterMap_pFppnHoopDesc__v_v > _(0xfee0b8c8, 0xfb008850, 0xfee0b8d8, 0xfe6d2cd4, 0x0, 0x15c1c0), at 0xfe627ec4 > [15] __1cFframeHoops_do6MpFppnHoopDesc__vpnLRegisterMap__v_(0xfb008850, > 0xfe6d2cd4, 0xfee0b8d8, 0xfe784000, 0xfee0b8c8, 0xfe6d2cd4), at 0xfe626378 > [16] __1cKJavaThreadHoops_do6MpFppnHoopDesc__v_v_(0x0, 0xfe6d2cd4, 0x400, > 0x10, 0xfe784000, 0xfe6d2cd4), at 0xfe625ee4 > [17] __1cHThreadsHoops_do6FpFppnHoopDesc__v_v_(0xfe784000, 0x29188, > 0xfe6d2cd4, 0xfee0b, 0xfe784000, 0xfee0b9d4), at 0xfe625d74 > [18] cNRootCollector2t5B6M_v_(0x9fac0, 0xfe784000, 0xf8d22ac0, 0xfee0bac4, > 0x22141e28, 0xfee0bad0), at 0xfe6d18bc > [19] cUVM_JVMPIPostHeapDumpEdoit6M_v_(0xfad818e4, 0xfe79a4a8, 0xfe784000, > 0xfad818e4, 0xfe784000, 0xfee0bacc), at 0xfe6d0d90 > > HeapDumper::doit > RootCollector > Threads::oops_do(f) > JavaThread::oops_do > // Traverse the execution stack > for(StackFrameStream fst(this); !fst.is_done(); fst.next()) { > fst.current()->oops_do(f, fst.register_map()); > > > where StackFrameStream fst(this) with no 2nd arg automatically > should mark update=true > where fst.current() is a frame* > > frame::oops_do: > void oops_do() { > if (!_is_static) { > --_offset; > oop_offset_do(); > } > iterate_parameters(); > } > > void OopMapSet::oops_do(const frame *fr, CodeBlob* cb, const RegisterMap *re > g_map, void f(oop*)) { > // add derived oops to a table > all_do(fr, cb, reg_map, f, add_derived_oop, do_nothing, do_nothing); > } > > void OopMapSet::all_do(const frame *fr, CodeBlob* cb, const RegisterMap *reg > _map, > void oop_fn(oop*), void derived_oop_fn(oop*, oop*), > void value_fn(oop*), void dead_fn(oop*)) { > { debug_only(CodeBlob* t_cb = CodeCache::find_blob(fr->pc());) > assert(cb != NULL && cb == t_cb, "wrong codeblob passed in"); > } > > NOT_PRODUCT(if (TraceCodeBlobStacks) trace_codeblob_maps(fr, cb, reg_map); > ) > OopMapSet* maps = cb->oop_maps(); > OopMap* map = cb->oop_map_for_return_address(fr->pc(), reg_map->is_pc_at_ > call(fr->sp())); > ... > > OopMap* CodeBlob::oop_map_for_return_address(address return_address, bool at > _call) { > address pc = return_address ; > assert (oop_maps() != NULL, "nope"); > if (is_native_method()) { > // do it unprecise; for natives we are using the value stored in > // TLS and not the return address when we cook the last frame > return oop_maps()->find_map_at_offset ((int) pc - (int) instructions_beg > in(), at_call, false); > } else { > return oop_maps()->find_map_at_offset ((int) pc - (int) instructions_beg > in(), at_call, true); > } > } > > OopMap* OopMapSet::find_map_at_offset(int pc_offset, bool at_call, bool prec > ise ) const { > My initial assessment is that we should strongly consider backporting the now working Merlin heap dump code to Ladybird since fixing each type of dump (this is from the threads dump) might actually require more work and lead to repeated surprise bugs. For this reason I am handing the bug over to Mandy Chung who fixed the HeapDump code for Merlin in bugid 4363856. mandy.chung@eng 2001-06-28 "oopmap not found" assertions are hit when a JavaThread is exiting a compiled method and oopmaps are not tracked. A safepoint can stop a JavaThread while exiting a compiled method due to two changes: 1. the fix for 4331687, which changed SharedRuntime::jvmpi_method_exit() from JRT_ENTRY to JRT_LEAF. 2. jvmpi::post_event_vm_mode() makes thread state transition from vm to native or vice versa. Making transition() calls allow safepoint to block. "oopmap not found" assertions found in two different places: 1. While collecting heap dump, in frames::oops_do() 2. In GC in frames::oops_do() mandy.chung@eng 2001-07-12 For ladybird, I implemented a hack in jvmpi heap dump to check if compiled method is exiting at that safepoint. If so, skip and retry. For Merlin, C1 team agreed to fix C1 to generate oopmap at method exit to resolve that problem. A new bug 4478223 was filed to track that. mandy.chung@eng 2001-07-19 Sitraka has confirmed that 1.3.1_01 hotspot with the fixes for heap dump problem fxies the heap dump and no longer has the SafepointSynchronize failure. ###@###.### 2002-01-04 Verified the bug fix as descirbed. Also the customer does not the see the problem any more.
04-01-2002