JDK-4430697 : Linux: -Xrunhprof:cpu=samples PepTest crashes in RawMonitorExit
  • Type: Bug
  • Component: vm-legacy
  • Sub-Component: jvmpi
  • Affected Version: 1.3.1
  • Priority: P2
  • Status: Closed
  • Resolution: Fixed
  • OS: linux
  • CPU: generic
  • Submitted: 2001-03-27
  • Updated: 2002-09-06
  • Resolved: 2002-09-06
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
Other Other
1.3.1 rc2Fixed 1.4.0Fixed
Related Reports
Relates :  
Description
daniel.daugherty@Eng 2001-03-27

This bug was encountered while chasing the following bug:

4369489 2/5 jdb does not work on Linux platform.

This bug is the seventh layer of that onion.

When "java_g -Xrunhprof:cpu=samples" PepTest is run, it quite
frequently crashes in RawMonitorExit(). Uses the java command
crashes in the same way. Adding -XX:+SafepointALot makes the
crash happen earlier.

Comments
CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: ladybird FIXED IN: ladybird merlin-beta merlin-beta2 INTEGRATED IN: ladybird-rc2 merlin-beta
14-06-2004

EVALUATION daniel.daugherty@Eng 2001-03-27 While trying to reproduce this crash in the debugger, I ran into the following assertion failure: assert(_is_owned == v_false, "mutex_lock should not have had owner") src/os/linux/vm/os_linux.hpp, 256 This assertion failure indicates a race between RawMonitorExit and RawMonitorEnter. This is quite possibly a different version of this crash. daniel.daugherty@Eng 2001-04-03 Hui Huang pointed out a problem in the new SR_handler() logic: do { sigsuspend(&set); + if (osthread->suspend_action() != SR_CONTINUE && + osthread->is_try_mutex_enter()) { + // We received a signal, but not from a resume operation. Since + // we were trying to enter a mutex, we need to partially resume + // so we can unlock the mutex. We break rather than return so + // that the context information is cleared and then we restore + // our "suspended" state so that ObjectMonitor::raw_enter() + // knows what is happening. + mutex_granted = true; + break; + } } while (osthread->suspend_action() != SR_CONTINUE); When we return from sigsuspend() above, if the suspend_action field is not SR_CONTINUE and we are trying to enter a mutex, then we assume we have been granted the mutex. Now consider the new "else" case in SR_handler(): + } else if (action == SR_CONTINUE && osthread->is_try_mutex_enter()) { + // Normally, we receive a resume signal in the sigsuspend() loop + // above. However, if we were granted a mutex while suspended, + // then we returned back to ObjectMonitor::raw_enter() to unlock + // the mutex. When the subsequent resume signal comes in, we + // catch it here and need to "resume" the thread. + osthread->set_suspend_action(SR_NONE); + } When we are in sigsuspend() in the previous loop, then the SR_handler() gets called again when the resume signal comes in. Since the action is SR_CONTINUE (set by the resumer thread), we get to the else and if we are trying to enter a mutex, then we reset the suspend_action field to SR_NONE. This completes the resume action. Now return to the previous loop. When a resume signal comes in and we are trying to enter a mutex...we find the suspend_action != SR_CONTINUE because it is now equal to SR_NONE. So when we get resumed, we also think we have been granted the mutex. A duality that results in us setting the suspend_action to SR_SUSPENDED just before returning to the caller, pthread_mutex_lock(). So we have been resumed, but we are marked as being suspended (SR_SUSPENDED). Another problem with the new loop is that signal other than resume or mutex grant will cause us to prematurely return to the caller. I have not investigated this trail too far. For some reason, this early return to pthread_mutex_lock() does not result in us stalling there until we are granted the mutex. We fall through and fail the assertion because we think we have been granted the mutex, but it is owned by someone else.
11-06-2004