JDK-8297106 : Remove the -Xcheck:jni local reference capacity checking
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 8,11,17,20
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2022-11-16
  • Updated: 2022-12-05
  • Resolved: 2022-11-29
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 20
20 b26Fixed
Related Reports
Duplicate :  
Relates :  
Description
The Xcheck:jni version of EnsureLocalCapacity tries to validate use of local refs by tracking an artificial limit (the "planned capacity") which is checked on all JNI method returns, to ensure that the limit has not been exceeded (with some slack built in- CHECK_JNI_LOCAL_REF_CAP_WARN_THRESHOLD):

Checked JNI function exit:
  size_t planned_capacity = handles->get_planned_capacity();
  size_t live_handles = handles->get_number_of_live_handles();
  if (live_handles > planned_capacity) {
    IN_VM(
      tty->print_cr("WARNING: JNI local refs: " SIZE_FORMAT ", exceeds capacity: " SIZE_FORMAT,
                    live_handles, planned_capacity);
      thr->print_stack();
    )

A well-written piece of native code that uses JNI is supposed to track the expected number of local refs it will use and call EnsureLocalCapacity beforehand to ensure it won't exceed the planned capacity.

Prior to JDK-8193222 EnsureLocalCapacity(capacity) simply did:

 add_planned_handle_capacity(thr->active_handles(), capacity);

where:

add_planned_handle_capacity(JNIHandleBlock* handles, size_t capacity) {
  handles->set_planned_capacity(capacity +
                                handles->get_number_of_live_handles() +
                                CHECK_JNI_LOCAL_REF_CAP_WARN_THRESHOLD);
}

so we simply add the required amount of capacity to the current in-use amount, plus the slack.

That approach was wrong because it did not handle nested native method calls correctly. Suppose funcA will create 20 local refs, and funcB will create 5, then they both call EnsureLocalCapacity with 20 and 5 respectively. But if funcA calls funcB then we have a problem. Lets say there are initially 0 active handles and the slack is 2. Then:

EnsureLocalCapacity(20) will do:

planned_capacity = 20 + 0 + 2 = 22

Lets say funcA creates no local refs before calling funcB, then in funcB EnsureLocalCapacity(5) will do:

planned_capacity = 5 + 0 + 2 = 7

we then return to funcA and proceed to loop 20 times to call a JNI function that returns a local ref. Once we get to the 8th iteration the function exit check will detect 8 active handles is greater than the planned capacity of 7 and produce the warning! The code lost the fact that funcA had requested a capacity of 20.

So the fix in JDK-8193222 was to have EnsureLocalCapacity only call add_planned_handle_capacity if the requested capacity was greater than the current planned_capacity. So in funcB the call to EnsureLocalCapacity becomes a no-op and when we return to funcA we still have a planned_capacity of 22 and now our loop completes fine. So bug fixed in a simple way - all good!

Except it isn't all good. What we haven't considered is that funcB created 5 local refs. Now if funcB calls DeleteLocalRef for each local ref it created all is well and good - we still have capacity for 22 local refs. However, funcB is not required to call DeleteLocalRef, and if it doesn't then our available capacity has been reduced to 22 - 5 = 17. So on the 18th iteration of the loop the function exit check will again fail and we get the warning!

So how to address this? The problem is that at the time EnsureLocalCapacity is called in funcB there is no knowledge of how many active handles will be created and remain when funcB returns. So there is no way to address this problem in EnsureLocalCapacity. Also funcB is just application native code, not a JNI function itself, so we can't put an adjustment in the JNI function exit hook that Xcheck:jni installs. So in short there seems no way to address this!

Now you may suggest that funcA has to call EnsureLocalCapacity with a value that accounts for all the local-refs created transitively by the code in funcA, including funcB - that would certainly fix the problem. But there is no way in general to know what this value would be. We would just encourage programmers to call EnsureLocalCapacity(BigNumber) to "ensure" there is plenty of allowance made. That makes the checked EnsureLocalCapacity a useless tool.

In practice, in Hotspot, the actual functionality of EnsureLocalcapacity is a no-op: there is no inherent local ref capacity limit: we create them till we run out of memory. So the Xcheck:jni version of EnsureLocalCapacity is just a way to encourage people to write portable JNI code, in case it runs on a VM that does have a limit. So the checked version is there to help developers write their native code in a way that will work on other VMs. This raises the question why we actually do this? If another VM has an inherent local ref limit (per the JNI specification) then why doesn't that other VM provide the tools to help developers write correct JNI code?

I think the checked version of EnsureLocalCapacity, whilst well-intentioned, cannot be implemented in a correct and useful way, and so has no real value and should be removed.
Comments
Changeset: 692bedbc Author: David Holmes <dholmes@openjdk.org> Date: 2022-11-29 02:03:49 +0000 URL: https://git.openjdk.org/jdk/commit/692bedbc1df153f362b8e85693f20b089b5594e2
29-11-2022

Even if this was C++ and we could use "helper" objects to increment a relative local capacity in funcN() and decrement that relative local capacity when funcN() returns, we are killed by this: > However, funcB is not required to call DeleteLocalRef, and if it doesn't then our > available capacity has been reduced to 22 - 5 = 17. If funcB doesn't have to clean up after itself, then nothing we do in the caller's context will help. Sigh... There's a reason I hate JNI... :-)
22-11-2022

Thanks for the comment [~jvernee]. The main use of PushLocalFrame/PopLocalframe is to set up a scope that automatically clears local refs when you pop. The capacity associated with that is unused other than being range checked. In essence PushLocalFrame(n) is the same as doing a (hypothetical) PushLocalFrame() followed by EnsureLocalCapacity(n), so has the same issues in terms of capacity checking. It is also unclear how the capacity specified for PushLocalFrame interacts with that set by EnsureLocalCapacity. As we have no inherent limit these spec issues have never been raised.
22-11-2022

We also have PushLocalFrame/PopLocalFrame, the former of which can be used to set a capacity as well it looks like. These push/pop a new jni handle block, which tracks the planned capacity, so EnsureLocalCapacity could work for nested function calls if those functions push a local frame at the start, and pop it again at the end. I guess EnsureLocalCapacity alone would still be useful for root calls (i.e. first native frame when coming from Java). In HotSpot we automatically only set a new handle block when doing an upcall into Java. In other words, the handle block is the same across consecutive down calls from Java to native. Each jni call that comes from Java would have to call EnsureLocalCapacity to set the capacity for the currently active handle block. But, I don't think we do that anywhere at the moment. Just adding some information. I don't have an opinion on whether the current checking behavior is useful or not.
21-11-2022

A pull request was submitted for review. URL: https://git.openjdk.org/jdk/pull/11259 Date: 2022-11-21 06:53:02 +0000
21-11-2022

Alternatively consider if EnsureLocalCapacity(n) always meant "increase the previous planned capacity by 'n' " (not, importantly, increase the available capacity by 'n'). That way the "planned capacity" would only ever grow as we have no way to know when it could be shrunk again (other than returning to Java code - but not all native code returns to Java). In this case if we had things like JVMTI GetLoadedClasses call EnsureLocalCapacity before converting the oops into local refs, then we would not get warnings in the caller. But this seems of very limited use if the goal is to encourage programmers to correctly "size" their JNI code, as now failure to do so can easily be masked by a large increase to planned capacity by a JVM TI function. And as the "planned capacity" cannot be reduced we would easily mask many cases of incorrectly "sized" code. So I remain of the view that the checked version of EnsureLocalCapacity has no utility. I also think the regular version of EnsureLocalCapacity is a flawed and impractical API, whose flaws and limited were never really exposed because we don't actually have any inherent capacity limit.
21-11-2022

Thanks [~cjplummer] I will remove the test from the ProblemList and close JDK-8296936 in favour of this issue.
17-11-2022

The issue we run across with JDK-8296936 is that there are a few JVMTI functions that will allocate and return a number of localrefs (in an array), and the caller has no way of knowing what that number may be. Therefore it can't call EnsureLocalCapacity() in a meaningful way to avoid the -Xcheck:jni warning.
17-11-2022

JDK-8296709 added a test case for JDK-8296936 and then immediately problem listed it due to the -Xcheck:jni warning. The test case should be removed from the problem list when this CR is implemented.
17-11-2022