Bug ID: JDK-8178536 OOM ERRORS + SERVICE-THREAD TAKES A PROCESSOR TO 100%

JDK 10	JDK 7	JDK 8	JDK 9
10Fixed	7u181Fixed	8u141Fixed	9 b175Fixed

Approved for JDK 9.
17-06-2017
Fix Request This is an escalated bug against 8u112 and the fix needs to be integrated into 10, 9 and 8uxx repositories. Here are the details on the problem and the solution: Problem: If there are listeners installed for MemoryMXBean then in the event of an OOME, JVM internal thread 'Service Thread' responsible for delivering notifications to the Java threads itself may encounter an OOM exception and get into a loop of processing its pending requests. This happens if and when the Service Thread executing the native code calls its corresponding java methods and faces an OOM exception, this pending exception makes the thread exit early from SensorInfo��trigger() function before it can update its pending requests counter (_pending_trigger_count). This pending exception is never cleared and that makes the thread loop in LowMemoryDetector::process_sensor_changes(). Hotspot changes: http://cr.openjdk.java.net/~poonam/8178536/webrev.hotspot/ These changes check for the pending exception and clear it, and make sure that the pending requests counters are kept in sync on both the Java and the VM side. JDK changes: http://cr.openjdk.java.net/~poonam/8178536/webrev.jdk/ These changes make the triggerAction() a no-op since we need to call this method if MemoryService::create_MemoryUsage_obj() encounters an OOM exception and we want to avoid further potential OOM exceptions in triggerAction(MemoryUsage). Please approve this fix for inclusion into JDK 9. This is a no risk change and adds proper handling of the pending OOM exception observed by the low memory detector. The changes have been reviewed by Mandy Chung and Daniel Daugherty.
17-06-2017
I thought this issue is target for JDK 10. You pushed it to JDK 9 without JDK 9 Fix Request Approval.
17-06-2017
Moved from hotspot/runtime -> hotspot/svc. The ServiceThread in this situation is on the JVM side of Monitoring and Management support.
07-06-2017
Instead of catching the exception on the Java side, I suggest to make the following changes to handle the pending exception on the VM side in SensorInfo::trigger(): 1. For the OOM exception that we could encounter in: Handle usage_h = MemoryService::create_MemoryUsage_obj(_usage, CHECK); Here, we can change CHECK to CHECK_CLEAR. With that, we would return from VM's SensorInfo::trigger() without calling the Java Sensor::trigger(). The Java and the VM state will stay in sync and we would have the exception cleared. Handle usage_h = MemoryService::create_MemoryUsage_obj(_usage, CHECK_CLEAR); 2. For JavaCalls::call_virtual(&result, sensorKlass, vmSymbols::trigger_name(), vmSymbols::trigger_method_signature(), &args, CHECK); I suggest to do the following change: JavaCalls::call_virtual(&result, sensorKlass, vmSymbols::trigger_name(), vmSymbols::trigger_method_signature(), &args, + THREAD); +if (HAS_PENDING_EXCEPTION) { + // tty->print_cr("Pending exception after Java trigger() call..."); + // we just clear the OOM pending exception that we might have encountered in Java's tiggerAction(), + // and continue with updating the counters since the Java counters have been updated too. + assert((PENDING_EXCEPTION->is_a(SystemDictionary::OutOfMemoryError_klass())), "we expect only an OOM error here"); + CLEAR_PENDING_EXCEPTION; +} { // Holds Service_lock and update the sensor state MutexLockerEx ml(Service_lock, Mutex::_no_safepoint_check_flag); _sensor_on = true; _sensor_count += count; _pending_trigger_count = _pending_trigger_count - count; } With this change, we clear the exception(to avoid the ServiceThread's spinning) and continue with updating the counters to keep the Java and VM state in sync.
28-05-2017
SensorInfo::trigger and SensorInfo::clear are called for low memory detection. This is the VM implementation for java.lang.management.MemoryPoolMXBean for the memory usage and GC notification. The threshold is set in the Java side and passes it to the VM for monitoring. VM will notify the Java side via sun.management.Sensor objects. Objects will be allocated during the notification in the sun.management.Sensor::trigger method. In the current implementation, there is no object allocation happened in clearing the sensor. The sensor state is also maintained in the VM side and it has to be kept in sync with the changes in the Java side. One possible fix is to have: 1. SensorInfo::trigger catches and clears any exception thrown in Sensor.trigger(int, MemoryUsage) method call. If this happens, it means that the notification is not sent. VM may want to log this cleared exception for troubleshooting. The sensor state in both the VM and Java side are updated regardless of the request is processed successfully or not. Although Sensor.clear() does not do any object allocation, there is no harm to catch and clear pending exception thrown by Sensor.clear(). 2. MemoryUsage object is created and pass it to Sensor::trigger method call. OOME may be thrown. Handle usage_h = MemoryService::create_MemoryUsage_obj(_usage, CHECK); VM should check and clear OOME (or maybe assert OOME). In this case, one approach is to continue to update the sensor counters but just drop the notification (as in #1 situation). The VM can call Sensor::trigger(int) method if it fails to create MemoryUsage object. The triggerAction() method in PoolSensor and CollectionSensor classes [1] need to be changed to be a nop. The VM can log this exception case as in #1 to help troubleshooting. [1] http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/2d94659f7ff3/src/java.management/share/classes/sun/management/MemoryPoolImpl.java#l292 3. Note that ServiceThread also calls other Java methods. They all may run into OOME and similiar issue that needs to be resolved as well. 124 if (sensors_changed) { 125 LowMemoryDetector::process_sensor_changes(jt); 126 } 127 128 if(has_gc_notification_event) { 129 GCNotifier::sendNotification(CHECK); 130 } 131 132 if(has_dcmd_notification_event) { 133 DCmdFactory::send_notification(CHECK); 134 } 135 136 if (acs_notify) { 137 AllocationContextService::notify(CHECK); 138 }
26-05-2017
Here's what is happening wrong with the ServiceThread - ServiveThread gets stuck in LowMemoryDetector::process_sensor_changes(TRAPS) and this happens because the SensorInfo::trigger() and SensorInfo::clear() fail to update the values of _pending_trigger_count and _pending_clear_count. Debugging showed that since clear() calls sun_management_Sensor_klass() with CHECK, this function returns without setting _pending_clear_count to 0 because there is a pending exception on the thread. 325 void SensorInfo::clear(int count, TRAPS) { 326 if (_sensor_obj != NULL) { 327 Klass* k = Management::sun_management_Sensor_klass(CHECK); 328 instanceKlassHandle sensorKlass (THREAD, k); 329 Handle sensor(THREAD, _sensor_obj); 330 331 JavaValue result(T_VOID); 332 JavaCallArguments args(sensor); 333 args.push_int((int) count); 334 JavaCalls::call_virtual(&result, 335 sensorKlass, 336 vmSymbols::clear_name(), 337 vmSymbols::int_void_signature(), 338 &args, 339 CHECK); 340 } 341 342 { 343 // Holds Service_lock and update the sensor state 344 MutexLockerEx ml(Service_lock, Mutex::_no_safepoint_check_flag); 345 _sensor_on = false; 346 _pending_clear_count = 0; 347 _pending_trigger_count = _pending_trigger_count - count; 348 } 349} (gdb) disassemble Dump of assembler code for function SensorInfo::clear(int, Thread): 0x00007f120c46f9f0 <+0>: push %rbp 0x00007f120c46f9f1 <+1>: mov %rsp,%rbp 0x00007f120c46f9f4 <+4>: mov %rbx,-0x20(%rbp) 0x00007f120c46f9f8 <+8>: mov %r12,-0x18(%rbp) 0x00007f120c46f9fc <+12>: mov %rdi,%rbx 0x00007f120c46f9ff <+15>: mov %r13,-0x10(%rbp) 0x00007f120c46fa03 <+19>: mov %r14,-0x8(%rbp) 0x00007f120c46fa07 <+23>: sub $0xc0,%rsp 0x00007f120c46fa0e <+30>: cmpq $0x0,(%rdi) 0x00007f120c46fa12 <+34>: mov %esi,%r13d 0x00007f120c46fa15 <+37>: mov %rdx,%r12 0x00007f120c46fa18 <+40>: je 0x7f120c46fad0 <SensorInfo::clear(int,Thread)+224> 0x00007f120c46fa1e <+46>: mov %rdx,%rdi 0x00007f120c46fa21 <+49>: callq 0x7f120c4b0a10 <Management::sun_management_Sensor_klass(Thread)> 0x00007f120c46fa26 <+54>: cmpq $0x0,0x8(%r12) 0x00007f120c46fa2c <+60>: mov %rax,%r14 0x00007f120c46fa2f <+63>: je 0x7f120c46fa48 <SensorInfo::clear(int,Thread)+88> 0x00007f120c46fa31 <+65>: mov -0x20(%rbp),%rbx 0x00007f120c46fa35 <+69>: mov -0x18(%rbp),%r12 0x00007f120c46fa39 <+73>: mov -0x10(%rbp),%r13 0x00007f120c46fa3d <+77>: mov -0x8(%rbp),%r14 0x00007f120c46fa41 <+81>: leaveq => 0x00007f120c46fa42 <+82>: retq 0x00007f120c46fa43 <+83>: nopl 0x0(%rax,%rax,1) 0x00007f120c46fa48 <+88>: mov (%rbx),%rdx 0x00007f120c46fa4b <+91>: lea -0x30(%rbp),%rdi 0x00007f120c46fa4f <+95>: mov %r12,%rsi 0x00007f120c46fa52 <+98>: callq 0x7f120c02b720 <Handle::Handle(Thread*, oop)> 0x00007f120c46fa57 <+103>: mov -0x30(%rbp),%rax 0x00007f120c46fa5b <+107>: lea -0xc0(%rbp),%r8 0x00007f120c46fa62 <+114>: movl $0xe,-0x40(%rbp) I think we should be updating the values of these flags before making the Java call on the sensor object.
12-05-2017