JDK-8239005 : [TESTBUG] test/hotspot/jtreg/runtime/StackGuardPages/TestStackGuardPages.java: exeinvoke.c: must initialize static state before calling do_overflow()
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 11,12,13,14,15
  • Priority: P4
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2020-02-13
  • Updated: 2020-06-05
  • Resolved: 2020-02-17
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 11 JDK 13 JDK 14 JDK 15
11.0.7Fixed 13.0.4Fixed 14.0.2Fixed 15 b11Fixed
Related Reports
Relates :  
Description
The testcase test_native_overflow fails for the initial thread if the stacksize of the initial thread is much smaller than the stacksize of the "other" thread.

Reason for this is that the static variable _rec_count used in do_overflow() is not reset to 0 before testing the initial thread.

test_native_overflow step-by-step:

* static variables _rec_count and _kp_rec_count get initialized to 0

* Initial thread T0 creates other thread T1 with run_native_overflow as start routine

* T1 executes run_native_overflow

  - T1 calls AttachCurrentThread()

  - T1 calls do_overflow() 

  - T1 receives SIGSEGV as it reaches the VMs stack guard pages and does the longjmp back

  - _rec_count has a value of let's say 50000

  - T1 calls DetachCurrentThread() and returns

* Initial thread T0 joins T1

* T0 executes run_native_overflow

  - T0 calls AttachCurrentThread()

  - BUG: _rec_count is *not* reset and keeps its value of 50000.

  - T0 calls do_overflow() 

  - T0 receives SIGSEGV after let's say 10000 recursions as it reaches the VMs stack guard pages, because T0's stack is smaller. T0 does the longjmp back

  -  _rec_count has the incorrect value of 60000.

  - T0 calls DetachCurrentThread()

  - Assignment _kp_rec_count = _rec_count

  - only now _rec_count is reset to 0

  - T0 calls do_overflow() 

  - T0 receives SIGSEGV at _rec_count == 20000, because it overflows its stack, which is smaller than the stack of the initial thread.

  - T0 signals test failure, because that SIGSEGV was unexpected.


### FAILURE output manually executing "invoke test_native_overflow"
Machine: Linux ld9510 3.12.57-60.35-default #1 SMP Tue Mar 22 10:47:09 UTC 2016 (1cd55eb) ppc64le ppc64le ppc64le GNU/Linux

Test started with pid: 5258

Testing NATIVE_OVERFLOW
Testing stack guard page behaviour for other thread
run_native_overflow 5981
Java thread is alive.
Got SIGSEGV(2) at address: 0x3fff4d3effc0
Test PASSED. Got access violation accessing guard page at 50750
Test PASSED. Not initial thread
Testing stack guard page behaviour for initial thread
run_native_overflow 5258
Java thread is alive.
Got SIGSEGV(2) at address: 0x3fffd7b1fff8
Test PASSED. Got access violation accessing guard page at 55822
Got SIGSEGV(2) at address: 0x3fffd73efff8
Test FAILED. Stack guard page is still there at 52176

### SUCCESS output *without* fix
Machine: Linux lu0486 4.4.0-173-generic #203-Ubuntu SMP Wed Jan 15 02:55:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Test started with pid: 80410

Testing NATIVE_OVERFLOW
Testing stack guard page behaviour for other thread
run_native_overflow 80425
Java thread is alive.
Got SIGSEGV(2) at address: 0x7f8814cbeff8
Test PASSED. Got access violation accessing guard page at 135867
Test PASSED. Not initial thread
Testing stack guard page behaviour for initial thread
run_native_overflow 80410
Java thread is alive.
Got SIGSEGV(2) at address: 0x7ffe45f90ff8
Test PASSED. Got access violation accessing guard page at 151828
Got SIGSEGV(1) at address: 0x7ffe4583cff8
Test PASSED. No stack guard page is present. SIGSEGV(1) at 136025

### Note: the initial thread receives a SIGSEGV after detaching too, but with SEGV_MAPERR (1) instead of SEGV_ACCERR (2)

### Output *WITH* fix (linuxppc64le)
Machine: Linux ld9510 3.12.57-60.35-default #1 SMP Tue Mar 22 10:47:09 UTC 2016 (1cd55eb) ppc64le ppc64le ppc64le GNU/Linux

Test started with pid: 35544

Testing NATIVE_OVERFLOW
Testing stack guard page behaviour for other thread
run_native_overflow 35589
Java thread is alive.
Got SIGSEGV(2) at address: 0x3fff7011ffc0
Test PASSED. Got access violation accessing guard page at 50750
Test PASSED. Not initial thread
Testing stack guard page behaviour for initial thread
run_native_overflow 35544
Java thread is alive.
Got SIGSEGV(2) at address: 0x3fffe0a7ffb0
Test PASSED. Got access violation accessing guard page at 5020
Test PASSED. No stack guard page is present. Maximum recursion level reached at 5020

### Output *WITH* fix (linuxx86_64)
Machine: Linux lu0486 4.4.0-173-generic #203-Ubuntu SMP Wed Jan 15 02:55:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Test started with pid: 83586

Testing NATIVE_OVERFLOW
Testing stack guard page behaviour for other thread
run_native_overflow 83599
Java thread is alive.
Got SIGSEGV(2) at address: 0x7f17e8036ff8
Test PASSED. Got access violation accessing guard page at 135867
Test PASSED. Not initial thread
Testing stack guard page behaviour for initial thread
run_native_overflow 83586
Java thread is alive.
Got SIGSEGV(2) at address: 0x7fffb250efe8
Test PASSED. Got access violation accessing guard page at 15895
Test PASSED. No stack guard page is present. Maximum recursion level reached at 15895

### Note: no signal is received until max. recursions are reached.

Comments
Fix request (13u): The original change applies cleanly, passes tier1,tier2,tier3 tests.
05-06-2020

Fix Request (14u) This makes test more reliable. Patch applies cleanly to 14u. Test keeps passing on x86_64.
06-03-2020

Fix Request (11u) This patch improves test reliability. Patch applies cleanly to 11u, affected tests still passes, tier{1,2,3} pass.
18-02-2020

URL: https://hg.openjdk.java.net/jdk/jdk/rev/f53f0d0637d8 User: rrich Date: 2020-02-17 10:42:02 +0000
17-02-2020

We are getting test failures on one of our linuxppcle boxes since JDK-8179317.
17-02-2020