JDK-8137240 : Negative lookahead in RegEx breaks backreference
  • Type: Bug
  • Component: core-libs
  • Sub-Component: jdk.nashorn
  • Affected Version: 8,9
  • Priority: P3
  • Status: Resolved
  • Resolution: Fixed
  • OS: other
  • CPU: x86
  • Submitted: 2015-09-12
  • Updated: 2016-10-13
  • Resolved: 2016-06-24
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u112Fixed 9 b125Fixed
Description
FULL PRODUCT VERSION :
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

ADDITIONAL OS VERSION INFORMATION :
Darwin 14.5.0 Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64 x86_64 i386

A DESCRIPTION OF THE PROBLEM :
In the Nashorn engine, using a JavaScript RegEx containing a matching group followed by a negative lookahead followed by a backreference to that matching group does not work as expected.
The backreference appears to refer to the negative lookahead instead of the matching group.
Since a negative lookahead is zero-length, this will cause the backreference to match anything.
This can be seen, for example, when using the RegEx for replacement, using JavaScript's String.prototype.replace function.

Example JavaScript code:

'aa'.replace(/(a)(?!b)\1/gm, 'c');

This returns

cc

when one would expect only

c

I don't know whether this only applies to negative lookaheads or lookarounds in general, but it certainly does apply to negative lookaheads.
Running the exact same code in the browser console of Chrome/Firefox/Safari/Opera produces the expected result.

Related question on StackOverflow:

https://stackoverflow.com/q/32480370

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Just run the attached code.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
A console output of:

c
ACTUAL -
A console output of:

cc

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import javax.script.Invocable;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;

class Test
{
    public static void main(String[] args) throws Exception
    {
        ScriptEngine js = new ScriptEngineManager().getEngineByName("JavaScript");
        js.eval("function x(s){return s.replace(/(a)(?!b)\1/gm, 'c');}");
        System.out.println(String.valueOf(((Invocable)js).invokeFunction("x", "aa")));
    }
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
None, apart from installing a separate program to parse and run JavaScript, and calling said program using Runtime.getRuntime().exec().


Comments
Attached Test case executed on : JDK 8- Fail JDK 8u66 -Fail JDK 9ea - Fail. This works fine with "Rhino" engine in JDK 7u80. The output evaluated by Nashorn should be consistent with that evaluated by the browsers (tested with Chrome, Firefox and IE ). So moving to dev-team to fix this.
28-09-2015