JDK-8343111 : Add getChars(int, int, char[], int) to CharSequence and CharBuffer
  • Type: CSR
  • Component: core-libs
  • Sub-Component: java.lang
  • Priority: P4
  • Status: Closed
  • Resolution: Approved
  • Fix Versions: 25
  • Submitted: 2024-10-26
  • Updated: 2025-05-06
  • Resolved: 2025-05-05
Related Reports
CSR :  
Description
Summary
-

New method `CharSequence.getChars(int, int, char[], int)` on `java.lang.CharSequence` to **bulk-read** characters from a region in the `CharSequence` into a region of the provided `char[]`.

Problem
-

`CharSequence` interface does not provide a bulk-read facility.
Some `CharSequence` implementations do not support bulk-reading at all, while others (like `String` and `CharBuffer`) do support bulk-reading but do not share a *common* method signature to perform it.

Applications wanting to bulk-read from `CharSequence` must contain a (theoretically infinite) list of bulk-reading solutions, e. g. (copied from `Reader.of(CharSequence)` in JDK 24):

    switch (cs) {
    	case String s -> s.getChars(next, next + n, cbuf, off);
    	case StringBuilder sb -> sb.getChars(next, next + n, cbuf, off);
    	case StringBuffer sb -> sb.getChars(next, next + n, cbuf, off);
    	case CharBuffer cb -> cb.get(next, cbuf, off, n);
    	default -> {
    		for (int i = 0; i < n; i++)
    			cbuf[off + i] = cs.charAt(next + i);
    	}
    }

The motivation for the new method is performance as sequential reading is expensive.

Solution
-

Add a new method `public void getChars(int, int, char[], int)` to `CharSequence` and `CharBuffer`.


Specification
-

String:

    /**
     * {@inheritDoc CharSequence}
     * @param srcBegin {@inheritDoc CharSequence}
     * @param srcEnd   {@inheritDoc CharSequence}
     * @param dst      {@inheritDoc CharSequence}
     * @param dstBegin {@inheritDoc CharSequence}
     * @throws IndexOutOfBoundsException {@inheritDoc CharSequence}
     */
    public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)

StringBuffer and StringBuilder:

    /**
     * {@inheritDoc CharSequence}
     */
    public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)

CharSequence:

    /**
     * Copies characters from this sequence into the given destination array.
     * The first character to be copied is at index {@code srcBegin}; the last
     * character to be copied is at index {@code srcEnd-1}. The total number of
     * characters to be copied is {@code srcEnd-srcBegin}. The
     * characters are copied into the subarray of {@code dst} starting
     * at index {@code dstBegin} and ending at index:
     * <pre>{@code
     * dstbegin + (srcEnd-srcBegin) - 1
     * }</pre>
     *
     * @param      srcBegin   start copying at this offset.
     * @param      srcEnd     stop copying at this offset.
     * @param      dst        the array to copy the data into.
     * @param      dstBegin   offset into {@code dst}.
     * @throws     IndexOutOfBoundsException  if any of the following is true:
     *             <ul>
     *             <li>{@code srcBegin} is negative
     *             <li>{@code dstBegin} is negative
     *             <li>the {@code srcBegin} argument is greater than
     *             the {@code srcEnd} argument.
     *             <li>{@code srcEnd} is greater than
     *             {@code this.length()}.
     *             <li>{@code dstBegin+srcEnd-srcBegin} is greater than
     *             {@code dst.length}
     *             </ul>
     * @throws     NullPointerException if {@code dst} is {@code null}
     *
     * @implSpec
     * The default implementation invokes {@link #charAt(int index)} in a loop
     * iterating {@code index} from {@code srcBegin} to {@code srcEnd-1}.
     * Concurrent truncation of this character sequence can throw
     * {@code IndexOutOfBoundsException}. In this case, some characters, but not
     * all, may be already transferred.
     *
     * @since 25
     */
    public default void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)

CharBuffer:

    /**
     * Absolute bulk <i>get</i> method.
     *
     * <p> This method transfers {@code srcEnd-srcBegin} characters from this
     * buffer into the given array, starting at index {@code srcBegin} in this
     * buffer and at offset {@code dstBegin} in the array. The position of this
     * buffer is unchanged.
     *
     * @param  srcBegin
     *         The index in this buffer from which the first character will be
     *         read; must be non-negative and less than {@code limit()}
     *
     * @param  srcEnd
     *         The index in this buffer directly before the last character to
     *         read; must be non-negative and less or equal than {@code limit()}
     *         and must be greater or equal than {@code srcBegin}
     *
     * @param  dst
     *         The destination array
     *
     * @param  dstBegin
     *         The offset within the array of the first character to be
     *         written; must be non-negative and less than {@code dst.length}
     *
     * @throws  IndexOutOfBoundsException
     *          If the preconditions on the {@code srcBegin}, {@code srcEnd},
     *          and {@code dstBegin} parameters do not hold
     *
     * @implSpec This method is equivalent to
     *           {@code get(srcBegin, dst, dstBegin, srcEnd - srcBegin)}.
     *
     * @since 25
     */
    @Override
    public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)

Comments
>mkarg, it is generally the responsibility of the assignee to move the CSR to Finalized, as discussed in the CSR documentation. I know that. What I do *not* know is how I can find out *whom* I have to wait for before I move the CSR to Finalize, i. e. the one who is perfomring those "internal compatibility investigations". Is that always Jaikiran, or is there a group that I can ask or a doc section that tells me?
06-05-2025

[~jpai], thank you for the additional analysis. Moving this CSR to Approved contingent on the planned release note being written.
05-05-2025

I have now completed the corpus analysis for the new (default) method `void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)` on CharSequence and CharBuffer. The corpus run found no existing methods on subclasses of `java.nio.CharBuffer` which match this signature. On the other hand, several classes in various libraries have been found implementing the proposed new method `java.lang.CharSequence.getChars(...)`. I have analyzed a majority of those existing implementations and almost all of those look compatible with the proposed semantics of this method. Many of them merely either call `System.arrayCopy(...)` or call `.getChars()` on an internal `String` or `StringBuilder` fields. The exceptions thrown from these existing methods appear to be `NullPointerException`, `StringIndexOutOfBoundException` or `IndexOutOfBoundException`. So those match the proposed API specification too. I found very few instances where the classes implementing `CharSequence` had a `getChars()` method matching the signature of this proposed new method, but the visibility of the methods was either private or package protected. Those instances will run into source compatibility issues with this change but those classes don't appear to be prominent enough and aren't too many in number. In the corpus run, no existing interfaces were found to have a "default" method by this signature. That's a good thing and reduces the chances of running into binary compatibility issue that we previously saw when the default isEmpty() method on CharSequence was introduced (explained at https://stuartmarks.wordpress.com/2020/09/22/incompatibilities-with-jdk-15-charsequence-isempty/ ) Given all this, from a compatibility point of view, I think this proposed API's impact appears minimal. Furthermore, having now analyzed these existing custom implementations of getChars(), I think this proposal to introduce getChars(...) on CharSequence interface is an useful enhancement.
05-05-2025

[~mkarg], it is generally the responsibility of the assignee to move the CSR to Finalized, as discussed in the CSR documentation: "After being reviewed by at least one engineer familiar with that technology area, the request is Finalized by the assignee. After being Finalized, the CSR reviews the request and if there are no problems or shortcomings with the request it will be Approved by the CSR lead." https://wiki.openjdk.org/display/csr/Main
05-05-2025

Hello Markus, > I know, but Joe wrote that he moved back from finalized to provisional for some internal compatibility investigations, I am running some internal compatibility tests. I should have those results soon. Once I've those results I'll add a comment to this issue. Changing the assignee to someone else isn't necessary for such activities. Leaving the assignee to yourself is the correct state.
04-05-2025

>Markus Karg You should transition this CSR to Finalized for a review, instead of assigning Joe. I know, but Joe wrote that he moved back from finalized to provisional for `some internal compatibility investigations`, and these investigations are performed *note by me*. So it was unclear *when and by whom* to move to finalized again.
04-05-2025

[~mkarg] You should transition this CSR to Finalized for a review, instead of assigning Joe. Your CSR will be reviewed if it appears on the jdk-csr-issues-under-review filter as shown in this dashboard: https://bugs.openjdk.org/secure/Dashboard.jspa?selectPageId=17313 This is a helpful page for you to determine if you CSR is in a right state for review.
04-05-2025

[~darcy] As you assigned this CSR back to me, but as I do not have any open issues, how to proceed now? Want me to confirm by setting to Finalized state?
04-05-2025

>Markus Karg, please update the Specification section of this CSR to match the change made in the PR to use @inheritDoc, as suggested in CSR comments. [~darcy] Done. :-)
01-05-2025

[~mkarg], please update the Specification section of this CSR to match the change made in the PR to use `@inheritDoc`, as suggested in CSR comments.
01-05-2025

>mkarg, it is not necessary to change the CSR assignee in that situation. Thank you, understood!
01-05-2025

[~mkarg], it is not necessary to change the CSR assignee in that situation.
30-04-2025

[~darcy] I have added your proposal to the spec. Assigning to you to wait for your internal compatibility investigations.
30-04-2025

Note it is possible to have javadoc like /** * {@inherticDoc Charsequence} * ... // inherticDoc other sections of the documentation */ to avoid including a misleading `@since` tag. I've used that structure in an API I maintain when a similar situation arose. Moving back to Provisional pending resolution of this item and some internal compatability investigations.
29-04-2025

> For String, StringBuffer and StringBuilder the wording is actually at-most similar (nearly identical in wording, and perfectly identical in semantics), hence I propose to completely drop the JavaDocs of getChars in that classes in favor of fully inherited JavaDocs. Hello Markus, removing the javadoc text from the `getChars(...)` method of `String`, `StringBuffer` and `StringBuilder` would mean that `getChars(...)` won't be listed in the methods section of these classes anymore and instead will just be listed in "Methods declared in interface" section of those classes (I built the latest docs and checked locally). That would have been OK but the linked `CharSequence.getChars(...)` has a `@since 25` (for the right reasons) and that would mean that the detail that this method is available on String, StringBuilder and StringBuffer since several older releases will not be properly conveyed. I don't expect you revert that change to the javadoc text just yet and instead I suggest we wait for Joe or others for the javadoc guidance.
27-04-2025

>Alan Bateman, understood for CharBuffer; I was thinking more of String, StringBuffer, and StringBuilder where there looks to be more textual similarities. For String, StringBuffer and StringBuilder the wording is actually at-most similar (nearly identical in wording, and perfectly identical in semantics), hence I propose to completely drop the JavaDocs of getChars in that classes in favor of fully inherited JavaDocs.
26-04-2025

>Per the compatibility risk, were any corpus-style analyses done to see if there are conflicting getChars methods in the wild? I have used the service "grep.app" to analyze GitHub and did not find a single one. It returned over 100 implementations for our proposed method signature. None of them will break. Just a single one did not use the same semantics for the same signature, but used the second int as count instead of end; nevertheless, that was just an internally used class, hence will not have any effect; the software will still work (and if wanted could be adapted within one minute). Looking over the results it looks like all of them actually implemented getChars inspired by String's API (which seconds my initial assumption: people do copy APIs). This result also seconds Alan's assumption that due to the number of parameters, the risk to break existing code is very low (or even near to zero).
24-04-2025

[~alanb], understood for CharBuffer; I was thinking more of String, StringBuffer, and StringBuilder where there looks to be more textual similarities.
23-04-2025

> Was any effort made to explore re-using the text of the components getChars in the various overrides using directed inhertiDoc? CharBuffer needs to be specified as a absolute bulk get method as otherwise it won't fit with the other methods. On the compatibility concern. I put text in the Compatibility Risk Description to remind us of the issues when isEmpty was added. I'm less concerned with getChars because it's a 4-arg method so less likely to conflict.
23-04-2025

Moving to Provisional, not Approved. Was any effort made to explore re-using the text of the components getChars in the various overrides using directed inhertiDoc? Per the compatibility risk, were any corpus-style analyses done to see if there are conflicting getChars methods in the wild?
22-04-2025

If its in the "proposed" state, you *can* wait for CSR reviewer to complete their initial review and advance its state. But its more expedient to go back to draft -> finalize.
16-04-2025

The transitions/edges in the state diagram that are labelled "CSR review" are done by the CSR chair. Large changes will usually use 2 step process, small changes will usually go straight to Finalized. So I think we are good here.
16-04-2025

[~liach] I understand the chart in the exact same way you did, and that is why I originally asked the reviewers of this CSR (you and Alan) to perform exactly that status change. What I do not understand is why you did not simply do exactly that, but instead asked me to switch state back to draft and then forward to finalized? No offence, just trying to learn. 🤔
16-04-2025

As you see, in that char, the proposed -> provisional arrow has a markup "CSR review", meaning this transition is only made by the CSR reviewers. Unfortunately this is not as clear as the transitions in the JEP process, which distinguishes with colored arrows.
15-04-2025

[~liach] Did as you said and it worked, thank you! 😃 What makes me wondering is that https://wiki.openjdk.org/download/attachments/31850534/CSR-two-phase.JPG?version=1&modificationDate=1553121507000&api=v2 says that one *actually can* go from proposed to provisional. This is confusing?! 🤔
15-04-2025

[~mkarg] You cannot transition from "proposed" directly to "finalized". You must back a proposed CSR to draft and then directly to finalized.
14-04-2025

As we now had two reviews without requested changes, I would like to proceed this CSR to the Finalized state. Unfortunately the Web-UI does not offer this option to me (only allows me to Withdraw or Back-To-Draft), so I assume one of you [~alanb] or [~liach] need to do that? Thanks! 🙂
14-04-2025

Aligned spec with latest state of API discussion.
30-03-2025

I have removed apiNotes and implNotes from CharBuffer, as suggested by Chen.
24-03-2025

This API can be helpful for Appendable implementations such as StringBuilder to batch move characters. I recommend removing the api and implementation notes for CharBuffer.getChars; they are redundant.
24-03-2025