JDK-8329968 : os::random should be random
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 23
  • Priority: P3
  • Status: Resolved
  • Resolution: Won't Fix
  • Submitted: 2024-04-09
  • Updated: 2024-08-10
  • Resolved: 2024-08-10
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 24
24Resolved
Related Reports
Blocks :  
Relates :  
Relates :  
Relates :  
Description
We don't initialize the random seed for os::random. We start with a static seed. So, every sequence of os::random will produce exactly the same sequence of numbers.


Comments
We decided on that its not considered a bug within hotspot. For gtests specifically, I will open a separate issue.
10-08-2024

A pull request was submitted for review. Branch: master URL: https://git.openjdk.org/jdk/pull/18702 Date: 2024-04-09 16:55:07 +0000
09-07-2024

[~jrose] I'm not fighting you :) I like the idea of having a single knob, and unifying seed handling. It is spread out currently. A single seed would also help with certain reproducibility issues. Note that even if we decide that current users of os::random in hotspot don't need true randomization, we have areas that really should be randomized by default. Today, these cases are handled with local pseudo RNGs with random seeds. E.g. we partly roll our own ASLR logic. That should stay random by default. If we unify seed handling, we would have to make a distinction between cases that, by default, should be random (e.g. ihashes ?) and those that should not. Maybe we need two knobs, then.
13-06-2024

There is no one right answer here, and it is useless to try to settle on one by argument. It would be a silly risk to just change the PRNG to seed from some variable source. And it would be hopelessly obscure to make such a change, but only for gtest. Here’s what we should do: Make a VM command option (if it doesn’t exist already!) which controls PRNG seeding. Set it to a default value which is compatible with today’s (predictable) behavior. Allow gtest users to set it to another value which means “pick a different number this time, and then mix it up a bit”. (It doesn’t need to be a truly fresh crypto key, necessarily; a weaker nonce is OK for gtest.) As a THIRD option for that flag, allow it to be set to a value which will cause the JVM to reproduce the PRNG series from a previous run. Where do you get those third values? Well, there needs to be an -Xlog option which will get the JVM to report the chosen seed. (Especially if that seed was chosen as “pick a different number this time”.) And, the crash-dump report from the JVM (if it fails) must also report this number. (I think 64 bits is too small for this parameter. It should be a string. Whether the JVM digests it down to 64 or 128 or 512 bits is the JVM’s business.) As a different “axis” in the design, perhaps there could be a mode, some day, which selects the strength of the PRNG to use. A modular mixer is fine for video games and for most JVMs, but sometimes you want a crypto-strong PRNG or even one which harvests physical entropy (presumably not on every query). To me that feels like a mode switch, which controls QoS for os::random. The JDK uses probabilistic algorithms also, at times. It uses its own PRNGs, but it should (in my opinion) be willing to seed them from os::random, so that one option controls all the streams of PRNGs, both in the JVM and in the JDK. (I don’t mean application classes; they are on their own.) Fight me! :-)
13-06-2024

> I am surprised to encounter that opinion, tbh. I am surprised by your surprise! How else can you know what a function in the VM does without looking at it? We don't have copious API documentation. If you think os::random might be an API you could use then you need to go look at it and make sure. > Therefore, if the original intent had been to have a reproducible random sequence, it failed in doing that. True. The sequence of "random" numbers is deterministic but the consumer of any given random-number is not. But I remain surprised at the surprise this code is causing. It produces a well-defined sequence of psuedo-random numbers based on the initial seed. The initial seed in our case is statically defined to, IIUC, aid in reproducibility. This code was put in place to "randomize" object hash values back in 1998 and the only change since then was the split of `random` and `next_rand` to make the former thread-safe.
13-05-2024

One more note, and an argument for random seeds: Since the random sequence depends on order of os::random(), the sequence is somewhat random today already, since concurrent threads will (at least once in their constructors) call os::random() at quasi-random points. Therefore, if the original intent had been to have a reproducible random sequence, it failed in doing that. So we are a tiny bit undeterministic as soon as multiple threads come into play, but we are not really random either since as long as we call os::random from a single thread and no concurrent threads are running, we always run through the same sequence. That is the worst of both worlds.
10-05-2024

>> os::random is used, among other things, to fuzzy test runs. >Who/what uses it for that? I do, as do others. Grepping for it in gtest yields 21 results. >> The second reason is surprise. >If you want to understand a VM API you need to go look at it. I am surprised to encounter that opinion, tbh. I want my code to be usable by others, without them having to reverse-engineer my intents when I wrote my code. Established standards go a long way to ensure that. >> Lastly, the most observable part of this is that identity hashes are deterministic and always follow the same sequence. I don't particularly like that fact. > I still believe that was a deliberate intent of the way this was written, to provide reproducibility and ease of debugging. With "it" you mean ihash generation, or os::random, or both? If it is the first, perhaps, but a little comment then would have gone a long way to preserve that idea.
10-05-2024

> os::random is used, among other things, to fuzzy test runs. Who/what uses it for that? > The second reason is surprise. Sorry but this is not a library method to be used by applications, it is part of the VM. If you want to understand a VM API you need to go look at it. If I see a function called `random()` I don't assume I know anything about it - I have to go find out exactly what it means in the context it exists. > Lastly, the most observable part of this is that identity hashes are deterministic and always follow the same sequence. I don't particularly like that fact. I still believe that was a deliberate intent of the way this was written, to provide reproducibility and ease of debugging.
10-05-2024

[~dholmes] I disagree. os::random is used, among other things, to fuzzy test runs. Repeating the same random sequence over and over minimizes the potential for these tests to find errors. The second reason is surprise. As an API user, I expect this API to be somewhat random. I don't expect cryptographic randomness, but I do expect non-repeatable runs. After all, for repeatable runs, I have a version of that API that takes a seed. Following expectations is the principle of least surprise and good API design. Lastly, the most observable part of this is that identity hashes are deterministic and always follow the same sequence. I don't particularly like that fact.
10-05-2024

We have 2 different API's for random numbers: // random number generation static int random(); // return 32bit pseudorandom number static int next_random(unsigned int rand_seed); // pure version of random() If code needs true randomness then it uses the `next_random` API. CDS uses a fixed seed for dumping, which I presume is for reproducability. Otherwise the default seed can probably be anything but it has been what is is since day 1. I don;t think true randomness buys us anything here. This is not used for cryptography or anything of that nature. If the seed changes then you need to know what seed was used when trying to debug issues and reproduce crashes. So then we would need a way to force the initrial seed to a specific value. I am not seeing a bug here.
11-04-2024