JDK-8332965 : os::random shows bad distribution
  • Type: Bug
  • Component: hotspot
  • Sub-Component: runtime
  • Affected Version: 23
  • Priority: P4
  • Status: Closed
  • Resolution: Not an Issue
  • Submitted: 2024-05-27
  • Updated: 2024-05-28
  • Resolved: 2024-05-27
Related Reports
Relates :  
Description
Random value distribution of os::next_random is pretty bad. This little test:

```
TEST_VM(os, rand_dist) {
  int buckets[64];
  constexpr unsigned num_buckets = sizeof(buckets)/sizeof(buckets[1]);
  constexpr size_t total_range = (4 * G);
  constexpr unsigned range_bucket = (unsigned)(total_range/num_buckets);
  for (unsigned i = 0; i < 10000000; i++) {
    unsigned x = (unsigned)os::random();
    unsigned y = x / range_bucket;
    buckets[y] ++;
  }
  for (unsigned i = 0; i < num_buckets; i++) {
    tty->print("%d ", buckets[i]);
  }
  int largest = 0;
  for (unsigned i = 0; i < num_buckets; i++) {
    largest = MAX2(largest, buckets[i]);
  }
  constexpr int num_lines = 16;
  const int step = largest / num_lines;
  for (int line = 0; line < num_lines; line ++) {
    tty->cr();
    const int threshold = largest - (line * step);
    for (unsigned i = 0; i < num_buckets; i++) {
      tty->print("%c ", buckets[i] > threshold ? 'X' : ' ');
    }
  }
}
```

Shows, for 10 mio values:

```
1804613236 312218 -1971345340 1248903260 313243 313064 -228130408 312236 312027 312940 312756 312965 814028451 312553 311926 312095 814007330 313399 1923859592 300131575 54085151 17113941 312838 313075 312041 312827 -1781999222 431490 1804613540 312669 312917 312017 1804300624 1 -1971635044 84606977 0 0 0 0 1 0 -1782311381 118933 53772288 24576 1804300832 1 1804300816 1 -1971604812 -753893375 1716799706 0 253119 0 150085632 1 814760336 1 150089176 1 0 0 

```

Very spiky, strong emphasis for lower values. Interestingly enough, spikes are also somewhat independent from seed, e.g. we will always see a lot of values in the lowest bucket.

Consequences:

- this mostly affects gtests, and some other parts of the JVM. Note that I have not tested if this affects ihashes. ihash RNG seeds are generated with os::random, so its seed quality suffers; OTOH, their RNG is different from then on.

Comments
[~dholmes] I screwed up in my test program. Did not initialize the histogram array (big facepalm). With a zero-initialiyed array, distribution is perfectly flat. Yes, the 2^31 issue comes on top of that.
28-05-2024

Please elaborate on the bug. :) I was going to query the 4G range when the output is limited to 2^31 - 1
28-05-2024

Turns out my tester had a bug, and this is not an issue at all.
27-05-2024