FULL PRODUCT VERSION :
java version "1.8.0_77"
Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 10.0.10586]
A DESCRIPTION OF THE PROBLEM :
The attached program should produce a stream of the cartesian product of several given sets. (In the example program, the "sets" are sets of digits, and the "products" are numbers composed of those digits.) Instead, it buffers the entire cartesian product before any processing begins, thereby consuming an unlimited amount of memory, causing OutOfMemoryError in practical applications with large sets.
The documentation does not explicitly state that flatMap will use a bounded amount of memory if it can, but it does not seem to be in the spirit of streaming, to buffer the entire stream before processing any of it. The Workaround section below shows an alternative implementation using LongStream.concat, which actually *does* explicitly state that it "creates a lazily concatenated stream"; yet even that solution eagerly evaluates all but one of the dimensions of the cartesian product.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the attached program and observe that the complete cartesian product is constructed and buffered (as shown by the "peek" code that prints all generated values) before any processing begins.
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
+ 1
+ 11
+ 111
First is 111
ACTUAL -
+ 1
+ 11
+ 111
+ 112
+ 113
+ 12
+ 121
+ 122
+ 123
+ 13
+ 131
+ 132
+ 133
+ 2
+ 21
+ 211
+ 212
+ 213
+ 22
+ 221
+ 222
+ 223
+ 23
+ 231
+ 232
+ 233
+ 3
+ 31
+ 311
+ 312
+ 313
+ 32
+ 321
+ 322
+ 323
+ 33
+ 331
+ 332
+ 333
First is 111
REPRODUCIBILITY :
This bug can be reproduced always.
---------- BEGIN SOURCE ----------
package org.vena.qb;
import java.util.stream.LongStream;
public class BugReport {
static final int NUM_DIMENSIONS = 3;
static final int TOP_DIGIT = 3;
public static void main(String[] args) {
LongStream permutations = LongStream.of(0L);
for (int dimension = 0; dimension < NUM_DIMENSIONS; dimension++) {
permutations = permutations.flatMap(tens -> LongStream
.rangeClosed(1, TOP_DIGIT).map(ones-> 10*tens + ones)
.peek(v->System.out.println("+ "+v)));
}
long first = permutations.findFirst().getAsLong();
System.out.println("First is " + first);
}
}
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
I've been unable to find a workaround involving streams. I have a suitable workaround using non-stream code, which I then wrap in a stream, so I've listed this as No Impact.
The closest I got to a stream-based solution was this implementation, but it still buffers all but one dimension:
for (int dimension = 0; dimension < NUM_DIMENSIONS; dimension++) {
permutations = permutations
.mapToObj(tens -> LongStream
.rangeClosed(1, TOP_DIGIT)
.map(ones -> 10*tens + ones)
.peek(n->System.err.println("+ " + n)))
.reduce(LongStream.empty(), (a,b)->LongStream.concat(a, b));
}