As an alternative to calling collect() on a Dataset query, you can
use this function to access the stream of RecordBatches in the Dataset.
This lets you aggregate on each chunk and pull the intermediate results into
a data.frame for further aggregation, even if you couldn't fit the whole
Dataset result in memory.
map_batches(X, FUN, ..., .data.frame = TRUE)
| X | A |
|---|---|
| FUN | A function or |
| ... | Additional arguments passed to |
| .data.frame | logical: collect the resulting chunks into a single
|
This is experimental and not recommended for production use.