Arrow Datasets allow you to query against data that has been split across multiple files. This sharding of data may indicate partitioning, which can accelerate queries that only touch some partitions (files).
The Dataset$create()
factory method instantiates a Dataset
and
takes the following arguments:
sources
: a list of DataSource objects
schema
: a Schema
$NewScan()
: Returns a ScannerBuilder for building a query
$schema
: Active binding, returns the Schema of the Dataset
open_dataset()
for a simple way to create a Dataset that has a
single DataSource
.
arrow::Object
-> Dataset
NewScan()
Start a new scan of the data
Dataset$NewScan()
clone()
The objects of this class are cloneable with this method.
Dataset$clone(deep = FALSE)
deep
Whether to make a deep clone.