Arrow Datasets allow you to query against data that has been split across multiple files. This sharding of data may indicate partitioning, which can accelerate queries that only touch some partitions (files).

Factory

The Dataset$create() factory method instantiates a Dataset and takes the following arguments:

Methods

  • $NewScan(): Returns a ScannerBuilder for building a query

  • $schema: Active binding, returns the Schema of the Dataset

See also

open_dataset() for a simple way to create a Dataset that has a single DataSource.

Super class

arrow::Object -> Dataset

Methods

Public methods

Inherited methods

Method NewScan()

Start a new scan of the data

Usage

Dataset$NewScan()

Returns

A ScannerBuilder


Method clone()

The objects of this class are cloneable with this method.

Usage

Dataset$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.