A Dataset can have one or more DataSources. A DataSource contains one or more DataFragments, such as files, of a common type and partition scheme. DataSourceDiscovery is used to create a DataSource, inspect the Schema of the fragments contained in it, and declare a partition scheme. FileSystemDataSourceDiscovery is a subclass of DataSourceDiscovery for discovering files in the local file system, the only currently supported file system.

Factory

The DataSourceDiscovery$create() factory method instantiates a DataSourceDiscovery and takes the following arguments:

  • path: A string file path containing data files

  • filesystem: Currently only "local" is supported

  • format: Currently only "parquet" is supported

  • allow_non_existent: logical: is path allowed to not exist? Default FALSE. See Selector.

  • recursive: logical: should files be discovered in subdirectories of

  • path? Default TRUE.

  • ... Additional arguments passed to the FileSystem $create() method

FileSystemDataSourceDiscovery$create() is a lower-level factory method and takes the following arguments:

  • filesystem: A FileSystem

  • selector: A Selector

  • format: Currently only "parquet" is supported

Methods

DataSource has no defined methods. It is just passed to Dataset$create().

DataSourceDiscovery and its subclasses have the following methods:

  • $Inspect(): Walks the files in the directory and returns a common Schema

  • $SetPartitionScheme(part): Takes a PartitionScheme

  • $Finish(): Returns a DataSource

See also

Dataset for what do do with a DataSource

Super class

arrow::Object -> DataSource

Methods

Public methods

Inherited methods

Method clone()

The objects of this class are cloneable with this method.

Usage

DataSource$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Super class

arrow::Object -> DataSourceDiscovery

Methods

Public methods

Inherited methods

Method Finish()

Usage

DataSourceDiscovery$Finish()


Method SetPartitionScheme()

Usage

DataSourceDiscovery$SetPartitionScheme(part)


Method Inspect()

Usage

DataSourceDiscovery$Inspect()


Method clone()

The objects of this class are cloneable with this method.

Usage

DataSourceDiscovery$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Super classes

arrow::Object -> arrow::DataSourceDiscovery -> FileSystemDataSourceDiscovery

Methods

Public methods

Inherited methods

Method clone()

The objects of this class are cloneable with this method.

Usage

FileSystemDataSourceDiscovery$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.