A Dataset can have one or more DataSources. A DataSource contains one
or more DataFragments, such as files, of a common type and partition
scheme. DataSourceDiscovery is used to create a DataSource, inspect the
Schema of the fragments contained in it, and declare a partition scheme.
FileSystemDataSourceDiscovery is a subclass of DataSourceDiscovery for
discovering files in the local file system, the only currently supported
file system.
The DataSourceDiscovery$create() factory method instantiates a
DataSourceDiscovery and takes the following arguments:
path: A string file path containing data files
filesystem: Currently only "local" is supported
format: Currently only "parquet" is supported
allow_non_existent: logical: is path allowed to not exist? Default
FALSE. See Selector.
recursive: logical: should files be discovered in subdirectories of
path? Default TRUE.
... Additional arguments passed to the FileSystem $create() method
FileSystemDataSourceDiscovery$create() is a lower-level factory method and
takes the following arguments:
filesystem: A FileSystem
selector: A Selector
format: Currently only "parquet" is supported
DataSource has no defined methods. It is just passed to Dataset$create().
DataSourceDiscovery and its subclasses have the following methods:
$Inspect(): Walks the files in the directory and returns a common Schema
$SetPartitionScheme(part): Takes a PartitionScheme
$Finish(): Returns a DataSource
Dataset for what do do with a DataSource
arrow::Object -> DataSource
clone()The objects of this class are cloneable with this method.
DataSource$clone(deep = FALSE)
deepWhether to make a deep clone.
arrow::Object -> DataSourceDiscovery
Finish()DataSourceDiscovery$Finish()
SetPartitionScheme()DataSourceDiscovery$SetPartitionScheme(part)
Inspect()DataSourceDiscovery$Inspect()
clone()The objects of this class are cloneable with this method.
DataSourceDiscovery$clone(deep = FALSE)
deepWhether to make a deep clone.
arrow::Object -> arrow::DataSourceDiscovery -> FileSystemDataSourceDiscovery
clone()The objects of this class are cloneable with this method.
FileSystemDataSourceDiscovery$clone(deep = FALSE)
deepWhether to make a deep clone.