A Dataset can have one or more DataSource
s. A DataSource
contains one
or more DataFragments
, such as files, of a common type and partition
scheme. DataSourceDiscovery
is used to create a DataSource
, inspect the
Schema of the fragments contained in it, and declare a partition scheme.
FileSystemDataSourceDiscovery
is a subclass of DataSourceDiscovery
for
discovering files in the local file system, the only currently supported
file system.
The DataSourceDiscovery$create()
factory method instantiates a
DataSourceDiscovery
and takes the following arguments:
path
: A string file path containing data files
filesystem
: Currently only "local" is supported
format
: Currently only "parquet" is supported
allow_non_existent
: logical: is path
allowed to not exist? Default
FALSE
. See Selector.
recursive
: logical: should files be discovered in subdirectories of
path
? Default TRUE
.
...
Additional arguments passed to the FileSystem $create()
method
FileSystemDataSourceDiscovery$create()
is a lower-level factory method and
takes the following arguments:
filesystem
: A FileSystem
selector
: A Selector
format
: Currently only "parquet" is supported
DataSource
has no defined methods. It is just passed to Dataset$create()
.
DataSourceDiscovery
and its subclasses have the following methods:
$Inspect()
: Walks the files in the directory and returns a common Schema
$SetPartitionScheme(part)
: Takes a PartitionScheme
$Finish()
: Returns a DataSource
Dataset for what do do with a DataSource
arrow::Object
-> DataSource
clone()
The objects of this class are cloneable with this method.
DataSource$clone(deep = FALSE)
deep
Whether to make a deep clone.
arrow::Object
-> DataSourceDiscovery
Finish()
DataSourceDiscovery$Finish()
SetPartitionScheme()
DataSourceDiscovery$SetPartitionScheme(part)
Inspect()
DataSourceDiscovery$Inspect()
clone()
The objects of this class are cloneable with this method.
DataSourceDiscovery$clone(deep = FALSE)
deep
Whether to make a deep clone.
arrow::Object
-> arrow::DataSourceDiscovery
-> FileSystemDataSourceDiscovery
clone()
The objects of this class are cloneable with this method.
FileSystemDataSourceDiscovery$clone(deep = FALSE)
deep
Whether to make a deep clone.