Open a multi-file dataset

open_dataset(path, schema = NULL, partition = NULL, ...)

Arguments

path

String path to a directory containing the data files

schema

Schema for the dataset. If NULL (the default), the schema will be inferred from the files

partition

One of

  • A Schema, in which case the file paths relative to path will be parsed, and path segments will be matched with the schema fields. For example, schema(year = int16(), month = int8()) would create partitions for file paths like "2019/01/file.parquet", "2019/02/file.parquet", etc.

  • A HivePartitionScheme, as returned by hive_partition()

  • NULL, the default, for no partitioning

...

additional arguments passed to DataSourceDiscovery$create()

Value

A Dataset R6 object. Use dplyr methods on it to query the data, or call $NewScan() to construct a query directly.

See also

PartitionScheme for defining partitioning