Parquet is a columnar storage file format. This function enables you to write Parquet files from R.
write_parquet( x, sink, chunk_size = NULL, version = NULL, compression = NULL, compression_level = NULL, use_dictionary = NULL, write_statistics = NULL, data_page_size = NULL, properties = ParquetWriterProperties$create(x, version = version, compression = compression, compression_level = compression_level, use_dictionary = use_dictionary, write_statistics = write_statistics, data_page_size = data_page_size), use_deprecated_int96_timestamps = FALSE, coerce_timestamps = NULL, allow_truncated_timestamps = FALSE, arrow_properties = ParquetArrowWriterProperties$create(use_deprecated_int96_timestamps = use_deprecated_int96_timestamps, coerce_timestamps = coerce_timestamps, allow_truncated_timestamps = allow_truncated_timestamps) )
x | An arrow::Table, or an object convertible to it. |
---|---|
sink | an arrow::io::OutputStream or a string which is interpreted as a file path |
chunk_size | chunk size in number of rows. If NULL, the total number of rows is used. |
version | parquet version, "1.0" or "2.0". |
compression | compression algorithm. No compression by default. |
compression_level | compression level. |
use_dictionary | Specify if we should use dictionary encoding. |
write_statistics | Specify if we should write statistics |
data_page_size | Set a target threshhold for the approximate encoded size of data pages within a column chunk. If omitted, the default data page size (1Mb) is used. |
properties | properties for parquet writer, derived from arguments
|
use_deprecated_int96_timestamps | Write timestamps to INT96 Parquet format |
coerce_timestamps | Cast timestamps a particular resolution. can be NULL, "ms" or "us" |
allow_truncated_timestamps | Allow loss of data when coercing timestamps to a particular resolution. E.g. if microsecond or nanosecond data is lost when coercing to ms', do not raise an exception |
arrow_properties | arrow specific writer properties, derived from
arguments |
NULL, invisibly
The parameters compression
, compression_level
, use_dictionary
and write_statistics
support
various patterns:
- The default NULL
leaves the parameter unspecified, and the C++ library uses an appropriate default for
each column
- A single, unnamed, value (e.g. a single string for compression
) applies to all columns
- An unnamed vector, of the same size as the number of columns, to specify a value for each column, in
positional order
- A named vector, to specify the value for the named columns, the default value for the setting is used
when not supplied.
#> Error in write_parquet(data.frame(x = 1:5), tf2): object 'tf2' not found# using compression tf2 <- tempfile(fileext = ".gz.parquet") write_parquet(data.frame(x = 1:5), compression = "gzip", compression_level = 5)#> Error in write_parquet(data.frame(x = 1:5), compression = "gzip", compression_level = 5): argument "sink" is missing, with no default# }