NEWS.md
open_dataset()
and then use dplyr
methods to select()
, filter()
, etc., and work will be done where possible in Arrow memory. When necessary, data is pulled into R for further computation. dplyr
methods are conditionally loaded if you have dplyr
available; it is not a hard dependency.dplyr
methods.dplyr
, [
methods for Tables, RecordBatches, Arrays, and ChunkedArrays now support natural row extraction operations. These use the C++ Filter
, Slice
, and Take
methods for efficient access, depending on the type of selection vector.array_expression
class has also been added, enabling among other things the ability to filter a Table with some function of Arrays, such as arrow_table[arrow_table$var1 > 5, ]
without having to pull everything into R first.write_parquet()
now supports compressioncodec_is_available()
returns TRUE
or FALSE
whether the Arrow C++ library was built with support for a given compression library (e.g. gzip, lz4, snappy)Class$create()
methods. Notably, arrow::array()
and arrow::table()
have been removed in favor of Array$create()
and Table$create()
, eliminating the package startup message about masking base
functions. For more information, see the new vignette("arrow")
.ARROW_PRE_0_15_IPC_FORMAT=1
.as_tibble
argument in the read_*()
functions has been renamed to as_data_frame
(ARROW-6337, @jameslamb)arrow::Column
class has been removed, as it was removed from the C++ libraryTable
and RecordBatch
objects have S3 methods that enable you to work with them more like data.frame
s. Extract columns, subset, and so on. See ?Table
and ?RecordBatch
for examples.read_csv_arrow()
supports more parsing options, including col_names
, na
, quoted_na
, and skip
read_parquet()
and read_feather()
can ingest data from a raw
vector (ARROW-6278)~/file.parquet
(ARROW-6323)double()
), and time types can be created with human-friendly resolution strings (“ms”, “s”, etc.). (ARROW-6338, ARROW-6364)Initial CRAN release of the arrow
package. Key features include: