NEWS.md
open_dataset() and then use dplyr methods to select(), filter(), etc., and work will be done where possible in Arrow memory. When necessary, data is pulled into R for further computation. dplyr methods are conditionally loaded if you have dplyr available; it is not a hard dependency.dplyr methods.dplyr, [ methods for Tables, RecordBatches, Arrays, and ChunkedArrays now support natural row extraction operations. These use the C++ Filter, Slice, and Take methods for efficient access, depending on the type of selection vector.array_expression class has also been added, enabling among other things the ability to filter a Table with some function of Arrays, such as arrow_table[arrow_table$var1 > 5, ] without having to pull everything into R first.write_parquet() now supports compressioncodec_is_available() returns TRUE or FALSE whether the Arrow C++ library was built with support for a given compression library (e.g. gzip, lz4, snappy)Class$create() methods. Notably, arrow::array() and arrow::table() have been removed in favor of Array$create() and Table$create(), eliminating the package startup message about masking base functions. For more information, see the new vignette("arrow").ARROW_PRE_0_15_IPC_FORMAT=1.as_tibble argument in the read_*() functions has been renamed to as_data_frame (ARROW-6337, @jameslamb)arrow::Column class has been removed, as it was removed from the C++ libraryTable and RecordBatch objects have S3 methods that enable you to work with them more like data.frames. Extract columns, subset, and so on. See ?Table and ?RecordBatch for examples.read_csv_arrow() supports more parsing options, including col_names, na, quoted_na, and skip
read_parquet() and read_feather() can ingest data from a raw vector (ARROW-6278)~/file.parquet (ARROW-6323)double()), and time types can be created with human-friendly resolution strings (“ms”, “s”, etc.). (ARROW-6338, ARROW-6364)Initial CRAN release of the arrow package. Key features include: