pyarrow.RecordBatchFileWriter

class pyarrow.RecordBatchFileWriter(sink, schema, *, use_legacy_format=None, options=None)[source]

Bases: pyarrow.lib._RecordBatchFileWriter

Writer to create the Arrow binary file format

Parameters
  • sink (str, pyarrow.NativeFile, or file-like Python object) – Either a file path, or a writable file object.

  • schema (pyarrow.Schema) – The Arrow schema for data to be written to the file.

  • options (pyarrow.ipc.IpcWriteOptions) –

    Options for IPC serialization.

    If None, default values will be used: the legacy format will not be used unless overridden by setting the environment variable ARROW_PRE_0_15_IPC_FORMAT=1, and the V5 metadata version will be used unless overridden by setting the environment variable ARROW_PRE_1_0_METADATA_VERSION=1.

  • use_legacy_format (bool, default None) –

    Deprecated in favor of setting options. Cannot be provided with options.

    If None, False will be used unless this default is overridden by setting the environment variable ARROW_PRE_0_15_IPC_FORMAT=1

__init__(sink, schema, *, use_legacy_format=None, options=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(sink, schema, *[, …])

Initialize self.

close(self)

Close stream and write end-of-stream 0 marker.

write(self, table_or_batch)

Write RecordBatch or Table to stream.

write_batch(self, RecordBatch batch)

Write RecordBatch to stream.

write_table(self, Table table[, max_chunksize])

Write Table to stream in (contiguous) RecordBatch objects.

close(self)

Close stream and write end-of-stream 0 marker.

stats

Current IPC write statistics.

write(self, table_or_batch)

Write RecordBatch or Table to stream.

Parameters

table_or_batch ({RecordBatch, Table}) –

write_batch(self, RecordBatch batch)

Write RecordBatch to stream.

Parameters

batch (RecordBatch) –

write_table(self, Table table, max_chunksize=None, **kwargs)

Write Table to stream in (contiguous) RecordBatch objects.

Parameters
  • table (Table) –

  • max_chunksize (int, default None) – Maximum size for RecordBatch chunks. Individual chunks may be smaller depending on the chunk layout of individual columns.