pyarrow.RecordBatchStreamReader

class pyarrow.RecordBatchStreamReader(source)[source]

Bases: pyarrow.lib._RecordBatchStreamReader

Reader for the Arrow streaming binary format.

Parameters

source (bytes/buffer-like, pyarrow.NativeFile, or file-like Python object) – Either an in-memory buffer, or a readable file object.

__init__(source)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(source)

Initialize self.

get_next_batch(self)

read_all(self)

Read all record batches as a pyarrow.Table.

read_next_batch(self)

Read next RecordBatch from the stream.

read_pandas(self, **options)

Read contents of stream to a pandas.DataFrame.

Attributes

schema

Shared schema of the record batches in the stream.

static from_batches(schema, batches)

Create RecordBatchReader from an iterable of batches.

Parameters
  • schema (Schema) – The shared schema of the record batches

  • batches (Iterable[RecordBatch]) – The batches that this reader will return.

Returns

reader (RecordBatchReader)

get_next_batch(self)
read_all(self)

Read all record batches as a pyarrow.Table.

read_next_batch(self)

Read next RecordBatch from the stream.

Raises

StopIteration: – At end of stream.

read_pandas(self, **options)

Read contents of stream to a pandas.DataFrame.

Read all record batches as a pyarrow.Table then convert it to a pandas.DataFrame using Table.to_pandas.

Parameters

**options (arguments to forward to Table.to_pandas) –

Returns

df (pandas.DataFrame)

schema

Shared schema of the record batches in the stream.

stats

Current IPC read statistics.