pyarrow.ChunkedArray¶
- 
class pyarrow.ChunkedArray¶
- Bases: - pyarrow.lib._PandasConvertible- An array-like composed from a (possibly empty) collection of pyarrow.Arrays - Warning - Do not call this class’s constructor directly. - 
__init__(*args, **kwargs)¶
- Initialize self. See help(type(self)) for accurate signature. 
 - Methods - __init__(*args, **kwargs)- Initialize self. - cast(self, target_type[, safe])- Cast array values to another data type - chunk(self, i)- Select a chunk by its index - combine_chunks(self, MemoryPool memory_pool=None)- Flatten this ChunkedArray into a single non-chunked array. - dictionary_encode(self)- Compute dictionary-encoded representation of array - equals(self, ChunkedArray other)- Return whether the contents of two chunked arrays are equal. - fill_null(self, fill_value)- See pyarrow.compute.fill_null docstring for usage. - filter(self, mask[, null_selection_behavior])- Select values from a chunked array. - flatten(self, MemoryPool memory_pool=None)- Flatten this ChunkedArray. - format(self, **kwargs)- is_null(self)- Return BooleanArray indicating the null values. - is_valid(self)- Return BooleanArray indicating the non-null values. - iterchunks(self)- length(self)- slice(self[, offset, length])- Compute zero-copy slice of this ChunkedArray - take(self, indices)- Select values from a chunked array. - to_numpy(self)- Return a NumPy copy of this array (experimental). - to_pandas(self[, memory_pool, categories, …])- Convert to a pandas-compatible NumPy array or DataFrame, as appropriate - to_pylist(self)- Convert to a list of native Python objects. - to_string(self, int indent=0, int window=10)- Render a “pretty-printed” string representation of the ChunkedArray - unique(self)- Compute distinct elements in array - validate(self, *[, full])- Perform validation checks. - value_counts(self)- Compute counts of unique elements in array. - Attributes - Total number of bytes consumed by the elements of the chunked array. - Number of null entries - Number of underlying chunks - 
cast(self, target_type, safe=True)¶
- Cast array values to another data type - See pyarrow.compute.cast for usage 
 - 
chunk(self, i)¶
- Select a chunk by its index - Parameters
- i (int) – 
- Returns
- pyarrow.Array 
 
 - 
chunks¶
 - 
combine_chunks(self, MemoryPool memory_pool=None)¶
- Flatten this ChunkedArray into a single non-chunked array. - Parameters
- memory_pool (MemoryPool, default None) – For memory allocations, if required, otherwise use default pool 
- Returns
- result (Array) 
 
 - 
data¶
 - 
dictionary_encode(self)¶
- Compute dictionary-encoded representation of array - Returns
- pyarrow.ChunkedArray – Same chunking as the input, all chunks share a common dictionary. 
 
 - 
equals(self, ChunkedArray other)¶
- Return whether the contents of two chunked arrays are equal. - Parameters
- other (pyarrow.ChunkedArray) – Chunked array to compare against. 
- Returns
- are_equal (bool) 
 
 - 
fill_null(self, fill_value)¶
- See pyarrow.compute.fill_null docstring for usage. 
 - 
filter(self, mask, null_selection_behavior='drop')¶
- Select values from a chunked array. See pyarrow.compute.filter for full usage. 
 - 
flatten(self, MemoryPool memory_pool=None)¶
- Flatten this ChunkedArray. If it has a struct type, the column is flattened into one array per struct field. - Parameters
- memory_pool (MemoryPool, default None) – For memory allocations, if required, otherwise use default pool 
- Returns
- result (List[ChunkedArray]) 
 
 - 
format(self, **kwargs)¶
 - 
is_null(self)¶
- Return BooleanArray indicating the null values. 
 - 
is_valid(self)¶
- Return BooleanArray indicating the non-null values. 
 - 
iterchunks(self)¶
 - 
length(self)¶
 - 
nbytes¶
- Total number of bytes consumed by the elements of the chunked array. 
 - 
null_count¶
- Number of null entries - Returns
- int 
 
 - 
num_chunks¶
- Number of underlying chunks - Returns
- int 
 
 - 
slice(self, offset=0, length=None)¶
- Compute zero-copy slice of this ChunkedArray - Parameters
- offset (int, default 0) – Offset from start of array to slice 
- length (int, default None) – Length of slice (default is until end of batch starting from offset) 
 
- Returns
- sliced (ChunkedArray) 
 
 - 
take(self, indices)¶
- Select values from a chunked array. See pyarrow.compute.take for full usage. 
 - 
to_numpy(self)¶
- Return a NumPy copy of this array (experimental). - Returns
- array (numpy.ndarray) 
 
 - 
to_pandas(self, memory_pool=None, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool timestamp_as_object=False, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False, bool safe=True, bool split_blocks=False, bool self_destruct=False, types_mapper=None)¶
- Convert to a pandas-compatible NumPy array or DataFrame, as appropriate - Parameters
- memory_pool (MemoryPool, default None) – Arrow MemoryPool to use for allocations. Uses the default memory pool is not passed. 
- strings_to_categorical (bool, default False) – Encode string (UTF8) and binary types to pandas.Categorical. 
- categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures. 
- zero_copy_only (bool, default False) – Raise an ArrowException if this function call would require copying the underlying data. 
- integer_object_nulls (bool, default False) – Cast integers with nulls to objects 
- date_as_object (bool, default True) – Cast dates to objects. If False, convert to datetime64[ns] dtype. 
- timestamp_as_object (bool, default False) – Cast non-nanosecond timestamps (np.datetime64) to objects. This is useful if you have timestamps that don’t fit in the normal date range of nanosecond timestamps (1678 CE-2262 CE). If False, all timestamps are converted to datetime64[ns] dtype. 
- use_threads (bool, default True) – Whether to parallelize the conversion using multiple threads. 
- deduplicate_objects (bool, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower. 
- ignore_metadata (bool, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present 
- safe (bool, default True) – For certain data types, a cast is needed in order to store the data in a pandas DataFrame or Series (e.g. timestamps are always stored as nanoseconds in pandas). This option controls whether it is a safe cast or not. 
- split_blocks (bool, default False) – If True, generate one internal “block” for each column when creating a pandas.DataFrame from a RecordBatch or Table. While this can temporarily reduce memory note that various pandas operations can trigger “consolidation” which may balloon memory use. 
- self_destruct (bool, default False) – EXPERIMENTAL: If True, attempt to deallocate the originating Arrow memory while converting the Arrow object to pandas. If you use the object after calling to_pandas with this option it will crash your program. 
- types_mapper (function, default None) – A function mapping a pyarrow DataType to a pandas ExtensionDtype. This can be used to override the default pandas type for conversion of built-in pyarrow types or in absence of pandas_metadata in the Table schema. The function receives a pyarrow DataType and is expected to return a pandas ExtensionDtype or - Noneif the default conversion should be used for that type. If you have a dictionary mapping, you can pass- dict.getas function.
 
- Returns
- pandas.Series or pandas.DataFrame depending on type of object 
 
 - 
to_pylist(self)¶
- Convert to a list of native Python objects. 
 - 
to_string(self, int indent=0, int window=10)¶
- Render a “pretty-printed” string representation of the ChunkedArray 
 - 
type¶
 - 
unique(self)¶
- Compute distinct elements in array - Returns
- pyarrow.Array 
 
 - 
validate(self, *, full=False)¶
- Perform validation checks. An exception is raised if validation fails. - By default only cheap validation checks are run. Pass full=True for thorough validation checks (potentially O(n)). - Parameters
- full (bool, default False) – If True, run expensive checks, otherwise cheap checks only. 
- Raises
- ArrowInvalid – 
 
 - 
value_counts(self)¶
- Compute counts of unique elements in array. - Returns
- An array of <input type “Values”, int64_t “Counts”> structs 
 
 
-