pyarrow.DictionaryArray¶
- 
class pyarrow.DictionaryArray¶
- Bases: - pyarrow.lib.Array- Concrete class for dictionary-encoded Arrow arrays. - 
__init__(*args, **kwargs)¶
- Initialize self. See help(type(self)) for accurate signature. 
 - Methods - __init__(*args, **kwargs)- Initialize self. - buffers(self)- Return a list of Buffer objects pointing to this array’s physical storage. - cast(self, target_type[, safe])- Cast array values to another data type - dictionary_encode(self)- diff(self, Array other)- Compare contents of this array against another one. - equals(self, Array other)- fill_null(self, fill_value)- See pyarrow.compute.fill_null for usage. - filter(self, Array mask[, …])- Select values from an array. - format(self, **kwargs)- from_arrays(indices, dictionary[, mask])- Construct a DictionaryArray from indices and values. - from_buffers(DataType type, length, buffers)- Construct an Array from a sequence of buffers. - from_pandas(obj[, mask, type])- Convert pandas.Series to an Arrow Array. - is_null(self)- Return BooleanArray indicating the null values. - is_valid(self)- Return BooleanArray indicating the non-null values. - slice(self[, offset, length])- Compute zero-copy slice of this array. - sum(self)- Sum the values in a numerical array. - take(self, indices)- Select values from an array. - to_numpy(self[, zero_copy_only, writable])- Return a NumPy view or copy of this array (experimental). - to_pandas(self[, memory_pool, categories, …])- Convert to a pandas-compatible NumPy array or DataFrame, as appropriate - to_pylist(self)- Convert to a list of native Python objects. - to_string(self, int indent=0, int window=10)- tolist(self)- Alias of to_pylist for compatibility with NumPy. - unique(self)- Compute distinct elements in array. - validate(self, *[, full])- Perform validation checks. - value_counts(self)- Compute counts of unique elements in array. - view(self, target_type)- Return zero-copy “view” of array as another data type. - Attributes - Total number of bytes consumed by the elements of the array. - A relative position into another array’s data. - 
buffers(self)¶
- Return a list of Buffer objects pointing to this array’s physical storage. - To correctly interpret these buffers, you need to also apply the offset multiplied with the size of the stored data type. 
 - 
cast(self, target_type, safe=True)¶
- Cast array values to another data type - See pyarrow.compute.cast for usage 
 - 
dictionary¶
 - 
dictionary_encode(self)¶
 - 
diff(self, Array other)¶
- Compare contents of this array against another one. - Return string containing the result of arrow::Diff comparing contents of this array against the other array. 
 - 
equals(self, Array other)¶
 - 
fill_null(self, fill_value)¶
- See pyarrow.compute.fill_null for usage. 
 - 
filter(self, Array mask, null_selection_behavior=u'drop')¶
- Select values from an array. See pyarrow.compute.filter for full usage. 
 - 
format(self, **kwargs)¶
 - 
static from_arrays(indices, dictionary, mask=None, bool ordered=False, bool from_pandas=False, bool safe=True, MemoryPool memory_pool=None)¶
- Construct a DictionaryArray from indices and values. - Parameters
- indices (pyarrow.Array, numpy.ndarray or pandas.Series, int type) – Non-negative integers referencing the dictionary values by zero based index. 
- dictionary (pyarrow.Array, ndarray or pandas.Series) – The array of values referenced by the indices. 
- mask (ndarray or pandas.Series, bool type) – True values indicate that indices are actually null. 
- from_pandas (bool, default False) – If True, the indices should be treated as though they originated in a pandas.Categorical (null encoded as -1). 
- ordered (bool, default False) – Set to True if the category values are ordered. 
- safe (bool, default True) – If True, check that the dictionary indices are in range. 
- memory_pool (MemoryPool, default None) – For memory allocations, if required, otherwise uses default pool. 
 
- Returns
- dict_array (DictionaryArray) 
 
 - 
static from_buffers(DataType type, length, buffers, null_count=-1, offset=0, children=None)¶
- Construct an Array from a sequence of buffers. - The concrete type returned depends on the datatype. - Parameters
- type (DataType) – The value type of the array. 
- length (int) – The number of values in the array. 
- buffers (List[Buffer]) – The buffers backing this array. 
- null_count (int, default -1) – The number of null entries in the array. Negative value means that the null count is not known. 
- offset (int, default 0) – The array’s logical offset (in values, not in bytes) from the start of each buffer. 
- children (List[Array], default None) – Nested type children with length matching type.num_fields. 
 
- Returns
- array (Array) 
 
 - 
static from_pandas(obj, mask=None, type=None, bool safe=True, MemoryPool memory_pool=None)¶
- Convert pandas.Series to an Arrow Array. - This method uses Pandas semantics about what values indicate nulls. See pyarrow.array for more general conversion from arrays or sequences to Arrow arrays. - Parameters
- sequence (ndarray, pandas.Series, array-like) – 
- mask (array (boolean), optional) – Indicate which values are null (True) or not null (False). 
- type (pyarrow.DataType) – Explicit type to attempt to coerce to, otherwise will be inferred from the data. 
- safe (bool, default True) – Check for overflows or other unsafe conversions. 
- memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default memory pool. 
 
 - Notes - Localized timestamps will currently be returned as UTC (pandas’s native representation). Timezone-naive data will be implicitly interpreted as UTC. - Returns
- array (pyarrow.Array or pyarrow.ChunkedArray) – ChunkedArray is returned if object data overflows binary buffer. 
 
 - 
indices¶
 - 
is_null(self)¶
- Return BooleanArray indicating the null values. 
 - 
is_valid(self)¶
- Return BooleanArray indicating the non-null values. 
 - 
nbytes¶
- Total number of bytes consumed by the elements of the array. 
 - 
null_count¶
 - 
offset¶
- A relative position into another array’s data. - The purpose is to enable zero-copy slicing. This value defaults to zero but must be applied on all operations with the physical storage buffers. 
 - 
slice(self, offset=0, length=None)¶
- Compute zero-copy slice of this array. - Parameters
- offset (int, default 0) – Offset from start of array to slice. 
- length (int, default None) – Length of slice (default is until end of Array starting from offset). 
 
- Returns
- sliced (RecordBatch) 
 
 - 
sum(self)¶
- Sum the values in a numerical array. 
 - 
take(self, indices)¶
- Select values from an array. See pyarrow.compute.take for full usage. 
 - 
to_numpy(self, zero_copy_only=True, writable=False)¶
- Return a NumPy view or copy of this array (experimental). - By default, tries to return a view of this array. This is only supported for primitive arrays with the same memory layout as NumPy (i.e. integers, floating point, ..) and without any nulls. - Parameters
- zero_copy_only (bool, default True) – If True, an exception will be raised if the conversion to a numpy array would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types). 
- writable (bool, default False) – For numpy arrays created with zero copy (view on the Arrow data), the resulting array is not writable (Arrow data is immutable). By setting this to True, a copy of the array is made to ensure it is writable. 
 
- Returns
- array (numpy.ndarray) 
 
 - 
to_pandas(self, memory_pool=None, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool timestamp_as_object=False, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False, bool safe=True, bool split_blocks=False, bool self_destruct=False, types_mapper=None)¶
- Convert to a pandas-compatible NumPy array or DataFrame, as appropriate - Parameters
- memory_pool (MemoryPool, default None) – Arrow MemoryPool to use for allocations. Uses the default memory pool is not passed. 
- strings_to_categorical (bool, default False) – Encode string (UTF8) and binary types to pandas.Categorical. 
- categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures. 
- zero_copy_only (bool, default False) – Raise an ArrowException if this function call would require copying the underlying data. 
- integer_object_nulls (bool, default False) – Cast integers with nulls to objects 
- date_as_object (bool, default True) – Cast dates to objects. If False, convert to datetime64[ns] dtype. 
- timestamp_as_object (bool, default False) – Cast non-nanosecond timestamps (np.datetime64) to objects. This is useful if you have timestamps that don’t fit in the normal date range of nanosecond timestamps (1678 CE-2262 CE). If False, all timestamps are converted to datetime64[ns] dtype. 
- use_threads (bool, default True) – Whether to parallelize the conversion using multiple threads. 
- deduplicate_objects (bool, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower. 
- ignore_metadata (bool, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present 
- safe (bool, default True) – For certain data types, a cast is needed in order to store the data in a pandas DataFrame or Series (e.g. timestamps are always stored as nanoseconds in pandas). This option controls whether it is a safe cast or not. 
- split_blocks (bool, default False) – If True, generate one internal “block” for each column when creating a pandas.DataFrame from a RecordBatch or Table. While this can temporarily reduce memory note that various pandas operations can trigger “consolidation” which may balloon memory use. 
- self_destruct (bool, default False) – EXPERIMENTAL: If True, attempt to deallocate the originating Arrow memory while converting the Arrow object to pandas. If you use the object after calling to_pandas with this option it will crash your program. 
- types_mapper (function, default None) – A function mapping a pyarrow DataType to a pandas ExtensionDtype. This can be used to override the default pandas type for conversion of built-in pyarrow types or in absence of pandas_metadata in the Table schema. The function receives a pyarrow DataType and is expected to return a pandas ExtensionDtype or - Noneif the default conversion should be used for that type. If you have a dictionary mapping, you can pass- dict.getas function.
 
- Returns
- pandas.Series or pandas.DataFrame depending on type of object 
 
 - 
to_pylist(self)¶
- Convert to a list of native Python objects. - Returns
- lst (list) 
 
 - 
to_string(self, int indent=0, int window=10)¶
 - 
tolist(self)¶
- Alias of to_pylist for compatibility with NumPy. 
 - 
type¶
 - 
unique(self)¶
- Compute distinct elements in array. 
 - 
validate(self, *, full=False)¶
- Perform validation checks. An exception is raised if validation fails. - By default only cheap validation checks are run. Pass full=True for thorough validation checks (potentially O(n)). - Parameters
- full (bool, default False) – If True, run expensive checks, otherwise cheap checks only. 
- Raises
- ArrowInvalid – 
 
 - 
value_counts(self)¶
- Compute counts of unique elements in array. - Returns
- An array of <input type “Values”, int64_t “Counts”> structs 
 
 
-