ArrayDict¶

class ArrayDict(**kwargs)[source]¶

A dictionary of arrays that share the same first dimension. The number of dimensions for each array can be different, but they need to be at least 1-dimensional.

Parameters:: **kwargs (ndarray) – arrays that shares the same first dimension.

Example

>>> from temporaldata import ArrayDict
>>> import numpy as np

>>> units = ArrayDict(
...     unit_id=np.array(["unit01", "unit02"]),
...     brain_region=np.array(["M1", "M1"]),
...     waveform_mean=np.random.rand(2, 48),
... )

>>> units
ArrayDict(
  unit_id=[2],
  brain_region=[2],
  waveform_mean=[2, 48]
)

keys()[source]¶

Returns a list of all array attribute names.

Return type:: List[str]

select_by_mask(mask, **kwargs)[source]¶

Return a new ArrayDict object where all array attributes are indexed using the boolean mask.

Parameters:

mask (ndarray) – Boolean array used for masking. The mask needs to be 1-dimensional, and of equal length as the first dimension of the ArrayDict.
**kwargs – Private attributes that will not be masked will need to be passed as arguments.

Example

>>> from temporaldata import ArrayDict
>>> import numpy as np

>>> units = ArrayDict(
...     unit_id=np.array(["unit01", "unit02"]),
...     brain_region=np.array(["M1", "M1"]),
...     waveform_mean=np.random.rand(2, 48),
... )

>>> units_subset = units.select_by_mask(np.array([True, False]))
>>> units_subset
ArrayDict(
  unit_id=[1],
  brain_region=[1],
  waveform_mean=[1, 48]
)

classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)[source]¶

Creates an ArrayDict object from a pandas DataFrame.

The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.

Parameters:

df (pandas.DataFrame) – DataFrame.
unsigned_to_long (bool, optional) – If True, automatically converts unsigned integers to int64. Defaults to True.

to_hdf5(file)[source]¶

Saves the data object to an HDF5 file.

Parameters:: file (h5py.File) – HDF5 file.

import h5py
from temporaldata import ArrayDict

data = ArrayDict(
    unit_id=np.array(["unit01", "unit02"]),
    brain_region=np.array(["M1", "M1"]),
    waveform_mean=np.zeros((2, 48)),
)

with h5py.File("data.h5", "w") as f:
    data.to_hdf5(f)

classmethod from_hdf5(file)[source]¶

Loads the data object from an HDF5 file.

Parameters:: file (h5py.File) – HDF5 file.

Note

This method will load all data in memory, if you would like to use lazy loading, call LazyArrayDict.from_hdf5() instead.

import h5py
from temporaldata import ArrayDict

with h5py.File("data.h5", "r") as f:
    data = ArrayDict.from_hdf5(f)

materialize()[source]¶

Materializes the data object, i.e., loads into memory all of the data that is still referenced in the HDF5 file.

Return type:: ArrayDict

class LazyArrayDict(**kwargs)[source]¶

Lazy variant of ArrayDict. The data is not loaded until it is accessed. This class is meant to be used when the data is too large to fit in memory, and is intended to be intantiated via. LazyArrayDict.from_hdf5.

Note

To access an attribute without triggering the in-memory loading use self.__dict__[key] otherwise using self.key or getattr(self, key) will trigger the lazy loading and will automatically convert the h5py dataset to a numpy array as well as apply any outstanding masks.

select_by_mask(mask)[source]¶

Return a new ArrayDict object where all array attributes are indexed using the boolean mask.

Parameters:

mask (ndarray) – Boolean array used for masking. The mask needs to be 1-dimensional, and of equal length as the first dimension of the ArrayDict.
**kwargs – Private attributes that will not be masked will need to be passed as arguments.

Example

>>> from temporaldata import ArrayDict
>>> import numpy as np

>>> units = ArrayDict(
...     unit_id=np.array(["unit01", "unit02"]),
...     brain_region=np.array(["M1", "M1"]),
...     waveform_mean=np.random.rand(2, 48),
... )

>>> units_subset = units.select_by_mask(np.array([True, False]))
>>> units_subset
ArrayDict(
  unit_id=[1],
  brain_region=[1],
  waveform_mean=[1, 48]
)

classmethod from_dataframe(df, unsigned_to_long=True)[source]¶

Creates an ArrayDict object from a pandas DataFrame.

The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.

Parameters:

df (pandas.DataFrame) – DataFrame.
unsigned_to_long (bool, optional) – If True, automatically converts unsigned integers to int64. Defaults to True.

to_hdf5(file)[source]¶

Saves the data object to an HDF5 file.

Parameters:: file (h5py.File) – HDF5 file.

import h5py
from temporaldata import ArrayDict

data = ArrayDict(
    unit_id=np.array(["unit01", "unit02"]),
    brain_region=np.array(["M1", "M1"]),
    waveform_mean=np.zeros((2, 48)),
)

with h5py.File("data.h5", "w") as f:
    data.to_hdf5(f)

classmethod from_hdf5(file)[source]¶

Loads the data object from an HDF5 file.

Parameters:: file (h5py.File) – HDF5 file.

import h5py
from temporaldata import ArrayDict

with h5py.File("data.h5", "r") as f:
    data = ArrayDict.from_hdf5(f)