Data Objects¶
A class representing an array dictionary. |
|
A class representing an irregular time series. |
|
A class representing a regular time series. |
|
A class representing an interval. |
|
A class representing data objects. |
- class ArrayDict(**kwargs)[source]¶
Bases:
object
A dictionary of arrays that share the same first dimension. The number of dimensions for each array can be different, but they need to be at least 1-dimensional.
Example
>>> from temporaldata import ArrayDict >>> import numpy as np >>> units = ArrayDict( ... unit_id=np.array(["unit01", "unit02"]), ... brain_region=np.array(["M1", "M1"]), ... waveform_mean=np.random.rand(2, 48), ... ) >>> units ArrayDict( unit_id=[2], brain_region=[2], waveform_mean=[2, 48] )
- select_by_mask(mask, **kwargs)[source]¶
Return a new
ArrayDict
object where all array attributes are indexed using the boolean mask.- Parameters:
Example
>>> from temporaldata import ArrayDict >>> import numpy as np >>> units = ArrayDict( ... unit_id=np.array(["unit01", "unit02"]), ... brain_region=np.array(["M1", "M1"]), ... waveform_mean=np.random.rand(2, 48), ... ) >>> units_subset = units.select_by_mask(np.array([True, False])) >>> units_subset ArrayDict( unit_id=[1], brain_region=[1], waveform_mean=[1, 48] )
- classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)[source]¶
Creates an
ArrayDict
object from a pandas DataFrame.The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.
- Parameters:
df (pandas.DataFrame) – DataFrame.
unsigned_to_long (bool, optional) – If
True
, automatically converts unsigned integers to int64. Defaults toTrue
.
- to_hdf5(file)[source]¶
Saves the data object to an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
import h5py from temporaldata import ArrayDict data = ArrayDict( unit_id=np.array(["unit01", "unit02"]), brain_region=np.array(["M1", "M1"]), waveform_mean=np.zeros((2, 48)), ) with h5py.File("data.h5", "w") as f: data.to_hdf5(f)
- classmethod from_hdf5(file)[source]¶
Loads the data object from an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
Note
This method will load all data in memory, if you would like to use lazy loading, call
LazyArrayDict.from_hdf5()
instead.import h5py from temporaldata import ArrayDict with h5py.File("data.h5", "r") as f: data = ArrayDict.from_hdf5(f)
- class IrregularTimeSeries(timestamps, *, timekeys=None, domain, **kwargs)[source]¶
Bases:
ArrayDict
An irregular time series is defined by a set of timestamps and a set of attributes that must share the same first dimension as the timestamps. This data object is ideal for event-based data as well as irregularly sampled time series.
- Parameters:
timestamps (
ndarray
) – an array of timestamps of shape (N,).timekeys (
Optional
[List
[str
]]) – a list of strings that specify which attributes are time-based attributes, this ensures that these attributes are updated appropriately when slicing.domain (
Union
[Interval
,str
]) – anInterval
object that defines the domain over which the timeseries is defined. If set to"auto"
, the domain will be automatically the interval defined by the minimum and maximum timestamps.**kwargs (
Dict
[str
,ndarray
]) – arrays that shares the same first dimension N.
Example
>>> import numpy as np >>> from temporaldata import IrregularTimeSeries >>> spikes = IrregularTimeSeries( ... unit_index=np.array([0, 0, 1, 0, 1, 2]), ... timestamps=np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6]), ... waveforms=np.zeros((6, 48)), ... domain="auto", ... ) >>> spikes IrregularTimeSeries( timestamps=[6], unit_index=[6], waveforms=[6, 48] ) >>> spikes.domain.start, spikes.domain.end (array([0.1]), array([0.6])) >>> spikes.keys() ['timestamps', 'unit_index', 'waveforms'] >>> spikes.is_sorted() True >>> slice_of_spikes = spikes.slice(0.2, 0.5) >>> slice_of_spikes IrregularTimeSeries( timestamps=[3], unit_index=[3], waveforms=[3, 48] ) >>> slice_of_spikes.domain.start, slice_of_spikes.domain.end (array([0.]), array([0.3])) >>> slice_of_spikes.timestamps array([0. , 0.1, 0.2])
- property domain¶
The time domain over which the time series is defined. Usually a single interval, but could also be a set of intervals.
- sort()[source]¶
Sorts the timestamps, and reorders the other attributes accordingly. This method is applied in place.
- slice(start, end, reset_origin=True)[source]¶
Returns a new
IrregularTimeSeries
object that contains the data between the start and end times. The end time is exclusive, the slice will only include data in \([\textrm{start}, \textrm{end})\).If
reset_origin
isTrue
, all time attributes are updated to be relative to the new start time. The domain is also updated accordingly.Warning
If the time series is not sorted, it will be automatically sorted in place.
- select_by_mask(mask)[source]¶
Return a new
IrregularTimeSeries
object where all array attributes are indexed using the boolean mask.Note that this will not update the domain, as it is unclear how to resolve the domain when the mask is applied. If you wish to update the domain, you should do so manually.
- select_by_interval(interval)[source]¶
Return a new
IrregularTimeSeries
object where all timestamps are within the interval.- Parameters:
interval (
Interval
) – Interval object.
- add_split_mask(name, interval)[source]¶
Adds a boolean mask as an array attribute, which is defined for each timestamp, and is set to
True
for all timestamps that are withininterval
. The mask attribute will be called<name>_mask
.This is used to mark points in the time series, as part of train, validation, or test sets, and is useful to ensure that there is no data leakage.
- classmethod from_dataframe(df, domain='auto', unsigned_to_long=True)[source]¶
Create an
IrregularTimeseries
object from a pandas DataFrame. The dataframe must have a timestamps column, with the name"timestamps"
(use pd.Dataframe.rename if needed).The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.
- Parameters:
df (
DataFrame
) – DataFrame.unsigned_to_long (
bool
) – Whether to automatically convert unsigned integers to int64 dtype. Defaults toTrue
.domain (optional) – The domain over which the time series is defined. If set to
"auto"
, the domain will be automatically the interval defined by the minimum and maximum timestamps. Defaults to"auto"
.
- to_hdf5(file)[source]¶
Saves the data object to an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
Warning
If the time series is not sorted, it will be automatically sorted in place.
import h5py from temporaldata import IrregularTimeseries data = IrregularTimeseries( unit_index=np.array([0, 0, 1, 0, 1, 2]), timestamps=np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6]), waveforms=np.zeros((6, 48)), domain="auto", ) with h5py.File("data.h5", "w") as f: data.to_hdf5(f)
- classmethod from_hdf5(file)[source]¶
Loads the data object from an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
Note
This method will load all data in memory, if you would like to use lazy loading, call
LazyIrregularTimeSeries.from_hdf5()
instead.import h5py from temporaldata import IrregularTimeSeries with h5py.File("data.h5", "r") as f: data = IrregularTimeSeries.from_hdf5(f)
- class RegularTimeSeries(*, sampling_rate, domain=None, domain_start=0.0, **kwargs)[source]¶
Bases:
ArrayDict
A regular time series is the same as an irregular time series, but it has a regular sampling rate. This allows for faster indexing, possibility of patching data and meaningful Fourier operations. The first dimension of all attributes must be the time dimension.
Note
If you have a matrix of shape (N, T), where N is the number of channels and T is the number of time points, you should transpose it to (T, N) before passing it to the constructor, since the first dimension should always be time.
- Parameters:
sampling_rate (
float
) – Sampling rate in Hz.domain (
Optional
[Interval
]) – anInterval
object that defines the domain over which the timeseries is defined. It is not possible to set domain to"auto"
.**kwargs (
Dict
[str
,ndarray
]) – Arbitrary keyword arguments where the values are arbitrary multi-dimensional (2d, 3d, …, nd) arrays with shape (N, *).
Example
>>> import numpy as np >>> from temporaldata import RegularTimeSeries >>> lfp = RegularTimeSeries( ... raw=np.zeros((1000, 128)), ... sampling_rate=250., ... domain=Interval(0., 4.), ... ) >>> lfp.slice(0, 1) RegularTimeSeries( raw=[250, 128] ) >>> lfp.to_irregular() IrregularTimeSeries( timestamps=[1000], raw=[1000, 128] )
- select_by_mask(mask)[source]¶
Return a new
ArrayDict
object where all array attributes are indexed using the boolean mask.- Parameters:
Example
>>> from temporaldata import ArrayDict >>> import numpy as np >>> units = ArrayDict( ... unit_id=np.array(["unit01", "unit02"]), ... brain_region=np.array(["M1", "M1"]), ... waveform_mean=np.random.rand(2, 48), ... ) >>> units_subset = units.select_by_mask(np.array([True, False])) >>> units_subset ArrayDict( unit_id=[1], brain_region=[1], waveform_mean=[1, 48] )
- slice(start, end, reset_origin=True)[source]¶
Returns a new
RegularTimeSeries
object that contains the data between the start (inclusive) and end (exclusive) times.When slicing, the start and end times are rounded to the nearest timestamp.
- add_split_mask(name, interval)[source]¶
Adds a boolean mask as an array attribute, which is defined for each timestamp, and is set to
True
for all timestamps that are withininterval
. The mask attribute will be called<name>_mask
.This is used to mark points in the time series, as part of train, validation, or test sets, and is useful to ensure that there is no data leakage.
- property timestamps¶
Returns the timestamps of the time series.
- to_hdf5(file)[source]¶
Saves the data object to an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
import h5py from temporaldata import RegularTimeSeries data = RegularTimeSeries( raw=np.zeros((1000, 128)), sampling_rate=250., domain=Interval(0., 4.), ) with h5py.File("data.h5", "w") as f: data.to_hdf5(f)
- classmethod from_hdf5(file)[source]¶
Loads the data object from an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
Note
This method will load all data in memory, if you would like to use lazy loading, call
LazyRegularTimeSeries.from_hdf5()
instead.import h5py from temporaldata import RegularTimeSeries with h5py.File("data.h5", "r") as f: data = RegularTimeSeries.from_hdf5(f)
- classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)¶
Creates an
ArrayDict
object from a pandas DataFrame.The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.
- Parameters:
df (pandas.DataFrame) – DataFrame.
unsigned_to_long (bool, optional) – If
True
, automatically converts unsigned integers to int64. Defaults toTrue
.
- class Interval(start, end, *, timekeys=None, **kwargs)[source]¶
Bases:
ArrayDict
An interval object is a set of time intervals each defined by a start time and an end time. For
Interval
, we do not need to define a domain, since the interval itself is its own domain.- Parameters:
start (
Union
[float
,ndarray
]) – an array of start times of shape (N,) or a float.end (
Union
[float
,ndarray
]) – an array of end times of shape (N,) or a float.timekeys – a list of strings that specify which attributes are time-based attributes.
**kwargs – arrays that shares the same first dimension N.
Example
>>> import numpy as np >>> from temporaldata import Interval >>> intervals = Interval( ... start=np.array([0., 1., 2.]), ... end=np.array([1., 2., 3.]), ... go_cue_time=np.array([0.5, 1.5, 2.5]), ... drifting_gratings_dir=np.array([0, 45, 90]), ... timekeys=["start", "end", "go_cue_time"], ... ) >>> intervals Interval( start=[3], end=[3], go_cue_time=[3], drifting_gratings_dir=[3] ) >>> intervals.keys() ['start', 'end', 'go_cue_time', 'drifting_gratings_dir'] >>> intervals.is_sorted() True >>> intervals.is_disjoint() True >>> intervals.slice(1.5, 2.5) Interval( start=[2], end=[2], go_cue_time=[2], drifting_gratings_dir=[2] )
An
Interval
object with a single interval can be simply created by passing a single float to thestart
andend
arguments.Example
>>> Interval(0., 1.) Interval( start=[1], end=[1] )
- is_disjoint()[source]¶
Returns
True
if the intervals are disjoint, i.e. if no two intervals overlap.
- sort()[source]¶
Sorts the intervals, and reorders the other attributes accordingly. This method is done in place.
Note
This method only works if the intervals are disjoint. If the intervals overlap, it is not possible to resolve the order of the intervals, and this method will raise an error.
- slice(start, end, reset_origin=True)[source]¶
Returns a new
Interval
object that contains the data between the start and end times. An interval is included if it has any overlap with the slicing window. The end time is exclusive.If
reset_origin
is set toTrue
, all time attributes will be updated to be relative to the new start time.Warning
If the intervals are not sorted, they will be automatically sorted in place.
- select_by_mask(mask)[source]¶
Return a new
Interval
object where all array attributes are indexed using the boolean mask.
- select_by_interval(interval)[source]¶
Return a new
IrregularTimeSeries
object where all timestamps are within the interval.- Parameters:
interval (
Interval
) – Interval object.
- dilate(size, max_len=None)[source]¶
Dilates the intervals by a given size. The dilation is performed in both directions. This operation is designed to not create overlapping intervals, meaning for a given interval and a given direction, dilation will stop if another interval is too close. If distance between two intervals is less than
size
, both of them will dilate until they meet halfway but will never overlap. You can think of dilation as inflating ballons that will never merge, and will stop each other from moving too far.- Parameters:
size (
float
) – The size of the dilation.max_len – Dilation will not exceed this maximum length. For intervals that are already longer than
max_len
, there will be no dilation. By default, there is no maximum length.
- coalesce(eps=1e-06)[source]¶
Coalesces the intervals that are closer than
eps
. This operation returns a newInterval
object, and does not resolve the existing attributes.- Parameters:
eps – The distance threshold for coalescing the intervals. Defaults to 1e-6.
- difference(other)[source]¶
Returns the difference between two sets of intervals. The intervals are redefined as to not intersect with any interval in
other
.
- split(sizes, *, shuffle=False, random_seed=None)[source]¶
Splits the set of intervals into multiple subsets. This will return a number of new
Interval
objects equal to the number of elements in sizes. If shuffle is set toTrue
, the intervals will be shuffled before splitting.- Parameters:
Note
This method will not guarantee that the resulting sets will be disjoint, if the intervals are not already disjoint.
- add_split_mask(name, interval)[source]¶
Adds a boolean mask as an array attribute, which is defined for each interval in the object, and is set to
True
if the interval intersects with the providedInterval
object. The mask attribute will be called<name>_mask
.This is used to mark intervals as part of train, validation, or test sets, and is useful to ensure that there is no data leakage.
If an interval belongs to multiple splits, an error will be raised, unless this is expected, in which case the method
allow_split_mask_overlap()
should be called.
- allow_split_mask_overlap()[source]¶
Disables the check for split mask overlap. This means there could be an overlap between the intervals across different splits. This is useful when an interval is allowed to belong to multiple splits.
- classmethod linspace(start, end, steps)[source]¶
Create a regular interval with a given number of samples.
Example
>>> from temporaldata import Interval >>> interval = Interval.linspace(0., 10., 100) >>> interval Interval( start=[100], end=[100] )
- classmethod arange(start, end, step, include_end=True)[source]¶
Create a grid of intervals with a given step size. If the last step cannot reach the end time, a smaller interval will be added, it will stop at the end time, and will be shorter than obj:step. This behavior can be changed by setting include_end to
False
.
- classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)[source]¶
Create an
Interval
object from a pandas DataFrame. The dataframe must have a start time and end time columns. The names of these columns need to be “start” and “end” (use pd.Dataframe.rename if needed).The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.
- Parameters:
df (pandas.DataFrame) – DataFrame.
unsigned_to_long (bool, optional) – Whether to automatically convert unsigned integers to int64 dtype. Defaults to
True
.
- classmethod from_list(interval_list)[source]¶
Create an
Interval
object from a list of (start, end) tuples.Example
>>> from temporaldata import Interval >>> interval_list = [(0, 1), (1, 2), (2, 3)] >>> interval = Interval.from_list(interval_list) >>> interval.start, interval.end (array([0., 1., 2.]), array([1., 2., 3.]))
- to_hdf5(file)[source]¶
Saves the data object to an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
import h5py from temporaldata import Interval interval = Interval( start=np.array([0, 1, 2]), end=np.array([1, 2, 3]), go_cue_time=np.array([0.5, 1.5, 2.5]), drifting_gratins_dir=np.array([0, 45, 90]), timekeys=["start", "end", "go_cue_time"], ) with h5py.File("data.h5", "w") as f: interval.to_hdf5(f)
- classmethod from_hdf5(file)[source]¶
Loads the data object from an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
Note
This method will load all data in memory, if you would like to use lazy loading, call
LazyInterval.from_hdf5()
instead.import h5py from temporaldata import Interval with h5py.File("data.h5", "r") as f: interval = Interval.from_hdf5(f)
- class Data(*, domain=None, **kwargs)[source]¶
Bases:
object
- A data object is a container for other data objects such as
ArrayDict
, RegularTimeSeries
,IrregularTimeSeries
, andInterval
objects. But also regular objects like sclars, strings and numpy arrays.
- Parameters:
Example
>>> import numpy as np >>> from temporaldata import ( ... ArrayDict, ... IrregularTimeSeries, ... RegularTimeSeries, ... Interval, ... Data, ... ) >>> data = Data( ... session_id="session_0", ... spikes=IrregularTimeSeries( ... timestamps=np.array([0.1, 0.2, 0.3, 2.1, 2.2, 2.3]), ... unit_index=np.array([0, 0, 1, 0, 1, 2]), ... waveforms=np.zeros((6, 48)), ... domain=Interval(0., 3.), ... ), ... lfp=RegularTimeSeries( ... raw=np.zeros((1000, 3)), ... sampling_rate=250., ... domain=Interval(0., 4.), ... ), ... units=ArrayDict( ... id=np.array(["unit_0", "unit_1", "unit_2"]), ... brain_region=np.array(["M1", "M1", "PMd"]), ... ), ... trials=Interval( ... start=np.array([0, 1, 2]), ... end=np.array([1, 2, 3]), ... go_cue_time=np.array([0.5, 1.5, 2.5]), ... drifting_gratings_dir=np.array([0, 45, 90]), ... ), ... drifting_gratings_imgs=np.zeros((8, 3, 32, 32)), ... domain=Interval(0., 4.), ... ) >>> data Data( session_id='session_0', spikes=IrregularTimeSeries( timestamps=[6], unit_index=[6], waveforms=[6, 48] ), lfp=RegularTimeSeries( raw=[1000, 3] ), units=ArrayDict( id=[3], brain_region=[3] ), trials=Interval( start=[3], end=[3], go_cue_time=[3], drifting_gratings_dir=[3] ), drifting_gratings_imgs=[8, 3, 32, 32], ) >>> data.slice(1, 3) Data( session_id='session_0', spikes=IrregularTimeSeries( timestamps=[3], unit_index=[3], waveforms=[3, 48] ), lfp=RegularTimeSeries( raw=[500, 3] ), units=ArrayDict( id=[3], brain_region=[3] ), trials=Interval( start=[2], end=[2], go_cue_time=[2], drifting_gratings_dir=[2] ), drifting_gratings_imgs=[8, 3, 32, 32], _absolute_start=1.0, )
- property domain¶
Returns the domain of the data object.
- property start¶
Returns the start time of the data object.
- property end¶
Returns the end time of the data object.
- property absolute_start¶
Returns the start time of this slice relative to the original start time. Should be 0. if the data object has not been sliced.
Example
>>> from temporaldata import Data >>> data = Data(domain=Interval(0., 4.)) >>> data.absolute_start 0.0 >>> data = data.slice(1, 3) >>> data.absolute_start 1.0 >>> data = data.slice(0.4, 1.4) >>> data.absolute_start 1.4
- slice(start, end, reset_origin=True)[source]¶
Returns a new
Data
object that contains the data between the start and end times. This method will slice all time-based attributes that are present in the data object.
- select_by_interval(interval)[source]¶
Return a new
IrregularTimeSeries
object where all timestamps are within the interval.- Parameters:
interval (
Interval
) – Interval object.
- to_hdf5(file, serialize_fn_map=None)[source]¶
Saves the data object to an HDF5 file. This method will also call the to_hdf5 method of all contained data objects, so that the entire data object is saved to the HDF5 file, i.e. no need to call to_hdf5 for each contained data object.
- Parameters:
file (h5py.File) – HDF5 file.
import h5py from temporaldata import Data data = Data(...) with h5py.File("data.h5", "w") as f: data.to_hdf5(f)
- classmethod from_hdf5(file, lazy=True)[source]¶
Loads the data object from an HDF5 file. This method will also call the from_hdf5 method of all contained data objects, so that the entire data object is loaded from the HDF5 file, i.e. no need to call from_hdf5 for each contained data object.
- Parameters:
file (h5py.File) – HDF5 file.
Note
This method will load all data in memory, if you would like to use lazy loading, call
LazyData.from_hdf5()
instead.import h5py from temporaldata import Data with h5py.File("data.h5", "r") as f: data = Data.from_hdf5(f)
- add_split_mask(name, interval)[source]¶
Create split masks for all Data, Interval & IrregularTimeSeries objects contained within this Data object.
- get_nested_attribute(path)[source]¶
Returns the attribute specified by the path. The path can be nested using dots. For example, if the path is “spikes.timestamps”, this method will return the timestamps attribute of the spikes object.
- A data object is a container for other data objects such as
Functions¶
Concatenate multiple data objects into a single object. |
- concat(objs, sort=True)[source]¶
Concatenates multiple time series objects into a single object.
- Parameters:
objs (List[Union[IrregularTimeSeries, RegularTimeSeries]]) – List of time series objects to concatenate.
sort (bool, optional) – Whether to sort the resulting time series by timestamps. Only applies to IrregularTimeSeries. Defaults to True.
- Returns:
The concatenated time series object.
- Return type:
Union[IrregularTimeSeries, RegularTimeSeries]
- Raises:
ValueError – If objects are not all of the same type or don’t have matching keys.
NotImplementedError – If concatenation is not implemented for the given object type.
Example
>>> import numpy as np >>> from temporaldata import IrregularTimeSeries, Interval, concat >>> ts1 = IrregularTimeSeries( ... timestamps=np.array([0.0, 1.0]), ... values=np.array([1.0, 2.0]), ... domain="auto", ... ) >>> ts2 = IrregularTimeSeries( ... timestamps=np.array([2.0, 3.0]), ... values=np.array([3.0, 4.0]), ... domain="auto", ... ) >>> ts_concat = concat([ts1, ts2]) >>> ts_concat IrregularTimeSeries( timestamps=[4], values=[4] ) >>> ts_concat.timestamps array([0., 1., 2., 3.])