Data Objects

ArrayDict

A class representing an array dictionary.

IrregularTimeSeries

A class representing an irregular time series.

RegularTimeSeries

A class representing a regular time series.

Interval

A class representing an interval.

Data

A class representing data objects.

class ArrayDict(**kwargs)[source]

Bases: object

A dictionary of arrays that share the same first dimension. The number of dimensions for each array can be different, but they need to be at least 1-dimensional.

Parameters:

**kwargs (Dict[str, ndarray]) – arrays that shares the same first dimension.

Example

>>> from temporaldata import ArrayDict
>>> import numpy as np

>>> units = ArrayDict(
...     unit_id=np.array(["unit01", "unit02"]),
...     brain_region=np.array(["M1", "M1"]),
...     waveform_mean=np.random.rand(2, 48),
... )

>>> units
ArrayDict(
  unit_id=[2],
  brain_region=[2],
  waveform_mean=[2, 48]
)
keys()[source]

Returns a list of all array attribute names.

Return type:

List[str]

select_by_mask(mask, **kwargs)[source]

Return a new ArrayDict object where all array attributes are indexed using the boolean mask.

Parameters:
  • mask (ndarray) – Boolean array used for masking. The mask needs to be 1-dimensional, and of equal length as the first dimension of the ArrayDict.

  • **kwargs – Private attributes that will not be masked will need to be passed as arguments.

Example

>>> from temporaldata import ArrayDict
>>> import numpy as np

>>> units = ArrayDict(
...     unit_id=np.array(["unit01", "unit02"]),
...     brain_region=np.array(["M1", "M1"]),
...     waveform_mean=np.random.rand(2, 48),
... )

>>> units_subset = units.select_by_mask(np.array([True, False]))
>>> units_subset
ArrayDict(
  unit_id=[1],
  brain_region=[1],
  waveform_mean=[1, 48]
)
classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)[source]

Creates an ArrayDict object from a pandas DataFrame.

The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.

Parameters:
  • df (pandas.DataFrame) – DataFrame.

  • unsigned_to_long (bool, optional) – If True, automatically converts unsigned integers to int64. Defaults to True.

to_hdf5(file)[source]

Saves the data object to an HDF5 file.

Parameters:

file (h5py.File) – HDF5 file.

import h5py
from temporaldata import ArrayDict

data = ArrayDict(
    unit_id=np.array(["unit01", "unit02"]),
    brain_region=np.array(["M1", "M1"]),
    waveform_mean=np.zeros((2, 48)),
)

with h5py.File("data.h5", "w") as f:
    data.to_hdf5(f)
classmethod from_hdf5(file)[source]

Loads the data object from an HDF5 file.

Parameters:

file (h5py.File) – HDF5 file.

Note

This method will load all data in memory, if you would like to use lazy loading, call LazyArrayDict.from_hdf5() instead.

import h5py
from temporaldata import ArrayDict

with h5py.File("data.h5", "r") as f:
    data = ArrayDict.from_hdf5(f)
materialize()[source]

Materializes the data object, i.e., loads into memory all of the data that is still referenced in the HDF5 file.

Return type:

ArrayDict

class IrregularTimeSeries(timestamps, *, timekeys=None, domain, **kwargs)[source]

Bases: ArrayDict

An irregular time series is defined by a set of timestamps and a set of attributes that must share the same first dimension as the timestamps. This data object is ideal for event-based data as well as irregularly sampled time series.

Parameters:
  • timestamps (ndarray) – an array of timestamps of shape (N,).

  • timekeys (Optional[List[str]]) – a list of strings that specify which attributes are time-based attributes, this ensures that these attributes are updated appropriately when slicing.

  • domain (Union[Interval, str]) – an Interval object that defines the domain over which the timeseries is defined. If set to "auto", the domain will be automatically the interval defined by the minimum and maximum timestamps.

  • **kwargs (Dict[str, ndarray]) – arrays that shares the same first dimension N.

Example

>>> import numpy as np
>>> from temporaldata import IrregularTimeSeries

>>> spikes = IrregularTimeSeries(
...     unit_index=np.array([0, 0, 1, 0, 1, 2]),
...     timestamps=np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6]),
...     waveforms=np.zeros((6, 48)),
...     domain="auto",
... )

>>> spikes
IrregularTimeSeries(
  timestamps=[6],
  unit_index=[6],
  waveforms=[6, 48]
)

>>> spikes.domain.start, spikes.domain.end
(array([0.1]), array([0.6]))

>>> spikes.keys()
['timestamps', 'unit_index', 'waveforms']

>>> spikes.is_sorted()
True

>>> slice_of_spikes = spikes.slice(0.2, 0.5)
>>> slice_of_spikes
IrregularTimeSeries(
  timestamps=[3],
  unit_index=[3],
  waveforms=[3, 48]
)

>>> slice_of_spikes.domain.start, slice_of_spikes.domain.end
(array([0.]), array([0.3]))

>>> slice_of_spikes.timestamps
array([0. , 0.1, 0.2])
property domain

The time domain over which the time series is defined. Usually a single interval, but could also be a set of intervals.

timekeys()[source]

Returns a list of all time-based attributes.

register_timekey(timekey)[source]

Register a new time-based attribute.

is_sorted()[source]

Returns True if the timestamps are sorted.

sort()[source]

Sorts the timestamps, and reorders the other attributes accordingly. This method is applied in place.

slice(start, end, reset_origin=True)[source]

Returns a new IrregularTimeSeries object that contains the data between the start and end times. The end time is exclusive, the slice will only include data in \([\textrm{start}, \textrm{end})\).

If reset_origin is True, all time attributes are updated to be relative to the new start time. The domain is also updated accordingly.

Warning

If the time series is not sorted, it will be automatically sorted in place.

Parameters:
  • start (float) – Start time.

  • end (float) – End time.

  • reset_origin (bool) – If True, all time attributes will be updated to be relative to the new start time. Defaults to True.

select_by_mask(mask)[source]

Return a new IrregularTimeSeries object where all array attributes are indexed using the boolean mask.

Note that this will not update the domain, as it is unclear how to resolve the domain when the mask is applied. If you wish to update the domain, you should do so manually.

select_by_interval(interval)[source]

Return a new IrregularTimeSeries object where all timestamps are within the interval.

Parameters:

interval (Interval) – Interval object.

add_split_mask(name, interval)[source]

Adds a boolean mask as an array attribute, which is defined for each timestamp, and is set to True for all timestamps that are within interval. The mask attribute will be called <name>_mask.

This is used to mark points in the time series, as part of train, validation, or test sets, and is useful to ensure that there is no data leakage.

Parameters:
  • name (str) – name of the split, e.g. “train”, “valid”, “test”.

  • interval (Interval) – a set of intervals defining the split domain.

classmethod from_dataframe(df, domain='auto', unsigned_to_long=True)[source]

Create an IrregularTimeseries object from a pandas DataFrame. The dataframe must have a timestamps column, with the name "timestamps" (use pd.Dataframe.rename if needed).

The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.

Parameters:
  • df (DataFrame) – DataFrame.

  • unsigned_to_long (bool) – Whether to automatically convert unsigned integers to int64 dtype. Defaults to True.

  • domain (optional) – The domain over which the time series is defined. If set to "auto", the domain will be automatically the interval defined by the minimum and maximum timestamps. Defaults to "auto".

to_hdf5(file)[source]

Saves the data object to an HDF5 file.

Parameters:

file (h5py.File) – HDF5 file.

Warning

If the time series is not sorted, it will be automatically sorted in place.

import h5py
from temporaldata import IrregularTimeseries

data = IrregularTimeseries(
    unit_index=np.array([0, 0, 1, 0, 1, 2]),
    timestamps=np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6]),
    waveforms=np.zeros((6, 48)),
    domain="auto",
)

with h5py.File("data.h5", "w") as f:
    data.to_hdf5(f)
classmethod from_hdf5(file)[source]

Loads the data object from an HDF5 file.

Parameters:

file (h5py.File) – HDF5 file.

Note

This method will load all data in memory, if you would like to use lazy loading, call LazyIrregularTimeSeries.from_hdf5() instead.

import h5py
from temporaldata import IrregularTimeSeries

with h5py.File("data.h5", "r") as f:
    data = IrregularTimeSeries.from_hdf5(f)
keys()

Returns a list of all array attribute names.

Return type:

List[str]

materialize()

Materializes the data object, i.e., loads into memory all of the data that is still referenced in the HDF5 file.

Return type:

ArrayDict

class RegularTimeSeries(*, sampling_rate, domain=None, domain_start=0.0, **kwargs)[source]

Bases: ArrayDict

A regular time series is the same as an irregular time series, but it has a regular sampling rate. This allows for faster indexing, possibility of patching data and meaningful Fourier operations. The first dimension of all attributes must be the time dimension.

Note

If you have a matrix of shape (N, T), where N is the number of channels and T is the number of time points, you should transpose it to (T, N) before passing it to the constructor, since the first dimension should always be time.

Parameters:
  • sampling_rate (float) – Sampling rate in Hz.

  • domain (Optional[Interval]) – an Interval object that defines the domain over which the timeseries is defined. It is not possible to set domain to "auto".

  • **kwargs (Dict[str, ndarray]) – Arbitrary keyword arguments where the values are arbitrary multi-dimensional (2d, 3d, …, nd) arrays with shape (N, *).

Example

>>> import numpy as np
>>> from temporaldata import RegularTimeSeries

>>> lfp = RegularTimeSeries(
...     raw=np.zeros((1000, 128)),
...     sampling_rate=250.,
...     domain=Interval(0., 4.),
... )

>>> lfp.slice(0, 1)
RegularTimeSeries(
  raw=[250, 128]
)

>>> lfp.to_irregular()
IrregularTimeSeries(
  timestamps=[1000],
  raw=[1000, 128]
)
property sampling_rate: float

Returns the sampling rate in Hz.

property domain: Interval

Returns the domain of the time series.

timekeys()[source]

Returns a list of all time-based attributes.

select_by_mask(mask)[source]

Return a new ArrayDict object where all array attributes are indexed using the boolean mask.

Parameters:
  • mask (ndarray) – Boolean array used for masking. The mask needs to be 1-dimensional, and of equal length as the first dimension of the ArrayDict.

  • **kwargs – Private attributes that will not be masked will need to be passed as arguments.

Example

>>> from temporaldata import ArrayDict
>>> import numpy as np

>>> units = ArrayDict(
...     unit_id=np.array(["unit01", "unit02"]),
...     brain_region=np.array(["M1", "M1"]),
...     waveform_mean=np.random.rand(2, 48),
... )

>>> units_subset = units.select_by_mask(np.array([True, False]))
>>> units_subset
ArrayDict(
  unit_id=[1],
  brain_region=[1],
  waveform_mean=[1, 48]
)
slice(start, end, reset_origin=True)[source]

Returns a new RegularTimeSeries object that contains the data between the start (inclusive) and end (exclusive) times.

When slicing, the start and end times are rounded to the nearest timestamp.

Parameters:
  • start (float) – Start time.

  • end (float) – End time.

  • reset_origin (bool) – If True, all time attributes will be updated to be relative to the new start time. Defaults to True.

add_split_mask(name, interval)[source]

Adds a boolean mask as an array attribute, which is defined for each timestamp, and is set to True for all timestamps that are within interval. The mask attribute will be called <name>_mask.

This is used to mark points in the time series, as part of train, validation, or test sets, and is useful to ensure that there is no data leakage.

Parameters:
  • name (str) – name of the split, e.g. “train”, “valid”, “test”.

  • interval (Interval) – a set of intervals defining the split domain.

to_irregular()[source]

Converts the time series to an irregular time series.

property timestamps

Returns the timestamps of the time series.

to_hdf5(file)[source]

Saves the data object to an HDF5 file.

Parameters:

file (h5py.File) – HDF5 file.

import h5py
from temporaldata import RegularTimeSeries

data = RegularTimeSeries(
    raw=np.zeros((1000, 128)),
    sampling_rate=250.,
    domain=Interval(0., 4.),
)

with h5py.File("data.h5", "w") as f:
    data.to_hdf5(f)
classmethod from_hdf5(file)[source]

Loads the data object from an HDF5 file.

Parameters:

file (h5py.File) – HDF5 file.

Note

This method will load all data in memory, if you would like to use lazy loading, call LazyRegularTimeSeries.from_hdf5() instead.

import h5py
from temporaldata import RegularTimeSeries

with h5py.File("data.h5", "r") as f:
    data = RegularTimeSeries.from_hdf5(f)
classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)

Creates an ArrayDict object from a pandas DataFrame.

The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.

Parameters:
  • df (pandas.DataFrame) – DataFrame.

  • unsigned_to_long (bool, optional) – If True, automatically converts unsigned integers to int64. Defaults to True.

keys()

Returns a list of all array attribute names.

Return type:

List[str]

materialize()

Materializes the data object, i.e., loads into memory all of the data that is still referenced in the HDF5 file.

Return type:

ArrayDict

class Interval(start, end, *, timekeys=None, **kwargs)[source]

Bases: ArrayDict

An interval object is a set of time intervals each defined by a start time and an end time. For Interval, we do not need to define a domain, since the interval itself is its own domain.

Parameters:
  • start (Union[float, ndarray]) – an array of start times of shape (N,) or a float.

  • end (Union[float, ndarray]) – an array of end times of shape (N,) or a float.

  • timekeys – a list of strings that specify which attributes are time-based attributes.

  • **kwargs – arrays that shares the same first dimension N.

Example

>>> import numpy as np
>>> from temporaldata import Interval

>>> intervals = Interval(
...    start=np.array([0., 1., 2.]),
...    end=np.array([1., 2., 3.]),
...    go_cue_time=np.array([0.5, 1.5, 2.5]),
...    drifting_gratings_dir=np.array([0, 45, 90]),
...    timekeys=["start", "end", "go_cue_time"],
... )

>>> intervals
Interval(
  start=[3],
  end=[3],
  go_cue_time=[3],
  drifting_gratings_dir=[3]
)

>>> intervals.keys()
['start', 'end', 'go_cue_time', 'drifting_gratings_dir']

>>> intervals.is_sorted()
True

>>> intervals.is_disjoint()
True

>>> intervals.slice(1.5, 2.5)
Interval(
  start=[2],
  end=[2],
  go_cue_time=[2],
  drifting_gratings_dir=[2]
)

An Interval object with a single interval can be simply created by passing a single float to the start and end arguments.

Example

>>> Interval(0., 1.)
Interval(
  start=[1],
  end=[1]
)
timekeys()[source]

Returns a list of all time-based attributes.

register_timekey(timekey)[source]

Register a new time-based attribute.

is_disjoint()[source]

Returns True if the intervals are disjoint, i.e. if no two intervals overlap.

is_sorted()[source]

Returns True if the intervals are sorted.

sort()[source]

Sorts the intervals, and reorders the other attributes accordingly. This method is done in place.

Note

This method only works if the intervals are disjoint. If the intervals overlap, it is not possible to resolve the order of the intervals, and this method will raise an error.

slice(start, end, reset_origin=True)[source]

Returns a new Interval object that contains the data between the start and end times. An interval is included if it has any overlap with the slicing window. The end time is exclusive.

If reset_origin is set to True, all time attributes will be updated to be relative to the new start time.

Warning

If the intervals are not sorted, they will be automatically sorted in place.

Parameters:
  • start (float) – Start time.

  • end (float) – End time.

  • reset_origin (bool) – If True, all time attributes will be updated to be relative to the new start time. Defaults to True.

select_by_mask(mask)[source]

Return a new Interval object where all array attributes are indexed using the boolean mask.

select_by_interval(interval)[source]

Return a new IrregularTimeSeries object where all timestamps are within the interval.

Parameters:

interval (Interval) – Interval object.

dilate(size, max_len=None)[source]

Dilates the intervals by a given size. The dilation is performed in both directions. This operation is designed to not create overlapping intervals, meaning for a given interval and a given direction, dilation will stop if another interval is too close. If distance between two intervals is less than size, both of them will dilate until they meet halfway but will never overlap. You can think of dilation as inflating ballons that will never merge, and will stop each other from moving too far.

Parameters:
  • size (float) – The size of the dilation.

  • max_len – Dilation will not exceed this maximum length. For intervals that are already longer than max_len, there will be no dilation. By default, there is no maximum length.

coalesce(eps=1e-06)[source]

Coalesces the intervals that are closer than eps. This operation returns a new Interval object, and does not resolve the existing attributes.

Parameters:

eps – The distance threshold for coalescing the intervals. Defaults to 1e-6.

difference(other)[source]

Returns the difference between two sets of intervals. The intervals are redefined as to not intersect with any interval in other.

split(sizes, *, shuffle=False, random_seed=None)[source]

Splits the set of intervals into multiple subsets. This will return a number of new Interval objects equal to the number of elements in sizes. If shuffle is set to True, the intervals will be shuffled before splitting.

Parameters:
  • sizes (Union[List[int], List[float]]) – A list of integers or floats. If integers, the list must sum to the

  • floats (number of intervals. If)

  • 1.0. (the list must sum to)

  • shuffle – If True, the intervals will be shuffled before splitting.

  • random_seed – The random seed to use for shuffling.

Note

This method will not guarantee that the resulting sets will be disjoint, if the intervals are not already disjoint.

add_split_mask(name, interval)[source]

Adds a boolean mask as an array attribute, which is defined for each interval in the object, and is set to True if the interval intersects with the provided Interval object. The mask attribute will be called <name>_mask.

This is used to mark intervals as part of train, validation, or test sets, and is useful to ensure that there is no data leakage.

If an interval belongs to multiple splits, an error will be raised, unless this is expected, in which case the method allow_split_mask_overlap() should be called.

Parameters:
  • name (str) – name of the split, e.g. “train”, “valid”, “test”.

  • interval (Interval) – a set of intervals defining the split domain.

allow_split_mask_overlap()[source]

Disables the check for split mask overlap. This means there could be an overlap between the intervals across different splits. This is useful when an interval is allowed to belong to multiple splits.

classmethod linspace(start, end, steps)[source]

Create a regular interval with a given number of samples.

Parameters:
  • start (float) – Start time.

  • end (float) – End time.

  • steps (int) – Number of samples.

Example

>>> from temporaldata import Interval

>>> interval = Interval.linspace(0., 10., 100)

>>> interval
Interval(
  start=[100],
  end=[100]
)
classmethod arange(start, end, step, include_end=True)[source]

Create a grid of intervals with a given step size. If the last step cannot reach the end time, a smaller interval will be added, it will stop at the end time, and will be shorter than obj:step. This behavior can be changed by setting include_end to False.

Parameters:
  • start (float) – Start time.

  • end (float) – End time.

  • step (float) – Step size.

  • include_end (bool) – Whether to include a partial interval at the end.

classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)[source]

Create an Interval object from a pandas DataFrame. The dataframe must have a start time and end time columns. The names of these columns need to be “start” and “end” (use pd.Dataframe.rename if needed).

The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.

Parameters:
  • df (pandas.DataFrame) – DataFrame.

  • unsigned_to_long (bool, optional) – Whether to automatically convert unsigned integers to int64 dtype. Defaults to True.

classmethod from_list(interval_list)[source]

Create an Interval object from a list of (start, end) tuples.

Parameters:

interval_list (List[Tuple[float, float]]) – List of (start, end) tuples.

Example

>>> from temporaldata import Interval

>>> interval_list = [(0, 1), (1, 2), (2, 3)]
>>> interval = Interval.from_list(interval_list)

>>> interval.start, interval.end
(array([0., 1., 2.]), array([1., 2., 3.]))
to_hdf5(file)[source]

Saves the data object to an HDF5 file.

Parameters:

file (h5py.File) – HDF5 file.

import h5py
from temporaldata import Interval

interval = Interval(
    start=np.array([0, 1, 2]),
    end=np.array([1, 2, 3]),
    go_cue_time=np.array([0.5, 1.5, 2.5]),
    drifting_gratins_dir=np.array([0, 45, 90]),
    timekeys=["start", "end", "go_cue_time"],
)

with h5py.File("data.h5", "w") as f:
    interval.to_hdf5(f)
classmethod from_hdf5(file)[source]

Loads the data object from an HDF5 file.

Parameters:

file (h5py.File) – HDF5 file.

Note

This method will load all data in memory, if you would like to use lazy loading, call LazyInterval.from_hdf5() instead.

import h5py
from temporaldata import Interval

with h5py.File("data.h5", "r") as f:
    interval = Interval.from_hdf5(f)
keys()

Returns a list of all array attribute names.

Return type:

List[str]

materialize()

Materializes the data object, i.e., loads into memory all of the data that is still referenced in the HDF5 file.

Return type:

ArrayDict

class Data(*, domain=None, **kwargs)[source]

Bases: object

A data object is a container for other data objects such as ArrayDict,

RegularTimeSeries, IrregularTimeSeries, and Interval objects. But also regular objects like sclars, strings and numpy arrays.

Parameters:

Example

>>> import numpy as np
>>> from temporaldata import (
...     ArrayDict,
...     IrregularTimeSeries,
...     RegularTimeSeries,
...     Interval,
...     Data,
... )

>>> data = Data(
...     session_id="session_0",
...     spikes=IrregularTimeSeries(
...         timestamps=np.array([0.1, 0.2, 0.3, 2.1, 2.2, 2.3]),
...         unit_index=np.array([0, 0, 1, 0, 1, 2]),
...         waveforms=np.zeros((6, 48)),
...         domain=Interval(0., 3.),
...     ),
...     lfp=RegularTimeSeries(
...         raw=np.zeros((1000, 3)),
...         sampling_rate=250.,
...         domain=Interval(0., 4.),
...     ),
...     units=ArrayDict(
...         id=np.array(["unit_0", "unit_1", "unit_2"]),
...         brain_region=np.array(["M1", "M1", "PMd"]),
...     ),
...     trials=Interval(
...         start=np.array([0, 1, 2]),
...         end=np.array([1, 2, 3]),
...         go_cue_time=np.array([0.5, 1.5, 2.5]),
...         drifting_gratings_dir=np.array([0, 45, 90]),
...     ),
...     drifting_gratings_imgs=np.zeros((8, 3, 32, 32)),
...     domain=Interval(0., 4.),
... )

>>> data
Data(
session_id='session_0',
spikes=IrregularTimeSeries(
  timestamps=[6],
  unit_index=[6],
  waveforms=[6, 48]
),
lfp=RegularTimeSeries(
  raw=[1000, 3]
),
units=ArrayDict(
  id=[3],
  brain_region=[3]
),
trials=Interval(
  start=[3],
  end=[3],
  go_cue_time=[3],
  drifting_gratings_dir=[3]
),
drifting_gratings_imgs=[8, 3, 32, 32],
)

>>> data.slice(1, 3)
Data(
session_id='session_0',
spikes=IrregularTimeSeries(
  timestamps=[3],
  unit_index=[3],
  waveforms=[3, 48]
),
lfp=RegularTimeSeries(
  raw=[500, 3]
),
units=ArrayDict(
  id=[3],
  brain_region=[3]
),
trials=Interval(
  start=[2],
  end=[2],
  go_cue_time=[2],
  drifting_gratings_dir=[2]
),
drifting_gratings_imgs=[8, 3, 32, 32],
_absolute_start=1.0,
)
property domain

Returns the domain of the data object.

property start

Returns the start time of the data object.

property end

Returns the end time of the data object.

property absolute_start

Returns the start time of this slice relative to the original start time. Should be 0. if the data object has not been sliced.

Example

>>> from temporaldata import Data
>>> data = Data(domain=Interval(0., 4.))

>>> data.absolute_start
0.0

>>> data = data.slice(1, 3)
>>> data.absolute_start
1.0

>>> data = data.slice(0.4, 1.4)
>>> data.absolute_start
1.4
slice(start, end, reset_origin=True)[source]

Returns a new Data object that contains the data between the start and end times. This method will slice all time-based attributes that are present in the data object.

Parameters:
  • start (float) – Start time.

  • end (float) – End time.

  • reset_origin (bool) – If True, all time attributes will be updated to be relative to the new start time. Defaults to True.

select_by_interval(interval)[source]

Return a new IrregularTimeSeries object where all timestamps are within the interval.

Parameters:

interval (Interval) – Interval object.

to_dict()[source]

Returns a dictionary of stored key/value pairs.

Return type:

Dict[str, Any]

to_hdf5(file, serialize_fn_map=None)[source]

Saves the data object to an HDF5 file. This method will also call the to_hdf5 method of all contained data objects, so that the entire data object is saved to the HDF5 file, i.e. no need to call to_hdf5 for each contained data object.

Parameters:

file (h5py.File) – HDF5 file.

import h5py
from temporaldata import Data

data = Data(...)

with h5py.File("data.h5", "w") as f:
    data.to_hdf5(f)
classmethod from_hdf5(file, lazy=True)[source]

Loads the data object from an HDF5 file. This method will also call the from_hdf5 method of all contained data objects, so that the entire data object is loaded from the HDF5 file, i.e. no need to call from_hdf5 for each contained data object.

Parameters:

file (h5py.File) – HDF5 file.

Note

This method will load all data in memory, if you would like to use lazy loading, call LazyData.from_hdf5() instead.

import h5py
from temporaldata import Data

with h5py.File("data.h5", "r") as f:
    data = Data.from_hdf5(f)
set_train_domain(interval)[source]

Set the train domain for all attributes.

set_valid_domain(interval)[source]

Set the valid domain for all attributes.

set_test_domain(interval)[source]

Set the test domain for all attributes.

add_split_mask(name, interval)[source]

Create split masks for all Data, Interval & IrregularTimeSeries objects contained within this Data object.

keys()[source]

Returns a list of all attribute names.

Return type:

List[str]

get_nested_attribute(path)[source]

Returns the attribute specified by the path. The path can be nested using dots. For example, if the path is “spikes.timestamps”, this method will return the timestamps attribute of the spikes object.

Parameters:

path (str) – Nested attribute path.

Return type:

Any

has_nested_attribute(path)[source]

Check if the attribute specified by the path exists in the Data object.

Return type:

bool

materialize()[source]

Materializes the data object, i.e., loads into memory all of the data that is still referenced in the HDF5 file.

Return type:

Data

Functions

concat()

Concatenate multiple data objects into a single object.

concat(objs, sort=True)[source]

Concatenates multiple time series objects into a single object.

Parameters:
  • objs (List[Union[IrregularTimeSeries, RegularTimeSeries]]) – List of time series objects to concatenate.

  • sort (bool, optional) – Whether to sort the resulting time series by timestamps. Only applies to IrregularTimeSeries. Defaults to True.

Returns:

The concatenated time series object.

Return type:

Union[IrregularTimeSeries, RegularTimeSeries]

Raises:
  • ValueError – If objects are not all of the same type or don’t have matching keys.

  • NotImplementedError – If concatenation is not implemented for the given object type.

Example

>>> import numpy as np
>>> from temporaldata import IrregularTimeSeries, Interval, concat

>>> ts1 = IrregularTimeSeries(
...     timestamps=np.array([0.0, 1.0]),
...     values=np.array([1.0, 2.0]),
...     domain="auto",
... )
>>> ts2 = IrregularTimeSeries(
...     timestamps=np.array([2.0, 3.0]),
...     values=np.array([3.0, 4.0]),
...     domain="auto",
... )

>>> ts_concat = concat([ts1, ts2])
>>> ts_concat
IrregularTimeSeries(
  timestamps=[4],
  values=[4]
)
>>> ts_concat.timestamps
array([0., 1., 2., 3.])