Interval¶

class Interval(start, end, *, timekeys=None, **kwargs)[source]¶

Bases: ArrayDict

An interval object is a set of time intervals each defined by a start time and an end time. For Interval, we do not need to define a domain, since the interval itself is its own domain.

Parameters:

start (Union[float, ndarray]) – an array of start times of shape (N,) or a float.
end (Union[float, ndarray]) – an array of end times of shape (N,) or a float.
timekeys – a list of strings that specify which attributes are time-based attributes.
**kwargs (ndarray) – arrays that shares the same first dimension N.

Example

>>> import numpy as np
>>> from temporaldata import Interval

>>> intervals = Interval(
...    start=np.array([0., 1., 2.]),
...    end=np.array([1., 2., 3.]),
...    go_cue_time=np.array([0.5, 1.5, 2.5]),
...    drifting_gratings_dir=np.array([0, 45, 90]),
...    timekeys=["start", "end", "go_cue_time"],
... )

>>> intervals
Interval(
  start=[3],
  end=[3],
  go_cue_time=[3],
  drifting_gratings_dir=[3]
)

>>> intervals.keys()
['start', 'end', 'go_cue_time', 'drifting_gratings_dir']

>>> intervals.is_sorted()
True

>>> intervals.is_disjoint()
True

>>> intervals.slice(1.5, 2.5)
Interval(
  start=[2],
  end=[2],
  go_cue_time=[2],
  drifting_gratings_dir=[2]
)

An Interval object with a single interval can be simply created by passing a single float to the start and end arguments.

Example

>>> Interval(0., 1.)
Interval(
  start=[1],
  end=[1]
)

timekeys()[source]¶: Returns a list of all time-based attributes.

register_timekey(timekey)[source]¶: Register a new time-based attribute.

is_disjoint()[source]¶: Returns True if the intervals are disjoint, i.e. if no two intervals overlap.

is_sorted()[source]¶: Returns True if the intervals are sorted.

sort()[source]¶: Sorts the intervals, and reorders the other attributes accordingly. This method is done in place.

Note

This method only works if the intervals are disjoint. If the intervals overlap, it is not possible to resolve the order of the intervals, and this method will raise an error.

slice(start, end, reset_origin=True)[source]¶

Returns a new Interval object that contains the data between the start and end times. An interval is included if it has any overlap with the slicing window. The end time is exclusive.

If reset_origin is set to True, all time attributes will be updated to be relative to the new start time.

Warning

If the intervals are not sorted, they will be automatically sorted in place.

Parameters:

start (float) – Start time.
end (float) – End time.
reset_origin (bool) – If True, all time attributes will be updated to be relative to the new start time. Defaults to True.

select_by_mask(mask)[source]¶: Return a new Interval object where all array attributes are indexed using the boolean mask.

select_by_interval(interval)[source]¶

Return a new IrregularTimeSeries object where all timestamps are within the interval.

Parameters:: interval (Interval) – Interval object.

dilate(size, max_len=None)[source]¶

Dilates the intervals by a given size. The dilation is performed in both directions. This operation is designed to not create overlapping intervals, meaning for a given interval and a given direction, dilation will stop if another interval is too close. If distance between two intervals is less than size, both of them will dilate until they meet halfway but will never overlap. You can think of dilation as inflating ballons that will never merge, and will stop each other from moving too far.

Parameters:

size (float) – The size of the dilation.
max_len – Dilation will not exceed this maximum length. For intervals that are already longer than max_len, there will be no dilation. By default, there is no maximum length.

coalesce(eps=1e-06)[source]¶

Coalesces the intervals that are closer than eps. This operation returns a new Interval object, and does not resolve the existing attributes.

Parameters:: eps (float) – The distance threshold for coalescing the intervals. Defaults to 1e-6.

Example

>>> interval = Interval(
...     start=np.array([0.0, 1.0, 2.0, 5.0, 5.5, 10.0]),
...     end=np.array([1.0, 2.0, 3.0, 5.5, 7.0, 12.0]),
... )
>>> coalesced = interval.coalesce()
>>> coalesced.start
array([ 0.,  5., 10.])
>>> coalesced.end
array([ 3.,  7., 12.])

difference(other)[source]¶: Returns the difference between two sets of intervals. The intervals are redefined as to not intersect with any interval in other.

split(sizes, *, shuffle=False, random_seed=None)[source]¶

Splits the set of intervals into multiple subsets. This will return a number of new Interval objects equal to the number of elements in sizes. If shuffle is set to True, the intervals will be shuffled before splitting.

Parameters:

sizes (Union[List[int], List[float]]) –
A list of integers or floats.
- Integers: The list must sum to the number of intervals. Example: [60, 20, 20] for 100 intervals.
- Floats: The list must sum to 1.0. Example: [0.6, 0.2, 0.2] for a 60/20/20 split.
shuffle – If True, the intervals will be shuffled before splitting.
random_seed – The random seed to use for shuffling.

Returns:

A list of Interval objects, one for each element in sizes.

Note

This method will not guarantee that the resulting sets will be disjoint, if the intervals are not already disjoint.

Examples

Split 10 intervals into 60/20/20 sets using integers:

>>> from temporaldata import Interval
>>> intervals = Interval.linspace(0, 1, 10)
>>> train, val, test = intervals.split([6, 2, 2])
>>> print(len(train), len(val), len(test))
6 2 2

Split using proportions (floats):

>>> intervals = Interval.linspace(0, 1, 100)
>>> train, val, test = intervals.split([0.7, 0.15, 0.15])
>>> print(len(train), len(val), len(test))
70 15 15

Split with shuffling:

>>> intervals = Interval.linspace(0, 1, 100)
>>> train, test = intervals.split(
...     [0.8, 0.2],
...     shuffle=True,
...     random_seed=42
... )
>>> print(len(train), len(test))
80 20

subdivide(step, drop_short=False)[source]¶

Subdivides each interval into fixed-duration segments while preserving attributes.

If the last segment of an interval is shorter than step, it will be included by default. Set drop_short to True to exclude these partial segments. If an interval is shorter than step, it will be treated as a partial segment (kept if drop_short is False, dropped otherwise).

Parameters:

step (float) – The duration of each segment.
drop_short (bool) – If True, excludes segments shorter than step. Defaults to False.

Return type:

Interval

Returns:

A new Interval object with the subdivided segments.

Example

>>> from temporaldata import Interval
>>> import numpy as np

>>> interval = Interval(
...     start=np.array([0.0, 20.0]),
...     end=np.array([10.0, 30.0]),
...     trial_id=np.array([1, 2])
... )
>>> subdivided = interval.subdivide(2.5)
>>> subdivided
Interval(
  start=[8],
  end=[8],
  trial_id=[8]
)
>>> subdivided.trial_id
array([1, 1, 1, 1, 2, 2, 2, 2])

classmethod linspace(start, end, steps)[source]¶

Create a regular interval with a given number of samples.

Parameters:

start (float) – Start time.
end (float) – End time.
steps (int) – Number of samples.

Example

>>> from temporaldata import Interval

>>> interval = Interval.linspace(0., 10., 100)

>>> interval
Interval(
  start=[100],
  end=[100]
)

classmethod arange(start, end, step, include_end=True)[source]¶

Create a grid of intervals with a given step size. If the last step cannot reach the end time, a smaller interval will be added, it will stop at the end time, and will be shorter than obj:step. This behavior can be changed by setting include_end to False.

Parameters:

start (float) – Start time.
end (float) – End time.
step (float) – Step size.
include_end (bool) – Whether to include a partial interval at the end.

classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)[source]¶

Create an Interval object from a pandas DataFrame. The dataframe must have a start time and end time columns. The names of these columns need to be “start” and “end” (use pd.Dataframe.rename if needed).

The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.

Parameters:

df (pandas.DataFrame) – DataFrame.
unsigned_to_long (bool, optional) – Whether to automatically convert unsigned integers to int64 dtype. Defaults to True.

classmethod from_list(interval_list)[source]¶

Create an Interval object from a list of (start, end) tuples.

Parameters:: interval_list (List[Tuple[float, float]]) – List of (start, end) tuples.

Example

>>> from temporaldata import Interval

>>> interval_list = [(0, 1), (1, 2), (2, 3)]
>>> interval = Interval.from_list(interval_list)

>>> interval.start, interval.end
(array([0., 1., 2.]), array([1., 2., 3.]))

to_hdf5(file)[source]¶

Saves the data object to an HDF5 file.

Parameters:: file (h5py.File) – HDF5 file.

import h5py
from temporaldata import Interval

interval = Interval(
    start=np.array([0, 1, 2]),
    end=np.array([1, 2, 3]),
    go_cue_time=np.array([0.5, 1.5, 2.5]),
    drifting_gratins_dir=np.array([0, 45, 90]),
    timekeys=["start", "end", "go_cue_time"],
)

with h5py.File("data.h5", "w") as f:
    interval.to_hdf5(f)

classmethod from_hdf5(file)[source]¶

Loads the data object from an HDF5 file.

Parameters:: file (h5py.File) – HDF5 file.

Note

This method will load all data in memory, if you would like to use lazy loading, call LazyInterval.from_hdf5() instead.

import h5py
from temporaldata import Interval

with h5py.File("data.h5", "r") as f:
    interval = Interval.from_hdf5(f)

keys()¶

Returns a list of all array attribute names.

Return type:: List[str]

materialize()¶

Materializes the data object, i.e., loads into memory all of the data that is still referenced in the HDF5 file.

Return type:: ArrayDict

class LazyInterval(start, end, *, timekeys=None, **kwargs)[source]¶

Bases: Interval

Lazy variant of Interval. The data is not loaded until it is accessed. This class is meant to be used when the data is too large to fit in memory, and is intended to be intantiated via. LazyInterval.from_hdf5.

Note

To access an attribute without triggering the in-memory loading use self.__dict__[key] otherwise using self.key or getattr(self, key) will trigger the lazy loading and will automatically convert the h5py dataset to a numpy array as well as apply any outstanding masks.

select_by_mask(mask)[source]¶: Return a new Interval object where all array attributes are indexed using the boolean mask.

slice(start, end, reset_origin=True)[source]¶: Returns a new Interval object that contains the data between the start and end times. An interval is included if it has any overlap with the slicing window.

to_hdf5(file)[source]¶

Saves the data object to an HDF5 file.

Parameters:: file (h5py.File) – HDF5 file.

import h5py
from temporaldata import Interval

interval = Interval(
    start=np.array([0, 1, 2]),
    end=np.array([1, 2, 3]),
    go_cue_time=np.array([0.5, 1.5, 2.5]),
    drifting_gratins_dir=np.array([0, 45, 90]),
    timekeys=["start", "end", "go_cue_time"],
)

with h5py.File("data.h5", "w") as f:
    interval.to_hdf5(f)

classmethod arange(start, end, step, include_end=True)¶

Parameters:

start (float) – Start time.
end (float) – End time.
step (float) – Step size.
include_end (bool) – Whether to include a partial interval at the end.

coalesce(eps=1e-06)¶

Coalesces the intervals that are closer than eps. This operation returns a new Interval object, and does not resolve the existing attributes.

Parameters:: eps (float) – The distance threshold for coalescing the intervals. Defaults to 1e-6.

Example

>>> interval = Interval(
...     start=np.array([0.0, 1.0, 2.0, 5.0, 5.5, 10.0]),
...     end=np.array([1.0, 2.0, 3.0, 5.5, 7.0, 12.0]),
... )
>>> coalesced = interval.coalesce()
>>> coalesced.start
array([ 0.,  5., 10.])
>>> coalesced.end
array([ 3.,  7., 12.])

difference(other)¶: Returns the difference between two sets of intervals. The intervals are redefined as to not intersect with any interval in other.

dilate(size, max_len=None)¶

Parameters:

size (float) – The size of the dilation.
max_len – Dilation will not exceed this maximum length. For intervals that are already longer than max_len, there will be no dilation. By default, there is no maximum length.

classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)¶

The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.

Parameters:

df (pandas.DataFrame) – DataFrame.
unsigned_to_long (bool, optional) – Whether to automatically convert unsigned integers to int64 dtype. Defaults to True.

classmethod from_hdf5(file)[source]¶

Loads the data object from an HDF5 file.

Parameters:: file (h5py.File) – HDF5 file.

import h5py
from temporaldata import ArrayDict

with h5py

classmethod from_list(interval_list)¶

Create an Interval object from a list of (start, end) tuples.

Parameters:: interval_list (List[Tuple[float, float]]) – List of (start, end) tuples.

Example

>>> from temporaldata import Interval

>>> interval_list = [(0, 1), (1, 2), (2, 3)]
>>> interval = Interval.from_list(interval_list)

>>> interval.start, interval.end
(array([0., 1., 2.]), array([1., 2., 3.]))

is_disjoint()¶: Returns True if the intervals are disjoint, i.e. if no two intervals overlap.

is_sorted()¶: Returns True if the intervals are sorted.

keys()¶

Returns a list of all array attribute names.

Return type:: List[str]

classmethod linspace(start, end, steps)¶

Create a regular interval with a given number of samples.

Parameters:

start (float) – Start time.
end (float) – End time.
steps (int) – Number of samples.

Example

>>> from temporaldata import Interval

>>> interval = Interval.linspace(0., 10., 100)

>>> interval
Interval(
  start=[100],
  end=[100]
)

materialize()¶

Materializes the data object, i.e., loads into memory all of the data that is still referenced in the HDF5 file.

Return type:: ArrayDict

register_timekey(timekey)¶: Register a new time-based attribute.

select_by_interval(interval)¶

Return a new IrregularTimeSeries object where all timestamps are within the interval.

Parameters:: interval (Interval) – Interval object.

sort()¶: Sorts the intervals, and reorders the other attributes accordingly. This method is done in place.

Note

This method only works if the intervals are disjoint. If the intervals overlap, it is not possible to resolve the order of the intervals, and this method will raise an error.

split(sizes, *, shuffle=False, random_seed=None)¶

Parameters:

sizes (Union[List[int], List[float]]) –
A list of integers or floats.
- Integers: The list must sum to the number of intervals. Example: [60, 20, 20] for 100 intervals.
- Floats: The list must sum to 1.0. Example: [0.6, 0.2, 0.2] for a 60/20/20 split.
shuffle – If True, the intervals will be shuffled before splitting.
random_seed – The random seed to use for shuffling.

Returns:

A list of Interval objects, one for each element in sizes.

Note

This method will not guarantee that the resulting sets will be disjoint, if the intervals are not already disjoint.

Examples

Split 10 intervals into 60/20/20 sets using integers:

>>> from temporaldata import Interval
>>> intervals = Interval.linspace(0, 1, 10)
>>> train, val, test = intervals.split([6, 2, 2])
>>> print(len(train), len(val), len(test))
6 2 2

Split using proportions (floats):

>>> intervals = Interval.linspace(0, 1, 100)
>>> train, val, test = intervals.split([0.7, 0.15, 0.15])
>>> print(len(train), len(val), len(test))
70 15 15

Split with shuffling:

>>> intervals = Interval.linspace(0, 1, 100)
>>> train, test = intervals.split(
...     [0.8, 0.2],
...     shuffle=True,
...     random_seed=42
... )
>>> print(len(train), len(test))
80 20

subdivide(step, drop_short=False)¶

Subdivides each interval into fixed-duration segments while preserving attributes.

Parameters:

step (float) – The duration of each segment.
drop_short (bool) – If True, excludes segments shorter than step. Defaults to False.

Return type:

Interval

Returns:

A new Interval object with the subdivided segments.

Example

>>> from temporaldata import Interval
>>> import numpy as np

>>> interval = Interval(
...     start=np.array([0.0, 20.0]),
...     end=np.array([10.0, 30.0]),
...     trial_id=np.array([1, 2])
... )
>>> subdivided = interval.subdivide(2.5)
>>> subdivided
Interval(
  start=[8],
  end=[8],
  trial_id=[8]
)
>>> subdivided.trial_id
array([1, 1, 1, 1, 2, 2, 2, 2])

timekeys()¶: Returns a list of all time-based attributes.