Interval¶
- class Interval(start, end, *, timekeys=None, **kwargs)[source]¶
Bases:
ArrayDictAn interval object is a set of time intervals each defined by a start time and an end time. For
Interval, we do not need to define a domain, since the interval itself is its own domain.- Parameters:
start (
Union[float,ndarray]) – an array of start times of shape (N,) or a float.end (
Union[float,ndarray]) – an array of end times of shape (N,) or a float.timekeys – a list of strings that specify which attributes are time-based attributes.
**kwargs (
ndarray) – arrays that shares the same first dimension N.
Example
>>> import numpy as np >>> from temporaldata import Interval >>> intervals = Interval( ... start=np.array([0., 1., 2.]), ... end=np.array([1., 2., 3.]), ... go_cue_time=np.array([0.5, 1.5, 2.5]), ... drifting_gratings_dir=np.array([0, 45, 90]), ... timekeys=["start", "end", "go_cue_time"], ... ) >>> intervals Interval( start=[3], end=[3], go_cue_time=[3], drifting_gratings_dir=[3] ) >>> intervals.keys() ['start', 'end', 'go_cue_time', 'drifting_gratings_dir'] >>> intervals.is_sorted() True >>> intervals.is_disjoint() True >>> intervals.slice(1.5, 2.5) Interval( start=[2], end=[2], go_cue_time=[2], drifting_gratings_dir=[2] )
An
Intervalobject with a single interval can be simply created by passing a single float to thestartandendarguments.Example
>>> Interval(0., 1.) Interval( start=[1], end=[1] )
- is_disjoint()[source]¶
Returns
Trueif the intervals are disjoint, i.e. if no two intervals overlap.
- sort()[source]¶
Sorts the intervals, and reorders the other attributes accordingly. This method is done in place.
Note
This method only works if the intervals are disjoint. If the intervals overlap, it is not possible to resolve the order of the intervals, and this method will raise an error.
- slice(start, end, reset_origin=True)[source]¶
Returns a new
Intervalobject that contains the data between the start and end times. An interval is included if it has any overlap with the slicing window. The end time is exclusive.If
reset_originis set toTrue, all time attributes will be updated to be relative to the new start time.Warning
If the intervals are not sorted, they will be automatically sorted in place.
- select_by_mask(mask)[source]¶
Return a new
Intervalobject where all array attributes are indexed using the boolean mask.
- select_by_interval(interval)[source]¶
Return a new
IrregularTimeSeriesobject where all timestamps are within the interval.- Parameters:
interval (
Interval) – Interval object.
- dilate(size, max_len=None)[source]¶
Dilates the intervals by a given size. The dilation is performed in both directions. This operation is designed to not create overlapping intervals, meaning for a given interval and a given direction, dilation will stop if another interval is too close. If distance between two intervals is less than
size, both of them will dilate until they meet halfway but will never overlap. You can think of dilation as inflating ballons that will never merge, and will stop each other from moving too far.- Parameters:
size (
float) – The size of the dilation.max_len – Dilation will not exceed this maximum length. For intervals that are already longer than
max_len, there will be no dilation. By default, there is no maximum length.
- coalesce(eps=1e-06)[source]¶
Coalesces the intervals that are closer than
eps. This operation returns a newIntervalobject, and does not resolve the existing attributes.- Parameters:
eps (
float) – The distance threshold for coalescing the intervals. Defaults to 1e-6.
Example
>>> interval = Interval( ... start=np.array([0.0, 1.0, 2.0, 5.0, 5.5, 10.0]), ... end=np.array([1.0, 2.0, 3.0, 5.5, 7.0, 12.0]), ... ) >>> coalesced = interval.coalesce() >>> coalesced.start array([ 0., 5., 10.]) >>> coalesced.end array([ 3., 7., 12.])
- difference(other)[source]¶
Returns the difference between two sets of intervals. The intervals are redefined as to not intersect with any interval in
other.
- split(sizes, *, shuffle=False, random_seed=None)[source]¶
Splits the set of intervals into multiple subsets. This will return a number of new
Intervalobjects equal to the number of elements in sizes. If shuffle is set toTrue, the intervals will be shuffled before splitting.- Parameters:
sizes (
Union[List[int],List[float]]) –A list of integers or floats.
Integers: The list must sum to the number of intervals. Example:
[60, 20, 20]for 100 intervals.Floats: The list must sum to 1.0. Example:
[0.6, 0.2, 0.2]for a 60/20/20 split.
shuffle – If
True, the intervals will be shuffled before splitting.random_seed – The random seed to use for shuffling.
- Returns:
A list of
Intervalobjects, one for each element insizes.
Note
This method will not guarantee that the resulting sets will be disjoint, if the intervals are not already disjoint.
Examples
Split 10 intervals into 60/20/20 sets using integers:
>>> from temporaldata import Interval >>> intervals = Interval.linspace(0, 1, 10) >>> train, val, test = intervals.split([6, 2, 2]) >>> print(len(train), len(val), len(test)) 6 2 2
Split using proportions (floats):
>>> intervals = Interval.linspace(0, 1, 100) >>> train, val, test = intervals.split([0.7, 0.15, 0.15]) >>> print(len(train), len(val), len(test)) 70 15 15
Split with shuffling:
>>> intervals = Interval.linspace(0, 1, 100) >>> train, test = intervals.split( ... [0.8, 0.2], ... shuffle=True, ... random_seed=42 ... ) >>> print(len(train), len(test)) 80 20
- subdivide(step, drop_short=False)[source]¶
Subdivides each interval into fixed-duration segments while preserving attributes.
If the last segment of an interval is shorter than
step, it will be included by default. Setdrop_shorttoTrueto exclude these partial segments. If an interval is shorter thanstep, it will be treated as a partial segment (kept ifdrop_shortisFalse, dropped otherwise).- Parameters:
- Return type:
- Returns:
A new
Intervalobject with the subdivided segments.
Example
>>> from temporaldata import Interval >>> import numpy as np >>> interval = Interval( ... start=np.array([0.0, 20.0]), ... end=np.array([10.0, 30.0]), ... trial_id=np.array([1, 2]) ... ) >>> subdivided = interval.subdivide(2.5) >>> subdivided Interval( start=[8], end=[8], trial_id=[8] ) >>> subdivided.trial_id array([1, 1, 1, 1, 2, 2, 2, 2])
- classmethod linspace(start, end, steps)[source]¶
Create a regular interval with a given number of samples.
Example
>>> from temporaldata import Interval >>> interval = Interval.linspace(0., 10., 100) >>> interval Interval( start=[100], end=[100] )
- classmethod arange(start, end, step, include_end=True)[source]¶
Create a grid of intervals with a given step size. If the last step cannot reach the end time, a smaller interval will be added, it will stop at the end time, and will be shorter than obj:step. This behavior can be changed by setting include_end to
False.
- classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)[source]¶
Create an
Intervalobject from a pandas DataFrame. The dataframe must have a start time and end time columns. The names of these columns need to be “start” and “end” (use pd.Dataframe.rename if needed).The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.
- Parameters:
df (pandas.DataFrame) – DataFrame.
unsigned_to_long (bool, optional) – Whether to automatically convert unsigned integers to int64 dtype. Defaults to
True.
- classmethod from_list(interval_list)[source]¶
Create an
Intervalobject from a list of (start, end) tuples.Example
>>> from temporaldata import Interval >>> interval_list = [(0, 1), (1, 2), (2, 3)] >>> interval = Interval.from_list(interval_list) >>> interval.start, interval.end (array([0., 1., 2.]), array([1., 2., 3.]))
- to_hdf5(file)[source]¶
Saves the data object to an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
import h5py from temporaldata import Interval interval = Interval( start=np.array([0, 1, 2]), end=np.array([1, 2, 3]), go_cue_time=np.array([0.5, 1.5, 2.5]), drifting_gratins_dir=np.array([0, 45, 90]), timekeys=["start", "end", "go_cue_time"], ) with h5py.File("data.h5", "w") as f: interval.to_hdf5(f)
- classmethod from_hdf5(file)[source]¶
Loads the data object from an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
Note
This method will load all data in memory, if you would like to use lazy loading, call
LazyInterval.from_hdf5()instead.import h5py from temporaldata import Interval with h5py.File("data.h5", "r") as f: interval = Interval.from_hdf5(f)
- class LazyInterval(start, end, *, timekeys=None, **kwargs)[source]¶
Bases:
IntervalLazy variant of
Interval. The data is not loaded until it is accessed. This class is meant to be used when the data is too large to fit in memory, and is intended to be intantiated via.LazyInterval.from_hdf5.Note
To access an attribute without triggering the in-memory loading use self.__dict__[key] otherwise using self.key or getattr(self, key) will trigger the lazy loading and will automatically convert the h5py dataset to a numpy array as well as apply any outstanding masks.
- select_by_mask(mask)[source]¶
Return a new
Intervalobject where all array attributes are indexed using the boolean mask.
- slice(start, end, reset_origin=True)[source]¶
Returns a new
Intervalobject that contains the data between the start and end times. An interval is included if it has any overlap with the slicing window.
- to_hdf5(file)[source]¶
Saves the data object to an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
import h5py from temporaldata import Interval interval = Interval( start=np.array([0, 1, 2]), end=np.array([1, 2, 3]), go_cue_time=np.array([0.5, 1.5, 2.5]), drifting_gratins_dir=np.array([0, 45, 90]), timekeys=["start", "end", "go_cue_time"], ) with h5py.File("data.h5", "w") as f: interval.to_hdf5(f)
- classmethod arange(start, end, step, include_end=True)¶
Create a grid of intervals with a given step size. If the last step cannot reach the end time, a smaller interval will be added, it will stop at the end time, and will be shorter than obj:step. This behavior can be changed by setting include_end to
False.
- coalesce(eps=1e-06)¶
Coalesces the intervals that are closer than
eps. This operation returns a newIntervalobject, and does not resolve the existing attributes.- Parameters:
eps (
float) – The distance threshold for coalescing the intervals. Defaults to 1e-6.
Example
>>> interval = Interval( ... start=np.array([0.0, 1.0, 2.0, 5.0, 5.5, 10.0]), ... end=np.array([1.0, 2.0, 3.0, 5.5, 7.0, 12.0]), ... ) >>> coalesced = interval.coalesce() >>> coalesced.start array([ 0., 5., 10.]) >>> coalesced.end array([ 3., 7., 12.])
- difference(other)¶
Returns the difference between two sets of intervals. The intervals are redefined as to not intersect with any interval in
other.
- dilate(size, max_len=None)¶
Dilates the intervals by a given size. The dilation is performed in both directions. This operation is designed to not create overlapping intervals, meaning for a given interval and a given direction, dilation will stop if another interval is too close. If distance between two intervals is less than
size, both of them will dilate until they meet halfway but will never overlap. You can think of dilation as inflating ballons that will never merge, and will stop each other from moving too far.- Parameters:
size (
float) – The size of the dilation.max_len – Dilation will not exceed this maximum length. For intervals that are already longer than
max_len, there will be no dilation. By default, there is no maximum length.
- classmethod from_dataframe(df, unsigned_to_long=True, **kwargs)¶
Create an
Intervalobject from a pandas DataFrame. The dataframe must have a start time and end time columns. The names of these columns need to be “start” and “end” (use pd.Dataframe.rename if needed).The columns in the DataFrame are converted to arrays when possible, otherwise they will be skipped.
- Parameters:
df (pandas.DataFrame) – DataFrame.
unsigned_to_long (bool, optional) – Whether to automatically convert unsigned integers to int64 dtype. Defaults to
True.
- classmethod from_hdf5(file)[source]¶
Loads the data object from an HDF5 file.
- Parameters:
file (h5py.File) – HDF5 file.
import h5py from temporaldata import ArrayDict with h5py
- classmethod from_list(interval_list)¶
Create an
Intervalobject from a list of (start, end) tuples.Example
>>> from temporaldata import Interval >>> interval_list = [(0, 1), (1, 2), (2, 3)] >>> interval = Interval.from_list(interval_list) >>> interval.start, interval.end (array([0., 1., 2.]), array([1., 2., 3.]))
- classmethod linspace(start, end, steps)¶
Create a regular interval with a given number of samples.
Example
>>> from temporaldata import Interval >>> interval = Interval.linspace(0., 10., 100) >>> interval Interval( start=[100], end=[100] )
- materialize()¶
Materializes the data object, i.e., loads into memory all of the data that is still referenced in the HDF5 file.
- Return type:
- register_timekey(timekey)¶
Register a new time-based attribute.
- select_by_interval(interval)¶
Return a new
IrregularTimeSeriesobject where all timestamps are within the interval.- Parameters:
interval (
Interval) – Interval object.
- sort()¶
Sorts the intervals, and reorders the other attributes accordingly. This method is done in place.
Note
This method only works if the intervals are disjoint. If the intervals overlap, it is not possible to resolve the order of the intervals, and this method will raise an error.
- split(sizes, *, shuffle=False, random_seed=None)¶
Splits the set of intervals into multiple subsets. This will return a number of new
Intervalobjects equal to the number of elements in sizes. If shuffle is set toTrue, the intervals will be shuffled before splitting.- Parameters:
sizes (
Union[List[int],List[float]]) –A list of integers or floats.
Integers: The list must sum to the number of intervals. Example:
[60, 20, 20]for 100 intervals.Floats: The list must sum to 1.0. Example:
[0.6, 0.2, 0.2]for a 60/20/20 split.
shuffle – If
True, the intervals will be shuffled before splitting.random_seed – The random seed to use for shuffling.
- Returns:
A list of
Intervalobjects, one for each element insizes.
Note
This method will not guarantee that the resulting sets will be disjoint, if the intervals are not already disjoint.
Examples
Split 10 intervals into 60/20/20 sets using integers:
>>> from temporaldata import Interval >>> intervals = Interval.linspace(0, 1, 10) >>> train, val, test = intervals.split([6, 2, 2]) >>> print(len(train), len(val), len(test)) 6 2 2
Split using proportions (floats):
>>> intervals = Interval.linspace(0, 1, 100) >>> train, val, test = intervals.split([0.7, 0.15, 0.15]) >>> print(len(train), len(val), len(test)) 70 15 15
Split with shuffling:
>>> intervals = Interval.linspace(0, 1, 100) >>> train, test = intervals.split( ... [0.8, 0.2], ... shuffle=True, ... random_seed=42 ... ) >>> print(len(train), len(test)) 80 20
- subdivide(step, drop_short=False)¶
Subdivides each interval into fixed-duration segments while preserving attributes.
If the last segment of an interval is shorter than
step, it will be included by default. Setdrop_shorttoTrueto exclude these partial segments. If an interval is shorter thanstep, it will be treated as a partial segment (kept ifdrop_shortisFalse, dropped otherwise).- Parameters:
- Return type:
- Returns:
A new
Intervalobject with the subdivided segments.
Example
>>> from temporaldata import Interval >>> import numpy as np >>> interval = Interval( ... start=np.array([0.0, 20.0]), ... end=np.array([10.0, 30.0]), ... trial_id=np.array([1, 2]) ... ) >>> subdivided = interval.subdivide(2.5) >>> subdivided Interval( start=[8], end=[8], trial_id=[8] ) >>> subdivided.trial_id array([1, 1, 1, 1, 2, 2, 2, 2])
- timekeys()¶
Returns a list of all time-based attributes.