Lazy Loading#

temporaldata uses hdf5 to store and load data. When loading data from hdf5, temporaldata uses lazy loading to defer loading data until it is needed:

  • Loading data only when attributes are accessed

  • Lazy slicing without loading the entire data object by creating an internal cached kd-tree for fast timestamp lookup

  • Deferring operations like slicing and masking until data is needed (when an attribute is accessed)

  • Managing an internal queue of operations (successive slices, masks, etc.) that is only resolved when the attribute is accessed

Attribute-Based Lazy Loading#

Data is loaded on a per-attribute basis. Only the attributes you access are loaded into memory:

user_session = Data.load("user_data.h5")

# No data loaded, just returns attribute names
print(user_session.keys())  # ['clicks', 'sensor', 'user_id', 'device']

# Only loads timestamps array from clicks
print(user_session.clicks.timestamps)  # [1.2, 2.3, 3.1]

# Only loads accelerometer data when accessed
print(user_session.sensor.accelerometer)
session = Data.load("neural_data.h5")

# No data loaded, just returns attribute names
print(session.keys())  # ['spikes', 'lfp', 'subject_id', 'date']

# Only loads timestamps array from spikes
print(session.spikes.timestamps)  # [1.2, 2.3, 3.1]

# Only loads raw LFP data when accessed
print(session.lfp.raw)

Time-Based Lazy Loading#

For time series data, you can efficiently load specific time windows without loading the entire dataset:

user_session = Data.load("user_data.h5")

# Define time window without loading
window = user_session.slice(start=0.0, end=2.0)

# Data loaded only for requested window
print(window.clicks.timestamps)  # [1.2]
print(window.sensor.accelerometer)  # First 200 samples
session = Data.load("neural_data.h5")

# Define time window without loading
window = session.slice(start=0.0, end=2.0)

# Data loaded only for requested window
print(window.spikes.timestamps)  # [1.2]
print(window.lfp.raw)  # First 2000 samples

Best Practices#

  1. Use lazy loading when:

    • Working with large datasets that may not fit in memory

    • Only needing specific time windows or attributes

    • Performing multiple operations before accessing data

  2. Consider materializing the data using .materialize() if you need to load the full data object into memory