Skip to content

Dataset

Pure PyTorch Dataset for windowed time series from HDF5 files.

FileEntry dataclass

FileEntry(path: str, resampling_factor: float = 1.0)

A single HDF5 file with optional resampling.

Parameters:

Name Type Description Default
path str

filesystem path to the HDF5 file

required
resampling_factor float

scaling factor for the sequence length

1.0

WindowedDataset

WindowedDataset(entries: list[FileEntry], inputs, targets, win_sz: int | None = None, stp_sz: int = 1)

Bases: Dataset

Pure PyTorch Dataset for windowed time series from HDF5 files.

Parameters:

Name Type Description Default
entries list[FileEntry]

list of FileEntry (path + resampling_factor)

required
inputs

single reader or tuple of readers for input signals

required
targets

single reader or tuple of readers for target signals

required
win_sz int | None

window size in (resampled) samples, None = full-file mode

None
stp_sz int

step size between windows

1
Source code in tsfast/tsdata/dataset.py
def __init__(
    self,
    entries: list[FileEntry],
    inputs,
    targets,
    win_sz: int | None = None,
    stp_sz: int = 1,
):
    self.entries = entries
    self._inputs = (inputs,) if not isinstance(inputs, tuple) else inputs
    self._targets = (targets,) if not isinstance(targets, tuple) else targets
    self._single_input = not isinstance(inputs, tuple)
    self._single_target = not isinstance(targets, tuple)
    self.win_sz = win_sz
    self.stp_sz = stp_sz
    self._ref_block = self._find_temporal(*self._inputs, *self._targets)

    if win_sz is not None:
        ref_block = self._ref_block
        counts = []
        for e in entries:
            raw_len = ref_block.file_len(e.path)
            eff_len = int(raw_len * e.resampling_factor)
            n = max(0, (eff_len - win_sz) // stp_sz + 1)
            counts.append(n)
        self._cumsum = np.cumsum(counts)
        self._counts = np.array(counts)