Data Pipeline¶
DataLoaders container and create_dls factory for pure-PyTorch data pipeline.
DataLoaders ¶
DataLoaders(train: DataLoader, valid: DataLoader, test: DataLoader | None = None, *, dls_id: str | None = None)
Container for train/valid/test DataLoaders with on-demand normalization stats.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train
|
DataLoader
|
training DataLoader |
required |
valid
|
DataLoader
|
validation DataLoader |
required |
test
|
DataLoader | None
|
test DataLoader, or None if no test split |
None
|
dls_id
|
str | None
|
cache id for exact file-based normalization stats |
None
|
Source code in tsfast/tsdata/pipeline.py
norm_stats
property
¶
Normalization stats, computed lazily on first access.
Uses exact file-based stats when dls_id was provided to create_dls, otherwise estimates from the first 10 training batches.
stats ¶
Estimate normalization stats from training batches.
stats_from_files ¶
Compute exact stats from full HDF5 scan, with optional disk caching.
Source code in tsfast/tsdata/pipeline.py
get_io_size ¶
Get total input/output feature counts from DataLoaders readers.
Source code in tsfast/tsdata/pipeline.py
get_file_paths ¶
get_signal_names ¶
Extract (input_names, target_names) from a DataLoader's readers.
Returns None if readers don't expose signal names (non-HDF5 readers).
Source code in tsfast/tsdata/pipeline.py
create_dls_from_readers ¶
create_dls_from_readers(inputs, targets, train_files: list[Path | str], valid_files: list[Path | str], test_files: list[Path | str] | None = None, win_sz: int = 100, stp_sz: int = 1, bs: int = 64, valid_stp_sz: int | None = None, num_workers: int = 0, n_batches_train: int | None = 300, n_batches_valid: int | None = None, targ_fs: list[float] | float | None = None, src_fs: float | str | Callable | None = None, cache: bool = False, dls_id: str | None = None) -> DataLoaders
Create DataLoaders from user-provided readers and file lists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs
|
input reader or tuple of readers |
required | |
targets
|
target reader or tuple of readers |
required | |
train_files
|
list[Path | str]
|
training HDF5 files |
required |
valid_files
|
list[Path | str]
|
validation HDF5 files |
required |
test_files
|
list[Path | str] | None
|
test HDF5 files, or None |
None
|
win_sz
|
int
|
window size in (resampled) samples |
100
|
stp_sz
|
int
|
step size between consecutive training windows |
1
|
bs
|
int
|
batch size |
64
|
valid_stp_sz
|
int | None
|
step size between consecutive validation windows, defaults to win_sz |
None
|
num_workers
|
int
|
number of worker processes for the DataLoader |
0
|
n_batches_train
|
int | None
|
exact number of training batches per epoch, None for all |
300
|
n_batches_valid
|
int | None
|
exact number of validation batches per epoch, None for all |
None
|
targ_fs
|
list[float] | float | None
|
target sampling frequency/frequencies for resampling |
None
|
src_fs
|
float | str | Callable | None
|
source sampling frequency (number or HDF5 attribute name) |
None
|
cache
|
bool
|
cache file data in memory on first read for faster subsequent access |
False
|
dls_id
|
str | None
|
cache id for exact file-based normalization stats |
None
|
Source code in tsfast/tsdata/pipeline.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 | |
create_dls ¶
create_dls(u: list[str], y: list[str], dataset: Path | str | list | dict, win_sz: int = 100, stp_sz: int = 1, bs: int = 64, valid_stp_sz: int | None = None, num_workers: int = 0, n_batches_train: int | None = 300, n_batches_valid: int | None = None, dls_id: str | None = None, targ_fs: list[float] | float | None = None, src_fs: float | str | Callable | None = None, cache: bool = False) -> DataLoaders
Create DataLoaders from HDF5 time-series files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
u
|
list[str]
|
list of input signal names |
required |
y
|
list[str]
|
list of output signal names |
required |
dataset
|
Path | str | list | dict
|
path to dataset, list of filepaths, or {'train':[], 'valid':[], 'test':[]} dict |
required |
win_sz
|
int
|
window size in (resampled) samples |
100
|
stp_sz
|
int
|
step size between consecutive training windows |
1
|
bs
|
int
|
batch size |
64
|
valid_stp_sz
|
int | None
|
step size between consecutive validation windows, defaults to win_sz |
None
|
num_workers
|
int
|
number of worker processes for the DataLoader |
0
|
n_batches_train
|
int | None
|
exact number of training batches per epoch, None for all |
300
|
n_batches_valid
|
int | None
|
exact number of validation batches per epoch, None for all |
None
|
dls_id
|
str | None
|
cache id: when provided, computes exact stats from full training set and caches to disk |
None
|
targ_fs
|
list[float] | float | None
|
target sampling frequency/frequencies for resampling |
None
|
src_fs
|
float | str | Callable | None
|
source sampling frequency (number or HDF5 attribute name) |
None
|
cache
|
bool
|
cache file data in memory on first read for faster subsequent access |
False
|
Source code in tsfast/tsdata/pipeline.py
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 | |