Data Splitting¶
File discovery and train/valid/test splitting utilities.
get_hdf_files ¶
Recursively find .hdf5/.h5 files under path.
Source code in tsfast/tsdata/split.py
discover_split_files ¶
discover_split_files(path: Path | str, train_name: str = 'train', valid_name: str = 'valid', test_name: str = 'test') -> dict[str, list[Path]]
Auto-discover train/valid/test HDF5 files by parent directory name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path | str
|
root directory containing train/valid/test subdirectories |
required |
train_name
|
str
|
name of training subdirectory |
'train'
|
valid_name
|
str
|
name of validation subdirectory |
'valid'
|
test_name
|
str
|
name of test subdirectory |
'test'
|
Source code in tsfast/tsdata/split.py
split_by_parent ¶
split_by_parent(files: list, train_name: str = 'train', valid_name: str = 'valid') -> tuple[list[int], list[int]]
Return (train_indices, valid_indices) based on parent directory names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
files
|
list
|
list of file paths |
required |
train_name
|
str
|
parent directory name for training files |
'train'
|
valid_name
|
str
|
parent directory name for validation files |
'valid'
|
Source code in tsfast/tsdata/split.py
split_by_percentage ¶
Sequential percentage split.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
files
|
list
|
list of items to split |
required |
pct
|
float
|
fraction of items assigned to the first split |
0.8
|
Source code in tsfast/tsdata/split.py
is_dataset_directory ¶
Check if path contains train/valid/test subdirectories with HDF5 files.