# Test: BenchmarkSpec basic initialization and defaults
_spec_sim = BenchmarkSpecSimulation(
name='_spec_default', dataset_id='_dummy_default',
u_cols=['u0'], y_cols=['y0'], metric_func=identibench.metrics.rmse,
download_func=_dummy_dataset_loader
)
test_eq(_spec_sim.init_window, None)
test_eq(_spec_sim.name, '_spec_default')Benchmark
Benchmark Specifications
BenchmarkSpecSimulation
BenchmarkSpecSimulation (name:str, dataset_id:str, u_cols:list[str], y_cols:list[str], metric_func:collections.abc.Ca llable[[numpy.ndarray,numpy.ndarray],float], x_cols:list[str]|None=None, sampling_time:float|None=None, download_func:col lections.abc.Callable[[pathlib.Path,bool],None]| None=None, test_model_func:collections.abc.Calla ble[[__main__.BenchmarkSpecBase,collections.abc. Callable],dict[str,typing.Any]]=<function _test_simulation>, custom_test_evaluation=None, init_window:int|None=None, data_root:[<class'pat hlib.Path'>,collections.abc.Callable[[],pathlib. Path]]=<function get_default_data_root>)
*Specification for a simulation benchmark task.
Inherits common parameters from BaseBenchmarkSpec. Use this when the goal is to simulate the system’s output given the input u.*
| Type | Default | Details | |
|---|---|---|---|
| name | str | Unique name identifying this benchmark task. | |
| dataset_id | str | Identifier for the raw dataset source. | |
| u_cols | list | list of column names for input signals (u). | |
| y_cols | list | list of column names for output signals (y). | |
| metric_func | Callable | Primary metric: func(y_true, y_pred). |
|
| x_cols | list[str] | None | None | Optional state inputs (x). |
| sampling_time | float | None | None | Optional sampling time (seconds). |
| download_func | collections.abc.Callable[[pathlib.Path, bool], None] | None | None | Dataset preparation func. |
| test_model_func | Callable | _test_simulation | |
| custom_test_evaluation | NoneType | None | |
| init_window | int | None | None | Steps for warm-up, potentially ignored in evaluation. |
| data_root | [<class ‘pathlib.Path’>, collections.abc.Callable[[], pathlib.Path]] | get_default_data_root | root dir for dataset, may be a callable or path |
BenchmarkSpecPrediction
BenchmarkSpecPrediction (name:str, dataset_id:str, u_cols:list[str], y_cols:list[str], metric_func:collections.abc.Ca llable[[numpy.ndarray,numpy.ndarray],float], pred_horizon:int, pred_step:int, x_cols:list[str]|None=None, sampling_time:float|None=None, download_func:col lections.abc.Callable[[pathlib.Path,bool],None]| None=None, test_model_func:collections.abc.Calla ble[[__main__.BenchmarkSpecBase,collections.abc. Callable],dict[str,typing.Any]]=<function _test_prediction>, custom_test_evaluation=None, init_window:int|None=None, data_root:[<class'pat hlib.Path'>,collections.abc.Callable[[],pathlib. Path]]=<function get_default_data_root>)
*Specification for a k-step ahead prediction benchmark task.
Inherits common parameters from BaseBenchmarkSpec and adds prediction-specific ones. Use this when the goal is to predict y some steps ahead based on past u and y.*
| Type | Default | Details | |
|---|---|---|---|
| name | str | Unique name identifying this benchmark task. | |
| dataset_id | str | Identifier for the raw dataset source. | |
| u_cols | list | list of column names for input signals (u). | |
| y_cols | list | list of column names for output signals (y). | |
| metric_func | Callable | Primary metric: func(y_true, y_pred). |
|
| pred_horizon | int | The ‘k’ in k-step ahead prediction (mandatory for this type). | |
| pred_step | int | Step size for k-step ahead prediction (e.g., predict y[t+k] using data up to t). | |
| x_cols | list[str] | None | None | Optional state inputs (x). |
| sampling_time | float | None | None | Optional sampling time (seconds). |
| download_func | collections.abc.Callable[[pathlib.Path, bool], None] | None | None | Dataset preparation func. |
| test_model_func | Callable | _test_prediction | |
| custom_test_evaluation | NoneType | None | |
| init_window | int | None | None | Steps for warm-up, potentially ignored in evaluation. |
| data_root | [<class ‘pathlib.Path’>, collections.abc.Callable[[], pathlib.Path]] | get_default_data_root | root dir for dataset, may be a callable or path |
# Test: BenchmarkSpec initialization with prediction-related parameters
_spec_pred = BenchmarkSpecPrediction(
name='_spec_pred_params', dataset_id='_dummy_pred_params',
u_cols=['u0'], y_cols=['y0'], metric_func=identibench.metrics.rmse,
download_func=_dummy_dataset_loader,
init_window=20, pred_horizon=5, pred_step=2
)
test_eq(_spec_pred.init_window, 20)
test_eq(_spec_pred.pred_horizon, 5)
test_eq(_spec_pred.pred_step, 2)# Test: BenchmarkSpec ensure_dataset_exists - first call (creation)
_spec_ensure = BenchmarkSpecSimulation(
name='_spec_ensure', dataset_id='_dummy_ensure',
u_cols=['u0'], y_cols=['y0'], metric_func=identibench.metrics.rmse,
download_func=_dummy_dataset_loader
)
_spec_ensure.ensure_dataset_exists()
_dataset_path_ensure = _spec_ensure.dataset_path
test_eq(_dataset_path_ensure.is_dir(), True)
test_eq((_dataset_path_ensure / 'train' / 'train_0.hdf5').is_file(), True)# Test: BenchmarkSpec ensure_dataset_exists - second call (skip)
_mtime_before_skip = (_dataset_path_ensure / 'train' / 'train_0.hdf5').stat().st_mtime
time.sleep(0.1)
_spec_ensure.ensure_dataset_exists()
_mtime_after_skip = (_dataset_path_ensure / 'train' / 'train_0.hdf5').stat().st_mtime
test_eq(_mtime_before_skip, _mtime_after_skip)# Test: BenchmarkSpec ensure_dataset_exists - third call (force_download=True)
_mtime_before_force = (_dataset_path_ensure / 'train' / 'train_0.hdf5').stat().st_mtime
time.sleep(0.1)
_spec_ensure.ensure_dataset_exists(force_download=True)
_mtime_after_force = (_dataset_path_ensure / 'train' / 'train_0.hdf5').stat().st_mtime
test_ne(_mtime_before_force, _mtime_after_force)Preparing dataset for '_spec_ensure' at /Users/daniel/.identibench_data/_dummy_ensure...
Dataset '_spec_ensure' prepared successfully.
Training Context
TrainingContext
TrainingContext (spec:__main__.BenchmarkSpecBase, hyperparameters:dict[str,typing.Any], seed:int|None=None)
*Context object passed to the user’s training function (build_predictor).
Holds the benchmark specification, hyperparameters, and seed. Provides methods to access the raw, full-length training and validation data sequences. Windowing/batching for training must be handled within the user’s build_predictor function.*
| Type | Default | Details | |
|---|---|---|---|
| spec | BenchmarkSpecBase | The benchmark specification. | |
| hyperparameters | dict | User-provided dictionary containing model and training hyperparameters. | |
| seed | int | None | None | Optional random seed for reproducibility. |
#todo: testBenchmark Runtime
run_benchmark
run_benchmark (spec, build_model, hyperparameters={}, seed=None)
# Example usage of run_benchmark
hyperparams = {'learning_rate': 0.01, 'epochs': 5} # Example hyperparameters
benchmark_results = run_benchmark(
spec=_spec_sim,
build_model=_dummy_build_model,
hyperparameters=hyperparams
)Building model with spec: _spec_default, seed: 138830228
{'benchmark_name': '_spec_default',
'dataset_id': '_dummy_default',
'hyperparameters': {'learning_rate': 0.01, 'epochs': 5},
'seed': 138830228,
'training_time_seconds': 4.279200220480561e-05,
'test_time_seconds': 0.0013009580434300005,
'benchmark_type': 'BenchmarkSpecSimulation',
'metric_name': 'rmse',
'metric_score': 0.5644842382745956,
'custom_scores': {}}
# Example usage of run_benchmark
benchmark_results = run_benchmark(
spec=_spec_pred,
build_model=_dummy_build_model,
hyperparameters=hyperparams
)Building model with spec: _spec_pred_params, seed: 3900254360
{'benchmark_name': '_spec_pred_params',
'dataset_id': '_dummy_pred_params',
'hyperparameters': {'learning_rate': 0.01, 'epochs': 5},
'seed': 3900254360,
'training_time_seconds': 6.71250163577497e-05,
'test_time_seconds': 0.0010067080147564411,
'benchmark_type': 'BenchmarkSpecPrediction',
'metric_name': 'rmse',
'metric_score': 0.5594019958882623,
'custom_scores': {}}
def custom_evaluation(results,spec):
def get_max_abs_error(y_pred,y_test):
return np.max(np.abs(y_test - y_pred))
def get_max_error(y_pred,y_test):
return np.max(y_test - y_pred)
avg_max_abs_error = aggregate_metric_score(results, get_max_abs_error, score_name='avg_max_abs_error',sequence_aggregation_func=np.mean,window_aggregation_func=np.mean)
median_max_error = aggregate_metric_score(results, get_max_error, score_name='median_max_abs_error',sequence_aggregation_func=np.median,window_aggregation_func=np.median)
return {**avg_max_abs_error, **median_max_error}spec_with_custom_test = BenchmarkSpecSimulation(
name="CustomTestExampleBench",
dataset_id="dummy_core_data_v1", # Same dataset ID as before
download_func=_dummy_dataset_loader,
u_cols=['u0', 'u1'],
y_cols=['y0'],
custom_test_evaluation=custom_evaluation,
metric_func=identibench.metrics.rmse
)# Run benchmark using the spec with the custom test function
hyperparams = {'model_type': 'dummy_v2'}
benchmark_results = run_benchmark(
spec=spec_with_custom_test,
build_model=_dummy_build_model,
hyperparameters=hyperparams
)Building model with spec: CustomTestExampleBench, seed: 1172241199
{'benchmark_name': 'CustomTestExampleBench',
'dataset_id': 'dummy_core_data_v1',
'hyperparameters': {'model_type': 'dummy_v2'},
'seed': 1172241199,
'training_time_seconds': 2.1415995433926582e-05,
'test_time_seconds': 0.0015841670101508498,
'benchmark_type': 'BenchmarkSpecSimulation',
'metric_name': 'rmse',
'metric_score': 0.5739597924041242,
'custom_scores': {'avg_max_abs_error': 0.9934645593166351,
'median_max_abs_error': 0.9934645593166351}}
benchmark_results_to_dataframe
benchmark_results_to_dataframe (results_list:list[dict[str,typing.Any]])
Transforms a list of benchmark result dictionaries into a pandas DataFrame.
| Type | Details | |
|---|---|---|
| results_list | list | List of benchmark result dictionaries from run_benchmark. |
| Returns | DataFrame |
run_benchmarks
run_benchmarks (specs:list[__main__.BenchmarkSpecBase]|dict[str,__main__. BenchmarkSpecBase], build_model:collections.abc.Callable[ [__main__.TrainingContext],collections.abc.Callable], hyp erparameters:dict[str,typing.Any]|list[dict[str,typing.An y]]|None=None, n_times:int=1, continue_on_error:bool=True, return_dataframe:bool=True)
*Runs multiple benchmarks sequentially, with repetitions and flexible hyperparameters.
Returns either a pandas DataFrame summarizing the results (default) or a list of raw result dictionaries.*
| Type | Default | Details | |
|---|---|---|---|
| specs | list[main.BenchmarkSpecBase] | dict[str, main.BenchmarkSpecBase] | Collection of specs to run. | |
| build_model | Callable | User function to build the model/predictor. | |
| hyperparameters | dict[str, typing.Any] | list[dict[str, typing.Any]] | None | None | Single dict, list of dicts (matching specs), or None. |
| n_times | int | 1 | Number of times to repeat each benchmark specification. |
| continue_on_error | bool | True | If True, continue running benchmarks even if one fails. |
| return_dataframe | bool | True | If True, return results as a pandas DataFrame, otherwise return a list of dicts. |
| Returns | pandas.core.frame.DataFrame | list[dict[str, typing.Any]] |
benchmark_results = run_benchmarks(
specs=[_spec_sim,_spec_pred,spec_with_custom_test],
build_model=_dummy_build_model,
return_dataframe=False
)
benchmark_results_to_dataframe(benchmark_results)--- Starting benchmark run for 3 specifications, repeating each 1 times ---
-- Repetition 1/1 --
[1/3] Running: _spec_default (Rep 1)
Building model with spec: _spec_default, seed: 2979218856
-> Success: _spec_default (Rep 1) completed.
[2/3] Running: _spec_pred_params (Rep 1)
Building model with spec: _spec_pred_params, seed: 2767908549
-> Success: _spec_pred_params (Rep 1) completed.
[3/3] Running: CustomTestExampleBench (Rep 1)
Building model with spec: CustomTestExampleBench, seed: 3139743514
-> Success: CustomTestExampleBench (Rep 1) completed.
--- Benchmark run finished. 3/3 individual runs completed successfully. ---
| benchmark_name | dataset_id | hyperparameters | seed | training_time_seconds | test_time_seconds | benchmark_type | metric_name | metric_score | cs_avg_max_abs_error | cs_median_max_abs_error | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | _spec_default | _dummy_default | {} | 2979218856 | 0.000006 | 0.001325 | BenchmarkSpecSimulation | rmse | 0.564484 | NaN | NaN |
| 1 | _spec_pred_params | _dummy_pred_params | {} | 2767908549 | 0.000006 | 0.000844 | BenchmarkSpecPrediction | rmse | 0.559402 | NaN | NaN |
| 2 | CustomTestExampleBench | dummy_core_data_v1 | {} | 3139743514 | 0.000005 | 0.000521 | BenchmarkSpecSimulation | rmse | 0.573960 | 0.993465 | 0.993465 |
results_multiple_runs = run_benchmarks(
specs=[_spec_sim,_spec_pred,spec_with_custom_test],
build_model=_dummy_build_model,
n_times=3
)
results_multiple_runs--- Starting benchmark run for 3 specifications, repeating each 3 times ---
-- Repetition 1/3 --
[1/9] Running: _spec_default (Rep 1)
Building model with spec: _spec_default, seed: 30935737
-> Success: _spec_default (Rep 1) completed.
[2/9] Running: _spec_pred_params (Rep 1)
Building model with spec: _spec_pred_params, seed: 2986847840
-> Success: _spec_pred_params (Rep 1) completed.
[3/9] Running: CustomTestExampleBench (Rep 1)
Building model with spec: CustomTestExampleBench, seed: 1147267216
-> Success: CustomTestExampleBench (Rep 1) completed.
-- Repetition 2/3 --
[4/9] Running: _spec_default (Rep 2)
Building model with spec: _spec_default, seed: 3191904871
-> Success: _spec_default (Rep 2) completed.
[5/9] Running: _spec_pred_params (Rep 2)
Building model with spec: _spec_pred_params, seed: 1536587039
-> Success: _spec_pred_params (Rep 2) completed.
[6/9] Running: CustomTestExampleBench (Rep 2)
Building model with spec: CustomTestExampleBench, seed: 3900899545
-> Success: CustomTestExampleBench (Rep 2) completed.
-- Repetition 3/3 --
[7/9] Running: _spec_default (Rep 3)
Building model with spec: _spec_default, seed: 3797015292
-> Success: _spec_default (Rep 3) completed.
[8/9] Running: _spec_pred_params (Rep 3)
Building model with spec: _spec_pred_params, seed: 3789263585
-> Success: _spec_pred_params (Rep 3) completed.
[9/9] Running: CustomTestExampleBench (Rep 3)
Building model with spec: CustomTestExampleBench, seed: 851966748
-> Success: CustomTestExampleBench (Rep 3) completed.
--- Benchmark run finished. 9/9 individual runs completed successfully. ---
| benchmark_name | dataset_id | hyperparameters | seed | training_time_seconds | test_time_seconds | benchmark_type | metric_name | metric_score | cs_avg_max_abs_error | cs_median_max_abs_error | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | _spec_default | _dummy_default | {} | 30935737 | 0.000009 | 0.001040 | BenchmarkSpecSimulation | rmse | 0.564484 | NaN | NaN |
| 1 | _spec_pred_params | _dummy_pred_params | {} | 2986847840 | 0.000004 | 0.000537 | BenchmarkSpecPrediction | rmse | 0.559402 | NaN | NaN |
| 2 | CustomTestExampleBench | dummy_core_data_v1 | {} | 1147267216 | 0.000004 | 0.000385 | BenchmarkSpecSimulation | rmse | 0.573960 | 0.993465 | 0.993465 |
| 3 | _spec_default | _dummy_default | {} | 3191904871 | 0.000003 | 0.000280 | BenchmarkSpecSimulation | rmse | 0.564484 | NaN | NaN |
| 4 | _spec_pred_params | _dummy_pred_params | {} | 1536587039 | 0.000003 | 0.000285 | BenchmarkSpecPrediction | rmse | 0.559402 | NaN | NaN |
| 5 | CustomTestExampleBench | dummy_core_data_v1 | {} | 3900899545 | 0.000003 | 0.000330 | BenchmarkSpecSimulation | rmse | 0.573960 | 0.993465 | 0.993465 |
| 6 | _spec_default | _dummy_default | {} | 3797015292 | 0.000003 | 0.000264 | BenchmarkSpecSimulation | rmse | 0.564484 | NaN | NaN |
| 7 | _spec_pred_params | _dummy_pred_params | {} | 3789263585 | 0.000003 | 0.000278 | BenchmarkSpecPrediction | rmse | 0.559402 | NaN | NaN |
| 8 | CustomTestExampleBench | dummy_core_data_v1 | {} | 851966748 | 0.000003 | 0.000531 | BenchmarkSpecSimulation | rmse | 0.573960 | 0.993465 | 0.993465 |
aggregate_benchmark_results
aggregate_benchmark_results (results_df:pandas.core.frame.DataFrame, group_by_cols:str|list[str]='benchmark_name' , agg_funcs:str|list[str]='mean')
Aggregates numeric results from a benchmark DataFrame, grouped by specified columns.
| Type | Default | Details | |
|---|---|---|---|
| results_df | DataFrame | DataFrame returned by run_benchmarks (with return_dataframe=True). | |
| group_by_cols | str | list[str] | benchmark_name | Column(s) to group by before aggregation. |
| agg_funcs | str | list[str] | mean | Aggregation function(s) (‘mean’, ‘median’, ‘std’, etc.) or list thereof. |
| Returns | DataFrame |
aggregate_benchmark_results(results_multiple_runs,agg_funcs=['mean','std'])| training_time_seconds | test_time_seconds | metric_score | cs_avg_max_abs_error | cs_median_max_abs_error | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | mean | std | |
| benchmark_name | ||||||||||
| CustomTestExampleBench | 0.000003 | 4.453506e-07 | 0.000415 | 0.000104 | 0.573960 | 0.0 | 0.993465 | 0.0 | 0.993465 | 0.0 |
| _spec_default | 0.000005 | 3.395723e-06 | 0.000528 | 0.000443 | 0.564484 | 0.0 | NaN | NaN | NaN | NaN |
| _spec_pred_params | 0.000003 | 3.254011e-07 | 0.000367 | 0.000147 | 0.559402 | 0.0 | NaN | NaN | NaN | NaN |