Running Benchmarks¶
How to reproduce and extend benchmark experiments.
Quick Start¶
# Run single method benchmark
python -m counterfactuals.pipelines.run_ppcef_pipeline \
dataset.config_path=config/datasets/adult.yaml
# Run with multiple seeds
for seed in 0 1 2 3 4; do
python -m counterfactuals.pipelines.run_ppcef_pipeline \
random_state=$seed
done
Multi-Dataset Benchmark¶
# Run across all classification datasets
for dataset in adult compas german_credit heloc; do
python -m counterfactuals.pipelines.run_ppcef_pipeline \
dataset.config_path=config/datasets/${dataset}.yaml
done
Comparing Methods¶
# Run multiple methods on same dataset
python -m counterfactuals.pipelines.run_ppcef_pipeline
python -m counterfactuals.pipelines.run_dice_pipeline
python -m counterfactuals.pipelines.run_globe_ce_pipeline
Viewing Results¶
import mlflow
# List all runs
runs = mlflow.search_runs()
print(runs[["run_id", "params.method", "metrics.validity"]])
# Compare methods
runs.groupby("params.method")["metrics.validity"].mean()
Custom Benchmark Configuration¶
Create a custom config:
# pipelines/conf/benchmark.yaml
defaults:
- _self_
- override hydra/sweeper: basic
hydra:
sweeper:
params:
dataset.config_path:
- config/datasets/adult.yaml
- config/datasets/compas.yaml
random_state: range(0, 5)
Run sweep:
Adding New Methods to Benchmarks¶
- Create pipeline in
counterfactuals/pipelines/ - Add configuration in
pipelines/conf/ - Run benchmark with same datasets