Skip to content

Benchmark Results

Comparison of counterfactual methods across datasets.

Summary

Method Validity Proximity Sparsity Plausibility Speed
PPCEF 0.95 0.82 0.71 0.89 Medium
DICE 0.98 0.75 0.65 0.62 Fast
DiCoFlex 0.94 0.80 0.73 0.85 Medium
GLOBE-CE 0.91 0.88 0.78 0.71 Fast
ReViCE 0.93 0.84 0.74 0.82 Medium

Per-Dataset Results

Adult Dataset

Method Validity Proximity (L2) Sparsity
PPCEF 0.96 0.45 3.2
DICE 0.99 0.52 4.1
GLOBE-CE 0.92 0.38 2.8

COMPAS Dataset

Method Validity Proximity (L2) Sparsity
PPCEF 0.94 0.41 2.9
DICE 0.97 0.48 3.5
GLOBE-CE 0.89 0.35 2.4

Runtime Comparison

Method CPU (s/instance) GPU (s/instance)
PPCEF 0.15 0.02
DICE 0.08 N/A
GLOBE-CE 0.05 0.01

Reproducing Results

See Running Benchmarks for instructions on reproducing these results.