2.7 KiB
2.7 KiB
impl
ROLL (Ranking via Optimized Label Learning) — PyTorch research project implementing custom loss functions for binary classification using kernel density estimation (KDE) to optimize TPR at target FPR thresholds. Targets imbalanced classification problems.
Architecture
src/roll.py— Loss function implementations (Normal/Beta/Kernelized ROLL)src/experiment.py— Training loop, evaluation infra,run_configurations()entry pointsrc/datasets.py— 10+ dataset loaders (KEEL, UCI, Kaggle, synthetic)src/networks.py—KeelNetMLP architecturesrc/summary.py— Plotly HTML visualization (ROC, score distributions, ECDF)src/utils.py— Logging, output dir creation (init_experiment)experiments/keel/,experiments/other/,experiments/large/— experiment scripts
Conventions
- All hyperparameters in Python dataclasses (
ExperimentConfiguration), no CLI parsing - Experiments follow
experiment-*.pynaming; all callrun_configurations() - KEEL experiments share
experiments/keel/_base.pyrunner; individual files just call it - All datasets expose:
__getitem__,__len__,.x,.yattributes - Episode-based eval: N independent train runs per config, results aggregated
- GPU enabled via
cudaSupport = truein flake.nix;get_device()in utils.py auto-selects GPU/CPU - Beta distribution variant (
roll_beta_loss_from_fpr) is kept for thesis writing but is not actively developed or used
Gotchas
- Dataset paths injected as env vars by Nix shell hook (
$keel_wisconsin_dir, etc.) — must usenix develop _calc_moments()in roll.py is unused and has a variable typo (arrayvsarr)- CIFAR-10 binary: class 1 vs rest (not class 0)
- Multiprocessing uses
spawnmethod viatorch.multiprocessing
Key Files
- src/roll.py — Core loss:
KernelizedROLLosscustom autograd Function with KDE backward pass - src/experiment.py —
ExperimentConfigurationdataclass,CriterioratorABC,run_configurations() - experiments/keel/_base.py — shared KEEL runner
run_keel_experiment() - flake.nix — Nix env with dataset downloads, hash-pinned, exports path env vars
- AGENTS.md — Project guidelines (naming conventions, env, dataset list)
Subnodes
- impl/loss-functions — Loss variants, KDE internals, gradient computation
- impl/experiments — Experiment structure, training flow, metrics, output layout
- impl/datasets — Dataset catalog, KEEL list, eval metrics