GLIDE
Generated Label Inference & Debiasing Engine
🧭 What is GLIDE?
GLIDE is a Python library for rigorous evaluation of GenAI systems using hybrid human/proxy annotations.
GLIDE implements methods from the field of prediction-powered inference — the science of system evaluation that combines a small set of labeled data with a large set of proxy-labeled data to produce valid, debiased estimates. See the implemented papers below.
🤔 Why GLIDE?
- 🤖 GenAI applications are everywhere — and imperfect. Deployed systems make mistakes, and measuring how often matters.
- ⚖️ LLM-as-judge is biased. Proxy evaluators (models, heuristics) are cheap but systematically over- or under-estimate true performance.
- 🧑 Rigorous evaluation requires a human in the loop. Ground-truth labels from humans are expensive, so only a small subset is feasible.
- 📐 GLIDE bridges the gap. It combines a small set of human annotations with a large set of proxy predictions to produce statistically valid metrics — correcting proxy bias without requiring full human labeling.
⚡ Quick Start
Install the package with your favorite package manager :
uv add glide-py
or
pip install glide-py
And look at our practical quickstart.
📚 Documentation
Explore the full documentation — from practical tutorials and user guides to scientific deep dives into the methods behind GLIDE.
🤝 Contributing
Contributions are welcome! Please read the contributing guide for setup instructions, an architectural overview, and the checklist to follow before opening a pull request. Feel free to open an issue to report a bug or suggest a feature.
🔢 Versioning
This project follows Semantic Versioning (SemVer): MAJOR.MINOR.PATCH.
📦 Dependency Support
This project follows SPEC 0 for dependency support windows.
📄 License & Citation
This project is licensed under the Apache 2.0 License.
If you use GLIDE in your work, please cite us using the "Cite this repository" button on the GitHub repository page.
📰 Implemented Papers
| Year | Title | Venue | Original Implementation | GLIDE class |
|---|---|---|---|---|
| 2023 | Prediction-powered inference | Science | Link | estimators.PPIMeanEstimator (with power_tuning=False) |
| 2023 | PPI++: Efficient Prediction-Powered Inference | Preprint | Link | estimators.PPIMeanEstimator |
| 2024 | Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation | NeurIPS'24 | NA | estimators.StratifiedPPIMeanEstimator |
| 2024 | A framework for efficient model evaluation through stratification, sampling, and estimation | ECCV'24 | Link | samplers.StratifiedSampler, estimators.StratifiedPPIMeanEstimator |
| 2024 | Active Statistical Inference | ICML'24 | Link | samplers.ActiveSampler, estimators.ASIMeanEstimator |
| 2025 | Can Unconfident LLM Annotations Be Used for Confident Conclusions? | NAACL'25 | Link | samplers.ActiveSampler, estimators.ASIMeanEstimator |
| 2025 | Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling | Preprint | Link | estimators.PTDMeanEstimator, estimators.StratifiedPTDMeanEstimator, estimators.IPWPTDMeanEstimator |
📬 Stay Updated
Follow our LinkedIn newsletter for updates on GLIDE and GenAI evaluation.
🏛️ Affiliation
Developed at Emerton Data.
