GLIDE

Generated Label Inference & Debiasing Engine

🧭 What is GLIDE?

GLIDE is a Python library for rigorous evaluation of GenAI systems using hybrid human/proxy annotations.

GLIDE implements methods from the field of prediction-powered inference — the science of system evaluation that combines a small set of labeled data with a large set of proxy-labeled data to produce valid, debiased estimates. See the implemented papers below.

Prediction-powered inference schema

🤔 Why GLIDE?

🤖 GenAI applications are everywhere — and imperfect. Deployed systems make mistakes, and measuring how often matters.
⚖️ LLM-as-judge is biased. Proxy evaluators (models, heuristics) are cheap but systematically over- or under-estimate true performance.
🧑 Rigorous evaluation requires a human in the loop. Ground-truth labels from humans are expensive, so only a small subset is feasible.
📐 GLIDE bridges the gap. It combines a small set of human annotations with a large set of proxy predictions to produce statistically valid metrics — correcting proxy bias without requiring full human labeling.

⚡ Quick Start

Install the package with your favorite package manager :

uv add glide-py

or

pip install glide-py

And look at our practical quickstart.

📚 Documentation

Explore the full documentation — from practical tutorials and user guides to scientific deep dives into the methods behind GLIDE.

🤝 Contributing

Contributions are welcome! Please read the contributing guide for setup instructions, an architectural overview, and the checklist to follow before opening a pull request. Feel free to open an issue to report a bug or suggest a feature.

🔢 Versioning

This project follows Semantic Versioning (SemVer): MAJOR.MINOR.PATCH.

📦 Dependency Support

This project follows SPEC 0 for dependency support windows.

📄 License & Citation

This project is licensed under the Apache 2.0 License.

If you use GLIDE in your work, please cite us using the "Cite this repository" button on the GitHub repository page.

📰 Implemented Papers

Year	Title	Venue	Original Implementation	GLIDE class
2023	Prediction-powered inference	Science	Link	estimators.PPIMeanEstimator (with `power_tuning=False`)
2023	PPI++: Efficient Prediction-Powered Inference	Preprint	Link	estimators.PPIMeanEstimator
2024	Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation	NeurIPS'24	NA	estimators.StratifiedPPIMeanEstimator
2024	A framework for efficient model evaluation through stratification, sampling, and estimation	ECCV'24	Link	samplers.StratifiedSampler, estimators.StratifiedPPIMeanEstimator
2024	Active Statistical Inference	ICML'24	Link	samplers.ActiveSampler, estimators.ASIMeanEstimator
2025	Can Unconfident LLM Annotations Be Used for Confident Conclusions?	NAACL'25	Link	samplers.ActiveSampler, estimators.ASIMeanEstimator
2025	Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling	Preprint	Link	estimators.PTDMeanEstimator, estimators.StratifiedPTDMeanEstimator, estimators.IPWPTDMeanEstimator

📬 Stay Updated

Follow our LinkedIn newsletter for updates on GLIDE and GenAI evaluation.

🏛️ Affiliation

Developed at Emerton Data.