Skip to content

Code quality Coverage Docs SPEC 0 Python versions PyPI Release Commits License LinkedIn arXiv

GLIDE Logo

GLIDE

Generated Label Inference & Debiasing Engine

🧭 What is GLIDE?

GLIDE is a Python library for rigorous evaluation of GenAI systems using hybrid human/proxy annotations.

GLIDE implements methods from the field of prediction-powered inference — the science of system evaluation that combines a small set of labeled data with a large set of proxy-labeled data to produce valid, debiased estimates. See the implemented papers below.

Prediction-powered inference schema

🤔 Why GLIDE?

  • 🤖 GenAI applications are everywhere — and imperfect. Deployed systems make mistakes, and measuring how often matters.
  • ⚖️ LLM-as-judge is biased. Proxy evaluators (models, heuristics) are cheap but systematically over- or under-estimate true performance.
  • 🧑 Rigorous evaluation requires a human in the loop. Ground-truth labels from humans are expensive, so only a small subset is feasible.
  • 📐 GLIDE bridges the gap. It combines a small set of human annotations with a large set of proxy predictions to produce statistically valid metrics — correcting proxy bias without requiring full human labeling.

⚡ Quick Start

Install the package with your favorite package manager :

uv add glide-py

or

pip install glide-py

And look at our practical quickstart.

📚 Documentation

Explore the full documentation — from practical tutorials and user guides to scientific deep dives into the methods behind GLIDE.

🤝 Contributing

Contributions are welcome! Please read the contributing guide for setup instructions, an architectural overview, and the checklist to follow before opening a pull request. Feel free to open an issue to report a bug or suggest a feature.

🔢 Versioning

This project follows Semantic Versioning (SemVer): MAJOR.MINOR.PATCH.

📦 Dependency Support

This project follows SPEC 0 for dependency support windows.

📄 License & Citation

This project is licensed under the Apache 2.0 License.

If you use GLIDE in your work, please cite us using the "Cite this repository" button on the GitHub repository page.

📰 Implemented Papers

Year Title Venue Original Implementation GLIDE class
2023 Prediction-powered inference Science Link estimators.PPIMeanEstimator (with power_tuning=False)
2023 PPI++: Efficient Prediction-Powered Inference Preprint Link estimators.PPIMeanEstimator
2024 Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation NeurIPS'24 NA estimators.StratifiedPPIMeanEstimator
2024 A framework for efficient model evaluation through stratification, sampling, and estimation ECCV'24 Link samplers.StratifiedSampler, estimators.StratifiedPPIMeanEstimator
2024 Active Statistical Inference ICML'24 Link samplers.ActiveSampler, estimators.ASIMeanEstimator
2025 Can Unconfident LLM Annotations Be Used for Confident Conclusions? NAACL'25 Link samplers.ActiveSampler, estimators.ASIMeanEstimator
2025 Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling Preprint Link estimators.PTDMeanEstimator, estimators.StratifiedPTDMeanEstimator, estimators.IPWPTDMeanEstimator

📬 Stay Updated

Follow our LinkedIn newsletter for updates on GLIDE and GenAI evaluation.

🏛️ Affiliation

Developed at Emerton Data.

Emerton Data