Skip to content

Contributing to GLIDE

Thank you for considering a contribution to GLIDE! This guide covers everything you need to set up your environment, understand the codebase, and submit a pull request.

Depending on what you want to do, jump to the relevant section:

Before writing any code, please open an issue to discuss the scope of your change. This is highly recommended and especially important for new estimators and samplers: sharing the reference paper upfront gives maintainers a chance to read it and frame the ticket to guide your implementation. When you are ready to submit, fork the repository, create a branch off main, and open a pull request against main. The PR template lists all conditions that must be satisfied before requesting a review.


Setup

GLIDE uses uv to manage the virtual environment and all dependency groups.

1. Install uv (skip if already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Create the virtual environment and install all dependencies:

make venv

This installs the main package, test dependencies, and documentation dependencies in one step.

3. Verify the setup by running the test suite:

make tests

4. Install the git pre-commit hooks:

GLIDE uses prek (a lightweight pre-commit hook runner) configured in prek.toml. The hooks run automatically on every git commit and enforce formatting, type checking, and notebook output stripping.

Install the hooks once after cloning:

uv run prek install

5. Testing notebooks locally (optional):

The project includes example notebooks in docs/. To test all notebooks locally:

make test-notebooks

Note: Notebook testing also runs in CI for all pull requests, so local testing is optional. The CI workflow ensures notebooks are executed and validated before merge.


Architectural overview

The package is organised around four concerns: estimators, samplers, core building blocks, and I/O.

glide/
├── estimators/             # Public API — mean estimators
│   ├── ppi.py
│   ├── ...
│
├── samplers/               # Public API — sampling strategies
│   ├── active.py
│   ├── ...
│
├── simulators/               # Public API — synthetic data generators for tests
│   ├── binary.py
│   ├── ...
│
├── confidence_intervals/   # Confidence interval
│   ├── base.py
│   ├── ...
│
├── mean_inference_results/ # Result types returned by estimators
│   ├── base.py
│   ├── ...
│
├── utils.py                # General-purpose helpers
│
└── io/                     # Serialisation helpers (e.g., to_json)
    └── export.py

How the pieces fit together. Estimators accept raw NumPy arrays and return a MeanInferenceResult subclass: prediction-powered estimators return a PredictionPoweredMeanInferenceResult, classical ones a ClassicalMeanInferenceResult. Every result embeds a ConfidenceInterval (e.g. CLTConfidenceInterval). Samplers produce the labeled arrays that estimators consume. The io module serialises result objects.


Possible contributions

Contributions are listed below.

1. Bug fixes

Reproduce the bug in a failing test first — this confirms the bug exists and guarantees it stays fixed. Then make the minimal code change that makes the test pass.

2. New features

New estimators and samplers should be backed by a scientific publication. Please first open an issue sharing the reference paper to give maintainers a chance to read and frame it to guide your implementation.

Adding a new estimator — step by step

  1. Identify the inputs, outputs, and any tunable hyperparameters.
  2. Implement the estimator class:
  3. If your estimator belongs to an existing family, add it to the corresponding file (e.g. PPI-based methods go in glide/estimators/ppi.py). Otherwise, create glide/estimators/<name>.py.
  4. estimate(array1, array2, ...) runs the method and returns an inference result object. Reuse one from glide/mean_inference_results (e.g. a MeanInferenceResult subclass) or add a new one there.
  5. If your estimator has hyperparameters, these should be optional parameters of estimate() with default values.
  6. Export the new class from glide/estimators/__init__.py.
  7. Write unit tests in tests/unit/estimators/test_<name>.py. Cover at minimum:
  8. Correct output type and shape.
  9. Known analytical results (e.g., the estimator reduces to the classical mean in special cases).
  10. Doctests in the class docstring.
  11. Write functional tests in tests/functional/estimators/test_<name>.py. If applicable, test expected behaviors and properties of your estimator in specific situations, see existing files in tests/functional/estimators for examples
  12. Write a numpy-style docstring that includes the reference paper, parameter descriptions, and a small Examples section with a minimalistic runnable doctest. See existing estimators for inspiration.
  13. Add an example script in docs/examples/plot_<name>.py demonstrating the estimator on some synthetic data.
  14. Update CHANGELOG.md under the [Next release] section.

Adding a new sampler — step by step

  1. Identify the inputs the sampler requires (e.g. proxy labels, uncertainty scores, stratum labels), the budget parameter, and what values it returns.
  2. Implement the sampler class:
  3. Create glide/samplers/<name>.py.
  4. sample(...) runs the sampling procedure and returns the computed values (at least a vector xi of sampling indicators and possibly a vector pi of sampling probabilities).
  5. If your sampler has hyperparameters, these should be optional parameters of sample() with default values.
  6. Export the new class from glide/samplers/__init__.py.
  7. Write unit tests in tests/unit/samplers/test_<name>.py. Cover at minimum:
  8. Correct output type and shape.
  9. Known analytical results (e.g., uniform inputs should yield equal probabilities).
  10. Edge cases for input parameters (e.g. budget equals dataset size).
  11. Doctests in the class docstring.
  12. Write functional tests in tests/functional/samplers/test_<name>.py. If applicable, test expected behaviors and properties of your sampler. See existing files in tests/functional/samplers for examples.
  13. Write a numpy-style docstring that includes the reference paper, parameter descriptions, and a small Examples section with a minimalistic runnable doctest. See existing samplers for inspiration.
  14. Update CHANGELOG.md under the [Next release] section.

3. Documentation

Corrections, clarifications, and new examples live in docs/. Build the docs locally with:

make doc

4. Repository hygiene

Improvements to CI, Makefile targets, GitHub Actions workflows, or dependency configuration. These changes should not affect the public API or test behaviour.

5. Refactoring

Restructuring code without changing observable behaviour. Refactoring PRs must be accompanied by the full passing test suite and must not be bundled with functional changes.


A note on LLM-assisted contributions

LLM usage is welcome and must be disclosed in the PR description. Reviewers should be aware that LLM-generated code tends to increase review burden: it is often verbose, introduces unnecessary abstractions, and may silently diverge from the project's conventions. Contributors are expected to thoroughly read, understand, and validate every line before submitting — not just run the tests. Undisclosed or unvalidated LLM output is grounds for requesting a rewrite.