Estimators
glide.estimators.classical.ClassicalMeanEstimator
Estimator for population mean using the classical sample mean.
Uses only a single array y to compute the sample mean and its
standard error via the Central Limit Theorem. This serves as a baseline
that does not require proxy predictions.
Examples:
>>> import numpy as np
>>> from glide.estimators import ClassicalMeanEstimator
>>> y = np.array([5.0, 6.0, 4.0, 7.0])
>>> estimator = ClassicalMeanEstimator()
>>> result = estimator.estimate(y)
>>> print(result)
Metric: Metric
Point Estimate: 5.500
Confidence Interval (95%): [4.235, 6.765]
Estimator : ClassicalMeanEstimator
n: 4
Source code in glide/estimators/classical.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
estimate
estimate(y, metric_name='Metric', confidence_level=0.95)
Estimate the population mean using the classical sample mean.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
NDArray
|
Array of observations, shape |
required |
metric_name
|
str
|
Human-readable label for the metric. Defaults to |
'Metric'
|
confidence_level
|
float
|
Target coverage for the confidence interval, e.g. |
0.95
|
Returns:
| Type | Description |
|---|---|
ClassicalMeanInferenceResult
|
Contains the CLT-based confidence interval, the metric name,
the estimator name ( |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in glide/estimators/classical.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
glide.estimators.stratified_classical.StratifiedClassicalMeanEstimator
Stratified classical estimator for population mean.
Extends mean estimation as in ClassicalMeanEstimator to datasets partitioned
into strata (e.g. by language, domain, or data source). A per-stratum sample
mean and standard error are computed independently, then combined with
population-proportional weights.
Examples:
>>> import numpy as np
>>> from glide.estimators import StratifiedClassicalMeanEstimator
>>> y = np.array([1.0, 3.0, 5.0, 7.0])
>>> groups = np.array(["A", "A", "B", "B"])
>>> estimator = StratifiedClassicalMeanEstimator()
>>> result = estimator.estimate(y, groups)
>>> print(result)
Metric: Metric
Point Estimate: 4.000
Confidence Interval (95%): [2.614, 5.386]
Estimator : StratifiedClassicalMeanEstimator
n: 4
Source code in glide/estimators/stratified_classical.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | |
estimate
estimate(
y,
groups,
metric_name="Metric",
confidence_level=0.95,
stratum_weights=None,
)
Estimate the population mean using stratified classical inference.
Splits observations by groups, computes a classical sample-mean
estimate within each stratum, and combines them with stratum weights:
theta = sum_k w_k * theta_k
sigma2 = sum_k w_k^2 * sigma2_k
where w_k is the weight of stratum k. By default w_k is the
sample fraction n_samples_k / n_samples; pass stratum_weights
to use a different weighting.
It is assumed that w_k reflects the true weight of stratum k for
all k.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
NDArray
|
Array of observations. |
required |
groups
|
NDArray
|
Array of group identifiers (same length as |
required |
metric_name
|
str
|
Human-readable label for the metric. Defaults to |
'Metric'
|
confidence_level
|
float
|
Target coverage for the confidence interval, e.g. |
0.95
|
stratum_weights
|
NDArray
|
Stratum weights in sorted stratum order. When provided, these
override the sample-count proportions. Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
ClassicalMeanInferenceResult
|
Contains the CLT-based confidence interval, the metric name,
the estimator name ( |
Raises:
| Type | Description |
|---|---|
ValueError
|
|
Source code in glide/estimators/stratified_classical.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | |
glide.estimators.ipw_classical.IPWClassicalMeanEstimator
Estimator for population mean using Inverse Probability Weighting (IPW).
Extends the classical sample mean to handle non-uniform sampling. Each observation y_i is reweighted by 1/π_i, where π_i is the pre-determined probability that sample i was selected for labeling. Some values of y_i may be NaN corresponding to unsampled instances.
For the computation to be statistically valid, the sum of π_i should be approximately equal to number of observed elements y_i.
Examples:
>>> import numpy as np
>>> from glide.estimators import IPWClassicalMeanEstimator
>>> y = np.array([5.0, 6.0, 4.0, np.nan, np.nan, np.nan])
>>> pi = np.array([0.2, 0.8, 0.6, 0.6, 0.4, 0.4])
>>> estimator = IPWClassicalMeanEstimator()
>>> result = estimator.estimate(y, pi)
>>> print(result)
Metric: Metric
Point Estimate: 6.528
Confidence Interval (95%): [-1.230, 14.286]
Estimator : IPWClassicalMeanEstimator
n: 3
Source code in glide/estimators/ipw_classical.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
estimate
estimate(
y,
sampling_probability,
metric_name="Metric",
confidence_level=0.95,
)
Estimate the population mean using IPW-corrected sample mean.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
NDArray
|
1-D array of observations, may contain unobserved NaN values. |
required |
sampling_probability
|
NDArray
|
1-D array of pre-determined sampling probabilities π_i ∈ [0, 1],
one per observation. Must have the same length as |
required |
metric_name
|
str
|
Human-readable label for the metric. Defaults to |
'Metric'
|
confidence_level
|
float
|
Target coverage for the confidence interval. Defaults to |
0.95
|
Returns:
| Type | Description |
|---|---|
ClassicalMeanInferenceResult
|
Contains the CLT-based confidence interval, the metric name,
the estimator name ( |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any value in |
Source code in glide/estimators/ipw_classical.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
glide.estimators.cluster_classical.ClusterClassicalMeanEstimator
Cluster classical estimator for population mean.
Extends mean estimation as in ClassicalMeanEstimator to datasets where
observations are grouped into clusters. Each cluster's size-weighted
contribution is treated as the sampling unit, which accounts for
within-cluster correlation and produces valid confidence intervals under
cluster sampling designs.
Examples:
>>> import numpy as np
>>> from glide.estimators import ClusterClassicalMeanEstimator
>>> y = np.array([5.0, 5.0, 7.0, 7.0])
>>> clusters = np.array(["A", "A", "B", "B"])
>>> estimator = ClusterClassicalMeanEstimator()
>>> result = estimator.estimate(y, clusters)
>>> print(result)
Metric: Metric
Point Estimate: 6.000
Confidence Interval (95%): [4.040, 7.960]
Estimator : ClusterClassicalMeanEstimator
n: 4
Source code in glide/estimators/cluster_classical.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
estimate
estimate(
y, clusters, metric_name="Metric", confidence_level=0.95
)
Estimate the population mean using the cluster classical estimator.
Computes within-cluster sums and uses them as sampling units to apply the CLT:
theta = (1 / N) * sum_l u_l
sigma2 = L * Var(u_l, ddof=1) / N^2
where u_l = sum_{i in l} y_i are the cluster sums, L is the
number of clusters, and N = sum_l n_l is the total number of
observations. NaN values in y are dropped before making the
computations. Clusters that contain only NaN are not used.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
NDArray
|
Array of observations, shape |
required |
clusters
|
NDArray
|
Array of cluster identifiers, shape |
required |
metric_name
|
str
|
Human-readable label for the metric. Defaults to |
'Metric'
|
confidence_level
|
float
|
Target coverage for the confidence interval. Defaults to |
0.95
|
Returns:
| Type | Description |
|---|---|
ClassicalMeanInferenceResult
|
Contains the CLT-based confidence interval, the metric name,
the estimator name ( |
Raises:
| Type | Description |
|---|---|
ValueError
|
|
Source code in glide/estimators/cluster_classical.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
glide.estimators.ppi.PPIMeanEstimator
Estimator for population mean using Prediction-Powered Inference (PPI).
This class implements the PPI method which combines a small set of labeled samples with a large set of unlabeled samples whose labels are approximated by a proxy model. The method provides consistent estimates even when the proxy is imperfect. An optional power-tuning mode (enabled by default) applies the optimal weight λ from PPI++, ensuring the confidence interval is never wider than the one obtained without the proxy.
References
Angelopoulos, Anastasios N., Stephen Bates, Clara Fannjiang, Michael I. Jordan, and Tijana Zrnic. "Prediction-powered inference." Science 382, no. 6671 (2023): 669-674.
Angelopoulos, Anastasios N., John C. Duchi, and Tijana Zrnic. "PPI++: Efficient prediction-powered inference." arXiv preprint arXiv:2311.01453 (2023).
Examples:
>>> import numpy as np
>>> from glide.estimators import PPIMeanEstimator
>>> y_true = np.array([5.0, 6.0, np.nan, np.nan])
>>> y_proxy = np.array([4.9, 6.1, 5.2, 6.1])
>>> estimator = PPIMeanEstimator()
>>> result = estimator.estimate(y_true, y_proxy)
>>> print(result)
Metric: Metric
Point Estimate: 5.618
Confidence Interval (95%): [4.923, 6.312]
Estimator : PPIMeanEstimator
n_true: 2
n_proxy: 4
Effective Sample Size: 3
Source code in glide/estimators/ppi.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
estimate
estimate(
y_true,
y_proxy,
metric_name="Metric",
confidence_level=0.95,
power_tuning=True,
)
Estimate the population mean using Prediction-Powered Inference (PPI).
Combines a small set of labeled samples with a large set of unlabeled samples whose
labels are approximated by a proxy (e.g. a pretrained model). The rectifier
mean(y_true) - λ·mean(y_proxy_labeled) corrects the bias of the proxy, yielding
a consistent estimate even when the proxy is imperfect.
The weight λ interpolates between relying only on y_true (λ = 0) and the
standard PPI estimate that leverages both y_true y_proxy with equal weights (λ = 1).
When power_tuning=True (default), the optimal λ is computed via the PPI++
closed-form formula to minimise the confidence interval width. When
power_tuning=False, λ = 1 and the estimator reduces to the classic PPI estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
NDArray
|
Array of labeled observations, shape |
required |
y_proxy
|
NDArray
|
Array of proxy predictions, shape |
required |
metric_name
|
str
|
Human-readable label for the metric. Defaults to |
'Metric'
|
confidence_level
|
float
|
Target coverage for the confidence interval, e.g. |
0.95
|
power_tuning
|
bool
|
If |
True
|
Returns:
| Type | Description |
|---|---|
PredictionPoweredMeanInferenceResult
|
Contains the CLT-based confidence interval, the metric name,
the estimator name ( |
Raises:
| Type | Description |
|---|---|
ValueError
|
|
Source code in glide/estimators/ppi.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
glide.estimators.stratified_ppi.StratifiedPPIMeanEstimator
Stratified PPI++ estimator for population mean.
Extends Prediction-Powered Inference to datasets that are naturally partitioned into strata (e.g. by language, domain, or data source). A per-stratum power-tuned lambda is computed independently for each stratum, and the final estimate is a population-proportional weighted average of the per-stratum PPI++ estimates.
This yields narrower confidence intervals than standard PPI++ whenever strata differ in proxy quality or relative size, because the optimal lambda can adapt to each stratum's signal-to-noise ratio.
References
Fisch, Adam, Joshua Maynez, R. Alex Hofer, Bhuwan Dhingra, Amir Globerson, and William W. Cohen. "Stratified prediction-powered inference for effective hybrid evaluation of language models." Advances in Neural Information Processing Systems 37 (2024): 111489-111514.
Fogliato, Riccardo, Pratik Patil, Mathew Monfort, and Pietro Perona. "A framework for efficient model evaluation through stratification, sampling, and estimation." In European Conference on Computer Vision, pp. 140-158. Cham: Springer Nature Switzerland, 2024.
Examples:
>>> import numpy as np
>>> from glide.estimators import StratifiedPPIMeanEstimator
>>> y_true = np.array([1.0, 2.0, np.nan, np.nan, 4.0, 5.0, np.nan, np.nan])
>>> y_proxy = np.array([1.1, 2.2, 1.5, 1.8, 3.9, 5.1, 4.5, 4.8])
>>> groups = np.array([0, 0, 0, 0, 1, 1, 1, 1])
>>> estimator = StratifiedPPIMeanEstimator()
>>> result = estimator.estimate(y_true, y_proxy, groups)
>>> print(result)
Metric: Metric
Point Estimate: 3.086
Confidence Interval (95%): [2.720, 3.452]
Estimator : StratifiedPPIMeanEstimator
n_true: 4
n_proxy: 8
Effective Sample Size: 14
Source code in glide/estimators/stratified_ppi.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
estimate
estimate(
y_true,
y_proxy,
groups,
metric_name="Metric",
confidence_level=0.95,
power_tuning=True,
)
Estimate the population mean using Stratified PPI++.
Splits arrays by unique values in groups, computes a power-tuned PPI++
estimate within each stratum, and combines them with
population-proportional weights:
theta = sum_k w_k * theta_k(lambda_k)
sigma2 = sum_k w_k^2 * sigma2_k(lambda_k)
where w_k is the fraction of samples in stratum k.
Note that this assumes the portions of labeled vs unlabeled samples are approximately the same in all strata which is important for statistical validity.
Labeled and unlabeled samples are distinguished by NaN in y_true:
a sample is labeled if its y_true entry is not NaN.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
NDArray
|
Array of observations, shape |
required |
y_proxy
|
NDArray
|
Array of proxy predictions, shape |
required |
groups
|
NDArray
|
Array of integer stratum identifiers, shape |
required |
metric_name
|
str
|
Human-readable label for the metric. Defaults to |
'Metric'
|
confidence_level
|
float
|
Target coverage for the confidence interval. Defaults to |
0.95
|
power_tuning
|
bool
|
If |
True
|
Returns:
| Type | Description |
|---|---|
PredictionPoweredMeanInferenceResult
|
Contains the CLT-based confidence interval, the metric name,
the estimator name ( |
Raises:
| Type | Description |
|---|---|
ValueError
|
|
Source code in glide/estimators/stratified_ppi.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
glide.estimators.asi.ASIMeanEstimator
Estimator for population mean using Active Statistical Inference (ASI).
This class implements the ASI method which extends PPI++ to non-uniform sampling. Each labeled sample has a known, pre-determined sampling probability π_i. Inverse probability weighting (IPW) corrects for this non-uniform selection, yielding valid confidence intervals under any sampling rule.
The special case where all π_i are equal to n_labeled / n recovers PPI++ at λ = 1.
References
Zrnic, Tijana, and Emmanuel J. Candès. "Active statistical inference." In Proceedings of the 41st International Conference on Machine Learning, pp. 62993-63010. 2024.
Gligorić, Kristina, Tijana Zrnic, Cinoo Lee, Emmanuel Candes, and Dan Jurafsky. "Can unconfident llm annotations be used for confident conclusions?." In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 3514-3533. 2025.
Examples:
>>> import numpy as np
>>> from glide.estimators import ASIMeanEstimator
>>> y_true = np.array([0.0, 1.0, np.nan, np.nan])
>>> y_proxy = np.array([0.1, 0.9, 0.5, 0.5])
>>> pi = np.array([0.8, 0.8, 0.8, 0.8])
>>> estimator = ASIMeanEstimator()
>>> result = estimator.estimate(y_true, y_proxy, pi)
>>> print(result)
Metric: Metric
Point Estimate: 0.548
Confidence Interval (95%): [0.138, 0.958]
Estimator : ASIMeanEstimator
n_true: 2
n_proxy: 4
Effective Sample Size: 4
Source code in glide/estimators/asi.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
estimate
estimate(
y_true,
y_proxy,
pi,
metric_name="Metric",
confidence_level=0.95,
power_tuning=True,
)
Estimate the population mean using Active Statistical Inference (ASI).
Uses inverse-probability weighting (IPW) to correct for non-uniform sampling, combining labeled and unlabeled samples into a single IPW-corrected estimator. A power-tuning step (enabled by default) finds the λ that minimises asymptotic variance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
NDArray
|
Array of shape |
required |
y_proxy
|
NDArray
|
Array of shape |
required |
pi
|
NDArray
|
Array of shape |
required |
metric_name
|
str
|
Human-readable label for the metric. Defaults to |
'Metric'
|
confidence_level
|
float
|
Target coverage for the confidence interval. Defaults to |
0.95
|
power_tuning
|
bool
|
If |
True
|
Returns:
| Type | Description |
|---|---|
PredictionPoweredMeanInferenceResult
|
Contains a |
Raises:
| Type | Description |
|---|---|
ValueError
|
|
Source code in glide/estimators/asi.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
glide.estimators.ptd.PTDMeanEstimator
Estimator for population mean using Predict-Then-Debias (PTD).
Combines a small set of labeled samples with a large set of unlabeled samples whose labels are approximated by a proxy model. Confidence intervals are constructed via a bootstrap percentile method, requiring no distributional assumptions on the proxy quality.
The bootstrap uses a CLT-based algorithm: the unlabeled proxy mean is computed once on the full unlabeled set and its sampling variability is simulated with a Gaussian draw at each iteration, making the per-iteration cost O(n_labeled) rather than O(n_labeled + n_unlabeled), where n_labeled and n_unlabeled are the number of labeled and unlabeled samples respectively.
References
Kluger, Dan M., Kerri Lu, Tijana Zrnic, Sherrie Wang, and Stephen Bates. "Prediction-powered inference with imputed covariates and nonuniform sampling." arXiv preprint arXiv:2501.18577 (2025).
Examples:
>>> import numpy as np
>>> from glide.estimators import PTDMeanEstimator
>>> y_true = np.array([5.0, 6.0, np.nan, np.nan])
>>> y_proxy = np.array([4.9, 6.1, 5.2, 6.1])
>>> estimator = PTDMeanEstimator()
>>> result = estimator.estimate(y_true, y_proxy, n_bootstrap=5, random_seed=0)
>>> print(result)
Metric: Metric
Point Estimate: 5.552
Confidence Interval (95%): [5.211, 5.865]
Estimator : PTDMeanEstimator
n_true: 2
n_proxy: 4
Effective Sample Size: 5
Source code in glide/estimators/ptd.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | |
estimate
estimate(
y_true,
y_proxy,
metric_name="Metric",
confidence_level=0.95,
n_bootstrap=2000,
power_tuning=True,
random_seed=None,
)
Estimate the population mean using Predict-Then-Debias (PTD).
Combines a small set of labeled samples with a large set of unlabeled
samples whose labels are approximated by a proxy model. The rectifier
mean(y_true) - λ·mean(y_proxy_labeled) corrects the bias of the proxy,
yielding a consistent estimate even when the proxy is imperfect.
The tuning parameter λ and the confidence interval are both derived from a bootstrap over the labeled set only. The sampling variability of the unlabeled proxy mean is approximated by a single Gaussian draw per iteration, keeping the per-iteration cost O(n_labeled), where n_labeled is the number of labeled samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
NDArray
|
Array of labeled observations, shape |
required |
y_proxy
|
NDArray
|
Array of proxy predictions, shape |
required |
metric_name
|
str
|
Human-readable label for the metric. Defaults to |
'Metric'
|
confidence_level
|
float
|
Target coverage for the confidence interval. Defaults to |
0.95
|
n_bootstrap
|
int
|
Number of bootstrap resamples. Defaults to |
2000
|
power_tuning
|
bool
|
If |
True
|
random_seed
|
int
|
Seed for the random number generator, for reproducibility.
Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
PredictionPoweredMeanInferenceResult
|
Contains a |
Raises:
| Type | Description |
|---|---|
ValueError
|
|
Source code in glide/estimators/ptd.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | |
glide.estimators.ipw_ptd.IPWPTDMeanEstimator
Estimator for population mean using IPW-corrected Predict-Then-Debias (IPW-PTD).
Extends PTD to handle non-uniform ground-truth labelling probabilities via inverse probability weighting. The bootstrap percentile confidence interval requires no distributional assumptions on the proxy quality. The CLT speedup is applied to the unlabeled proxies. However, inverse probability weighting requires sampling over the whole dataset to compute bootstrap ground-truth mean and labeled proxy mean estimates.
For large sample count (CLT applies), produces inference equivalent to ASIMeanEstimator,
but without relying on the normal approximation for the labeled rectifier.
References
Kluger, Dan M., Kerri Lu, Tijana Zrnic, Sherrie Wang, and Stephen Bates. "Prediction-powered inference with imputed covariates and nonuniform sampling." arXiv preprint arXiv:2501.18577 (2025).
Examples:
>>> import numpy as np
>>> from glide.estimators.ipw_ptd import IPWPTDMeanEstimator
>>> y_true = np.array([1.0, 0.0, np.nan, np.nan])
>>> y_proxy = np.array([0.9, 0.1, 0.8, 0.2])
>>> pi = np.array([0.4, 0.6, 0.3, 0.7])
>>> estimator = IPWPTDMeanEstimator()
>>> result = estimator.estimate(y_true, y_proxy, pi, n_bootstrap=5, random_seed=0)
>>> print(result)
Metric: Metric
Point Estimate: 0.253
Confidence Interval (95%): [-0.082, 0.633]
Estimator : IPWPTDMeanEstimator
n_true: 2
n_proxy: 4
Effective Sample Size: 9
Source code in glide/estimators/ipw_ptd.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
estimate
estimate(
y_true,
y_proxy,
pi,
metric_name="Metric",
confidence_level=0.95,
n_bootstrap=2000,
power_tuning=True,
random_seed=None,
)
Estimate the population mean using IPW-corrected Predict-Then-Debias.
Ground-truth labels were sampled with known, non-uniform probabilities π_i. Inverse probability weighting (IPW) corrects for this non-uniform selection, yielding valid confidence intervals under any sampling rule. The unlabeled proxy mean is not resampled: its sampling variability is injected via a single Gaussian draw per iteration (CLT speedup).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
NDArray
|
Array of shape |
required |
y_proxy
|
NDArray
|
Array of shape |
required |
pi
|
NDArray
|
Array of shape |
required |
metric_name
|
str
|
Human-readable label for the metric. Defaults to |
'Metric'
|
confidence_level
|
float
|
Target coverage for the confidence interval. Defaults to |
0.95
|
n_bootstrap
|
int
|
Number of bootstrap resamples. Defaults to |
2000
|
power_tuning
|
bool
|
If |
True
|
random_seed
|
int
|
Seed for the random number generator, for reproducibility.
Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
PredictionPoweredMeanInferenceResult
|
Contains a |
Raises:
| Type | Description |
|---|---|
ValueError
|
|
Source code in glide/estimators/ipw_ptd.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
glide.estimators.stratified_ptd.StratifiedPTDMeanEstimator
Stratified Predict-Then-Debias estimator for population mean.
Extends PTD to datasets partitioned into strata (e.g. by language, domain, or data source). A per-stratum power-tuning parameter is computed independently within each stratum, and the final confidence interval is constructed from a bootstrap distribution obtained by combining the per-stratum bootstrap estimates with weights proportional to the stratum sizes.
This yields narrower confidence intervals than standard PTD whenever strata differ in proxy quality, because the optimal power-tuning parameter can adapt to each stratum's signal-to-noise ratio.
References
Kluger, Dan M., Kerri Lu, Tijana Zrnic, Sherrie Wang, and Stephen Bates. "Prediction-powered inference with imputed covariates and nonuniform sampling." arXiv preprint arXiv:2501.18577 (2025).
Examples:
>>> import numpy as np
>>> from glide.estimators import StratifiedPTDMeanEstimator
>>> y_true = np.array([5.0, 6.0, np.nan, np.nan, 5.0, 6.0, np.nan, np.nan])
>>> y_proxy = np.array([4.9, 6.1, 5.2, 6.1, 4.9, 6.1, 5.2, 6.1])
>>> groups = np.array(["A", "A", "A", "A", "B", "B", "B", "B"])
>>> estimator = StratifiedPTDMeanEstimator()
>>> result = estimator.estimate(y_true, y_proxy, groups, n_bootstrap=5, random_seed=0)
>>> print(result)
Metric: Metric
Point Estimate: 5.578
Confidence Interval (95%): [5.400, 5.664]
Estimator : StratifiedPTDMeanEstimator
n_true: 4
n_proxy: 8
Effective Sample Size: 33
Source code in glide/estimators/stratified_ptd.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | |
estimate
estimate(
y_true,
y_proxy,
groups,
metric_name="Metric",
confidence_level=0.95,
n_bootstrap=2000,
power_tuning=True,
random_seed=None,
)
Estimate the population mean using Stratified Predict-Then-Debias.
Splits arrays by unique values in groups, applies the PTD bootstrap
algorithm within each stratum with a per-stratum power-tuning, and
combines the resulting per-stratum bootstrap arrays with weights proportional
to the stratum sizes into a single BootstrapConfidenceInterval:
theta = sum_k w_k * theta_k(lambda_k)
where w_k is the fraction of samples in stratum k and theta_k(lambda_k)
is the mean estimate for that stratum computed with power-tuning parameter
lambda_k.
Note that this assumes that these fractions reflect the true strata weights in the target data distribution which is important for statistical validity.
Labeled and unlabeled samples are distinguished by NaN in y_true:
a sample is labeled if its y_true entry is not NaN.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
NDArray
|
Array of observations, shape |
required |
y_proxy
|
NDArray
|
Array of proxy predictions, shape |
required |
groups
|
NDArray
|
Array of stratum identifiers, shape |
required |
metric_name
|
str
|
Human-readable label for the metric. Defaults to |
'Metric'
|
confidence_level
|
float
|
Target coverage for the confidence interval. Defaults to |
0.95
|
n_bootstrap
|
int
|
Number of bootstrap resamples. Defaults to |
2000
|
power_tuning
|
bool
|
If |
True
|
random_seed
|
int
|
Seed for the random number generator, for reproducibility.
Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
PredictionPoweredMeanInferenceResult
|
Contains the bootstrap-based confidence interval, the metric name,
the estimator name ( |
Raises:
| Type | Description |
|---|---|
ValueError
|
|
Source code in glide/estimators/stratified_ptd.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | |