Skip to content

Confidence Intervals

glide.confidence_intervals.clt.CLTConfidenceInterval dataclass

Confidence interval based on the Central Limit Theorem.

Constructs a symmetric interval around the point estimate using the standard normal distribution: [mean - z * std, mean + z * std], where z is the critical value from the standard normal distribution corresponding to the target confidence level.

Parameters:

Name Type Description Default
mean float

The point estimate of the population mean.

required
std float

The standard error (standard deviation of the estimate).

required
confidence_level float

Target coverage probability, e.g. 0.95 for a 95% CI. Default is 0.95.

0.95

Examples:

>>> from glide.confidence_intervals import CLTConfidenceInterval
>>> ci = CLTConfidenceInterval(mean=5.0, std=0.2, confidence_level=0.95)
>>> print(f"[{ci.lower_bound:.3f}, {ci.upper_bound:.3f}]")
[4.608, 5.392]
Source code in glide/confidence_intervals/clt.py
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
@dataclass
class CLTConfidenceInterval:
    """Confidence interval based on the Central Limit Theorem.

    Constructs a symmetric interval around the point estimate using the standard
    normal distribution: [mean - z * std, mean + z * std], where z is the critical
    value from the standard normal distribution corresponding to the target
    confidence level.

    Parameters
    ----------
    mean : float
        The point estimate of the population mean.
    std : float
        The standard error (standard deviation of the estimate).
    confidence_level : float, optional
        Target coverage probability, e.g. 0.95 for a 95% CI. Default is 0.95.

    Examples
    --------
    >>> from glide.confidence_intervals import CLTConfidenceInterval
    >>> ci = CLTConfidenceInterval(mean=5.0, std=0.2, confidence_level=0.95)
    >>> print(f"[{ci.lower_bound:.3f}, {ci.upper_bound:.3f}]")
    [4.608, 5.392]
    """

    mean: float
    std: float
    var: float = field(init=False, repr=False)
    _confidence_level: float = field(init=False, repr=False)
    lower_bound: float = field(init=False, repr=False)
    upper_bound: float = field(init=False, repr=False)
    width: float = field(init=False, repr=False)

    def __init__(self, mean: float, std: float, confidence_level: float = 0.95) -> None:
        self.mean = mean
        self.std = std
        self.var = std**2
        self.confidence_level = confidence_level

    @property
    def confidence_level(self) -> float:
        return self._confidence_level

    @confidence_level.setter
    def confidence_level(self, value: float) -> None:
        _validate_bounds(value, "confidence_level", lower=0, upper=1, left_inclusive=False, right_inclusive=False)
        self._confidence_level = value
        alpha_over_two = (1 - value) / 2
        z_score = norm.ppf(1 - alpha_over_two)
        self.lower_bound = self.mean - self.std * z_score
        self.upper_bound = self.mean + self.std * z_score
        self.width = 2 * self.std * z_score

    def test_null_hypothesis(
        self, h0_value: float, alternative: Literal["larger", "smaller", "two-sided"] = "two-sided"
    ) -> Tuple[float, float, float]:
        """Perform a one-sample z-test against a null hypothesis value.

        Parameters
        ----------
        h0_value : float
            The hypothesized population mean under the null hypothesis (H0: μ = h0_value).
        alternative : str, optional
            The alternative hypothesis. One of:
            - ``'two-sided'`` (default): H1: μ ≠ h0_value
            - ``'larger'``: H1: μ > h0_value
            - ``'smaller'``: H1: μ < h0_value

        Returns
        -------
        Tuple[float, float, float]
            ``(z_stat, p_value, df)`` where ``z_stat`` is the test statistic
            (mean - h0_value) / std, ``p_value`` is the p-value under the standard
            normal distribution, and ``df`` is ``float('inf')``.

        Raises
        ------
        ValueError
            If ``alternative`` is not one of ``'two-sided'``, ``'larger'``, or ``'smaller'``.
        """
        z_stat = (self.mean - h0_value) / self.std
        alternatives = ["two-sided", "larger", "smaller"]
        _validate_literal(alternative, "alternative", alternatives)

        if alternative == alternatives[0]:
            p_value = 2 * norm.sf(abs(z_stat))
        elif alternative == alternatives[1]:
            p_value = norm.sf(z_stat)
        else:
            p_value = norm.cdf(z_stat)

        df = float("inf")
        return z_stat, p_value, df

test_null_hypothesis

test_null_hypothesis(h0_value, alternative='two-sided')

Perform a one-sample z-test against a null hypothesis value.

Parameters:

Name Type Description Default
h0_value float

The hypothesized population mean under the null hypothesis (H0: μ = h0_value).

required
alternative str

The alternative hypothesis. One of: - 'two-sided' (default): H1: μ ≠ h0_value - 'larger': H1: μ > h0_value - 'smaller': H1: μ < h0_value

'two-sided'

Returns:

Type Description
Tuple[float, float, float]

(z_stat, p_value, df) where z_stat is the test statistic (mean - h0_value) / std, p_value is the p-value under the standard normal distribution, and df is float('inf').

Raises:

Type Description
ValueError

If alternative is not one of 'two-sided', 'larger', or 'smaller'.

Source code in glide/confidence_intervals/clt.py
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def test_null_hypothesis(
    self, h0_value: float, alternative: Literal["larger", "smaller", "two-sided"] = "two-sided"
) -> Tuple[float, float, float]:
    """Perform a one-sample z-test against a null hypothesis value.

    Parameters
    ----------
    h0_value : float
        The hypothesized population mean under the null hypothesis (H0: μ = h0_value).
    alternative : str, optional
        The alternative hypothesis. One of:
        - ``'two-sided'`` (default): H1: μ ≠ h0_value
        - ``'larger'``: H1: μ > h0_value
        - ``'smaller'``: H1: μ < h0_value

    Returns
    -------
    Tuple[float, float, float]
        ``(z_stat, p_value, df)`` where ``z_stat`` is the test statistic
        (mean - h0_value) / std, ``p_value`` is the p-value under the standard
        normal distribution, and ``df`` is ``float('inf')``.

    Raises
    ------
    ValueError
        If ``alternative`` is not one of ``'two-sided'``, ``'larger'``, or ``'smaller'``.
    """
    z_stat = (self.mean - h0_value) / self.std
    alternatives = ["two-sided", "larger", "smaller"]
    _validate_literal(alternative, "alternative", alternatives)

    if alternative == alternatives[0]:
        p_value = 2 * norm.sf(abs(z_stat))
    elif alternative == alternatives[1]:
        p_value = norm.sf(z_stat)
    else:
        p_value = norm.cdf(z_stat)

    df = float("inf")
    return z_stat, p_value, df

glide.confidence_intervals.bootstrap.BootstrapConfidenceInterval dataclass

Quantile bootstrap confidence interval.

Stores the full distribution of bootstrap point estimates and derives bounds as quantiles of that distribution.

Parameters:

Name Type Description Default
bootstrap_estimates NDArray

Array of shape (B,) containing the B bootstrap point estimates.

required
confidence_level float

Target coverage, e.g. 0.95 for a 95 % CI. Default is 0.95.

0.95

Examples:

>>> import numpy as np
>>> from glide.confidence_intervals import BootstrapConfidenceInterval
>>> rng = np.random.default_rng(0)
>>> estimates = rng.normal(loc=5.0, scale=0.3, size=20)
>>> ci = BootstrapConfidenceInterval(bootstrap_estimates=estimates, confidence_level=0.95)
>>> print(f"[{ci.lower_bound:.3f}, {ci.upper_bound:.3f}]")
[4.453, 5.354]
Source code in glide/confidence_intervals/bootstrap.py
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
@dataclass
class BootstrapConfidenceInterval:
    """Quantile bootstrap confidence interval.

    Stores the full distribution of bootstrap point estimates and derives
    bounds as quantiles of that distribution.

    Parameters
    ----------
    bootstrap_estimates : NDArray
        Array of shape (B,) containing the B bootstrap point estimates.
    confidence_level : float, optional
        Target coverage, e.g. 0.95 for a 95 % CI. Default is 0.95.

    Examples
    --------
    >>> import numpy as np
    >>> from glide.confidence_intervals import BootstrapConfidenceInterval
    >>> rng = np.random.default_rng(0)
    >>> estimates = rng.normal(loc=5.0, scale=0.3, size=20)
    >>> ci = BootstrapConfidenceInterval(bootstrap_estimates=estimates, confidence_level=0.95)
    >>> print(f"[{ci.lower_bound:.3f}, {ci.upper_bound:.3f}]")
    [4.453, 5.354]
    """

    mean: float = field(init=False, repr=False)
    var: float = field(init=False, repr=False)
    std: float = field(init=False, repr=False)
    _sorted_estimates: NDArray = field(init=False, repr=False)
    _confidence_level: float = field(init=False, repr=False)
    lower_bound: float = field(init=False, repr=False)
    upper_bound: float = field(init=False, repr=False)
    width: float = field(init=False, repr=False)

    def __init__(self, bootstrap_estimates: NDArray, confidence_level: float = 0.95) -> None:
        self.mean = float(np.mean(bootstrap_estimates))
        self.var = float(np.var(bootstrap_estimates, ddof=1))
        self.std = float(np.sqrt(self.var))
        self._sorted_estimates = np.sort(bootstrap_estimates)
        self.confidence_level = confidence_level

    @property
    def confidence_level(self) -> float:
        return self._confidence_level

    @confidence_level.setter
    def confidence_level(self, value: float) -> None:
        _validate_bounds(value, "confidence_level", lower=0, upper=1, left_inclusive=False, right_inclusive=False)
        self._confidence_level = value
        alpha_over_two = (1 - value) / 2
        self.lower_bound = float(np.quantile(self._sorted_estimates, alpha_over_two))
        self.upper_bound = float(np.quantile(self._sorted_estimates, 1 - alpha_over_two))
        self.width = self.upper_bound - self.lower_bound

    def test_null_hypothesis(
        self,
        h0_value: float,
        alternative: Literal["larger", "smaller", "two-sided"] = "two-sided",
    ) -> Tuple[float, float, float]:
        """Bootstrap hypothesis test against a null value.

        Computes a p-value as the proportion of bootstrap estimates that are
        at least as extreme as `h0_value` under the specified alternative.

        Parameters
        ----------
        h0_value : float
            The hypothesized population mean under the null hypothesis (H0: μ = h0_value).
        alternative : str, optional
            The alternative hypothesis. One of:
            - ``'two-sided'`` (default): H1: μ ≠ h0_value
            - ``'larger'``: H1: μ > h0_value
            - ``'smaller'``: H1: μ < h0_value

        Returns
        -------
        Tuple[float, float, float]
            ``(test_statistic, p_value, df)`` where ``test_statistic`` is the
            point estimate (mean of bootstrap distribution), ``p_value`` is the
            bootstrap p-value, and ``df`` is ``float('inf')``.
        """
        n = len(self._sorted_estimates)
        alternatives = ["two-sided", "larger", "smaller"]
        _validate_literal(alternative, "alternative", alternatives)

        if alternative == alternatives[0]:
            observed_deviation = abs(h0_value - self.mean)
            # Count estimates <= (mean - deviation) or >= (mean + deviation)
            lower_threshold = self.mean - observed_deviation
            upper_threshold = self.mean + observed_deviation
            count_below = np.searchsorted(self._sorted_estimates, lower_threshold, side="right")
            count_above = n - np.searchsorted(self._sorted_estimates, upper_threshold, side="left")
            count_extreme = count_below + count_above
        elif alternative == alternatives[1]:
            # Count estimates <= h0_value (evidence against "larger" alternative)
            count_extreme = np.searchsorted(self._sorted_estimates, h0_value, side="right")
        else:
            # Count estimates >= h0_value (evidence against "smaller" alternative)
            count_extreme = n - np.searchsorted(self._sorted_estimates, h0_value, side="left")

        p_value = float(count_extreme) / n

        return self.mean, p_value, float("inf")

test_null_hypothesis

test_null_hypothesis(h0_value, alternative='two-sided')

Bootstrap hypothesis test against a null value.

Computes a p-value as the proportion of bootstrap estimates that are at least as extreme as h0_value under the specified alternative.

Parameters:

Name Type Description Default
h0_value float

The hypothesized population mean under the null hypothesis (H0: μ = h0_value).

required
alternative str

The alternative hypothesis. One of: - 'two-sided' (default): H1: μ ≠ h0_value - 'larger': H1: μ > h0_value - 'smaller': H1: μ < h0_value

'two-sided'

Returns:

Type Description
Tuple[float, float, float]

(test_statistic, p_value, df) where test_statistic is the point estimate (mean of bootstrap distribution), p_value is the bootstrap p-value, and df is float('inf').

Source code in glide/confidence_intervals/bootstrap.py
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
def test_null_hypothesis(
    self,
    h0_value: float,
    alternative: Literal["larger", "smaller", "two-sided"] = "two-sided",
) -> Tuple[float, float, float]:
    """Bootstrap hypothesis test against a null value.

    Computes a p-value as the proportion of bootstrap estimates that are
    at least as extreme as `h0_value` under the specified alternative.

    Parameters
    ----------
    h0_value : float
        The hypothesized population mean under the null hypothesis (H0: μ = h0_value).
    alternative : str, optional
        The alternative hypothesis. One of:
        - ``'two-sided'`` (default): H1: μ ≠ h0_value
        - ``'larger'``: H1: μ > h0_value
        - ``'smaller'``: H1: μ < h0_value

    Returns
    -------
    Tuple[float, float, float]
        ``(test_statistic, p_value, df)`` where ``test_statistic`` is the
        point estimate (mean of bootstrap distribution), ``p_value`` is the
        bootstrap p-value, and ``df`` is ``float('inf')``.
    """
    n = len(self._sorted_estimates)
    alternatives = ["two-sided", "larger", "smaller"]
    _validate_literal(alternative, "alternative", alternatives)

    if alternative == alternatives[0]:
        observed_deviation = abs(h0_value - self.mean)
        # Count estimates <= (mean - deviation) or >= (mean + deviation)
        lower_threshold = self.mean - observed_deviation
        upper_threshold = self.mean + observed_deviation
        count_below = np.searchsorted(self._sorted_estimates, lower_threshold, side="right")
        count_above = n - np.searchsorted(self._sorted_estimates, upper_threshold, side="left")
        count_extreme = count_below + count_above
    elif alternative == alternatives[1]:
        # Count estimates <= h0_value (evidence against "larger" alternative)
        count_extreme = np.searchsorted(self._sorted_estimates, h0_value, side="right")
    else:
        # Count estimates >= h0_value (evidence against "smaller" alternative)
        count_extreme = n - np.searchsorted(self._sorted_estimates, h0_value, side="left")

    p_value = float(count_extreme) / n

    return self.mean, p_value, float("inf")