Statistics Utilities Reference¶
pytest-quantum ships statistical primitives that underpin its assertions and help you choose the right shot count for reliable tests. All functions are pure numpy/scipy — no quantum SDK is required.
Import from the top-level package:
from pytest_quantum import (
min_shots,
recommended_shots,
fidelity,
tvd,
tvd_from_counts,
chi_square_test,
)
Shot-count calculators¶
min_shots¶
min_shots(
epsilon,
alpha=0.05,
power=0.80,
) -> int
Returns the minimum number of shots to reliably detect a Total Variation
Distance of epsilon between two distributions.
The formula
Based on two-sample statistical power analysis:
$$ N = \left\lceil \frac{(z_{1-\alpha/2} + z_{\text{power}})^2}{2\varepsilon^2} \right\rceil $$
where $z_p$ is the $p$-th quantile of the standard normal distribution.
With default settings ($\alpha = 0.05$, $\text{power} = 0.80$): $z_{0.975} \approx 1.96$, $z_{0.80} \approx 0.84$.
- Parameters
epsilon— Minimum detectable TVD.0.01means the test can reliably catch a 1% deviation from the expected distribution.alpha— Significance level (default0.05→ 95% confidence).power— Statistical power — the probability of detecting a real error (default0.80→ 80% power).- Returns
Minimum recommended shot count as an integer.
- Raises
ValueError— Any argument is outside its valid range(0, 1).
Worked examples
from pytest_quantum import min_shots
min_shots(0.10) # 74 — catch 10% TVD, 95% CI, 80% power
min_shots(0.05) # 293 — catch 5% TVD
min_shots(0.01) # 7299 — catch 1% TVD
min_shots(0.01, alpha=0.01, power=0.90) # 11282 — stricter: 99% CI, 90% power
Using in a test
import pytest
from pytest_quantum import assert_measurement_distribution, min_shots
@pytest.mark.quantum
def test_bell_5pct_sensitivity(aer_simulator):
from qiskit import QuantumCircuit, transpile
shots = min_shots(epsilon=0.05) # 293 shots
qc = QuantumCircuit(2)
qc.h(0); qc.cx(0, 1); qc.measure_all()
counts = aer_simulator.run(
transpile(qc, aer_simulator), shots=shots
).result().get_counts()
assert_measurement_distribution(counts, {"00": 0.5, "11": 0.5})
Choosing epsilon
Use case |
Recommended epsilon |
|---|---|
Smoke test — just check the circuit runs |
|
Normal regression test |
|
Precise distribution validation |
|
High-precision scientific result |
|
Mark high-shot tests with @pytest.mark.quantum_slow and run them with
--quantum-slow to keep the default suite fast.
recommended_shots¶
recommended_shots(
expected_probs,
min_expected_per_bucket=5,
) -> int
Returns the shot count needed so every non-zero bucket in expected_probs
gets at least min_expected_per_bucket expected counts.
The chi-square goodness-of-fit test (used by assert_measurement_distribution)
requires expected count ≥ 5 per cell to produce valid p-values. Violating
this gives unreliable results and triggers a UserWarning.
recommended_shots targets the rarest outcome: if the rarest outcome has
probability $p_{\min}$, you need at least $\lceil k / p_{\min} \rceil$ shots
(where $k$ is min_expected_per_bucket).
- Parameters
expected_probs— Dict mapping outcome labels to probabilities. Must sum to 1. Zero-probability outcomes are ignored.min_expected_per_bucket— Minimum expected count per non-zero bucket (default5).- Returns
Recommended shot count as an integer.
- Raises
ValueError—expected_probsis empty or all probabilities are zero.
Examples
from pytest_quantum import recommended_shots
# Uniform Bell state — rarest outcome has probability 0.5
recommended_shots({"00": 0.5, "11": 0.5}) # 10
# Mostly-uniform, but one rare outcome at 0.1%
recommended_shots({"00": 0.499, "01": 0.001, "11": 0.5}) # 5000
# 3-qubit uniform (min_prob = 1/8)
recommended_shots({f"{i:03b}": 1/8 for i in range(8)}) # 40
Using in a test
from pytest_quantum import assert_measurement_distribution, recommended_shots
def test_ghz_distribution(aer_simulator):
from qiskit import QuantumCircuit, transpile
expected = {"000": 0.5, "111": 0.5}
shots = recommended_shots(expected) # 10 — very cheap for uniform
qc = QuantumCircuit(3)
qc.h(0); qc.cx(0, 1); qc.cx(1, 2); qc.measure_all()
counts = aer_simulator.run(
transpile(qc, aer_simulator), shots=max(shots, 500)
).result().get_counts()
assert_measurement_distribution(counts, expected)
Note
recommended_shots guarantees chi-square validity but may return fewer
shots than min_shots for detection power. For production tests combine
both: use max(recommended_shots(probs), min_shots(epsilon=0.05)) to
satisfy both constraints.
Statistical primitives¶
fidelity¶
fidelity(
psi,
phi,
) -> float
Computes the pure-state fidelity $F = |\langle\psi|\phi\rangle|^2$.
Both arrays are flattened and normalised before computation, so minor normalisation errors from simulators do not affect the result.
Returns: Float in [0.0, 1.0]. 1.0 means identical states (up to
global phase). 0.0 means orthogonal states.
Raises: ValueError if arrays have different sizes or are zero-norm.
import numpy as np
from pytest_quantum import fidelity
zero = np.array([1, 0], dtype=complex)
one = np.array([0, 1], dtype=complex)
plus = np.array([1, 1], dtype=complex) / np.sqrt(2)
fidelity(zero, zero) # 1.0 — identical
fidelity(zero, one) # 0.0 — orthogonal
fidelity(zero, plus) # 0.5 — |<0|+>|² = 0.5
fidelity(plus, plus) # 1.0 — identical
Global phase invariance:
psi = np.array([1, 0], dtype=complex)
psi_j = 1j * np.array([1, 0], dtype=complex) # global phase i·|0>
fidelity(psi, psi_j) # 1.0 — global phase is invisible
tvd¶
tvd(
p,
q,
) -> float
Computes the Total Variation Distance between two probability distributions:
$$ \text{TVD}(p, q) = \frac{1}{2} \sum_x |p(x) - q(x)| $$
Parameters: p, q — 1-D numpy arrays of probabilities (each sums to 1).
Returns: Float in [0.0, 1.0]. 0.0 means identical; 1.0 means
disjoint support.
import numpy as np
from pytest_quantum import tvd
# Identical distributions
tvd(np.array([0.5, 0.5]), np.array([0.5, 0.5])) # 0.0
# Small deviation
tvd(np.array([0.5, 0.5]), np.array([0.6, 0.4])) # 0.1
# Orthogonal distributions
tvd(np.array([1.0, 0.0]), np.array([0.0, 1.0])) # 1.0
Interpreting TVD values:
TVD |
Interpretation |
|---|---|
|
Identical distributions |
|
Very close — acceptable for most tests |
|
Noticeable deviation — may indicate noise or error |
|
Significant — likely a bug or misconfiguration |
|
Completely disjoint — certain error |
tvd_from_counts¶
tvd_from_counts(
counts_a,
counts_b,
) -> float
Computes TVD between two shot-count dictionaries. Each dict is normalised to a probability distribution before TVD is calculated. Outcomes present in one dict but absent in the other are treated as having count 0.
- Parameters:
counts_a— First counts dict, e.g.{"00": 489, "11": 511}.counts_b— Second counts dict, e.g.{"00": 501, "11": 499}.
Returns: Float in [0.0, 1.0].
Raises: ValueError if either dict is empty.
from pytest_quantum import tvd_from_counts
# Nearly identical Bell distributions
tvd_from_counts(
{"00": 489, "11": 511},
{"00": 501, "11": 499},
)
# → 0.012
# One backend sees "01" where the other sees nothing
tvd_from_counts(
{"00": 500, "11": 500},
{"00": 450, "01": 50, "11": 500},
)
# → 0.05
Using tvd_from_counts directly (instead of assert_counts_close):
from pytest_quantum import tvd_from_counts
def test_backend_drift(aer_simulator):
"""Fail if backend results drift more than 3% TVD day-over-day."""
from qiskit import QuantumCircuit, transpile
qc = QuantumCircuit(2)
qc.h(0); qc.cx(0, 1); qc.measure_all()
qc_t = transpile(qc, aer_simulator)
run1 = aer_simulator.run(qc_t, shots=2000).result().get_counts()
run2 = aer_simulator.run(qc_t, shots=2000).result().get_counts()
distance = tvd_from_counts(run1, run2)
assert distance < 0.03, f"Backend drift too large: TVD = {distance:.4f}"
chi_square_test¶
chi_square_test(
observed,
expected_probs,
total_shots=None,
) -> tuple[float, float]
Chi-square goodness-of-fit test for quantum measurement distributions.
Tests whether observed counts are consistent with expected_probs.
This is the statistical engine behind assert_measurement_distribution.
Use it directly when you need the raw p-value or chi-square statistic.
- Parameters
observed— Either a count dict{"00": 489, "11": 511}or a 1-D numpy array of observed counts.expected_probs— Either a probability dict{"00": 0.5, "11": 0.5}(must sum to 1) or a 1-D numpy array of expected probabilities.total_shots— Required when both inputs are numpy arrays. Ignored when dict inputs are used (total is inferred fromobserved).
Returns: (statistic, pvalue) — the chi-square statistic and the p-value.
Reject the null hypothesis (i.e., declare the distributions inconsistent)
when pvalue < significance.
Raises: ValueError — Inconsistent inputs (mismatched keys, missing
total_shots for array inputs, observed counts summing to zero).
Example — dict inputs
from pytest_quantum import chi_square_test
# 1000 shots on a Bell circuit — should give 50/50
stat, p = chi_square_test(
observed={"00": 495, "11": 505},
expected_probs={"00": 0.5, "11": 0.5},
)
print(f"χ² = {stat:.4f}, p = {p:.4f}")
# χ² = 0.1000, p = 0.7518 → consistent
# Biased circuit — clearly wrong distribution
stat, p = chi_square_test(
observed={"00": 800, "11": 200},
expected_probs={"00": 0.5, "11": 0.5},
)
print(f"χ² = {stat:.4f}, p = {p:.6f}")
# χ² = 360.0000, p = 0.000000 → reject null hypothesis
Example — numpy array inputs
import numpy as np
from pytest_quantum import chi_square_test
observed_counts = np.array([245, 255, 248, 252]) # 4-outcome uniform
expected_uniform = np.array([0.25, 0.25, 0.25, 0.25])
stat, p = chi_square_test(
observed=observed_counts,
expected_probs=expected_uniform,
total_shots=1000,
)
assert p > 0.05 # consistent with uniform distribution
Interpreting p-values
p-value |
Interpretation |
|---|---|
|
Consistent with expected distribution — pass |
|
Marginal — consider more shots |
|
Significant deviation — likely a bug |
|
Strong evidence of error |
Degrees of freedom
The chi-square test has k - 1 degrees of freedom, where k is the number
of non-zero expected outcome buckets. Adding outcomes with zero expected
probability that appear in counts does not add degrees of freedom.
The test requires expected count ≥ 5 per cell. Use recommended_shots to
compute the shot count that satisfies this for your distribution.