Concepts¶

Understanding why pytest-quantum makes certain design choices will help you write better quantum tests. This page explains the physics and statistics behind the library.

Why shot noise is a first-class problem¶

Every quantum measurement is fundamentally random. Even a perfect Bell circuit can produce {"00": 487, "11": 537} when the true probabilities are 50/50. This isn’t a bug; it’s quantum mechanics.

Classical test frameworks don’t know this. assert counts["00"] == 512 fails ~5% of the time on a perfectly correct circuit simply because of Poisson sampling fluctuations. This creates ghost failures in CI that waste hours of debugging time.

pytest-quantum’s measurement assertions use chi-square goodness-of-fit testing:

H₀: the measured distribution matches the expected probabilities
Reject H₀ only if p-value < significance threshold (default: 0.01)

With 1024 shots and significance=0.01, you’ll get a false positive ~1% of the time on a correct circuit, not 5-50% like with hard-coded counts.

Rule of thumb: If your quantum test flakes in CI, it probably needs more shots or a statistical assertion. Use min_shots(epsilon=0.05) to compute the minimum shots needed to detect a 5% deviation with 99% confidence.

Why global phase is not a bug¶

Two quantum gates are physically identical if they differ only by a global phase:

U₂ = e^(iθ) · U₁   for any real θ

The circuit H and e^(iπ/4) · H are completely indistinguishable by any quantum measurement. Yet np.allclose(U1, U2) returns False for them.

assert_unitary computes the physical distance:

distance = 1 - |Tr(U₁† U₂)| / d

This is zero if and only if U₁ and U₂ are equal up to global phase. You can opt into strict phase comparison with allow_global_phase=False when you specifically need to verify a global phase (rare, but sometimes relevant for concatenating circuits).

How fidelity is measured¶

The state fidelity between two pure states is:

F(|ψ⟩, |φ⟩) = |⟨ψ|φ⟩|²

This ranges from 0 (orthogonal) to 1 (identical up to global phase). A fidelity of 0.99 means the states are 99% similar. In practice, fidelities above 0.999 are considered “high fidelity” for near-term quantum devices.

For mixed states (density matrices), the Uhlmann fidelity generalises this:

F(ρ, σ) = (Tr √(√ρ σ √ρ))²

The trace distance is a complementary metric:

T(ρ, σ) = ½ ‖ρ - σ‖₁

Both satisfy 0 ≤ F ≤ 1 and 0 ≤ T ≤ 1, with T = 0 and F = 1 meaning identical states. They’re related by: 1 - √F ≤ T ≤ √(1 - F).

Statistical distance between distributions¶

When comparing measurement distributions, you have several options depending on what you care about:

Total Variation Distance (TVD)¶

TVD(P, Q) = ½ Σₓ |P(x) - Q(x)|

TVD is the operational distance: it’s the maximum probability difference any event can have between the two distributions. TVD = 0 means identical, TVD = 1 means completely different.

Use assert_counts_close(counts_a, counts_b, max_tvd=0.05).

Hellinger Distance¶

H(P, Q) = (1/√2) √(Σₓ (√P(x) - √Q(x))²)

More sensitive to small probabilities than TVD. Good for detecting rare outcome deviations. Use assert_hellinger_close.

KL Divergence¶

KL(P ‖ Q) = Σₓ P(x) log(P(x)/Q(x))

Asymmetric. Measures how much information is lost when using Q instead of P. Useful for distribution comparison in variational algorithms.

Quantum Volume¶

IBM’s Quantum Volume (QV) protocol is the gold standard for characterising the overall capability of a quantum processor.

A backend has QV = 2^n if it can execute “model circuits” of width n × depth n using random SU(4) two-qubit unitaries, and achieve a heavy output probability (HOP) > 2/3 with statistical confidence:

HOP = fraction of measured bitstrings that are "heavy"
"heavy" = ideal probability above the median

The intuition: a perfect backend reproduces the ideal distribution exactly, so HOP = expected fraction of outcomes above median ≈ 2/3 + ε. Noise makes the distribution uniform, pushing HOP toward 0.5.

Randomised Benchmarking¶

RB measures the average Clifford gate error rate by fitting an exponential decay to survival probability vs sequence length:

P(m) = A · p^m + B

where p is the depolarising parameter and the average gate fidelity is:

F = 1 - (1-p)/d

for dimension d = 2 (single qubit). The key insight: RB is robust to state preparation and measurement (SPAM) errors because A and B absorb them.

Interleaved RB (IRB) isolates the error of a specific gate G by interleaving G between random Cliffords. The gate fidelity is:

F_G = 1 - (d-1)/d · (1 - p_G/p_ref)

Cross-Entropy Benchmarking (XEB)¶

XEB is Google’s protocol for benchmarking (and demonstrating quantum advantage). For random circuits, the XEB fidelity is:

F_XEB = D · ⟨P_ideal(x)⟩_measured - 1

where D = 2^n and the average is over measured bitstrings. For an ideal noiseless backend, this equals 1.0 in expectation. Noise drives it toward 0.0.

XEB is more sensitive than QV for large circuits and is used in Google’s “quantum supremacy” and “quantum advantage” demonstrations.

T1, T2, and T2* Coherence Times¶

These are the fundamental noise parameters for any quantum device:

Parameter	Physical meaning	Measurement	Typical value
T1	Energy relaxation: excited state decays to ground state	`assert_t1_above`	50–200 µs
T2	Dephasing with echo: Hahn echo sequence	`assert_t2_above`	30–100 µs
T2*	Dephasing without echo: free induction decay (Ramsey)	`assert_t2star_above`	10–50 µs

All three use exponential decay fits: P(t) = exp(-t/T). Note T2 ≤ 2·T1 always (the Hahn echo refocuses static noise, T2 > T2* typically).

Expressibility and Barren Plateaus¶

Two critical properties for variational quantum algorithms (VQAs):

Expressibility¶

How uniformly does a parameterised ansatz sample the unitary group? Measured via KL divergence between the ansatz’s fidelity distribution and the Haar-random (uniform) distribution:

expressibility ≈ exp(-KL(P_ansatz ‖ P_Haar))

Score = 1.0 means Haar-random (maximally expressive). Use assert_expressibility_above.

Barren Plateaus¶

In deep VQAs, gradients can vanish exponentially with system size, a phenomenon known as the barren plateau problem. pytest-quantum detects this via the parameter-shift rule:

∂E/∂θ = [E(θ+π/2) - E(θ-π/2)] / 2

If the variance of gradients across random initialisations is < threshold, the landscape is too flat to train. Use assert_no_barren_plateau.

Diamond Norm¶

The diamond norm measures the worst-case operational distance between two quantum channels:

‖Φ - Ψ‖◇ = max_{ρ} ‖(Φ⊗I)(ρ) - (Ψ⊗I)(ρ)‖₁

It answers: “what’s the maximum probability that any experiment can distinguish channel Φ from channel Ψ?” A diamond norm of 0.01 means channels are at most 1% distinguishable.

Computing the diamond norm exactly requires a semidefinite programme (SDP). pytest-quantum uses cvxpy when available, falling back to the spectral norm as an upper bound.

Why fixtures instead of setup code¶

Without pytest-quantum:

# In every test file... every project... forever:
from qiskit_aer import AerSimulator

@pytest.fixture(scope="session")
def simulator():
    return AerSimulator()

With pytest-quantum:

# Nothing. Just:
def test_bell(aer_simulator):
    ...

Benefits:

Session-scoped by default: one simulator per pytest session, not one per test
Auto-skip: tests are automatically skipped if the SDK isn’t installed
Framework-agnostic: same patterns work for Qiskit, Cirq, Braket, PennyLane
Composable: fixtures can be combined with multi_backend_runner for cross-framework testing