Concepts

Understanding why pytest-quantum makes certain design choices will help you write better quantum tests. This page explains the physics and statistics behind the library.


Why shot noise is a first-class problem

Every quantum measurement is fundamentally random. Even a perfect Bell circuit can produce {"00": 487, "11": 537} when the true probabilities are 50/50. This isn’t a bug; it’s quantum mechanics.

Classical test frameworks don’t know this. assert counts["00"] == 512 fails ~5% of the time on a perfectly correct circuit simply because of Poisson sampling fluctuations. This creates ghost failures in CI that waste hours of debugging time.

pytest-quantum’s measurement assertions use chi-square goodness-of-fit testing:

H₀: the measured distribution matches the expected probabilities
Reject H₀ only if p-value < significance threshold (default: 0.01)

With 1024 shots and significance=0.01, you’ll get a false positive ~1% of the time on a correct circuit, not 5-50% like with hard-coded counts.

Rule of thumb: If your quantum test flakes in CI, it probably needs more shots or a statistical assertion. Use min_shots(epsilon=0.05) to compute the minimum shots needed to detect a 5% deviation with 99% confidence.


Why global phase is not a bug

Two quantum gates are physically identical if they differ only by a global phase:

U₂ = e^(iθ) · U₁   for any real θ

The circuit H and e^(iπ/4) · H are completely indistinguishable by any quantum measurement. Yet np.allclose(U1, U2) returns False for them.

assert_unitary computes the physical distance:

distance = 1 - |Tr(U₁† U₂)| / d

This is zero if and only if U₁ and U₂ are equal up to global phase. You can opt into strict phase comparison with allow_global_phase=False when you specifically need to verify a global phase (rare, but sometimes relevant for concatenating circuits).


How fidelity is measured

The state fidelity between two pure states is:

F(|ψ⟩, |φ⟩) = |⟨ψ|φ⟩|²

This ranges from 0 (orthogonal) to 1 (identical up to global phase). A fidelity of 0.99 means the states are 99% similar. In practice, fidelities above 0.999 are considered “high fidelity” for near-term quantum devices.

For mixed states (density matrices), the Uhlmann fidelity generalises this:

F(ρ, σ) = (Tr √(√ρ σ √ρ))²

The trace distance is a complementary metric:

T(ρ, σ) = ½ ‖ρ - σ‖₁

Both satisfy 0 ≤ F ≤ 1 and 0 ≤ T ≤ 1, with T = 0 and F = 1 meaning identical states. They’re related by: 1 - √F T √(1 - F).


Statistical distance between distributions

When comparing measurement distributions, you have several options depending on what you care about:

Total Variation Distance (TVD)

TVD(P, Q) = ½ Σₓ |P(x) - Q(x)|

TVD is the operational distance: it’s the maximum probability difference any event can have between the two distributions. TVD = 0 means identical, TVD = 1 means completely different.

Use assert_counts_close(counts_a, counts_b, max_tvd=0.05).

Hellinger Distance

H(P, Q) = (1/√2) √(Σₓ (√P(x) - √Q(x))²)

More sensitive to small probabilities than TVD. Good for detecting rare outcome deviations. Use assert_hellinger_close.

KL Divergence

KL(P ‖ Q) = Σₓ P(x) log(P(x)/Q(x))

Asymmetric. Measures how much information is lost when using Q instead of P. Useful for distribution comparison in variational algorithms.


Quantum Volume

IBM’s Quantum Volume (QV) protocol is the gold standard for characterising the overall capability of a quantum processor.

A backend has QV = 2^n if it can execute “model circuits” of width n × depth n using random SU(4) two-qubit unitaries, and achieve a heavy output probability (HOP) > 2/3 with statistical confidence:

HOP = fraction of measured bitstrings that are "heavy"
"heavy" = ideal probability above the median

The intuition: a perfect backend reproduces the ideal distribution exactly, so HOP = expected fraction of outcomes above median ≈ 2/3 + ε. Noise makes the distribution uniform, pushing HOP toward 0.5.


Randomised Benchmarking

RB measures the average Clifford gate error rate by fitting an exponential decay to survival probability vs sequence length:

P(m) = A · p^m + B

where p is the depolarising parameter and the average gate fidelity is:

F = 1 - (1-p)/d

for dimension d = 2 (single qubit). The key insight: RB is robust to state preparation and measurement (SPAM) errors because A and B absorb them.

Interleaved RB (IRB) isolates the error of a specific gate G by interleaving G between random Cliffords. The gate fidelity is:

F_G = 1 - (d-1)/d · (1 - p_G/p_ref)

Cross-Entropy Benchmarking (XEB)

XEB is Google’s protocol for benchmarking (and demonstrating quantum advantage). For random circuits, the XEB fidelity is:

F_XEB = D · ⟨P_ideal(x)⟩_measured - 1

where D = 2^n and the average is over measured bitstrings. For an ideal noiseless backend, this equals 1.0 in expectation. Noise drives it toward 0.0.

XEB is more sensitive than QV for large circuits and is used in Google’s “quantum supremacy” and “quantum advantage” demonstrations.


T1, T2, and T2* Coherence Times

These are the fundamental noise parameters for any quantum device:

Parameter

Physical meaning

Measurement

Typical value

T1

Energy relaxation: excited state decays to ground state

assert_t1_above

50–200 µs

T2

Dephasing with echo: Hahn echo sequence

assert_t2_above

30–100 µs

T2*

Dephasing without echo: free induction decay (Ramsey)

assert_t2star_above

10–50 µs

All three use exponential decay fits: P(t) = exp(-t/T). Note T2 ≤ 2·T1 always (the Hahn echo refocuses static noise, T2 > T2* typically).


Expressibility and Barren Plateaus

Two critical properties for variational quantum algorithms (VQAs):

Expressibility

How uniformly does a parameterised ansatz sample the unitary group? Measured via KL divergence between the ansatz’s fidelity distribution and the Haar-random (uniform) distribution:

expressibility ≈ exp(-KL(P_ansatz ‖ P_Haar))

Score = 1.0 means Haar-random (maximally expressive). Use assert_expressibility_above.

Barren Plateaus

In deep VQAs, gradients can vanish exponentially with system size, a phenomenon known as the barren plateau problem. pytest-quantum detects this via the parameter-shift rule:

∂E/∂θ = [E(θ+π/2) - E(θ-π/2)] / 2

If the variance of gradients across random initialisations is < threshold, the landscape is too flat to train. Use assert_no_barren_plateau.


Diamond Norm

The diamond norm measures the worst-case operational distance between two quantum channels:

‖Φ - Ψ‖◇ = max_{ρ} ‖(Φ⊗I)(ρ) - (Ψ⊗I)(ρ)‖₁

It answers: “what’s the maximum probability that any experiment can distinguish channel Φ from channel Ψ?” A diamond norm of 0.01 means channels are at most 1% distinguishable.

Computing the diamond norm exactly requires a semidefinite programme (SDP). pytest-quantum uses cvxpy when available, falling back to the spectral norm as an upper bound.


Why fixtures instead of setup code

Without pytest-quantum:

# In every test file... every project... forever:
from qiskit_aer import AerSimulator

@pytest.fixture(scope="session")
def simulator():
    return AerSimulator()

With pytest-quantum:

# Nothing. Just:
def test_bell(aer_simulator):
    ...

Benefits:

  • Session-scoped by default: one simulator per pytest session, not one per test

  • Auto-skip: tests are automatically skipped if the SDK isn’t installed

  • Framework-agnostic: same patterns work for Qiskit, Cirq, Braket, PennyLane

  • Composable: fixtures can be combined with multi_backend_runner for cross-framework testing