Random Returns Die Hard

The efficient market hypothesis is often defended in a familiar shorthand: returns are random, structureless, noise all the way down.

That claim has a consequence.

In cryptography, randomness is not a metaphor. It is tested. A sequence earns the label only if it survives batteries designed to detect structure, dependence, and distributional irregularity. The standard was set by George Marsaglia in 1995, when he published the Diehard battery — a suite of statistical tests that became the benchmark for evaluating random number generators. Its successor, Robert G. Brown’s Dieharder (2003–), extends the battery and remains in wide use. The distinction matters because weak entropy is not an academic flaw. It gets broken.

Apply that standard to the S&P 500 and the language of randomness begins to fall apart.


We took 31,456,311 one-second closing prices for SPY — the SPDR S&P 500 ETF, and the most liquid equity instrument in the world — spanning May 2018 to April 2026. We computed log returns between consecutive seconds within each trading day, discarding overnight gaps. We then applied the empirical cumulative distribution function — a rank transform — to map each return to a uniform value in [0, 1], scaled to the byte range 0–255. This encoding is distribution-free: it does not assume normality or any parametric form. The resulting byte sequence preserves the original time ordering of returns while guaranteeing a perfectly uniform marginal distribution.

We fed the byte stream to the Diehard battery, run through Dieharder 3.31.1.

As a control, we randomly shuffled the same bytes — destroying time ordering while preserving every other statistical property — and ran the battery again. If failures are caused by the encoding or the distribution, the shuffled control should fail equally. If failures are caused by temporal structure in the sequence, the control should pass.

Test SPY Returns Shuffled Control
Birthday Spacings FAILED PASSED (.291)
Overlapping Permutations FAILED FAILED
Binary Rank (32×32) FAILED PASSED (.102)
Binary Rank (6×8) FAILED PASSED (.283)
Bitstream FAILED PASSED (.257)
OPSO FAILED FAILED
OQSO FAILED PASSED (.100)
DNA FAILED PASSED (.073)
Count the 1s (stream) FAILED PASSED (.120)
Count the 1s (byte) FAILED PASSED (.931)
Parking Lot FAILED PASSED (.851)
Minimum Distance (2D) FAILED PASSED (.416)
Minimum Distance (3D) FAILED PASSED (.257)
Squeeze FAILED FAILED
Overlapping Sums FAILED PASSED (.024)
Runs FAILED PASSED (.011)
Runs (count) WEAK (.000005) PASSED (.994)
Craps (wins) FAILED WEAK (.002)
Craps (throws) FAILED PASSED (.008)
Total 18 failed, 1 weak 14 passed, 1 weak, 4 failed

Marsaglia’s Diehard battery (1995), run through Dieharder 3.31.1. Source: 31,456,311 one-second log returns for SPY, regular trading hours, May 2018–April 2026. Data: Databento OHLCV-1s. Encoding: empirical CDF rank transform to bytes [0, 255]. The shuffled control uses the identical bytes in random order, isolating temporal structure as the cause of failure.

The ordered returns failed every test. The shuffled control — same bytes, same distribution, time ordering destroyed — passed fourteen of nineteen. The few shared failures (Overlapping Permutations, OPSO, Squeeze) are attributable to residual file-rewind effects; they appear in both runs and tell us nothing about the returns themselves.

What the divergence tells us is specific. The non-randomness is not in the shape of the distribution. The rank transform guarantees uniform marginals by construction. The non-randomness is in the sequence — in the temporal dependence between consecutive returns. Volatility clusters. Autocorrelation persists. Structure survives at the one-second level and would only deepen at the daily level where most investors operate.

A note on resolution. The Diehard battery was built to test random number generators, which produce unlimited streams. Daily returns from a single index — the natural unit for the claim being tested — yield roughly 24,000 observations over a century. That is four orders of magnitude short of what the test suite requires. We use one-second bars to reach sufficient volume. This is a concession to the tool, not to the argument. Higher-frequency data is more forgiving to the efficient market hypothesis: microstructure effects that vanish at the daily level are still present at one-second resolution, giving the “random” claim every advantage. The sequence still fails.

None of this proves easy profit. Structured is not the same as exploitable. But it is not the same as random, either. And once that distinction is admitted, the stronger rhetoric — the casual invocation of randomness as a first principle — is dead.

So the burden shifts. If you insist that market returns are truly random, the implications should not trouble you. Your cryptographic standards are clear. Your confidence should be easy.


S&P 500 returns make a fine entropy pool. Encrypt something. We’ll wait.

All writing