Writeup Highlights

The writeup documents the methodology, architecture decisions, and results. It covers PIT enforcement, signal validation, portfolio construction, risk controls, OOS protocol, and known limitations.

See full writeup

Correctness guardrails

The system treats LLM output as untrusted research input. Typed artifacts, schema validation, deterministic gates, and replayable result files keep persuasive prose from becoming executable truth.

Agent-first communication

Agents communicate in structured decision records: what they saw, what they chose, why they chose it, and what downstream measurement changed. That makes review possible after the run.

Point-in-time discipline

Fundamentals are keyed by availability, training windows are separated from holdout and OOS windows, and the signal proposal layer does not get to inspect future returns.

Pitfall-aware quant methodology

The writeup foregrounds survivorship bias, selection effects, null-model choice, multiple testing, benchmark choice, and the limits of a short bull-market OOS window.

What the system demonstrates

Signal rejection is load-bearing. In the live run, three of five rejected candidates failed at DSL arity validation before reaching any statistical test, and two failed the coverage screen. The null tests (signal-shuffle, block-bootstrap) are a later gate; early structural checks do significant filtering.
Benchmark choice affects interpretation. The equal-weight and cap-weight benchmarks are full-net-long over a 2013-2015 bull-market window. The strategy includes a dollar-neutral sleeve and runs at lower gross exposure, so return underperformance versus the benchmarks reflects different net-exposure profiles, not alpha failure.
LLM output should feed deterministic validation, not bypass it. The architecture keeps LLMs in proposal and critique roles; typed schemas, cross-field Python validation, and pre-registered gates determine what reaches the portfolio.

OOS summary

The strategy did not outperform either universe benchmark on absolute return over 2013-2015 (system +15.4% vs. EW +58.3% / cap-weight +61.3%). On a risk-adjusted basis the strategy (Sharpe 0.751) trails the EW (1.092) and cap-weight (1.242) benchmarks, with roughly half the benchmark maximum drawdown (-7.8% vs. -15.1% / -12.8%). Returns are concentrated in 2013 (Sharpe 1.99); 2014 and 2015 were near-flat. These numbers are reported as-is from a single pre-registered OOS run.