Writeup Highlights
I built the system as an argument for thoughtful agentic design: use language models where judgment, creativity, and critique are valuable, then force their outputs through deterministic, statistically pre-registered gates.
I built the system as an argument for thoughtful agentic design: use language models where judgment, creativity, and critique are valuable, then force their outputs through deterministic, statistically pre-registered gates.
The system treats LLM output as untrusted research input. Typed artifacts, schema validation, deterministic gates, and replayable result files keep persuasive prose from becoming executable truth.
Agents communicate in structured decision records: what they saw, what they chose, why they chose it, and what downstream measurement changed. That makes review possible after the run.
Fundamentals are keyed by availability, training windows are separated from holdout and OOS windows, and the signal proposal layer does not get to inspect future returns.
The writeup foregrounds survivorship bias, selection effects, null-model choice, multiple testing, benchmark choice, and the limits of a short bull-market OOS window.
The system did not beat the equal-weight or cap-weighted universe benchmark over 2013-2015. The strongest outcome is methodological: the creative agent proposed ideas, the critic and validators applied real pressure, and the final report preserves the negative result instead of converting it into a marketing claim.