If you’re here, you’re probably interested in systematic trading, so we have that in common. Over the years, I’ve built tools for testing and trading systematic strategies, both in industry and academia. I’ve written plenty of code, but mostly for research and strategy logic. Whenever I tried to build full applications myself, I ran into the same bottlenecks: UI design, database structure, API integrations, deployment, and keeping code maintainable as complexity grows.
Recently, I’ve been experimenting with agentic AI for coding. That led to a concrete experiment:
Can an AI agent build a functional systematic trading system from scratch?
To make this a meaningful test, I’m going to focus (mostly) on trend-following using futures, but the idea is to build something general for systematic strategies.1 Using futures will also force the AI agent to deal with some technical details:
Futures data comes with structural complexities: expiries, contract multipliers, tick sizes, roll rules, and the construction of continuous series.
Trend following is simple enough to explain clearly, but rich enough to expose real system-design tradeoffs.
End-to-end automation is realistic: data ingestion, signal generation, monitoring, and broker execution can all be connected.
In other words, this isn’t a toy backtest: it’s a self-contained but non-trivial engineering problem. Throughout this series, I’ll highlight where the agent was surprisingly effective, where it struggled, where I had to intervene, and where domain knowledge proved indispensable.
This first post will focus on the data workflow (data acquisition and management).
The AI Agent Hype (or is it?)
While I’m more on the skeptical side regarding the AI hype in general (particularly when it comes to claims about cognition and AGI), the speed of improvement in AI for coding is mind-boggling. A lot of friends who are professional developers are doing very sophisticated things which make me feel like the ape in this meme. They have agents managing agents in self-improving loops that work while they, well, work even more on other things. It’s clear to me that AI agents are being used everywhere (even at Rockstar!), but I also see some criticism and risks:
It’s great for demos, but not for production.
What about debugging and code maintenance?
Governance and accountability: who is responsible when an AI agent ships bad code?
Security vulnerabilities
“AI agent made me forget how to code”
Define “Functional”
Like many people, I recently started experimenting with AI coding agents. I particularly like how I can quickly prototype simple web apps running locally on my machine to speed up recurring and time-consuming tasks. I’ve created some apps for teaching, some for research, and some for trading. It’s perfect for me because it gets me around the gaps in my coding skill set (which I have to admit, I already had little intention of fixing, even before AI agents came along).
My objective was to build a local web app that could run a systematic trading workflow with minimal hassle. The requirements I had in mind were:
Data pipeline: ingest historical contract-level data, fetch updates on demand, process continuous futures series, inspect quality/coverage.
Research workflow: generate trading signals, combine them into subsystems/strategies, and backtest across multiple instruments.
Execution plumbing: connect to broker, monitor signals/positions, stage orders, and optionally execute with safeguards.
Usable interface: browser-based UI on a local server. Based on advice from the AI agent, I used Streamlit for rapid iteration.
In short, “functional” means end-to-end: data, research, execution, and interface.
The Framework
The framework I’m building assumes that systems would trade at most on a daily basis. I have futures in mind for the moment, but it could be adapted to other asset classes like ETFs (in that case, it would look more like a tactical allocation program that should probably trade at most on a weekly or monthly basis, and the level of automation may be overkill).
The system is based on the following hierarchy2:
Parent instrument: the family of contracts for one instrument. For example, ES is the parent contract for S&P 500 futures trading on the CME.
Children instruments: associated with each parent are the individual contracts with specific maturities. For example, ESH6 is the ES contract with expiry in March 2026.
Continuous series: associated with each parent, we will build a continuous time series that “stitches together” the children contracts. This is needed because as we roll positions, there is a difference in price between the current contract that will expire soon and the next one.
Trading rule: a trading rule generates a signal to be long or short a particular instrument. Example: a time-series momentum (TSMOM) rule on ES with a lookback period of 6 months would provide a +1 or -1 signal to be long or short that instrument.
Trading subsystem: a trading subsystem combines trading rules for one parent instrument. These could be either variations of the same trading rule (e.g. TSMOM with lookbacks of 6 and 12 months on ES), or different kinds of trading rules (e.g. TSMOM with lookback of 6 months and a moving average crossover). Later, we need to make choices about how to combine the signals from different trading rules.
Strategy: a combination of several trading subsystems for multiple instruments. When we get to this stage, we will need to focus on risk management and position sizing. I’ll implement different options and discuss the tradeoffs.
How I used the AI Agent
I started from scratch and built the app in an exploratory, sequential way. This was partly because I wanted to explore how the AI agent works, but also because I knew that many details and issues would only become clear to me when I started building the app, so I didn’t write detailed specs to start with. Instead, I initially provided as many details as possible. I had some ideas about the workflows, but I iterated quite a bit with the agent and requested suggestions on some architecture decisions (data layout, DB, web interface). According to my AI agent:
You optimized for fast learning and domain correctness under evolving requirements; professional usage usually optimize for predictability, auditability, and safe integration into established codebases.
I provided some of my own existing code to the AI agent, especially for the backtesting engine, but this was mostly because I wanted it to build something that I would be familiar with.
Even with this approach, I was able to build something functional fairly quickly. I have no doubt that a professional quant developer with AI minions could build something much more sophisticated within the same time frame.
Show me the Data
The first step was to find a data provider. To build a trading/backtesting app, I wanted to find a provider with:
contract-level historical futures data,
API access for updates,
low enough cost (ideally free) for iterative development.
A friend who is a professional systematic trader recommended I take a look at databento. For daily frequency data, the startup credits when you create a new account are more than enough to download historical data for a large number of futures contracts.3 The downside is that historical data are available for a relatively short period going back to 2010 for most contracts. After setting up the databento account, generating an API key is straightforward. This will come in handy later to automate downloads.
Databento provides multiple schemas. For this system’s first milestone, the key ones are:
Definitions
Instrument metadata over time: symbol mappings, instrument IDs, activation/expiration timestamps, exchange, tick size, multiplier, and related contract fields.
This info will be needed for correct contract mapping and roll logic.OHLCV-1d
Daily open/high/low/close/volume bars at instrument level.
This is the core dataset needed for backtesting and daily signal generation.
Databento allows downloads in different formats. I chose “Databento Binary Encoding (DBN)” which seems to be the most convenient/fastest. I gave the DBN documentation to the AI agent so that it knew how to work with this format.
Tickers
In this series, I’ll use the following contracts for illustration purposes:
There’s nothing special about these specific contracts. I chose them because they cover different asset classes and illustrate the complexities of futures data (eg the contracts have different maturities, different contract multipliers etc).
Data Workflow
The app supports two operational paths for data workflow:
Download data from API and ingest
Ingest previously downloaded local files
In practice, local ingest is often preferable for larger definition pulls, because databento takes a few minutes to prepare downloads of definitions files. In the future, I plan to automate this so I never have to touch it unless I want to add a new ticker.
Pipeline Design
Raw files are stored unchanged in a dedicated folder (data/raw/).
Ingestion parses and validates raw files.
Cleaned contract-level bars are written to curated Parquet datasets.
SQLite stores metadata such as:
parent-to-contract mappings,
data coverage windows,
parent-level contract specs (exchange, currency, tick size, multiplier, roll settings, adjustment method).
This raw/curated/metadata separation keeps updates reproducible and debugging manageable.
Creating Continuous Time Series
For backtesting and signal generation, we need to create a continuous time series that “stitches” together data from different maturities. There are several details that affect how these series are created4:
Contract listing structure
Example: ES is listed quarterly (H/M/U/Z), while other markets may list monthly.
Roll rule
The rule for when to roll positions as contracts approach expiry. Examples:
roll N days before expiry.
roll on liquidity trigger (e.g., next contract overtakes front contract in volume/open interest).
Adjustment method
This avoids artificial jumps at roll dates. Typical choices are:
Ratio adjustment: multiplicative scaling across rolls.
Additive adjustment: constant offsets across rolls.
The app infers available contract months from ingested metadata and lets the user define and save rolling rules per parent contract. Then, it builds or updates continuous series from those definitions, which the user can inspect on charts/tables before using them in backtests.
Different methods to create continuous series come with different tradeoffs. I chose to implement a methodology that always adjusts prices backwards. This means that the current price for the front contract matches the continuous series. However, whenever a new roll enters the dataset, we need to reprocess the entire series. This takes seconds so it’s totally acceptable for me. It can also be optimized by storing adjustment factors (which currently I don’t see a need to do).
Hiccups and Bugs
When I started building the app, I wasn’t quite sure what to expect. This is a breakdown of my experience in this part of the experiment:
Where AI Was Strong
AI agent trivially set up the local database and created worflows to store curated data in parquet format.
Likewise, the web app looked decent on the first iteration.
Implementing on-demand data updates using the databento API worked straightaway, with a few bugs on date ranges that were easily fixed.
The speed to prototype and test new features and make changes in the UI is incredible.
Where AI Struggled
UI behavior was occasionally flaky. A recurring issue was certain parts of a view being hidden because the AI agent placed the block in the wrong place.
AI tended to infer structure from ingested data that caused noisy metadata because the structure is not homogeneous for all contracts.
Occasionally, asking the agent to fix one bug (example symbol normalization) made something else break (although this is pretty common when a human is writing code?)
Human Intervention
I had to intervene heavily in a few cases:
The AI agent inferred contract roll dates incorrectly in a few cases because it was overgeneralizing based on inference about one of the contracts. This caused gaps in the continuous series for some contracts.
The first implementation of the contract stitching logic was incorrect. The error was not obvious nor easy to detect.
Some Lessons
AI excels at structured “plumbing” tasks.
It struggles with heterogeneous domain structures.
Ambiguous specifications lead to brittle implementations. This was particularly relevant for UI-related requests. I learned that I need to be very precise when describing the behavior I wanted.
What the app looks like at this stage
In the video below, I go over the data management part of the current version of the app.
In the Next Post
Systematic trading requires good quality data. This initial step gave me a decent starting point. In the next post in this series, I’ll move from data plumbing to signal generation:
Defining trading rules
Combining rules into subsystems
Building strategies combining different instruments
Building Along
If you want to build it along, these specs can be given to any AI agent.
The hierarchy for trading rules, subsystems and strategies was heavily inspired by this book by Robert Carver. The rest diverges because I mostly focus on binary trading signals, while his approach is based on continuous forecasts. On the site for his book, there’s a link to a python project that implements the systematic trading framework exactly as in his book.
I have no affiliation with databento.


