Nikhil Adithyan - freeCodeCamp.org

How to Find Stock-Specific Moves in the S&P 500 with Python

Nikhil Adithyan — Wed, 24 Jun 2026 16:55:46 +0000

On June 12, 2026, SPY closed up 0.54%. EchoStar (SATS) dropped 11%. Lennar (LEN) dropped 4.9%. Most of the other 500 stocks in the index barely moved beyond what SPY’s own gain would predict.

That gap is the entire premise of this article. Every stock has a normal relationship to the market: how much it tends to rise when SPY rises, how much it tends to fall when SPY falls.

Once you know that relationship, you can calculate what a stock should have done on any given day and compare it to what it actually did. Most days, for most stocks, there’s almost nothing left over. Some days, for a handful of stocks, there’s a lot left over, and that’s where the real story is.

This article builds a Python scanner that runs that comparison across the entire S&P 500 every day, flags the stocks with the largest gap between expected and actual return, and checks whether news, volume, or sector activity explains what happened.

Prerequisites
Setting Up: Importing Packages
Building the S&P 500 Universe
Fetching Prices, Volume, and Daily Returns
- Calculating Daily Returns
Estimating Rolling Beta and Alpha
Computing the Residual Return
Scoring the Residual With a Drift-Corrected Z-Score
Adding Multi-Day Confirmation
Confirming With Volume
Building the Alpha Investigation Queue
Checking the Story Against the News
Visualizing the Abnormal Movers
Conclusions and Ideas for Next Steps

Prerequisites

To follow along, you should be comfortable with basic Python, pandas DataFrames, loops, functions, and simple plotting with matplotlib.

You’ll also need:

Python 3.9 or later
An EODHD API key
The following Python libraries: requests, pandas, numpy, matplotlib, and statsmodels
Basic familiarity with daily returns, beta, alpha, volume, z-scores, and stock tickers

You don't need advanced quantitative finance knowledge. The goal is to build a practical scanner that separates market-driven moves from stock-specific moves, then checks whether volume and news help explain the abnormal return.

Setting Up: Importing Packages

import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS

plt.style.use('ggplot')

requests and pandas handle the API calls and all the data wrangling. RollingOLS from statsmodels runs the rolling regression that estimates each stock's beta and alpha against SPY, which is the core of the scanner. ggplot gives the charts a cleaner look than matplotlib's default.

Building the S&P 500 Universe

The scanner needs a current list of S&P 500 tickers and their sectors. EODHD’s fundamentals endpoint for the index returns this directly.

api_key = 'eodhd api key'

url = f'https://eodhd.com/api/fundamentals/GSPC.INDX?api_token={api_key}&fmt=json&filter=Components'
r = requests.get(url)
components = r.json()

universe = pd.DataFrame(components).T[['Code', 'Sector']].rename(columns={
    'Code': 'ticker',
    'Sector': 'sector'
}).reset_index(drop=True)

tickers = universe['ticker'].tolist()

print(f'universe size: {len(universe)}')
print(universe['sector'].value_counts())

Output:

universe size: 503
sector
Technology                83
Industrials               75
Financial Services        70
Healthcare                59
Consumer Cyclical         54
Consumer Defensive        35
Utilities                 31
Real Estate               31
Communication Services    24
Energy                    21
Basic Materials           20
Name: count, dtype: int64

503 tickers, because the S&P 500 includes a handful of dual-class share structures. Technology and Industrials make up nearly a third of the index between them, which matters later when a cluster of moves shows up concentrated in one sector.

SPY is fetched separately in the next step and never enters this list. It’s the benchmark, not a candidate.

Fetching Prices, Volume, and Daily Returns

The regression needs a full year of price and volume history for every ticker in the universe, plus SPY as the benchmark. This historical data can be fetched using EODHD's historical EOD endpoint.

end_date = pd.Timestamp.today().strftime('%Y-%m-%d')
start_date = (pd.Timestamp.today() - pd.Timedelta(days=365)).strftime('%Y-%m-%d')

def fetch_ohlcv(ticker, start, end):
    url = f'https://eodhd.com/api/eod/{ticker}.US?from={start}&to={end}&api_token={api_key}&fmt=json'
    r = requests.get(url)
    data = r.json()
    df = pd.DataFrame(data)[['date', 'adjusted_close', 'volume']]
    df['date'] = pd.to_datetime(df['date'])
    df = df.set_index('date')
    df.columns = [ticker, f'{ticker}_vol']
    return df

all_prices = {}
all_volumes = {}

for ticker in tickers + ['SPY']:
    try:
        result = fetch_ohlcv(ticker, start_date, end_date)
        all_prices[ticker] = result[ticker]
        all_volumes[ticker] = result[f'{ticker}_vol']
        print(f'{ticker} DONE')
    except:
        print(f'{ticker} ERROR')

prices = pd.DataFrame(all_prices)
volumes = pd.DataFrame(all_volumes)

Two wide dataframes come out of the loop, one for price and one for volume, both indexed by date with a column per ticker. 251 trading days across exactly one year, 504 columns because the 503 S&P 500 tickers plus SPY all came back successfully.

Calculating Daily Returns

Adjusted close converts directly into daily percentage returns, which is what the regression actually runs on, not raw price.

prices = prices.sort_index()
volumes = volumes.sort_index()

prices = prices.apply(pd.to_numeric, errors='coerce')
volumes = volumes.apply(pd.to_numeric, errors='coerce')

prices = prices.ffill(limit=3)

returns = prices.pct_change(fill_method=None)
returns = returns.iloc[1:]

missing_pct = returns.isna().mean()

valid_tickers = missing_pct[missing_pct <= 0.10].index.tolist()

if 'SPY' not in valid_tickers:
    valid_tickers.append('SPY')

returns = returns[valid_tickers]
volumes = volumes[valid_tickers]

spy_returns = returns['SPY']
stock_returns = returns.drop(columns=['SPY'])

stock_returns.head()

A couple of tickers, including Q, came back with NaN prices on certain days. This is the kind of one-off gap a 503-ticker pull is bound to hit.

Forward-filling that gap on the price itself, capped at three trading days, is what ffill(limit=3) does before the percentage change is taken. So the return calculated from it reflects an actual assumption: no new price, no change, instead of a fabricated number from filling the return directly.

Anything with a gap longer than three days still shows up as NaN in returns and gets dropped by the 10% missing threshold rather than patched.

fill_method=None on pct_change matters too, since pandas would otherwise forward-fill before differencing on its own, which is the exact shortcut this fix avoids. Two tickers came out as 501 instead of 503 after the filter, both falling above the missing threshold.

Estimating Rolling Beta and Alpha

Every stock has a normal sensitivity to SPY: how much it tends to move when the market moves. Beta captures that sensitivity, and a 60-day rolling window gives a stable estimate without overreacting to any single day. RollingOLS runs that regression for every ticker in one pass.

window = 60

rolling_beta = pd.DataFrame(np.nan, index=stock_returns.index, columns=stock_returns.columns)
rolling_alpha = pd.DataFrame(np.nan, index=stock_returns.index, columns=stock_returns.columns)

spy_with_const = sm.add_constant(spy_returns)

for ticker in stock_returns.columns:
    model = RollingOLS(stock_returns[ticker], spy_with_const, window=window).fit()
    rolling_beta[ticker] = model.params['SPY']
    rolling_alpha[ticker] = model.params['const']

print(f'beta estimated for: {rolling_beta.notna().any().sum()} tickers')
print(f'date range with estimates: {rolling_beta.dropna(how="all").index[0].date()} to {rolling_beta.dropna(how="all").index[-1].date()}')

sm.add_constant adds the intercept term to SPY's return series so the regression solves for both alpha and beta together. model.params['SPY'] is the beta, model.params['const'] is the alpha, pulled straight out of the fitted model for every ticker in the loop.

PGR's beta sitting around -0.42 to -0.53 in early June stands out immediately, an insurance name moving consistently opposite to the market over that stretch, while CSX holds steady near 0.43 to 0.49, a much more textbook beta for an industrial name.

Computing the Residual Return

Beta and alpha describe what a stock should have done given how SPY moved. Subtracting that expected return from what the stock actually did leaves the residual, the part of the move that has nothing to do with the market.

Using today’s beta to judge today’s move would let the move influence the very benchmark it’s being measured against, so both get shifted back a day first.

beta_shifted = rolling_beta.shift(1)
alpha_shifted = rolling_alpha.shift(1)

spy_aligned = spy_returns.reindex(stock_returns.index)

expected_returns = alpha_shifted.add(beta_shifted.multiply(spy_aligned, axis=0))
residuals = stock_returns - expected_returns

expected_returns is yesterday's alpha plus yesterday's beta times today's SPY return, the prediction a stock's normal market relationship would have made. residuals is the actual return minus that prediction.

Most of these numbers sit in a narrow band, a few tenths of a percent either way. This is exactly what a market-driven move looks like once the market's own contribution has been removed.

Scoring the Residual With a Drift-Corrected Z-Score

A residual of 0.03 means nothing on its own. Some stocks routinely have noisier idiosyncratic moves than others, so the same residual needs to be judged against that stock’s own recent history, not a fixed threshold applied across all the names.

window_z = 20

resid_mean = residuals.shift(1).rolling(window_z).mean()
resid_std = residuals.shift(1).rolling(window_z).std()

zscore = (residuals - resid_mean) / resid_std

zscore.tail()

The rolling mean is in there deliberately, not just the rolling standard deviation. Some stocks carry a small persistent drift in their residuals, a slight tendency to run a touch above or below zero over any given stretch, and scoring against that drift rather than against zero keeps the z-score honest about what’s actually unusual for that specific stock.

Both the mean and the standard deviation get shifted by a day for the same reason beta and alpha did: today’s score can’t be built from a distribution that includes today’s own value.

AFL’s -2.31 on June 8 and SOLV’s -2.31 the same day already clear the threshold worth paying attention to (two standard deviations below their own recent norm), while SOLV swings to +2.40 the very next day.

Adding Multi-Day Confirmation

A single day’s z-score can be noise, one stray print that happens to land outside the normal range. Compounding the residual over the trailing 3 and 5 days checks whether the move actually held.

residuals_3d = (1 + residuals).rolling(3).apply(np.prod, raw=True) - 1
residuals_5d = (1 + residuals).rolling(5).apply(np.prod, raw=True) - 1

print('3-day compounded residuals (last 5 rows, first 5 tickers):')
print(residuals_3d.iloc[-5:, :5].round(4))
print('\n5-day compounded residuals (last 5 rows, first 5 tickers):')
print(residuals_5d.iloc[-5:, :5].round(4))

Compounding rather than summing matters here because residual returns multiply through time the same way regular returns do.

AIZ's residual climbs from 2.4% over 3 days to a near-flat 0.6% over 5, which means most of that move was concentrated in the most recent stretch and the earlier days were closer to neutral. MNST shows the opposite shape: a steady build from 2% to 4.4% to 5.8% across the three windows in the days leading into June 11, a sustained drift rather than a single spike.

Confirming With Volume

A large residual return on ordinary trading volume is easier to dismiss than the same move with twice the usual number of shares changing hands. Volume is the check on whether the move had real participation behind it.

vol_mean = volumes.shift(1).rolling(20).mean()
volume_ratio = volumes / vol_mean

volume_ratio = volume_ratio.drop(columns=['SPY'], errors='ignore')

print('volume ratios (last 5 rows, first 5 tickers):')
print(volume_ratio.iloc[-5:, :5].round(2))

The 20-day average volume used as the denominator is shifted by a day for the same reason every other rolling statistic in this scanner is: today's elevated volume shouldn't be allowed to inflate the baseline it's being measured against.

None of these five tickers cross 1.5x on these particular days, which is the threshold that turns a volume reading into a meaningful confirmation rather than ordinary day-to-day variation. A ratio above 1.5 paired with a z-score outside 2 standard deviations is a stronger candidate than either signal showing up alone.

Building the Alpha Investigation Queue

Every piece built so far points at the same trading day. Pulling the most recent row out of each one and joining them by ticker turns five separate dataframes into the single table the whole scanner exists to produce.

scan_date = stock_returns.index[-1]

queue = pd.DataFrame({
    'sector': universe.set_index('ticker')['sector'],
    'actual_return': stock_returns.loc[scan_date],
    'spy_return': spy_returns.loc[scan_date],
    'beta': beta_shifted.loc[scan_date],
    'expected_return': expected_returns.loc[scan_date],
    'residual': residuals.loc[scan_date],
    'zscore': zscore.loc[scan_date],
    'residual_3d': residuals_3d.loc[scan_date],
    'residual_5d': residuals_5d.loc[scan_date],
    'volume_ratio': volume_ratio.loc[scan_date]
})

queue = queue.dropna()
queue = queue.reindex(queue['zscore'].abs().sort_values(ascending=False).index)
queue['high_confidence'] = (queue['zscore'].abs() > 2.0) & (queue['volume_ratio'] > 1.5)

queue.head(10)

A few names stand out for different reasons.

SATS, the volume outlier: Down almost 11% while SPY was up half a percent. A beta of 1.55 would have called for a small gain, not a double-digit drop, and the residual lands near -12%. Volume ran more than six times its 20-day average, the highest ratio in the table.

LEN, the extreme score: A z-score of -3.9, the single most negative number anywhere in the queue. Beta of 1.45 predicted a modest gain on a day SPY was up. The stock fell almost 5% instead.

MOS and ALB, a possible shared story: Both Basic Materials, both positive, both backed by elevated volume, sitting back to back in the ranking. Worth checking for a common catalyst before treating either one as an independent idiosyncratic move.

TKO, a flag with a catch: Clears the high-confidence bar on the numbers alone, but the ticker maps to two different companies depending on the source, TKO Group Holdings and Tikehau Capital. That collision turns into a real problem once the news search runs.

Checking the Story Against the News

A z-score only says a move was unusual, not why it happened. Pulling recent headlines for the flagged names is the only way to find out whether there’s an actual story behind the number. We’ll fetch the news data using EODHD’s financial news endpoint.

def fetch_news(ticker, start, end):
    url = f'https://eodhd.com/api/news?s={ticker}.US&from={start}&to={end}&limit=3&api_token={api_key}&fmt=json'
    r = requests.get(url)
    data = r.json()
    return [item['title'] for item in data[:3]]

news_start = (scan_date - pd.Timedelta(days=3)).strftime('%Y-%m-%d')
news_end = scan_date.strftime('%Y-%m-%d')

high_conf = queue[queue['high_confidence']].head(10)
remaining = queue[~queue['high_confidence']].head(max(0, 10 - len(high_conf)))
news_candidates = pd.concat([high_conf, remaining])

news_results = {}
for ticker in news_candidates.index:
    headlines = fetch_news(ticker, news_start, news_end)
    news_results[ticker] = headlines
    print(f'\n{ticker}:')
    if headlines:
        for h in headlines:
            print(f'  - {h}')
    else:
        print('  no news found')

Output:

LEN:
  - Lennar Corp (LEN) Q2 2026 Earnings Call Highlights: Strong Margins and Strategic Adjustments ...
  - Why Lennar (LEN) Stock Is Down Today
  - Update: Equities Rise as SpaceX Soars; Wall Street Logs Weekly Gain Amid Iran Deal Optimism

SATS:
  - Stocks Rally on Hopes for a Near-term US-Iran Interim Peace Agreement
  - Stock Market Today, June 12: EchoStar Falls as SpaceX-Linked Rally Meets DISH DBS Payment Risk
  - Why EchoStar (SATS) Stock Is Falling Today

MOS:
  - S&P 500 Movers: KLAC, MOS
  - Top 10 most oversold S&P 500 stocks
  - Mosaic (MOS) Down 5% Since Last Earnings Report: Can It Rebound?

ALB:
  - DuPont Achieves Renewable Power Milestone in US Healthcare Sites
  - S&P 500 Movers: KLAC, MOS
  - ATI and BWX Technologies Extend Strategic Partnership Through 2030

TKO:
  - Tikehau Capital: Disclosure of Shares Repurchases from 05 June 2026 to 11 June 2026
  - Here's Why We're Wary Of Buying TKO Group Holdings' (NYSE:TKO) For Its Upcoming Dividend
  - Tikehau Capital: Extension of the Share Repurchase Mandate

FOX:
  - Fox Could Unlock 800+ World Cup Ad Spots
  - World Cup Economics: How Much Boost Could The US Get?
  - Why Is Fox (FOXA) Up 3.3% Since Last Earnings Report?

DPZ:
  - Is Domino's (DPZ) Valuation Reset Revealing a Deeper Shift in Its Competitive Moat?
  - Domino's Pizza (DPZ) Stock Valuation Check After Mixed Recent Performance
  - Is Domino's Pizza, Inc. (DPZ) A Good Stock To Buy Now?

CTAS:
  - UniFirst Shareholders Approve Transaction with Cintas
  - Cintas Stock Bears Are Overlooking This Profit Engine

CTVA:
  - Corteva sees higher restructuring charges, plans to cease production at Spanish site
  - Zacks Industry Outlook Highlights Corteva, Archer Daniels Midland, The Scotts, Miracle-Gro, Adecoagro and Mission Produce
  - 5 Agriculture Operations Stocks to Benefit From Innovation-Driven Growth

TSN:
  - Tyson Foods Installs New COO As Beef Woes And Valuation Discount Persist
  - Why JBS Is Closing Plants Even as Beef Prices Hit Records
  - Tyson Foods, Inc. (TSN) is Attracting Investor Attention: Here is What You Should Know

The high-confidence stocks get pulled first, with the remaining slots filled by the next highest z-scores if fewer than 10 clear that bar. This is why DPZ, CTAS, CTVA, and TSN show up here despite not carrying the flag.

SATS holds up. A direct headline ties the drop to DISH DBS payment risk, surfacing on the exact day the residual shows up and lining up with the volume spike.

LEN holds up, too. "Why Lennar (LEN) Stock Is Down Today" is about as direct a confirmation as a headline gets, backed by a Q2 earnings call reference that explains why the market would be repricing the stock specifically.

TKO breaks. Every headline returned is about Tikehau Capital, a French asset manager that happens to share the same ticker as TKO Group Holdings on a different exchange. The high-confidence flag fired correctly. The news search picked the wrong company entirely.

MOS and ALB stay unexplained. ALB's headlines are about DuPont and a defense partnership, neither relevant. MOS gets a real mention in passing, a "down 5% since last earnings" line, but nothing that explains a same-day move. The shared-catalyst theory from the queue doesn't get resolved here either way.

Visualizing the Abnormal Movers

Actual vs Expected Return

A stock’s actual return only earns attention here if it breaks away from what beta alone would have predicted. A scatter against the expected return is the fastest way to see which ones did.

fig, ax = plt.subplots(figsize=(9, 7))

sector_list = queue['sector'].unique()
colors = plt.cm.tab20(np.linspace(0, 1, len(sector_list)))
sector_colors = dict(zip(sector_list, colors))

for sector in sector_list:
    subset = queue[queue['sector'] == sector]
    ax.scatter(
        subset['expected_return'], subset['actual_return'],
        s=subset['volume_ratio'] * 40,
        color=sector_colors[sector],
        label=sector, alpha=0.7, edgecolors='black', linewidth=0.5
    )

lims = [queue[['expected_return', 'actual_return']].min().min(),
        queue[['expected_return', 'actual_return']].max().max()]
ax.plot(lims, lims, color='black', linestyle='--', linewidth=1)

ax.set_xlabel('expected return (beta-adjusted)')
ax.set_ylabel('actual return')
ax.set_title(f'Actual vs Expected Return - {scan_date.date()}')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)
plt.tight_layout()
plt.show()

Most of the 487 points crowd close to the dashed line, sitting in the narrow band near zero where actual and expected return roughly agree. This is what the bulk of any given day’s trading actually looks like once beta is accounted for.

SATS sits far below the line on the right side of the chart, the largest bubble in the entire plot, its size scaled directly to the 6.28x volume ratio that confirmed the move.

The big grey Technology-colored point near the bottom is LEN, also well clear of the line and large enough to stand out against the Consumer Cyclical points clustered tighter to the diagonal.

A handful of other points drift noticeably off the line in both directions without being flagged as high-confidence, a reminder that distance from the line alone doesn’t guarantee a real story, which is exactly what the volume and news checks exist to settle.

Top 30 Abnormal Movers by Z-Score

A z-score ranking alone tells you which moves were statistically unusual. Pairing each bar with its volume ratio shows which of those moves also had real trading activity behind them, since the two together matter more than either alone.

top30 = queue.head(30).sort_values('zscore')

fig, ax = plt.subplots(figsize=(8, 6))
bar_colors = ['#2ca02c' if z > 0 else '#d62728' for z in top30['zscore']]
ax.barh(top30.index, top30['zscore'], color=bar_colors)

for i, (ticker, row) in enumerate(top30.iterrows()):
    ax.text(row['zscore'] + (0.1 if row['zscore'] > 0 else -0.1),
             i, f'vol={row["volume_ratio"]:.1f}x',
             va='center', ha='left' if row['zscore'] > 0 else 'right', fontsize=7)

ax.axvline(2.0, color='black', linestyle='--', linewidth=1)
ax.axvline(-2.0, color='black', linestyle='--', linewidth=1)
ax.set_xlabel('z-score')
ax.set_title(f'Top 30 Abnormal Movers by Z-Score - {scan_date.date()}')
plt.tight_layout()
plt.show()

LEN’s bar runs past -3.9, well clear of the -2.0 reference line, with a 2.4x volume label sitting right at the tip.

SATS follows close behind at roughly -3.0. But the number that actually stands out next to it is the volume label, 6.3x, the highest ratio anywhere on the chart and a much stronger confirmation than LEN’s.

On the positive side, MOS and ALB sit at the top of the green bars within a fraction of each other, both backed by volume north of 1.5x. This is consistent with the queue’s earlier suggestion that the two might share a catalyst.

ADBE is the one worth lingering on. Its bar barely crosses -1.6, short of the -2.0 threshold that would have earned it a high-confidence flag. But its volume ratio of 4.2x is among the highest in the entire chart. That combination, a moderate z-score paired with unusually heavy volume, is exactly the kind of case a fixed threshold misses and a chart like this one catches instead.

Trailing Abnormal Returns

A single day’s residual can’t say whether a move is building or already over. Lining up the 1, 3, and 5-day windows for the same set of stocks separates the two.

top15 = queue.head(15)
heatmap_data = top15[['residual', 'residual_3d', 'residual_5d']]
heatmap_data.columns = ['1-day', '3-day', '5-day']

fig, ax = plt.subplots(figsize=(7, 5))
im = ax.imshow(heatmap_data.values, cmap='RdYlGn', aspect='auto', vmin=-0.1, vmax=0.1)

ax.set_xticks(range(len(heatmap_data.columns)))
ax.set_xticklabels(heatmap_data.columns)
ax.set_yticks(range(len(heatmap_data.index)))
ax.set_yticklabels(heatmap_data.index)
ax.grid(False)

for i in range(heatmap_data.shape[0]):
    for j in range(heatmap_data.shape[1]):
        ax.text(j, i, f'{heatmap_data.values[i, j]:.1%}', ha='center', va='center', fontsize=8)

ax.set_title(f'Trailing Abnormal Returns - {scan_date.date()}')
plt.colorbar(im, ax=ax, label='abnormal return')
plt.tight_layout()
plt.show()

ALB is the clearest example of a move that built rather than spiked, going from 7.0% on day one to 11.8% over three days and settling at 10.5% over five, each window deepening the color rather than reversing it.

SATS tells the opposite story. The 1-day column shows -11.8% (by far the darkest red cell in the entire heatmap), but the 3-day and 5-day columns fade to -2.9% and -2.3%. This means that most of the damage was already priced in within the first session and the days that followed barely added to it.

CVNA shows a third pattern entirely, a move that got worse before it got better: -6.4% on day one widens to -8.9% over three days, the single deepest red cell outside of SATS’s first column, before narrowing back to -4.4% by day five.

Three names, three different shapes, and none of that distinction would be visible from the single-day z-score table alone.

Conclusions and Ideas for Next Steps

A few things stood out from today’s scan:

31 out of 487 stocks cleared the high-confidence bar, roughly 6%, which is a reasonable hit rate for a daily flag.
SATS and LEN both had real news behind the move, the best-case outcome for this kind of scanner.
TKO is a reminder that a ticker can mean two different companies depending on the data source.
MOS and ALB moving together with no news confirmation is worth a closer look, not just a glance at the table.

A few ways to take this further:

Match news by company name instead of ticker. That alone would've caught the TKO collision.
Pull more than 3 headlines per stock. ALB and MOS both got thin results.
Run this daily and keep a log, a single day’s queue can’t tell you if a move held or reversed.
Add a sector check. Two stocks from the same sector flagging together is worth a second look before calling either one idiosyncratic.

Beta explains most of what a stock does on most days. The exceptions are rare, and even then, it still takes a real check before you know if one means anything.

With that being said, you’ve reached the end of the article. Hope you learned something new and useful. Thank you very much for your time.

How to Analyze Analyst Estimate Ranges with Python

Nikhil Adithyan — Thu, 18 Jun 2026 15:49:47 +0000

Most financial models use analyst consensus as a single forward-looking input: revenue estimate, EPS estimate, EBITDA estimate, or some version of a forward margin assumption.

That works, but it flattens the data.

The average estimate is only the center of the range. Behind it, there is usually a low estimate, a high estimate, and the number of analysts contributing to the view. Two companies can have the same average estimate but very different levels of agreement behind it.

So I wanted to test a simple idea: what happens if we stop treating consensus as one number and start looking at its shape?

Not to predict stock returns or build a trading signal. Just to see whether the range around estimates tells us where analysts actually disagree.

Prerequisites
The Data I Needed To Test This
Pulling Analyst Estimates Across A Mixed Universe
Turning Estimate Ranges Into Spread Metrics
First View: Analyst Coverage Does Not Guarantee Agreement
A Few Names Made The Pattern Obvious
What This Changes In A Forecasting Workflow
What I Would Not Overclaim
Final Takeaway: Consensus Has Structure

Prerequisites

To follow along, you should be comfortable with basic Python, pandas DataFrames, dictionaries, loops, and simple plotting with matplotlib.

You’ll also need:

Python 3.9 or later
An FMP API key
The following Python libraries: requests, pandas, numpy, and matplotlib
Basic familiarity with analyst estimates, revenue, EPS, P/E-style forecasting inputs, and analyst coverage

You don't need advanced financial modeling knowledge. The goal is to show how low, average, high estimates, and analyst counts can reveal the shape of consensus instead of treating analyst estimates as one flat number.

The Data I Needed to Test This

To test this properly, the average estimate wasn't enough. I needed the full estimate range.

For each company, I wanted:

revenue low, average, and high
EPS low, average, and high
number of analysts behind the revenue estimate
number of analysts behind the EPS estimate

That gives two useful views. The average shows the center of expectations. The low and high estimates show how wide the expectation range is. The analyst count gives a rough sense of how deep the consensus is.

I also wanted a mixed universe. If the sample only includes mega-cap tech names, the result can easily become too clean because most of those companies are heavily covered. So I used a mix of mega-cap tech, semiconductors, energy, financials, healthcare, consumer names, and higher-uncertainty growth companies.

For the data source, I used FMP’s analyst estimates data because it provides the low, high, average, and analyst count fields needed for this experiment.

Pulling Analyst Estimates Across A Mixed Universe

I started by importing the basic packages and defining the stock universe.

import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from time import sleep

api_key = 'YOUR FMP API KEY'
base_url = 'https://financialmodelingprep.com/stable'

tickers = [
    'AAPL', 'MSFT', 'NVDA', 'AMZN', 'META', 'GOOGL',
    'TSLA', 'PLTR', 'COIN', 'RBLX', 'SNOW', 'UBER',
    'AMD', 'INTC', 'MU', 'AVGO', 'QCOM',
    'CAT', 'DE', 'BA', 'GE', 'XOM', 'CVX',
    'WMT', 'COST', 'NKE', 'SBUX', 'MCD', 'TGT',
    'JPM', 'BAC', 'GS', 'MS', 'V', 'MA',
    'UNH', 'PFE', 'LLY', 'MRK', 'ABBV',
    'ROKU', 'SHOP', 'SQ', 'PYPL', 'ZM'
]

The next step was to pull annual analyst estimates for every ticker. I used the nearest usable future estimate period for each company, because estimate endpoints can return multiple periods and some far-out periods may not be fully populated.

all_rows = []

today = pd.Timestamp.today().normalize()

for ticker in tickers:
    url = f'{base_url}/analyst-estimates'

    params = {
        'symbol': ticker,
        'period': 'annual',
        'limit': 10,
        'apikey': api_key
    }

    response = requests.get(url, params=params)
    data = response.json()

    df = pd.DataFrame(data)

    if len(df) == 0:
        print(f'{ticker}: no data')
        continue

    df['date'] = pd.to_datetime(df['date'])
    df = df.sort_values('date')

    df = df[
        (df['date'] > today) &
        (df['revenueAvg'].notna()) &
        (df['revenueLow'].notna()) &
        (df['revenueHigh'].notna()) &
        (df['epsAvg'].notna()) &
        (df['epsLow'].notna()) &
        (df['epsHigh'].notna())
    ].copy()

    if len(df) == 0:
        print(f'{ticker}: no usable future estimates')
        continue

    row = df.iloc[0].copy()
    all_rows.append(row)
    print(f'{ticker} done')
    
    sleep(0.2)

estimates = pd.DataFrame(all_rows)
estimates.head()

The output gave one usable forward estimate row per company.

This table is already more useful than a normal average estimate pull. It gives the center of the estimate, the range around it, and the analyst count behind it. That's enough to start measuring the shape of consensus instead of only storing the average.

Turning Estimate Ranges Into Spread Metrics

Once the estimate data was in place, I needed a way to compare estimate ranges across companies.

Raw ranges aren't enough. A $10 billion revenue range means something very different for a company expected to generate $50 billion in revenue versus one expected to generate $500 billion. So I normalized the range by the average estimate.

estimates['revenue_spread'] = ((estimates['revenueHigh'] - estimates['revenueLow']) / estimates['revenueAvg'])
estimates['eps_spread'] = ((estimates['epsHigh'] - estimates['epsLow']) / estimates['epsAvg'].abs())
shape_df = estimates[['symbol','date','revenueLow','revenueAvg','revenueHigh','revenue_spread','numAnalystsRevenue',
                      'epsLow','epsAvg','epsHigh','eps_spread','numAnalystsEps']].copy()

shape_df.head()

The logic is simple. revenue_spread tells us how wide the revenue estimate range is relative to the average revenue estimate. eps_spread does the same for EPS.

But EPS needs one extra check. If average EPS is close to zero, even a normal estimate range can create a huge spread. That doesn't always mean analysts are wildly uncertain. Sometimes it just means the denominator is too small.

So I kept the original EPS spread, but created a cleaner version for plotting.

shape_df['eps_spread_clean'] = shape_df['eps_spread']

shape_df.loc[shape_df['epsAvg'].abs() < 1, 'eps_spread_clean'] = np.nan
shape_df.loc[shape_df['eps_spread_clean'] > 3, 'eps_spread_clean'] = np.nan

After that, I checked the widest and tightest ranges.

shape_df.sort_values('revenue_spread', ascending=False)[
    [
        'symbol',
        'revenueLow',
        'revenueAvg',
        'revenueHigh',
        'revenue_spread',
        'numAnalystsRevenue'
    ]
].head(10)

This was the first sign that the idea might be useful. Some names had wide revenue estimate ranges despite meaningful analyst coverage. TSLA had 35 analysts behind revenue estimates, NVDA had 39, and INTC had 31, but their revenue ranges were still relatively wide.

Then I checked the cleaned EPS spread.

shape_df.sort_values('eps_spread_clean', ascending=False)[
    [
        'symbol',
        'epsLow',
        'epsAvg',
        'epsHigh',
        'eps_spread_clean',
        'numAnalystsEps'
    ]
].head(10)

This made the analysis more interesting. Revenue and EPS weren't behaving the same way. TSLA had wide ranges on both. SQ had a very high EPS spread, even though its revenue spread was much tighter. That started to suggest something useful: consensus disagreement can sit in different parts of the model.

First View: Analyst Coverage Does Not Guarantee Agreement

The first thing I wanted to check was whether deeper analyst coverage automatically meant tighter consensus.

So I used two simple dimensions:

number of analysts covering revenue
revenue estimate spread

Then I split the data using median thresholds. This isn't meant to be a formal model. It's just a quick way to separate different consensus shapes.

analyst_threshold = shape_df['numAnalystsRevenue'].median()
spread_threshold = shape_df['revenue_spread'].median()

analyst_threshold, spread_threshold

Then I created coverage and spread buckets:

shape_df['coverage_bucket'] = np.where(
    shape_df['numAnalystsRevenue'] >= analyst_threshold,
    'high coverage',
    'low coverage'
)

shape_df['spread_bucket'] = np.where(
    shape_df['revenue_spread'] <= spread_threshold,
    'low spread',
    'high spread'
)

From there, each company falls into one of four simple categories:

conditions = [
    (shape_df['coverage_bucket'] == 'high coverage') & (shape_df['spread_bucket'] == 'low spread'),
    (shape_df['coverage_bucket'] == 'high coverage') & (shape_df['spread_bucket'] == 'high spread'),
    (shape_df['coverage_bucket'] == 'low coverage') & (shape_df['spread_bucket'] == 'low spread'),
    (shape_df['coverage_bucket'] == 'low coverage') & (shape_df['spread_bucket'] == 'high spread')
]

labels = [
    'tight consensus',
    'watched but uncertain',
    'thin but stable',
    'weak consensus'
]

shape_df['revenue_consensus_shape'] = np.select(conditions, labels)

The split came out more balanced than I expected:

That was useful because the labels weren't collapsing into one obvious bucket. The universe actually had different consensus shapes.

Then I plotted coverage against revenue spread.

plt.figure(figsize=(12, 7))

for label in shape_df['revenue_consensus_shape'].unique():
    temp = shape_df[shape_df['revenue_consensus_shape'] == label]

    plt.scatter(
        temp['numAnalystsRevenue'],
        temp['revenue_spread'],
        s=80,
        label=label,
        alpha=0.8
    )

plt.axvline(analyst_threshold, linestyle='--', linewidth=1)
plt.axhline(spread_threshold, linestyle='--', linewidth=1)

for i, row in shape_df.iterrows():
    if row['revenue_spread'] > spread_threshold or row['numAnalystsRevenue'] > analyst_threshold:
        plt.text(
            row['numAnalystsRevenue'] + 0.3,
            row['revenue_spread'],
            row['symbol'],
            fontsize=9
        )

plt.title('Analyst Coverage vs Revenue Estimate Spread')
plt.xlabel('Number of Analysts Covering Revenue')
plt.ylabel('Revenue Estimate Spread')

plt.legend()
plt.show()

The chart made one thing clear: more analyst coverage doesn't always mean tighter agreement.

MSFT, AAPL, MA, WMT, and META sat closer to the tight consensus area. They had higher coverage and relatively narrow revenue ranges.

But TSLA, AVGO, NVDA, INTC, AMD, MU, and GOOGL were also heavily covered, yet their revenue estimate spreads were wider. These are the “watched but uncertain” names. The market isn't ignoring them. Analysts are looking at them closely, but the forecast range is still wide.

The weaker consensus area was also useful. CVX, XOM, and COIN had wide revenue ranges with lower coverage compared to the mega-cap names. That's a different kind of uncertainty. It's not just disagreement. It's disagreement with less analyst depth behind it.

This first view was helpful, but it still only looked at revenue. The next question was more interesting: does the uncertainty sit in revenue, EPS, or both?

plot_df = shape_df.dropna(subset=['revenue_spread', 'eps_spread_clean']).copy()

plt.figure(figsize=(12, 7))

plt.scatter(
    plot_df['revenue_spread'],
    plot_df['eps_spread_clean'],
    s=plot_df['numAnalystsRevenue'] * 3,
    alpha=0.75
)

for i, row in plot_df.iterrows():
    plt.text(
        row['revenue_spread'] + 0.002,
        row['eps_spread_clean'],
        row['symbol'],
        fontsize=9
    )

plt.title('Revenue Estimate Spread vs EPS Estimate Spread')
plt.xlabel('Revenue Estimate Spread')
plt.ylabel('EPS Estimate Spread')

plt.show()

This was the more useful view.

The chart showed that consensus uncertainty doesn't sit in the same place for every company. Some names had both revenue and EPS clustered tightly. Some had wide ranges across both. And a few had a much more specific kind of disagreement.

SQ was the clearest example. Its revenue spread was low, but its EPS spread was high. That suggests analysts were much closer on the revenue side than on the earnings side.

TSLA showed the opposite kind of extreme. Both revenue and EPS spreads were wide, so the average estimate was hiding disagreement across more than one part of the model.

At this point, I wanted to turn this into a simple classification. Again, this isn't a formal risk model. I used median thresholds only to separate the shapes clearly.

revenue_spread_threshold = plot_df['revenue_spread'].median()
eps_spread_threshold = plot_df['eps_spread_clean'].median()

plot_df['revenue_uncertainty'] = np.where(
    plot_df['revenue_spread'] <= revenue_spread_threshold,
    'low revenue uncertainty',
    'high revenue uncertainty'
)

plot_df['eps_uncertainty'] = np.where(
    plot_df['eps_spread_clean'] <= eps_spread_threshold,
    'low EPS uncertainty',
    'high EPS uncertainty'
)

Then I combined the two buckets into four forecast shapes.

conditions = [
    (plot_df['revenue_uncertainty'] == 'low revenue uncertainty') & (plot_df['eps_uncertainty'] == 'low EPS uncertainty'),
    (plot_df['revenue_uncertainty'] == 'low revenue uncertainty') & (plot_df['eps_uncertainty'] == 'high EPS uncertainty'),
    (plot_df['revenue_uncertainty'] == 'high revenue uncertainty') & (plot_df['eps_uncertainty'] == 'low EPS uncertainty'),
    (plot_df['revenue_uncertainty'] == 'high revenue uncertainty') & (plot_df['eps_uncertainty'] == 'high EPS uncertainty')
]

labels = [
    'stable forecast shape',
    'profitability uncertainty',
    'top-line uncertainty',
    'broad forecast uncertainty'
]

plot_df['forecast_shape'] = np.select(conditions, labels)

The distribution looked like this:

That split was more useful than the first one because it showed where the disagreement was located.

A stable forecast shape means both revenue and EPS ranges are relatively tight. Profitability uncertainty means revenue estimates are tighter, but EPS estimates are wider. Top-line uncertainty means the revenue range is wider while EPS is relatively tighter. Broad forecast uncertainty means both sides are wide.

Then I plotted the same chart again with these labels:

plt.figure(figsize=(12, 7))

for label in plot_df['forecast_shape'].unique():
    temp = plot_df[plot_df['forecast_shape'] == label]

    plt.scatter(
        temp['revenue_spread'],
        temp['eps_spread_clean'],
        s=temp['numAnalystsRevenue'] * 3,
        label=label,
        alpha=0.75
    )

plt.axvline(revenue_spread_threshold, linestyle='--', linewidth=1)
plt.axhline(eps_spread_threshold, linestyle='--', linewidth=1)

for i, row in plot_df.iterrows():
    if (
        row['revenue_spread'] > revenue_spread_threshold or
        row['eps_spread_clean'] > eps_spread_threshold
    ):
        plt.text(
            row['revenue_spread'] + 0.002,
            row['eps_spread_clean'],
            row['symbol'],
            fontsize=9
        )

plt.title('Revenue Uncertainty vs EPS Uncertainty')
plt.xlabel('Revenue Estimate Spread')
plt.ylabel('EPS Estimate Spread')

plt.legend()
plt.show()

This became the main chart for the analysis.

The average estimate hides the center of expectations, but this chart shows the structure around it. For a forecasting workflow, that matters. A model shouldn't treat a tight consensus estimate and a wide consensus estimate as if they carry the same level of agreement.

A Few Names Made The Pattern Obvious

Once the companies were grouped by forecast shape, the pattern became easier to read.

plot_df[
    [
        'symbol',
        'revenue_spread',
        'eps_spread_clean',
        'numAnalystsRevenue',
        'numAnalystsEps',
        'forecast_shape'
    ]
].sort_values(['forecast_shape', 'eps_spread_clean'], ascending=[True, False])

The full table was useful, but for the article, the more important part is the examples from each bucket.

broad_uncertainty = final_view[
    final_view['forecast_shape'] == 'broad forecast uncertainty'
].sort_values('eps_spread_pct', ascending=False)

broad_uncertainty.head(10)

TSLA was the obvious outlier. The revenue estimate spread was around 21.8%, and the EPS spread was over 104%. That's not just a wide range around one line item. It's disagreement across both the top line and bottom line.

CVX and XOM were also interesting, but for a different reason. Their revenue spreads were very wide, and analyst coverage was lower than many tech names in the sample. That makes their consensus shape different from a name like TSLA, where coverage is deeper but disagreement still remains.

Then I looked at the profitability uncertainty bucket.

profitability_uncertainty = final_view[
    final_view['forecast_shape'] == 'profitability uncertainty'
].sort_values('eps_spread_pct', ascending=False)

profitability_uncertainty

This was the most useful bucket conceptually.

SQ had only about 1.1% revenue spread, but nearly 73.8% EPS spread. That's a very different shape from TSLA. Here, analysts were much closer on revenue, but far apart on earnings.

That matters for a model. If I only store the average revenue estimate and average EPS estimate, I lose that distinction. The model can't see that the revenue estimate is relatively tight while the EPS estimate carries much more disagreement.

SNOW and PLTR showed a similar pattern, though not as extreme. Revenue expectations were relatively close together, but EPS expectations had a wider range. That points to uncertainty around profitability, margins, or earnings conversion rather than pure revenue growth.

The stable bucket gave the contrast.

stable_shape = final_view[
    final_view['forecast_shape'] == 'stable forecast shape'
].sort_values(['revenue_spread_pct', 'eps_spread_pct'])

stable_shape.head(10)

MSFT was the cleanest example here. Its revenue spread was around 0.4%, and its EPS spread was around 3.0%. MA, BAC, ABBV, and TGT also stayed in the stable zone, with relatively tight ranges across both revenue and EPS.

That doesn't mean these estimates will be right. It only means analysts are clustered more tightly around the forward numbers.

Finally, the top-line uncertainty bucket was smaller.

topline_uncertainty = final_view[
    final_view['forecast_shape'] == 'top-line uncertainty'
].sort_values('revenue_spread_pct', ascending=False)

topline_uncertainty

This group was smaller, but it completed the picture. These were cases where revenue uncertainty was more visible than EPS uncertainty.

The broader point is simple: consensus doesn't have one shape. Averages hide that. The range around the average shows whether disagreement sits around revenue, EPS, or both.

What This Changes In A Forecasting Workflow

The practical takeaway isn't that every model needs a new complicated uncertainty system. It's simpler than that.

If a model already stores analyst estimates, it should probably store the range around those estimates too.

Instead of keeping only this:

symbol | estimated_revenue | estimated_eps

I would rather keep this:

symbol | estimated_revenue | estimated_eps | revenue_spread | eps_spread | analyst_count | forecast_shape

That gives the model more context about the forecast input it's already using.

To make this usable, I created a final table with the estimate period, revenue spread, EPS spread, analyst coverage, revenue consensus shape, and overall forecast shape.

final_df = plot_df[
    [
        'symbol',
        'date',
        'revenueAvg',
        'revenueLow',
        'revenueHigh',
        'revenue_spread',
        'epsAvg',
        'epsLow',
        'epsHigh',
        'eps_spread_clean',
        'numAnalystsRevenue',
        'numAnalystsEps',
        'revenue_consensus_shape',
        'forecast_shape'
    ]
].copy()

final_df = final_df.rename(
    columns={
        'date': 'estimate_period',
        'revenueAvg': 'revenue_avg',
        'revenueLow': 'revenue_low',
        'revenueHigh': 'revenue_high',
        'epsAvg': 'eps_avg',
        'epsLow': 'eps_low',
        'epsHigh': 'eps_high',
        'eps_spread_clean': 'eps_spread',
        'numAnalystsRevenue': 'revenue_analysts',
        'numAnalystsEps': 'eps_analysts'
    }
)

final_df['revenue_spread_pct'] = final_df['revenue_spread'] * 100
final_df['eps_spread_pct'] = final_df['eps_spread'] * 100

final_view = final_df[
    [
        'symbol',
        'estimate_period',
        'revenue_spread_pct',
        'eps_spread_pct',
        'revenue_analysts',
        'eps_analysts',
        'revenue_consensus_shape',
        'forecast_shape'
    ]
].copy()

final_view = final_view.sort_values('eps_spread_pct', ascending=False)

final_view.head(15)

The output looked like this:

This table is mainly useful for spotting where the average estimate hides the most disagreement.

TSLA is the clearest broad uncertainty case. Both revenue and EPS spreads are wide, so storing only the average estimate would flatten too much of the forecast structure.

SQ is different. Its revenue spread is only about 1.1%, but its EPS spread is about 73.8%. That suggests the disagreement is much less about revenue and much more about profitability or earnings conversion.

SNOW and PLTR show a similar pattern, though less extreme. Their revenue spreads are relatively tight, while EPS spreads are much wider. That's a useful distinction for any model using estimates as inputs.

The point isn't to decide which estimate is right. The point is to avoid treating every consensus average as if it carries the same level of agreement. The average gives the center. The spread shows how much disagreement sits around that center.

What I Would Not Overclaim

I wouldn't treat these labels as a final model.

The stock universe here is handpicked, not the full market. The cutoffs are also simple median thresholds, not a statistical confidence model. They're useful for separating the data into readable groups, but they shouldn't be treated as exact boundaries.

EPS spread also needs care. If average EPS is close to zero, the spread can become distorted, which is why I cleaned extreme EPS cases before plotting.

Most importantly, this doesn't tell us which estimate is right. A wide range doesn't automatically mean the company is bad, and a tight range does not mean the forecast will be accurate.

The useful part is more basic: the model stops pretending that every average estimate carries the same level of agreement.

Final Takeaway: Consensus Has Structure

The average estimate is still useful. I wouldn't remove it from a forecasting model.

But after looking at the low, high, average, and analyst count together, using only the average feels incomplete.

Consensus has structure. Some estimates are tight. Some are wide. Sometimes disagreement sits around revenue. Sometimes it sits around EPS. Sometimes it shows up across both.

A better forecasting workflow should preserve that structure instead of flattening it away. It doesn't need to become complicated. Even a few extra fields, like revenue spread, EPS spread, analyst count, and forecast shape, can make the estimate layer more honest.

Geopolitical Risk Isn't One Thing. I Built a Python Framework to Prove It

Nikhil Adithyan — Sat, 13 Jun 2026 06:37:23 +0000

On April 3, 2025, the US announced sweeping tariffs on Chinese imports. SPY dropped 4.8% that day. The next day, it dropped another 6%. Financial news ran the usual headline: markets rattled by geopolitical uncertainty.

Three months earlier, on August 5, 2024, the yen carry trade unwound. SPY dropped 3% in a single session. VIXY hit 65. Same headline: geopolitical uncertainty roils markets.

Both events got the same label. But if you actually pull the data and look at what moved, the two events have almost nothing in common. Gold surged in the tariff shock. In the yen unwind, it fell. Bonds rallied in the yen unwind. In the tariff shock, they sold off alongside equities.

Same label. Completely different markets.

To understand why, in this analysis we'll forensically pull apart three geopolitical events using Python and EODHD’s market data APIs. We'll track what moved, in what order, what the options market was pricing before spot prices moved, and what news sentiment was saying through all of it. The data tells a more specific story than the headlines did.

Prerequisites
Setup: The Asset Basket and Data Source
The Repricing Sequence Engine
Options Data and IV Skew
Composite Stress Score
News Sentiment
Event 1: Hamas Attack on Israel, Oct 7 2023
Event 2: Yen Carry Unwind, Aug 5 2024
Event 3: US-China Tariff Shock, Apr 2025
Putting It All Together: The Heatmap
Final Thoughts

Prerequisites

Before following along, you should be comfortable with basic Python and pandas. This article assumes you can read DataFrames, work with dictionaries, write simple functions, and understand basic return calculations.

You’ll also need:

Python 3.9 or later
An EODHD API key
The following Python libraries: requests, pandas, numpy, and plotly
Basic familiarity with ETFs like SPY, QQQ, GLD, TLT, and VIXY
Some understanding of returns, volatility, implied volatility, options skew, correlation, and market sentiment

You don't need to be an options expert to follow the article. The options section uses one simple idea: if out-of-the-money puts become more expensive relative to at-the-money calls, the market is paying more for downside protection. We’ll use that as a rough fear signal, not as a full options pricing model.

The goal isn't to build a perfect geopolitical risk model. The goal is to show how different market data layers can help separate one type of shock from another.

Setup: The Asset Basket and Data Source

The asset basket is built around one question: which instruments reveal the most about how a shock is being interpreted by the market?

Broad equities (SPY, QQQ, IWM) show the scale of the selloff and which market cap segments are hit hardest. Sector ETFs (XLE, XLF, ITA, XLK) show where the economic consequence is being priced. Energy, financials, defense, and tech each respond differently depending on the nature of the shock. Safe havens (GLD, TLT, UUP) are the most diagnostic: how gold, bonds, and the dollar move relative to equities tells you what kind of fear the market is expressing. VIXY tracks implied volatility directly.

Together, these 11 assets produce a fingerprint for each event.

We've pulled data from EODHD’s historical EOD API. Each event gets a 30-day window on either side of the event date.

import requests
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

api_key = 'your_eodhd_api_key'

events = {
    'oct7_attack': {
        'date': '2023-10-07',
        'label': 'Hamas Attack on Israel (Oct 2023)',
        'shock_type': 'confidence',
        'shock_label': 'Type 1 - Confidence Shock'
    },
    'yen_carry_unwind': {
        'date': '2024-08-05',
        'label': 'Yen Carry Unwind + Middle East Escalation (Aug 2024)',
        'shock_type': 'liquidity',
        'shock_label': 'Type 2 - Liquidity Shock'
    },
    'tariff_shock': {
        'date': '2025-04-03',
        'label': 'US-China Tariff Shock (Apr 2025)',
        'shock_type': 'structural',
        'shock_label': 'Type 3 - Structural Shock'
    }
}

assets = {
    'spy': 'SPY.US', 'qqq': 'QQQ.US', 'iwm': 'IWM.US',
    'xle': 'XLE.US', 'xlf': 'XLF.US', 'ita': 'ITA.US',
    'xlk': 'XLK.US', 'gld': 'GLD.US', 'tlt': 'TLT.US',
    'uup': 'UUP.US', 'vixy': 'VIXY.US'
}

def fetch_prices(ticker, start, end):
    url = f'https://eodhd.com/api/eod/{ticker}'
    params = {
        'from': start,
        'to': end,
        'api_token': api_key,
        'fmt': 'json'
    }
    r = requests.get(url, params=params)
    df = pd.DataFrame(r.json())
    df['date'] = pd.to_datetime(df['date'])
    df = df.set_index('date')[['adjusted_close']]
    df.columns = [ticker.split('.')[0].lower()]
    return df

def fetch_event_prices(event_date, lookback=30, lookahead=30):
    start = (pd.Timestamp(event_date) - pd.Timedelta(days=lookback)).strftime('%Y-%m-%d')
    end = (pd.Timestamp(event_date) + pd.Timedelta(days=lookahead)).strftime('%Y-%m-%d')
    frames = [fetch_prices(ticker, start, end) for ticker in assets.values()]
    return pd.concat(frames, axis=1)

event_prices = {name: fetch_event_prices(e['date']) for name, e in events.items()}

event_prices.keys()

This gives us three dataframes: one per event, each with 11 columns and roughly 60 rows covering the full window.

dict_keys(['oct7_attack', 'yen_carry_unwind', 'tariff_shock'])

All prices are adjusted close, which handles any splits or dividend distortions cleanly.

The Repricing Sequence Engine

Before looking at each event individually, we need a consistent way to measure what happened across all of them. The repricing sequence engine does three things: normalizes all asset prices to 100 at the event date so cross-asset comparison is clean, slices a tight window around the event, and ranks assets by the size of their T+1 move to identify what repriced fastest.

def normalize_to_event(df, event_date):
    event_date = pd.Timestamp(event_date)
    valid_dates = df.index[df.index >= event_date]
    anchor = valid_dates[0]
    normalized = df.div(df.loc[anchor]) * 100
    return normalized, anchor

def get_event_window(df, anchor, t_minus=5, t_plus=10):
    start_idx = df.index.get_loc(anchor) - t_minus
    end_idx = df.index.get_loc(anchor) + t_plus
    start_idx = max(start_idx, 0)
    return df.iloc[start_idx:end_idx + 1]

def repricing_leaderboard(window_df, anchor):
    anchor_idx = window_df.index.get_loc(anchor)
    post_event = window_df.iloc[anchor_idx:]
    cumulative_returns = (post_event / post_event.iloc[0] - 1) * 100
    t1_moves = cumulative_returns.iloc[1].abs().sort_values(ascending=False)
    return cumulative_returns, t1_moves

event_windows = {}
leaderboards = {}

for name, meta in events.items():
    df = event_prices[name]
    normalized, anchor = normalize_to_event(df, meta['date'])
    window = get_event_window(normalized, anchor)
    cumret, t1_rank = repricing_leaderboard(window, anchor)
    event_windows[name] = {'window': window, 'anchor': anchor, 'cumret': cumret}
    leaderboards[name] = t1_rank
    print(f"\n{meta['label']}")
    print(f'anchor date: {anchor.date()}')
    print('T+1 move ranking:')
    print(t1_rank.round(2))

Output:

Hamas Attack on Israel (Oct 2023)
anchor date: 2023-10-09
T+1 move ranking:
vixy    3.35
iwm     1.13
xlf     0.73
ita     0.72
qqq     0.55
spy     0.52
uup     0.24
gld     0.17
xlk     0.15
tlt     0.14
xle     0.12
Name: 2023-10-10 00:00:00, dtype: float64

Yen Carry Unwind + Middle East Escalation (Aug 2024)
anchor date: 2024-08-05
T+1 move ranking:
vixy    20.52
tlt      2.24
xlf      1.62
xlk      1.36
iwm      1.09
qqq      0.96
spy      0.92
gld      0.80
xle      0.61
ita      0.57
uup      0.32
Name: 2024-08-06 00:00:00, dtype: float64

US-China Tariff Shock (Apr 2025)
anchor date: 2025-04-03
T+1 move ranking:
vixy    19.97
xle      9.20
ita      8.44
xlf      7.32
xlk      6.59
qqq      6.21
spy      5.85
iwm      4.46
gld      2.34
uup      1.11
tlt      1.09
Name: 2025-04-04 00:00:00, dtype: float64

VIXY leads all three events at T+1, which makes sense. Volatility reprices faster than anything else. But look past VIXY and the rankings diverge completely.

In the Hamas attack, moves were small across the board. The largest non-VIXY move was IWM at 1.13%. In the yen carry unwind, TLT was the second biggest mover at 2.24%, bonds bid hard as a safe haven. In the tariff shock, every equity sector moved 4% to 9% while TLT moved just 1.09%, and gold came in at 2.34%.

Three events with three completely different repricing hierarchies. The T+1 leaderboard alone tells you something meaningful about what each market was actually pricing.

Note on the Oct 7 anchor: the attack happened on a Saturday. The first trading day was Monday, October 9, which is why the anchor is Oct 9 rather than Oct 7. This matters for the skew analysis later.

Options Data and IV Skew

Price data tells you what happened. Options data tells you what the market was willing to pay to protect against it.

The skew metric we compute here is straightforward: the difference between the average implied volatility of OTM puts (strikes at 90% to 97% of spot) and ATM calls (97% to 103% of spot). When this number rises, the market is paying a premium for downside protection relative to upside exposure. That is fear, quantified.

We pull SPY options data from EODHD's options EOD endpoint, paginating through the full dataset for each event window.

def fetch_options_all(ticker, start, end, exp_cap):
    url = 'https://eodhd.com/api/mp/unicornbay/options/eod'
    all_records = []
    offset = 0
    limit = 1000
    cols = None

    while True:
        params = {
            'filter[underlying_symbol]': ticker,
            'filter[tradetime_from]': start,
            'filter[tradetime_to]': end,
            'filter[exp_date_to]': exp_cap,
            'fields[options-eod]': 'type,exp_date,strike,volatility,tradetime',
            'page[limit]': limit,
            'page[offset]': offset,
            'api_token': api_key,
            'compact': 1
        }
        r = requests.get(url, params=params)
        payload = r.json()

        if 'meta' not in payload:
            print(f'unexpected response at offset {offset}: {list(payload.keys())}')
            break

        if cols is None:
            cols = [f.strip() for f in payload['meta']['fields']]

        batch = payload['data']
        all_records.extend(batch)

        total = payload['meta']['total']
        offset += limit
        if offset >= total or not batch:
            break

    df = pd.DataFrame(all_records, columns=cols)
    df['tradetime'] = pd.to_datetime(df['tradetime'])
    df['exp_date'] = pd.to_datetime(df['exp_date'])
    df['strike'] = pd.to_numeric(df['strike'], errors='coerce')
    df['volatility'] = pd.to_numeric(df['volatility'], errors='coerce')
    return df.dropna(subset=['volatility', 'strike']).query('volatility > 0')

def compute_skew(df, spot):
    df = df.copy()
    df['moneyness'] = df['strike'] / spot

    for expiry in sorted(df['exp_date'].unique()):
        sub = df[df['exp_date'] == expiry]
        otm_puts = sub[(sub['type'] == 'put') & (sub['moneyness'].between(0.90, 0.97))]
        atm_calls = sub[(sub['type'] == 'call') & (sub['moneyness'].between(0.97, 1.03))]
        if otm_puts.empty or atm_calls.empty:
            continue

        daily_skew = []
        for date, puts in otm_puts.groupby('tradetime'):
            calls = atm_calls[atm_calls['tradetime'] == date]
            if calls.empty:
                continue
            skew = puts['volatility'].mean() - calls['volatility'].mean()
            daily_skew.append({'date': date, 'skew': skew})

        if daily_skew:
            print(f'  using expiry: {expiry.date()}, {len(daily_skew)} days')
            return pd.DataFrame(daily_skew).set_index('date').sort_index()

    return pd.DataFrame()

spy_skew = {}

for name, meta in events.items():
    anchor = event_windows[name]['anchor']
    spot = event_prices[name].loc[anchor, 'spy']
    start = (anchor - pd.Timedelta(days=20)).strftime('%Y-%m-%d')
    end = (anchor + pd.Timedelta(days=5)).strftime('%Y-%m-%d')
    exp_cap = (pd.Timestamp(end) + pd.Timedelta(days=90)).strftime('%Y-%m-%d')
    raw = fetch_options_all('SPY', start, end, exp_cap)
    print(f'\n{meta["label"]} | total rows: {len(raw)}')
    skew_df = compute_skew(raw, spot)
    spy_skew[name] = skew_df
    print(skew_df)

Output:

Hamas Attack on Israel (Oct 2023) | total rows: 10435
  using expiry: 2023-11-17, 3 days
                skew
date                
2023-10-11  0.014164
2023-10-12  0.034279
2023-10-13  0.054055
unexpected response at offset 11000: ['errors']

Yen Carry Unwind + Middle East Escalation (Aug 2024) | total rows: 10660
  using expiry: 2024-10-18, 11 days
                skew
date                
2024-07-26  0.040748
2024-07-29  0.041219
2024-07-30  0.087402
2024-07-31  0.029824
2024-08-01  0.065074
2024-08-02  0.053369
2024-08-05  0.049848
2024-08-06  0.055957
2024-08-07  0.050664
2024-08-08  0.050283
2024-08-09  0.055462
unexpected response at offset 11000: ['errors']

US-China Tariff Shock (Apr 2025) | total rows: 10698
  using expiry: 2025-06-20, 18 days
                skew
date                
2025-03-14  0.042500
2025-03-17  0.029671
2025-03-18  0.027886
2025-03-19  0.029360
2025-03-20  0.026691
2025-03-21  0.008500
2025-03-24  0.013388
2025-03-25  0.022157
2025-03-26  0.012829
2025-03-27  0.009171
2025-03-28  0.026971
2025-03-31  0.036586
2025-04-01  0.022857
2025-04-02 -0.023000
2025-04-03  0.019729
2025-04-04  0.036729
2025-04-07  0.005257
2025-04-08  0.041543

A few observations worth noting before the event analysis. The Oct 7 dataset has only three data points, all post-event, due to limited options coverage for that period. The tariff shock dataset has the richest pre-event coverage, going back to March 14, nearly three weeks before the event. It also includes a negative skew reading on April 2, the day before the crash. We'll look at what each of these means in context when we get to the individual events.

Composite Stress Score

The skew signal alone has a weakness: it can spike for reasons unrelated to geopolitical stress. To make it more robust, we combine it with a second signal: the rolling 10-day correlation between SPY and GLD.

Under normal conditions, equities and gold are weakly correlated or negatively correlated. When stress builds, that relationship breaks down. Tracking the breakdown gives us a second, independent measure of market stress that doesn't depend on options pricing.

Both signals are z-scored before combining, so neither dominates due to scale differences. The correlation signal is inverted since falling correlation means rising stress. The composite is the average of the two.

def build_composite(event_name, skew_df, event_prices_df, anchor):
    prices = event_prices_df[['spy', 'gld']].copy()
    prices['corr'] = prices['spy'].rolling(10).corr(prices['gld'])

    def zscore(s):
        return (s - s.mean()) / s.std()

    skew_z = zscore(skew_df['skew'])
    corr_z = zscore(prices['corr'].dropna())

    corr_z = corr_z * -1

    combined = pd.concat([skew_z.rename('skew_z'), corr_z.rename('corr_z')], axis=1).dropna()
    combined['composite'] = combined.mean(axis=1)

    combined['stress_flag'] = combined['composite'] > 1.0

    return combined

composites = {}

for name, meta in events.items():
    anchor = event_windows[name]['anchor']
    skew_df = spy_skew[name]
    prices_df = event_prices[name]
    comp = build_composite(name, skew_df, prices_df, anchor)
    composites[name] = comp
    print(f"\n{meta['label']}")
    print(comp.round(3))

Output:

Hamas Attack on Israel (Oct 2023)
            skew_z  corr_z  composite  stress_flag
date                                              
2023-10-11  -1.003  -1.186     -1.094        False
2023-10-12   0.006  -1.316     -0.655        False
2023-10-13   0.997  -0.971      0.013        False

Yen Carry Unwind + Middle East Escalation (Aug 2024)
            skew_z  corr_z  composite  stress_flag
date                                              
2024-07-26  -0.808  -0.863     -0.835        False
2024-07-29  -0.776  -1.074     -0.925        False
2024-07-30   2.343  -0.559      0.892        False
2024-07-31  -1.546  -0.082     -0.814        False
2024-08-01   0.835   0.933      0.884        False
2024-08-02   0.044   2.117      1.081         True
2024-08-05  -0.194   1.977      0.892        False
2024-08-06   0.219   1.525      0.872        False
2024-08-07  -0.138   1.170      0.516        False
2024-08-08  -0.164   0.881      0.358        False
2024-08-09   0.186   0.371      0.278        False

US-China Tariff Shock (Apr 2025)
            skew_z  corr_z  composite  stress_flag
date                                              
2025-03-17   0.511   0.516      0.513        False
2025-03-18   0.398   0.493      0.445        False
2025-03-19   0.491   0.154      0.323        False
2025-03-20   0.322  -0.209      0.057        False
2025-03-21  -0.830  -1.023     -0.926        False
2025-03-24  -0.520  -0.999     -0.759        False
2025-03-25   0.035  -0.777     -0.371        False
2025-03-26  -0.556  -0.566     -0.561        False
2025-03-27  -0.787   0.096     -0.346        False
2025-03-28   0.340   1.093      0.716        False
2025-03-31   0.949   1.179      1.064         True
2025-04-01   0.080   1.309      0.694        False
2025-04-02  -2.824   1.190     -0.817        False
2025-04-03  -0.119   1.047      0.464        False
2025-04-04   0.958   0.119      0.539        False
2025-04-07  -1.035  -0.794     -0.915        False
2025-04-08   1.263  -1.274     -0.006        False

The stress flag threshold is set at 1.0. Two days get flagged across all three events: August 2, 2024, for the yen carry unwind, and March 31, 2025, for the tariff shock. Both are pre-event. The Oct 7 dataset is too sparse to produce a meaningful composite reading.

The Apr 2 row in the tariff shock is worth noting: skew_z of -2.824, the most negative skew reading in the entire dataset, pulling the composite negative despite the correlation signal remaining elevated. The options market was actively pricing more upside than downside on the day before the largest single-day SPY drop of 2025. That isn't a signal failure to brush past. We'll come back to it.

News Sentiment

The final data layer is news sentiment. EODHD's sentiment API generates a daily normalized score for each ticker derived from financial news coverage, ranging from -1 (strongly negative) to +1 (strongly positive). We pull SPY sentiment as a broad market proxy for the same windows used in the options analysis.

def fetch_sentiment(ticker, start, end):
    url = 'https://eodhd.com/api/sentiments'
    params = {
        's': ticker,
        'from': start,
        'to': end,
        'api_token': api_key,
        'fmt': 'json'
    }
    r = requests.get(url, params=params)
    data = r.json()
    key = ticker if ticker in data else ticker + '.US'
    if key not in data:
        return pd.DataFrame()
    df = pd.DataFrame(data[key])
    df['date'] = pd.to_datetime(df['date'])
    df = df.set_index('date')[['normalized']].rename(columns={'normalized': 'sentiment'})
    return df.sort_index()

event_sentiment = {}

for name, meta in events.items():
    anchor = event_windows[name]['anchor']
    start = (anchor - pd.Timedelta(days=20)).strftime('%Y-%m-%d')
    end = (anchor + pd.Timedelta(days=10)).strftime('%Y-%m-%d')
    sent_df = fetch_sentiment('SPY', start, end)
    event_sentiment[name] = sent_df
    print(f"\n{meta['label']}")
    print(sent_df)

Output:

Hamas Attack on Israel (Oct 2023)
            sentiment
date                 
2023-09-25      0.997
2023-09-26      0.986

Yen Carry Unwind + Middle East Escalation (Aug 2024)
            sentiment
date                 
2024-07-17     0.9340
2024-07-22     0.9460
2024-07-23     0.9550
2024-07-25     0.9925
2024-07-26     0.9860
2024-07-29     0.9850
2024-07-30     0.9630
2024-07-31     0.9950
2024-08-02     0.3350
2024-08-05     0.9780
2024-08-06     0.3603
2024-08-15     0.9980

US-China Tariff Shock (Apr 2025)
            sentiment
date                 
2025-03-14    -0.9890
2025-03-15     0.9930
2025-03-17    -0.7010
2025-03-18     0.9990
2025-03-20    -0.8900
2025-03-22     0.9950
2025-03-24     0.9600
2025-03-27     0.9830
2025-03-28     0.9917
2025-04-03     0.9365
2025-04-05     0.0130
2025-04-06     0.9990
2025-04-07     0.9870
2025-04-09     0.5460
2025-04-10     0.8079
2025-04-11     0.0929
2025-04-12    -0.9920
2025-04-13     0.0130

Two things stand out immediately. For the yen carry unwind, sentiment ranged between 0.934 and 0.995 from July 17 through July 31 while skew was already spiking on July 30 and the composite was building. Sentiment did not register the stress the options market was pricing. For the tariff shock, sentiment on April 3, the day SPY dropped 4.8%, was +0.9365. Strongly positive. The news cycle had no idea what was coming.

The October 7 sentiment data has only two data points from late September, both near +1.0. This predates the event by nearly two weeks and tells us nothing about market sentiment around the attack itself. Coverage is too thin for this event to contribute to the sentiment analysis.

Event 1: Hamas Attack on Israel, Oct 7 2023

The Hamas attack on October 7, 2023, was a major geopolitical shock. The market's response was not.

SPY closed up 0.64% on October 9 relative to the October 6 close. The anchor is Monday, October 9, because the attack happened on a Saturday. GLD and TLT both rallied. VIXY spiked to a T+1 move of 3.35%, modest compared to the 20% readings in the other two events. Within two weeks, most assets had drifted back toward pre-event levels.

The market's interpretation was specific: this was a regional conflict with limited direct economic transmission. Israel is not a major oil supplier, not a critical trade partner, and not deeply embedded in global supply chains in a way that would reprice earnings expectations. The uncertainty was real. The economic consequence was not.

That distinction shows up clearly in the safe haven behavior. GLD and TLT both up, UUP flat, equities essentially unchanged. When gold and bonds rally together while equities hold, the market is expressing classic flight-to-safety. Money moved into defensive assets as insurance against uncertainty, not as a response to any fundamental repricing.

The skew data for this event is limited to three post-event days: October 11, 12, and 13. Skew climbed steadily from 0.014 to 0.054 over those three days, consistent with the market pricing of ongoing uncertainty in the days following the attack.

But because the attack happened on a weekend and EODHD's options coverage for this period is thin, there is no pre-event skew data. We can't say whether the options market anticipated this event.

The composite is similarly sparse. Three data points, none flagged. There isn't enough data here to draw conclusions about early warning signals.

This is the weakest case study analytically. It stays in the analysis because the repricing fingerprint is informative and the contrast with the other two events is stark. The small moves, the clean flight-to-safety pattern, and the rapid recovery point to a specific kind of event: one where the market prices fear without pricing economic damage. That's a meaningful category even if the options data can't say more about it.

Event 2: Yen Carry Unwind, Aug 5 2024

The August 2024 event is the most analytically rich of the three. It's also the one where the data most clearly supports the idea that structured market signals were pricing stress before the crash arrived.

The repricing sequence tells an immediate story. VIXY exploded to a T+1 move of 20.52%. TLT was the second biggest mover at 2.24%, bid hard as a safe haven. Equities sold off across the board.

This is what a liquidity shock looks like. The Bank of Japan raised rates unexpectedly on July 31, triggering a massive unwind of yen carry trades.

The selling wasn't driven by a change in economic fundamentals. It was driven by positioning. Traders who had borrowed cheaply in yen to buy higher-yielding assets were forced to sell those assets simultaneously to cover their positions. The correlation between assets broke down because everything was being sold for the same mechanical reason.

Now look at what the skew data was doing before any of this:

On July 30, six days before the crash, skew spiked to 0.087. The highest reading in the entire pre-event window by a significant margin. It then compressed on July 31 before rising again on August 1 and 2. The crash hit on August 5.

That July 30 spike is the most important data point in this analysis. The BOJ rate decision that triggered the unwind came on July 31. The options market was pricing elevated downside risk the day before the trigger event, not after it. Someone, or more likely many someones, was paying up for SPY put protection before the news was public.

Now look at what sentiment was doing over the same period:

From July 17 through July 31, sentiment held between 0.934 and 0.995. Near maximum bullishness, every single day. On July 30, the same day skew spiked to 0.087, sentiment was 0.963. The news cycle was not concerned. The options market was.

Sentiment finally dropped to 0.335 on August 2, three days after the skew spike and three days before the crash. By that point, the options market had already been signaling stress for nearly a week.

The composite flagged August 2 as a stress day, driven primarily by the correlation breakdown signal. The SPY/GLD rolling correlation had been deteriorating since late July as gold started decoupling from equities. The composite didn't catch the July 30 skew spike cleanly because the skew signal compressed the day after, pulling the z-score back down. But the combination of a spiking skew on July 30 and a flagged composite on August 2 gave a two-stage warning before the August 5 crash.

The yen carry unwind is the clearest case in this analysis for the thesis that structured market signals carry information that news sentiment does not. The options market wasn't prescient. But it was pricing something that the headlines weren't.

Event 3: US-China Tariff Shock, Apr 2025

The April 2025 tariff shock is the most interesting event in this analysis, not because the signals worked, but because of where they failed.

The numbers are severe. SPY dropped 5.85% at T+1 and continued falling through T+3. Every equity sector moved between 4% and 9%. XLE led at 9.20%, reflecting the direct exposure of energy and trade-dependent sectors to tariff policy. ITA followed at 8.44%. Tech dropped 6.59%.

These aren't volatility moves. They're repricing moves, the market adjusting its estimate of what these companies are actually worth under a structurally different trade regime.

The safe haven behavior is the most diagnostic part of this chart. GLD rose 2.34% at T+1 and kept climbing in the days that followed. TLT moved only 1.09% at T+1 and then sold off. Bonds and equities fell together. There was no flight to bonds. The only clean safe haven was gold.

This is what distinguishes a structural shock from the other two event types. In a confidence shock, both gold and bonds rally. In a liquidity shock, bonds rally hard. In a structural shock, bonds offer no protection because the shock itself calls into question the fiscal and monetary outlook. Gold becomes the only asset without a counterparty.

This is where the analysis gets genuinely uncomfortable.

On April 2, 2025, the day before the crash, skew was -0.023. Negative. ATM calls were more expensive than OTM puts. The options market wasn't pricing downside risk. It was pricing upside.

Skew had been elevated through mid-March, ranging from 0.025 to 0.042, then compressed steadily through late March. By the time the tariff announcement hit, the options market had actively de-risked its fear positioning.

There are two plausible explanations. The first is that the market had been pricing tariff risk as a negotiating tactic throughout March, then concluded by early April that a deal was likely. The negative skew on April 2 reflects collective confidence that the announced tariffs wouldn't materialize at full scale.

The second is that the options market simply didn't have the information. The tariff announcement on April 2 was more severe and more immediate than most participants expected.

Either way, the options market failed as an early warning signal here. This isn't a flaw in the methodology. It's a finding. Skew measures what market participants are willing to pay for protection. If participants have collectively decided a risk isn't worth pricing, skew won't warn you. That decision can be wrong.

The composite flagged March 31 as a stress day, three days before the crash. The signal came entirely from the correlation breakdown component, not the skew component. The SPY/GLD rolling correlation had been deteriorating through late March as gold climbed while equities softened. The composite picked up that decoupling even while skew was compressing.

On April 2, the composite dropped sharply to -0.817. The skew component had turned strongly negative, overwhelming the still-elevated correlation signal and flipping the composite well below zero. The composite effectively said no stress, just before the largest single-day SPY drop of 2025.

The tariff shock exposes a real limitation of any signal built on options pricing. When the market has collectively mispriced a risk, the signal will reflect that mispricing. The correlation breakdown component performed better here, but one signal out of two isn't a reliable composite.

Putting It All Together: The Heatmap

The individual event analyses show three different stories. The heatmap puts them side by side so the differences are visible in one place.

fig = make_subplots(rows=1, cols=3,
    subplot_titles=[e['label'] for e in events.values()],
    horizontal_spacing=0.08)

for i, (name, meta) in enumerate(events.items()):
    window = event_windows[name]['window']
    anchor = event_windows[name]['anchor']
    anchor_idx = window.index.get_loc(anchor)

    start_i = max(anchor_idx - 3, 0)
    end_i = min(anchor_idx + 8, len(window))
    slice_df = window.iloc[start_i:end_i].copy()
    slice_df.columns = [c.upper() for c in slice_df.columns]

    anchor_pos = anchor_idx - start_i
    anchor_vals = slice_df.iloc[anchor_pos]
    pct_df = ((slice_df - anchor_vals) / anchor_vals * 100).round(2)

    n_days = len(pct_df)
    t_labels = [f'T{d:+d}' for d in range(-anchor_pos, -anchor_pos + n_days)]

    fig.add_trace(go.Heatmap(
        z=pct_df.values.T,
        x=t_labels,
        y=list(pct_df.columns),
        colorscale='RdYlGn',
        zmid=0,
        zmin=-15,
        zmax=15,
        showscale=(i == 2),
        colorbar=dict(title='% return from T0')
    ), row=1, col=i+1)

fig.update_layout(
    title='Asset Return Heatmap - T-3 to T+7 across Events',
    template='plotly_dark',
    height=500
)

for annotation in fig['layout']['annotations']:
    annotation['font'] = dict(size=11)
    annotation['y'] = 1.02
    
fig.show()

Three panels, one per event, each showing percentage returns relative to the event date from T-3 to T+7. Green means the asset gained relative to T0. Red means it lost. The color scale is capped at plus or minus 15%, so the tariff shock’s extreme moves don't wash out the smaller Oct 7 moves.

The VIXY row tells different stories depending on the event. In the Hamas attack and tariff shock, it spikes green post-event as volatility surged above its T0 level. In the yen carry unwind, the row is deep red throughout, not because volatility didn't spike but because VIXY was already at its highest point on August 5, the anchor date, making everything relative to T0 look flat or negative.

Look at the GLD row. In the Hamas attack, it stays near neutral, a minimal safe haven response. In the yen carry unwind, it turns green post-event as forced selling cleared and gold recovered. In the tariff shock, it turns deeply green and stays there, the strongest and most sustained move of any asset across the three events.

The TLT row shows the starkest contrast. Near neutral in the Hamas attack, clearly green in the yen carry unwind as bonds rallied hard, and near neutral to slightly negative in the tariff shock. Bonds were a reliable safe haven in one event and offered almost nothing in the other two.

The equity rows tell the scale story. In the Hamas attack, the colors are pale, with small moves in both directions. In the yen carry unwind, they're moderately red before recovering to green. In the tariff shock, they are deep red across every sector from T0 through T+3, the kind of uniform selloff that happens when the market is repricing fundamentals, not just pricing fear.

This is what the taxonomy looks like in data form. Three events, three fingerprints, and three different markets responding to three different things that all got filed under the same label.

Final Thoughts

The three events in this analysis all got the same label. But the data gave them three different ones.

A confidence shock prices fear without pricing economic damage. Gold and bonds rally, equities hold, recovery is faster than it feels.

A liquidity shock is mechanical: everything sells off because positioning unwinds, not because fundamentals changed.

A structural shock reprices what companies are actually worth under a different economic regime. Bonds offer no protection. Gold is the only clean hedge. Recovery timeline is unknown.

The IV skew and correlation composite built here using EODHD’s historical and options data worked cleanly for one event, partially for another, and failed for the third. That's not a reason to dismiss the signals. It's a reason to understand what they measure. Skew reflects what participants are paying for downside protection. When the market has collectively decided a risk isn't worth pricing, skew goes quiet. That silence isn't safety.

The most useful output of this framework isn't a signal. It's a question: what kind of shock is this? The answer changes everything that follows.

How to Choose the Best Stock Market API for FinTech Projects and AI Agents

Nikhil Adithyan — Sun, 07 Jun 2026 00:22:29 +0000

Choosing a stock API looks simple until the project becomes real.

At first, you only need a few prices. You send a request, get JSON back, load it into pandas, and move on. But the moment that API starts powering a backtester, dashboard, screener, valuation tool, or AI assistant, the decision becomes much more serious.

A backtester needs adjusted historical prices, splits, dividends, and stable time series. A dashboard needs fresh quotes, clean fields, and reliable responses. A stock screener needs fundamentals, ratios, and company metadata. An AI agent needs structured data that it can retrieve and use without guessing.

That's why I wouldn't start by comparing endpoint counts or pricing pages. Those matter, but they're not the first question.

The first question is: what are you building?

In this article, we’ll walk through how to choose a stock market API based on the workflow it needs to support. Then we’ll build a practical stock research workflow in Python using Alpha Vantage to see how prices, fundamentals, technical indicators, and AI-ready access can fit together in one project.

Why Stock API Choice Depends On The Workflow
What A Modern Stock Market Data Workflow Actually Requires
Building A Practical Stock Research Workflow With Alpha Vantage
Where Each Provider Fits In The Stock API Workflow
Provider Breakdown Through A Workflow Lens
Final Checklist Before Choosing A Stock API
Final Thoughts

Why Stock API Choice Depends On The Workflow

A stock API should be judged by the workflow it supports, not by how long its feature list looks. The same provider can be a good fit for one project and a weak fit for another.

A clean historical dataset matters more for a backtester than a live quote endpoint. A dashboard has different problems. It needs fresh responses, predictable fields, and rate limits that don't collapse once users start refreshing the page.

Here is how I would think about it.

1. If You Are Building A Backtester

Start with historical data quality.

A backtest needs adjusted prices, splits, dividends, long history, and stable time series. If those pieces are wrong, the backtest can still run, but the results may be misleading.

For this workflow, real-time data is usually secondary. Clean historical data matters more than fast quotes.

2. If You Are Building A Dashboard

Start with freshness and reliability.

A dashboard needs quote data that updates consistently, fields that don't change unexpectedly, and rate limits that can handle repeated requests. A failed request in a notebook is annoying. A failed request in a user-facing dashboard is a product problem.

You also need to check whether the data can be displayed to users. Licensing becomes part of the workflow once the dashboard is public.

3. If You Are Building A Stock Screener

Start with fundamentals and structured fields.

A screener needs more than prices. It may need ratios, company profiles, sector data, market cap, earnings, and symbol coverage across many companies.

The hard part is comparison. If fields are inconsistent across tickers, the screener becomes a cleanup project before it becomes a useful tool.

4. If You Are Building A Valuation Or Research Tool

Start with financial statements.

A valuation workflow usually needs income statements, balance sheets, cash flow statements, earnings history, and historical fundamentals. Price data gives market context, but the business data does the heavier work.

This is where depth matters. The latest numbers are useful, but trends across multiple periods are often more important.

5. If You Are Building An AI Assistant Or Agent

Start with structure.

An AI agent shouldn't guess financial data from memory. It needs predictable API responses, clear schemas, and tool access it can use reliably.

This is where MCP-style workflows matter. If an agent can call a tool, retrieve a quote, pull fundamentals, or fetch a time series cleanly, the API becomes part of the agent’s reasoning loop.

The practical point is simple: choose the API around the system you're building. Once the workflow is clear, the rest of the decision becomes much easier.

What A Modern Stock Market Data Workflow Actually Requires

A modern stock data workflow is rarely just one API call.

You might start with market data, but most useful projects eventually need more layers. A research dashboard may need fundamentals. A screener may need technical indicators. An AI assistant may need structured responses that it can retrieve through a tool.

A simple way to think about the workflow is:

Market Data -> Fundamentals -> Indicators -> Structured Responses -> Programmatic Workflow -> AI/Agent Access

Each layer solves a different problem.

Market data gives you prices, volume, returns, and historical movement.
Fundamentals add business context through revenue, margins, cash flow, earnings, and company details.
Indicators help convert raw prices into features that can support screening, research, or signal testing.
Structured responses make the data easier to parse, join, and reuse.
Programmatic workflows turn the raw API response into tables, charts, models, dashboards, or research outputs.
AI or agent access lets an assistant call tools, retrieve current data, and work with structured financial context instead of relying only on static knowledge.

This is why stock API choice matters beyond the first request. The API is not only there to return data but to support the way the project grows after the prototype.

Building A Practical Stock Research Workflow With Alpha Vantage

Now let’s turn the framework into something practical.

For this section, we’ll use Alpha Vantage as the implementation API because it gives us the main layers we need for this workflow: adjusted historical prices, company data, technical indicators, and MCP-style access for AI agents.

The goal isn't to test every endpoint. The goal is to build a small research workflow that shows what a useful stock API should help us do.

We’ll build this in five steps:

Fetch adjusted historical prices.
Add company or fundamental data.
Add a technical indicator.
Combine everything into a research-ready table.
Connect the workflow to an AI-agent setup using MCP.

By the end, we should have a simple but practical stock research table that can support a screener, dashboard, research notebook, or AI assistant.

Step 1: Fetch Adjusted Historical Prices

Adjusted prices are the first thing I would check for any research or backtesting workflow. Raw prices can break around stock splits or dividends, while adjusted prices keep the series more useful for return calculations.

Let’s fetch daily adjusted price data for Apple.

import requests
import pandas as pd

api_key = 'YOUR ALPHA VANTAGE API KEY'

symbol = 'AAPL'

url = f'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol={symbol}&outputsize=compact&apikey={api_key}'

response = requests.get(url)
data = response.json()

prices = pd.DataFrame(data['Time Series (Daily)']).T

prices.index = pd.to_datetime(prices.index)
prices = prices.sort_index()

prices = prices.rename(columns={
    '1. open': 'open',
    '2. high': 'high',
    '3. low': 'low',
    '4. close': 'close',
    '5. adjusted close': 'adjusted_close',
    '6. volume': 'volume',
    '7. dividend amount': 'dividend',
    '8. split coefficient': 'split'
})

price_cols = ['open', 'high', 'low', 'close', 'adjusted_close', 'volume', 'dividend', 'split']
prices[price_cols] = prices[price_cols].astype(float)

prices.tail()

The output gives us a clean daily price table as you can see in the image below:

For a chart, you may only need close. For research or backtesting, I would usually work with adjusted_close because it handles corporate actions more safely. Next, we can convert the time series into a few basic price features.

latest_price = prices['adjusted_close'].iloc[-1] 
return_30d = prices['adjusted_close'].pct_change(30).iloc[-1] 
volatility_30d = prices['adjusted_close'].pct_change().tail(30).std() 

price_features = {'symbol': symbol, 'latest_price': latest_price, 'return_30d': return_30d, 'volatility_30d': volatility_30d}
price_features

This returns:

{'symbol': 'AAPL',
 'latest_price': 312.06,
 'return_30d': 0.18583097277442007,
 'volatility_30d': 0.012845143800989936}

This is already more useful than a raw API response. We now have a small set of price features that can feed a dashboard, screener, research table, or AI-assisted stock analysis workflow.

Step 2: Add Company Or Fundamental Data

Price data tells us how the stock moved, but it doesn't tell us much about the company behind the ticker. For a screener, valuation tool, or research workflow, we need some business context too.

Alpha Vantage’s OVERVIEW endpoint gives company-level fields like sector, industry, market cap, PE ratio, EPS, profit margin, and other summary metrics. Let’s pull those fields and keep only the ones we need for this workflow.

overview_url = f'https://www.alphavantage.co/query?function=OVERVIEW&symbol={symbol}&apikey={api_key}'

response = requests.get(overview_url)
overview = response.json()

fundamental_features = {
    'symbol': symbol,
    'name': overview.get('Name'),
    'sector': overview.get('Sector'),
    'industry': overview.get('Industry'),
    'market_cap': overview.get('MarketCapitalization'),
    'pe_ratio': overview.get('PERatio'),
    'eps': overview.get('EPS'),
    'profit_margin': overview.get('ProfitMargin'),
    'beta': overview.get('Beta')
}

fundamental_features

This returns:

{'symbol': 'AAPL',
 'name': 'Apple Inc',
 'sector': 'TECHNOLOGY',
 'industry': 'CONSUMER ELECTRONICS',
 'market_cap': 4583336182000.0,
 'pe_ratio': 37.73,
 'eps': 8.27,
 'profit_margin': 0.272,
 'beta': 1.065}

Now we have two layers: price behavior from the time series data and business context from the company overview. The next step is to add a technical indicator so the table includes a market-derived signal as well.

Step 3: Add Technical Indicators

Fundamentals give us business context, but many research workflows also need market-derived signals. A simple example is the relative strength index, or RSI, which is often used to measure recent momentum.

Alpha Vantage has a RSI endpoint, so we can pull the indicator directly instead of calculating it from scratch.

rsi_url = f'https://www.alphavantage.co/query?function=RSI&symbol={symbol}&interval=daily&time_period=14&series_type=close&apikey={api_key}'

response = requests.get(rsi_url)
rsi_data = response.json()

rsi = pd.DataFrame(rsi_data['Technical Analysis: RSI']).T

rsi.index = pd.to_datetime(rsi.index)
rsi = rsi.sort_index()
rsi['RSI'] = rsi['RSI'].astype(float)

latest_rsi = rsi['RSI'].iloc[-1]

indicator_features = {
    'symbol': symbol,
    'rsi_14': latest_rsi
}

indicator_features

This returns:

{'symbol': 'AAPL', 'rsi_14': 79.0043}

Now the workflow has three layers:

price behavior from adjusted historical data
business context from company fundamentals
momentum context from a technical indicator

None of these is enough on its own. Together, they start to look like a usable research workflow instead of a raw API test.

Step 4: Combine Everything Into A Research-Ready Table

Now we can combine the price, fundamentals, and indicator layers into one table.

This is the part that matters for most real projects. A dashboard, screener, notebook, or AI assistant usually needs a clean object it can reuse, not three separate raw API responses.

research_row = {
    **price_features,
    **fundamental_features,
    **indicator_features
}

research_table = pd.DataFrame([research_row])

research_table

This gives us a single-row research table:

This table is simple, but it already supports several use cases.

A screener can filter on pe_ratio, profit_margin, or rsi_14. A dashboard can show price, returns, sector, and market cap. A research notebook can add more tickers and compare them. An AI assistant can receive this as a compact context object instead of parsing multiple API responses on its own.

That's the real benefit of building the workflow this way. The API calls are only the beginning. The useful output is the structured table you create from them.

Step 5: Connect The Workflow To AI Agents With MCP

The table we created is useful because it has a predictable structure, which is exactly what AI workflows need.

If an agent needs stock context, it shouldn't guess from memory or parse several raw API responses every time. It should call a tool, retrieve the data, and receive something clean enough to use.

A simplified MCP workflow looks like this:

User question -> AI agent -> MCP tool call -> Stock API data -> Structured response -> Final answer

For example, a user might ask:

Is Apple looking expensive compared with its recent momentum?

An agent could retrieve price data, fundamentals, and an indicator such as RSI before answering. The important part is not that the model already “knows” the answer. It's that the model can call the right tool and work with current data.

That is where our research table helps:

research_table.to_dict(orient='records')[0]

This returns a compact dictionary:

{'symbol': 'AAPL',
 'latest_price': 312.06,
 'return_30d': 0.18583097277442007,
 'volatility_30d': 0.012845143800989936,
 'name': 'Apple Inc',
 'sector': 'TECHNOLOGY',
 'industry': 'CONSUMER ELECTRONICS',
 'market_cap': 4583336182000.0,
 'pe_ratio': 37.73,
 'eps': 8.27,
 'profit_margin': 0.272,
 'beta': 1.065,
 'rsi_14': 79.0043}

This doesn't replace proper analysis, and it shouldn't be treated as investment advice. But it gives an AI assistant a cleaner starting point than raw JSON, stale model knowledge, or a vague prompt with no data attached.

AI readiness isn't just about saying an API supports agents. The API has to return data that can be retrieved, structured, checked, and passed into a workflow without fragile glue code at every step.

Where Each Provider Fits In The Stock API Workflow

The workflow we built above is one version of a modern stock data project: prices, fundamentals, indicators, programmatic analysis, and AI-agent access working together.

Other projects may need a narrower or more specialized provider. Here's a practical way to compare the fit:

Provider	Market Data	Fundamentals	Technical Indicators	Developer Workflow	AI / Agent Readiness	Workflow Completeness	Best Fit
Alpha Vantage	Strong	Strong	Strong	Strong	Strong	High	Broad technical projects, research tools, screeners, dashboards, and AI-agent workflows
Bloomberg API	Very strong	Strong	Moderate	Enterprise-focused	Enterprise-dependent	High	Institutions already using Bloomberg internally
QuoteMedia	Strong	Moderate	Limited / Moderate	Moderate	Limited	Medium	Investor relations websites and embedded market data widgets
EODHD	Strong	Good	Good	Good	Strong	High	Global EOD history, backtesting, and historical research
Intrinio	Good	Strong	Limited / Moderate	Good	Limited / Moderate	Medium / High	US fundamentals, valuation tools, and professional datasets
Xignite	Strong	Good	Limited / Moderate	Enterprise-focused	Limited / Moderate	Medium / High	Enterprise financial applications needing vendor support

No provider fits every workflow equally well. The point of this table is to show where the fit is strongest.

Alpha Vantage works well when a project needs several layers together, especially market data, fundamentals, indicators, developer usability, and AI-agent access. EODHD is stronger when the workflow is centered on global historical research. Intrinio fits better when standardized US fundamentals are the main requirement. Bloomberg API and Xignite are more natural for institutional or enterprise environments, while QuoteMedia is more specialized around investor relations and embedded market data widgets.

This is the right way to think about stock APIs: not as one universal winner, but as different tools for different workflow shapes.

Provider Breakdown Through A Workflow Lens

The table gives a quick comparison. This section explains what that means in practice.

Instead of asking which provider is “best” in general, it is better to ask: what kind of workflow is this provider naturally built for?

1. When The Project Needs Several Data Layers: Alpha Vantage

Alpha Vantage fits well when the project needs more than one type of market data in the same workflow.

In the workflow we built earlier, we used:

adjusted historical prices
company data
technical indicators
structured output for programmatic analysis
a format that can also support AI-agent workflows

That makes Alpha Vantage a flexible fit for stock research notebooks, screeners, dashboards, backtesting workflows, and AI assistants that need market data through tools or MCP-style access.

The main caveat is specialization. If your project needs direct exchange infrastructure, co-location, or a highly specialized institutional setup, you may need a more specialized provider. But for most research, fintech apps, and AI workflows, Alpha Vantage gives enough breadth without forcing you to combine several APIs too early.

2. When The Workflow Is Institutional: Bloomberg API

Bloomberg API makes sense when the organization already uses Bloomberg internally.

It's best suited for firms that want to connect Bloomberg data with internal tools, reports, models, and risk systems.

This isn't usually the right fit for solo developers or small teams. The cost, licensing, and ecosystem dependency make it more suitable for institutions.

3. When The Product Needs Investor Relations Widgets: QuoteMedia

QuoteMedia fits products where the main need is public-facing market data display.

That can include:

investor relations pages
quote widgets
embedded charts
company stock pages
market data modules for public websites

This is different from building a programmatic research workflow. QuoteMedia makes more sense when presentation and embedded financial data are the core product requirement.

4. When The Workflow Is Global Historical Research: EODHD

EODHD fits well when the project needs broad historical data across global markets.

It's useful for long-horizon backtesting, global screeners, and research workflows that depend on end-of-day data from many exchanges.

The tradeoff is cleanup. Global data often brings differences in symbols, exchange calendars, currencies, and local market conventions. That's manageable, but it should be expected.

5. When The Workflow Needs US Fundamentals: Intrinio

Intrinio fits well when standardized US fundamentals are the center of the product.

It's useful for:

valuation tools
earnings dashboards
fundamentals-based screeners
professional US equity research workflows

The main thing to check is dataset fit. Before building around Intrinio, I would look closely at the specific datasets, access terms, and coverage levels the product needs.

6. When The Workflow Needs Enterprise Data Delivery: Xignite

Xignite fits larger financial applications that need formal vendor support.

This can include banks, brokerages, wealth platforms, and enterprise fintech products where support, contracts, reliability, and data relationships matter as much as the endpoint itself.

For smaller developer projects, it may feel heavier than necessary. For enterprise products, that structure can be exactly the point.

Final Checklist Before Choosing A Stock API

Before choosing a provider, I would run through this checklist.

Question	Why It Matters
What am I building?	A backtester, dashboard, screener, valuation tool, and AI assistant all need different things.
Do I need real-time, delayed, or historical data?	Real-time access matters only if the workflow actually needs it.
Do I need adjusted prices?	For backtesting and research, adjusted prices are usually non-negotiable.
Do I need fundamentals?	Screeners, valuation tools, and research dashboards usually need company data, not just prices.
Do I need technical indicators?	Signal testing, filters, and momentum-style analysis may need indicators directly from the API or calculated separately.
How many symbols will I query?	One ticker in a notebook is easy. Hundreds of tickers can expose rate-limit and performance issues quickly.
Will users see the data?	If yes, licensing, display rights, storage rules, and redistribution terms matter before the product goes live.
Is the response easy to parse in Python or other programming languages?	Clean JSON can save a lot of cleanup work once the project grows.
Can it support AI or agent workflows?	AI assistants need structured responses, tool compatibility, or MCP-style access.
Will this API still work after the prototype stage?	A provider can be easy to try and still be hard to build around.

Final Thoughts

A good stock API should reduce project risk, not just return data.

If you're building a small chart, almost any clean price endpoint can work. But once the same API starts supporting a backtester, screener, dashboard, valuation tool, or AI assistant, the decision becomes more important. The provider affects your data quality, parsing logic, refresh jobs, licensing choices, and future product direction.

This is why workflow fit matters more than endpoint count. For projects that need several layers together, such as real-time and historical market data, fundamentals, indicators, developer-friendly access, spreadsheet support, and MCP-style AI workflows, Alpha Vantage fits well. For narrower workflow needs, another provider may make more sense.

Choose the API as part of your project’s data infrastructure, not just as a list of endpoints.

Beyond NVIDIA: Where the AI Infra Trade Actually Shows Up

Nikhil Adithyan — Fri, 29 May 2026 22:26:37 +0000

The AI capex trade is usually discussed like one clean idea. Capex simply means capital expenditure, or the money companies spend on long-term assets like data centers, chips, servers, power systems, and other infrastructure.

NVIDIA. Hyperscalers. Data centers. Power demand. Everything gets pushed into the same bucket and called "AI infrastructure."

But I don't think this is very useful anymore.

Capex doesn't move through the market as a headline. It moves through a chain. A cloud company decides to spend more on AI infrastructure, but that spending has to pass through chips, semiconductor equipment, servers, networking, data centers, power systems, cooling, and construction before it becomes usable compute.

That's where the story gets more interesting.

The obvious AI names still matter, but they're not the whole map. If AI capex is becoming one of the biggest investment cycles in the market, then the better question isn't just:

"Which companies are AI stocks?"

It's actually:

"Where does the money actually travel?"

In this article, we'll use Python and EODHD data to build a simple AI capex map. The goal isn't to create a buy list. The goal is to separate the theme into layers, compare fundamentals with market recognition, and see where the AI infrastructure trade is already showing up in the data.

Prerequisites
What We're Investigating
Import the Required Packages
Building the AI Capex Universe
Pulling the Financial Data Behind the Story
- Fundamentals Data
- Historical Prices Data
Separating Business Strength from Market Recognition
- Fundamental Signal
- Market Recognition Signal
The AI Capex Matrix: Where the Trade Actually Shows Up
Which AI Infrastructure Layers Has the Market Rewarded Most?
The Physical Infrastructure Layer Is No Longer Hidden
What the Market Has Already Noticed
What This Study Shows
Conclusion

Prerequisites

Before following along, you should be comfortable with basic Python, especially working with dictionaries, lists, functions, and pandas DataFrames.

You’ll also need:

Python 3.9 or later
An EODHD API key
The following Python libraries: requests, pandas, numpy, and matplotlib
Basic familiarity with financial metrics like revenue growth, profit margin, P/E ratio, stock returns, volatility, and drawdown

You don’t need advanced finance knowledge for this article. The goal is to show how data visualization can help map a market theme, not to build a complete valuation model or stock recommendation engine.

What We're Investigating

The lazy version of this article would be a list of AI stocks.

That's not what I want to do here.

The more useful approach is to treat AI capex as a spending chain and ask where each part of that chain appears in the market.

A company selling GPUs is exposed to the theme in one way. A company building electrical systems for data centers is exposed in a completely different way. Both can benefit from the same capex cycle, but the economics, margins, valuation, and market behavior may look very different.

So the investigation has three parts.

First, we'll create a working AI infrastructure universe across layers like chips, semiconductor equipment, servers, networking, data centers, power, cooling, and construction.

Second, we'll pull fundamentals and price data from EODHD to measure two things:

Fundamental signal: Is the business showing growth and profitability?
Market recognition signal: Has the stock already been rewarded by the market?

Third, we'll map the companies into a matrix and look for patterns.

The main output isn't a ranking of the "best AI infrastructure stocks." It's a clearer view of where the AI capex trade has already shown up, where it looks concentrated, and where the physical infrastructure layer starts becoming hard to ignore.

Import the Required Packages

We'll keep the setup light. This is an analysis notebook, not a production system.

import requests
import pandas as pd
import numpy as np
from datetime import date, timedelta
import matplotlib.pyplot as plt

These packages cover everything we need here.

requests will call the EODHD API, pandas will handle the tables, and numpy will help with basic calculations. We'll use date and timedelta for the one-year price window, and matplotlib for the charts.

Building the AI Capex Universe

There's one issue with analyzing AI infrastructure stocks: AI capex exposure isn't a clean financial field.

No API directly tells us that a company is "30% exposed to AI data center spending" or "highly tied to GPU infrastructure." So we need a research universe first.

For this article, I used an LLM as a research assistant to draft the first version of the AI capex chain, then manually reviewed the companies before pulling fundamentals and price data from EODHD.

The universe is split into layers:

Demand-side hyperscalers
AI compute and chips
Semiconductor equipment
Servers and storage
Networking
Data centers
Power and electrification
Cooling and industrial systems
Construction and engineering

ai_capex_universe = [
    {'ticker': 'MSFT.US', 'company': 'Microsoft', 'capex_layer': 'Demand-side hyperscalers', 'exposure_level': 'High', 'reason': 'Major cloud and AI infrastructure spender through Azure'},
    {'ticker': 'AMZN.US', 'company': 'Amazon', 'capex_layer': 'Demand-side hyperscalers', 'exposure_level': 'High', 'reason': 'Large AI and cloud infrastructure spender through AWS'},
    {'ticker': 'GOOGL.US', 'company': 'Alphabet', 'capex_layer': 'Demand-side hyperscalers', 'exposure_level': 'High', 'reason': 'Major AI infrastructure spender across Google Cloud and internal AI systems'},
    {'ticker': 'META.US', 'company': 'Meta Platforms', 'capex_layer': 'Demand-side hyperscalers', 'exposure_level': 'High', 'reason': 'Large AI compute and data center spending program'},

    {'ticker': 'NVDA.US', 'company': 'NVIDIA', 'capex_layer': 'AI compute and chips', 'exposure_level': 'Very High', 'reason': 'Core GPU and accelerator supplier for AI training and inference'},
    {'ticker': 'AMD.US', 'company': 'Advanced Micro Devices', 'capex_layer': 'AI compute and chips', 'exposure_level': 'High', 'reason': 'AI accelerator and data center CPU exposure'},
    {'ticker': 'AVGO.US', 'company': 'Broadcom', 'capex_layer': 'AI compute and chips', 'exposure_level': 'High', 'reason': 'Custom silicon and networking exposure for AI infrastructure'},
    {'ticker': 'MRVL.US', 'company': 'Marvell Technology', 'capex_layer': 'AI compute and chips', 'exposure_level': 'High', 'reason': 'Custom silicon, networking, and data infrastructure exposure'},

    {'ticker': 'AMAT.US', 'company': 'Applied Materials', 'capex_layer': 'Semiconductor equipment', 'exposure_level': 'High', 'reason': 'Supplies equipment used in advanced chip manufacturing'},
    {'ticker': 'LRCX.US', 'company': 'Lam Research', 'capex_layer': 'Semiconductor equipment', 'exposure_level': 'High', 'reason': 'Semiconductor manufacturing equipment supplier'},
    {'ticker': 'KLAC.US', 'company': 'KLA', 'capex_layer': 'Semiconductor equipment', 'exposure_level': 'High', 'reason': 'Process control and inspection tools for chip manufacturing'},
    {'ticker': 'ASML.US', 'company': 'ASML', 'capex_layer': 'Semiconductor equipment', 'exposure_level': 'Very High', 'reason': 'Critical lithography equipment supplier for advanced chips'},

    {'ticker': 'DELL.US', 'company': 'Dell Technologies', 'capex_layer': 'Servers and storage', 'exposure_level': 'High', 'reason': 'AI server and enterprise hardware exposure'},
    {'ticker': 'HPE.US', 'company': 'Hewlett Packard Enterprise', 'capex_layer': 'Servers and storage', 'exposure_level': 'Medium', 'reason': 'Server, storage, and enterprise infrastructure exposure'},
    {'ticker': 'SMCI.US', 'company': 'Super Micro Computer', 'capex_layer': 'Servers and storage', 'exposure_level': 'High', 'reason': 'AI server systems and data center hardware exposure'},

    {'ticker': 'ANET.US', 'company': 'Arista Networks', 'capex_layer': 'Networking', 'exposure_level': 'High', 'reason': 'Data center networking supplier tied to AI cluster buildouts'},
    {'ticker': 'CSCO.US', 'company': 'Cisco', 'capex_layer': 'Networking', 'exposure_level': 'Medium', 'reason': 'Networking and enterprise infrastructure exposure'},

    {'ticker': 'EQIX.US', 'company': 'Equinix', 'capex_layer': 'Data centers', 'exposure_level': 'Medium', 'reason': 'Global data center and interconnection infrastructure'},
    {'ticker': 'DLR.US', 'company': 'Digital Realty', 'capex_layer': 'Data centers', 'exposure_level': 'Medium', 'reason': 'Data center real estate exposure'},

    {'ticker': 'VRT.US', 'company': 'Vertiv', 'capex_layer': 'Power and electrification', 'exposure_level': 'High', 'reason': 'Power and thermal infrastructure for data centers'},
    {'ticker': 'ETN.US', 'company': 'Eaton', 'capex_layer': 'Power and electrification', 'exposure_level': 'Medium', 'reason': 'Electrical systems and power management exposure'},
    {'ticker': 'PWR.US', 'company': 'Quanta Services', 'capex_layer': 'Power and electrification', 'exposure_level': 'Medium', 'reason': 'Grid, power, and infrastructure construction exposure'},
    {'ticker': 'CEG.US', 'company': 'Constellation Energy', 'capex_layer': 'Power and electrification', 'exposure_level': 'Medium', 'reason': 'Power demand beneficiary from data center expansion'},

    {'ticker': 'TT.US', 'company': 'Trane Technologies', 'capex_layer': 'Cooling and industrial systems', 'exposure_level': 'Medium', 'reason': 'Cooling and climate systems exposure for buildings and infrastructure'},
    {'ticker': 'CARR.US', 'company': 'Carrier Global', 'capex_layer': 'Cooling and industrial systems', 'exposure_level': 'Medium', 'reason': 'Cooling, HVAC, and infrastructure systems exposure'},
    {'ticker': 'JCI.US', 'company': 'Johnson Controls', 'capex_layer': 'Cooling and industrial systems', 'exposure_level': 'Medium', 'reason': 'Building systems, controls, and cooling infrastructure exposure'},

    {'ticker': 'EME.US', 'company': 'EMCOR Group', 'capex_layer': 'Construction and engineering', 'exposure_level': 'Medium', 'reason': 'Electrical and mechanical construction exposure'},
    {'ticker': 'FIX.US', 'company': 'Comfort Systems USA', 'capex_layer': 'Construction and engineering', 'exposure_level': 'Medium', 'reason': 'Mechanical and electrical services for commercial infrastructure'}
]

universe = pd.DataFrame(ai_capex_universe)

universe.head()

This gives us the research universe.

The important thing is that this table doesn't prove anything by itself. It only defines the map. The actual comparison comes from the fundamentals and historical price data we pull next.

Pulling the Financial Data Behind the Story

The universe gives us the map, but the map is not the analysis.

Now we need actual data behind each company. For that, we'll use EODHD fundamentals and historical prices.

The fundamentals help us check business strength. The price data helps us see whether the market has already recognized the company as part of the AI capex trade.

Fundamentals Data

First, we'll pull fundamentals using EODHD's fundamentals endpoint.

api_key = 'YOUR EODHD API KEY'

def get_fundamentals(ticker):
    url = f'https://eodhd.com/api/fundamentals/{ticker}?api_token={api_key}&fmt=json'
    data = requests.get(url).json()
    return data

Note: Replace YOUR EODHD API KEY with your actual EODHD API key.

This function calls the fundamentals endpoint for one ticker and returns the full JSON response.

We don't need the entire response for this analysis, so we'll extract only the fields we care about.

def extract_fundamental_fields(ticker, data):
    general = data.get('General', {})
    highlights = data.get('Highlights', {})
    valuation = data.get('Valuation', {})
    technicals = data.get('Technicals', {})

    return {
        'ticker': ticker,
        'sector': general.get('Sector'),
        'industry': general.get('Industry'),
        'market_cap': highlights.get('MarketCapitalization'),
        'revenue_growth_yoy': highlights.get('QuarterlyRevenueGrowthYOY'),
        'profit_margin': highlights.get('ProfitMargin'),
        'operating_margin': highlights.get('OperatingMarginTTM'),
        'return_on_equity': highlights.get('ReturnOnEquityTTM'),
        'pe_ratio': highlights.get('PERatio'),
        'forward_pe': valuation.get('ForwardPE'),
        'beta': technicals.get('Beta')
    }

These fields give us a compact view of growth, profitability, valuation, and company context.

Now we can run this across the full universe.

fundamental_rows = []

for ticker in universe['ticker']:
    try:
        data = get_fundamentals(ticker)
        row = extract_fundamental_fields(ticker, data)
        fundamental_rows.append(row)
        print(f'{ticker} DONE')

    except Exception as e:
        fundamental_rows.append({
            'ticker': ticker,
            'sector': np.nan,
            'industry': np.nan,
            'market_cap': np.nan,
            'revenue_growth_yoy': np.nan,
            'profit_margin': np.nan,
            'operating_margin': np.nan,
            'return_on_equity': np.nan,
            'pe_ratio': np.nan,
            'forward_pe': np.nan,
            'beta': np.nan
        })
        print(f'{ticker} ERROR')

fundamentals = pd.DataFrame(fundamental_rows)

fundamentals.head()

The try block keeps the scan moving if one ticker fails. That matters because this universe mixes different types of companies, and one missing response should not break the whole analysis.

Historical Prices Data

Next, we'll pull one year of historical prices using EODHD's historical end-of-day prices endpoint.

price_start = date.today() - timedelta(days=365)
price_end = date.today()

def get_price_history(ticker):
    url = f'https://eodhd.com/api/eod/{ticker}?api_token={api_key}&fmt=json&from={price_start.isoformat()}&to={price_end.isoformat()}&period=d'
    data = requests.get(url).json()
    prices = pd.DataFrame(data)

    if prices.empty:
        return pd.DataFrame()

    prices['date'] = pd.to_datetime(prices['date'], errors='coerce')
    prices['adjusted_close'] = pd.to_numeric(prices['adjusted_close'], errors='coerce')

    prices = prices.dropna(subset=['date', 'adjusted_close'])
    prices = prices.sort_values('date').reset_index(drop=True)

    return prices[['date', 'adjusted_close']]

We use adjusted close because it's cleaner for return calculations after splits and dividends.

Now we'll convert the price history into a few market signals.

def calculate_market_signals(prices):
    if prices.empty or len(prices) < 60:
        return {
            'return_1y': np.nan,
            'return_6m': np.nan,
            'return_3m': np.nan,
            'volatility_1y': np.nan,
            'max_drawdown_1y': np.nan
        }

    prices = prices.copy()
    prices['daily_return'] = prices['adjusted_close'].pct_change()

    latest_close = prices['adjusted_close'].iloc[-1]

    return_1y = (latest_close / prices['adjusted_close'].iloc[0]) - 1
    return_6m = (latest_close / prices['adjusted_close'].iloc[-126]) - 1 if len(prices) >= 126 else np.nan
    return_3m = (latest_close / prices['adjusted_close'].iloc[-63]) - 1 if len(prices) >= 63 else np.nan

    volatility_1y = prices['daily_return'].std() * np.sqrt(252)

    running_high = prices['adjusted_close'].cummax()
    drawdown = (prices['adjusted_close'] / running_high) - 1
    max_drawdown_1y = drawdown.min()

    return {
        'return_1y': return_1y,
        'return_6m': return_6m,
        'return_3m': return_3m,
        'volatility_1y': volatility_1y,
        'max_drawdown_1y': max_drawdown_1y
    }

These signals tell us how strongly the market has already responded to each company.

Now we run the same logic for every ticker.

market_rows = []

for ticker in universe['ticker']:
    try:
        prices = get_price_history(ticker)
        signals = calculate_market_signals(prices)
        signals['ticker'] = ticker
        market_rows.append(signals)
        print(f'{ticker} DONE')

    except Exception:
        market_rows.append({
            'ticker': ticker,
            'return_1y': np.nan,
            'return_6m': np.nan,
            'return_3m': np.nan,
            'volatility_1y': np.nan,
            'max_drawdown_1y': np.nan
        })
        print(f'{ticker} ERROR')

market_signals = pd.DataFrame(market_rows)

market_signals.head()

Finally, we merge the universe, fundamentals, and market signals into one dataset.

capex_data = universe.merge(fundamentals, on='ticker', how='left')
capex_data = capex_data.merge(market_signals, on='ticker', how='left')

print(capex_data.columns)
capex_data.head()

Separating Business Strength from Market Recognition

Now comes the part that makes the analysis useful.

If we only look at stock returns, we end up chasing what already moved. If we only look at fundamentals, we miss how the market is actually treating the theme.

So I split the analysis into two simple signals:

Fundamental Signal: is the business showing growth and profitability?
Market Recognition Signal: has the market already rewarded the stock?

First, we need a helper function to normalize each metric.

def min_max_score(series):
    series = pd.to_numeric(series, errors='coerce')

    if series.isna().all():
        return pd.Series(0, index=series.index)

    min_val = series.min()
    max_val = series.max()

    if min_val == max_val:
        return pd.Series(0.5, index=series.index)

    return (series - min_val) / (max_val - min_val)

This brings every metric into a 0 to 1 range, so growth, margins, returns, and drawdowns can be compared without mixing raw scales.

Fundamental Signal

Now we build the fundamental signal.

capex_data['revenue_growth_score'] = min_max_score(capex_data['revenue_growth_yoy'])
capex_data['profit_margin_score'] = min_max_score(capex_data['profit_margin'])
capex_data['operating_margin_score'] = min_max_score(capex_data['operating_margin'])
capex_data['roe_score'] = min_max_score(capex_data['return_on_equity'])

capex_data['fundamental_signal'] = (
    capex_data['revenue_growth_score'] * 0.35 +
    capex_data['operating_margin_score'] * 0.30 +
    capex_data['profit_margin_score'] * 0.20 +
    capex_data['roe_score'] * 0.15
) * 100

capex_data['fundamental_signal'] = capex_data['fundamental_signal'].round(2)
capex_data[['ticker', 'company', 'capex_layer', 'revenue_growth_yoy', 'operating_margin', 'profit_margin', 'return_on_equity', 'fundamental_signal']].sort_values('fundamental_signal', ascending=False).head(10)

This signal isn't trying to crown the best company. It's just checking whether the business data supports the AI capex story.

In my run, NVIDIA clearly stood out because its revenue growth and margins were on a different level. But the interesting part was not only NVIDIA. Names like KLA, Arista, Broadcom, Microsoft, Meta, Lam Research, Alphabet, and Super Micro also appeared near the top for different reasons.

That already tells us something important: the AI capex chain has different types of winners. Some are high-margin platform businesses. Some are semiconductor equipment names. Some are high-growth hardware names with thinner margins.

Market Recognition Signal

Now we build the market recognition signal.

capex_data['return_1y_score'] = min_max_score(capex_data['return_1y'])
capex_data['return_6m_score'] = min_max_score(capex_data['return_6m'])
capex_data['return_3m_score'] = min_max_score(capex_data['return_3m'])
capex_data['drawdown_score'] = min_max_score(capex_data['max_drawdown_1y'])

capex_data['market_recognition_signal'] = (
    capex_data['return_1y_score'] * 0.40 +
    capex_data['return_6m_score'] * 0.30 +
    capex_data['return_3m_score'] * 0.20 +
    capex_data['drawdown_score'] * 0.10
) * 100

capex_data['market_recognition_signal'] = capex_data['market_recognition_signal'].round(2)
capex_data[['ticker','company','capex_layer','return_1y','return_6m','return_3m','max_drawdown_1y','market_recognition_signal']].sort_values('market_recognition_signal', ascending=False).head(10)

This is where the story gets more interesting.

The market recognition list wasn't just filled with hyperscalers or chip names. Comfort Systems, Vertiv, Quanta Services, Dell, Applied Materials, and Lam Research showed up strongly. That is the first clear sign that the AI capex trade is spreading into the physical infrastructure layer, not staying locked inside the usual mega-cap AI basket.

The AI Capex Matrix: Where the Trade Actually Shows Up

At this point, we have two separate lenses.

The fundamental signal tells us whether the business looks strong.
The market recognition signal tells us whether the stock has already been rewarded.

Now we can put both on the same chart.

plt.figure(figsize=(12, 8))

plot_data = capex_data.dropna(
    subset=['market_recognition_signal', 'fundamental_signal', 'market_cap']
).copy()

plot_data['bubble_size'] = np.sqrt(plot_data['market_cap']) / 5000

for layer in plot_data['capex_layer'].unique():
    layer_data = plot_data[plot_data['capex_layer'] == layer]

    plt.scatter(
        layer_data['market_recognition_signal'],
        layer_data['fundamental_signal'],
        s=layer_data['bubble_size'],
        alpha=0.6,
        label=layer
    )

for _, row in plot_data.iterrows():
    if row['market_recognition_signal'] > 55 or row['fundamental_signal'] > 45:
        plt.text(row['market_recognition_signal'] + 0.8, row['fundamental_signal'] + 0.8, row['ticker'].replace('.US', ''), fontsize=10)

plt.axvline(plot_data['market_recognition_signal'].median(), linestyle='--', linewidth=1)
plt.axhline(plot_data['fundamental_signal'].median(), linestyle='--', linewidth=1)

plt.text(median_market + 2, median_fundamental + 55, 'Strong fundamentals,\nmore recognized',fontsize=10)
plt.text(4, median_fundamental + 55,'Strong fundamentals,\nless recognized',fontsize=10)
plt.text(median_market + 2, 4, 'High market recognition,\nweaker fundamentals',fontsize=10)
plt.text(4, 4, 'Less clear in this framework', fontsize=10)

plt.title('AI Capex Matrix: Fundamentals vs Market Recognition')
plt.xlabel('Market Recognition Signal')
plt.ylabel('Fundamental Signal')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

This is the most useful chart in the study.

It makes one thing clear: AI capex doesn't show up in one clean cluster.

NVIDIA is the obvious fundamental outlier. That makes sense. Its growth and margins are difficult to compare with almost anything else in the universe.

But the right side of the chart is where the broader story starts. AMD, Marvell, Vertiv, Comfort Systems, Dell, Lam Research, Applied Materials, and Quanta Services show stronger market recognition. That is a very different mix of companies. Some are chip-related. Some are equipment-related. Some are physical infrastructure names.

That matters because it shows the market isn't only rewarding the most obvious AI companies. It's also rewarding the companies that help turn AI capex into actual infrastructure.

This is the main shift in the article: the AI capex trade starts looking less like a tech basket and more like a buildout chain.

Which AI Infrastructure Layers Has the Market Rewarded Most?

The matrix is useful at the company level. But the AI capex trade also needs to be viewed by layer.

So next, I grouped the companies by capex_layer and calculated median returns and median signal scores.

layer_performance = capex_data.groupby('capex_layer').agg(
    company_count=('ticker', 'count'),
    median_return_1y=('return_1y', 'median'),
    median_return_6m=('return_6m', 'median'),
    median_fundamental_signal=('fundamental_signal', 'median'),
    median_market_recognition=('market_recognition_signal', 'median')
).reset_index()

layer_performance = layer_performance.sort_values('median_return_1y', ascending=False)

layer_performance

Then I plotted the median one-year return by infrastructure layer.

plt.figure(figsize=(11, 6))

plt.barh(layer_performance['capex_layer'], layer_performance['median_return_1y'] * 100)

plt.gca().invert_yaxis()

plt.title('Median 1Y Return by AI Infrastructure Layer', fontsize=14, pad=12)
plt.xlabel('Median 1Y Return (%)')
plt.ylabel('')

plt.grid(axis='x', alpha=0.25)

plt.tight_layout()
plt.show()

This chart is where the story becomes much less obvious.

Construction and engineering ranked at the top by median one-year return, followed by semiconductor equipment, AI compute and chips, and servers and storage. That's not the usual way people talk about the AI trade.

The takeaway is not that construction and engineering is automatically the best AI capex layer. The sample size is small, so the result should be read as directional. But it still tells us something useful: the market has been rewarding the physical buildout side of AI infrastructure, not just the companies selling chips or cloud services.

That's the larger point. Once AI capex becomes real-world infrastructure, the trade starts showing up in companies tied to equipment, servers, electrical work, and construction.

The Physical Infrastructure Layer Is No Longer Hidden

This is the part of the AI capex trade that I find most useful.

The obvious AI story starts with chips and hyperscalers. But once the spending becomes real infrastructure, the list gets wider. AI data centers need servers, networking equipment, power systems, cooling, grid work, electrical construction, and physical capacity.

So I filtered the dataset to focus on the non-obvious infrastructure layers.

physical_layers = ['Power and electrification', 'Cooling and industrial systems', 'Construction and engineering',
                   'Data centers', 'Servers and storage', 'Networking']

physical_infra = capex_data[capex_data['capex_layer'].isin(physical_layers)].copy()
physical_infra = physical_infra.sort_values(['market_recognition_signal', 'fundamental_signal'], ascending=False)
physical_watchlist = physical_infra[['ticker', 'company', 'capex_layer', 'revenue_growth_yoy', 'operating_margin',
                                     'return_1y', 'return_6m', 'fundamental_signal', 'market_recognition_signal']].head(12)

physical_watchlist.head(10)

Comfort Systems, Vertiv, Dell, Quanta Services, Cisco, HPE, EMCOR, Equinix, Johnson Controls, and Digital Realty all sit in different parts of the physical buildout. Some are tied to servers. Some are tied to power and electrification. Some are tied to data centers, cooling, or construction.

The key point is simple: the market is already treating parts of the physical infrastructure layer as part of the AI capex story.

That doesn't mean every name here has the same quality or the same upside. The fundamental signals vary a lot. But the table shows why looking only at "AI software" or "AI chip" names misses a large part of the spending chain.

What the Market Has Already Noticed

This section is important because not every AI capex name is early.

Some companies in the chain have already moved aggressively. That doesn't make them weak companies, but it changes the question. At that point, the question is no longer just whether the company is exposed to AI infrastructure. The better question is whether the market has already priced in a large part of that exposure.

To check that, I sorted the universe by the market recognition signal.

market_already_noticed = capex_data.sort_values('market_recognition_signal', ascending=False).head(10).copy()

market_already_noticed['return_1y'] = (market_already_noticed['return_1y'] * 100).round(2)
market_already_noticed['return_6m'] = (market_already_noticed['return_6m'] * 100).round(2)
market_already_noticed['return_3m'] = (market_already_noticed['return_3m'] * 100).round(2)
market_already_noticed['max_drawdown_1y'] = (market_already_noticed['max_drawdown_1y'] * 100).round(2)

market_already_noticed = market_already_noticed[['ticker', 'company', 'capex_layer', 'return_1y', 'return_6m', 'return_3m', 
                                                 'max_drawdown_1y', 'market_recognition_signal', 'fundamental_signal']]

market_already_noticed

This list is a useful reality check.

Comfort Systems, AMD, Marvell, Vertiv, Lam Research, Dell, Applied Materials, Quanta Services, Cisco, and Alphabet all show up with strong market recognition. The mix is the important part. It includes chips, semiconductor equipment, servers, networking, power, construction, and a hyperscaler.

That tells us the AI capex trade has already broadened in price action. It's not waiting quietly in the background.

But this also means we need to be careful with the "hidden beneficiary" framing. Some infrastructure names have already delivered very large one-year returns. So the smarter follow-up question is not:

"Which companies are exposed?"

It's:

"How much of that exposure has the market already recognized?"

What This Study Shows

The AI capex trade is easier to understand when we stop treating it as one group of "AI stocks."

The data shows three things clearly.

First, the obvious names still matter. NVIDIA remains the cleanest fundamental outlier in this universe, and chip-related names continue to sit close to the center of the AI infrastructure story.

Second, the trade has already moved beyond chips. Semiconductor equipment, servers, networking, power, and construction names all show up in the market recognition data. That makes sense. AI infrastructure isn't just model training. It needs physical capacity, electrical systems, cooling, data centers, and buildout work.

Third, market recognition and business strength don't always move together. Some companies have strong fundamentals but quieter price action. Others have already moved aggressively, even if their fundamental signal isn't as strong. That's why a simple "AI beneficiary" label isn't enough.

Conclusion

AI capex isn't just a mega-cap tech story. It's a spending chain.

Once we trace that chain, the theme becomes broader and more interesting. It moves from chips to semiconductor equipment, from servers to networking, from data centers to power, cooling, and construction.

The goal of this study wasn't to find the best AI infrastructure stock. It was to build a clearer map of where the trade is already showing up.

That map matters because the next phase of the AI story may not be about who mentions AI the most. It may be about who sits closest to the infrastructure that makes AI possible.

How to Build a Live Options Database in Python – A Complete Guide

Nikhil Adithyan — Thu, 07 May 2026 23:00:08 +0000

Live options analytics change constantly. Implied volatility shifts, Greeks drift, and the shape of the surface can look different even a few minutes later.

But a lot of teams still treat these numbers like something you glance at once. A screenshot in a deck. A one-off notebook cell. A quick check in a UI before a meeting.

That works until you need to answer basic questions that show up in real workflows:

What did TSLA's surface look like at 10:32? When did skew start steepening? Did the change come from the wings moving or the ATM shifting?

If you don't store the data as it arrives, you can't replay it, compare it, or audit it. You're stuck with whatever you happened to look at in the moment.

In this walkthrough, we'll build something small but practical: an internal database that continuously captures SpiderRock MLink's LiveImpliedQuote analytics for TSLA, stores each snapshot as queryable history, and also maintains a "latest view" table so you can pull the current surface state without scanning the full history.

The goal is not to build a trading system. It's to build a reliable internal dataset that you can monitor and query.

Note: SpiderRock MLink's LiveImpliedQuote analytics is a product offered for a fee, which includes exchange charges for the underlying market data used in its creation.

Prerequisites
What Data We're Using
Setup: Importing Packages
Database Design
Pulling LiveImpliedQuote
Normalizing the Response Into Rows
Writing To The Database
Running a Short Polling Capture
Analysis: Smile Reconstruction From the Database
Analysis: ATM IV and Skew Over Time
Alert-Style Thresholds
Wrapping Up

Prerequisites

Before running any of the code in this walkthrough, there are a few things you need to have in place.

On the API side, you need a SpiderRock MLink account with access to the LiveImpliedQuote feed. The examples use the REST interface, so no websocket setup is required, but you do need a valid API key. If you don't have one yet, you can reach out to SpiderRock directly to get access.

On the Python side, the environment is minimal. You need Python 3.10 or later for the tuple type hint syntax used in one of the function signatures. The external packages are requests, pandas, numpy, and matplotlib. Everything else – sqlite3, time, datetime – is part of the standard library. You can install the external dependencies with:

pip install requests pandas numpy matplotlib

No database setup is required beyond a writable local path. SQLite creates the file automatically on first run, so there's nothing to install or configure separately.

Finally, the walkthrough uses TSLA as the target symbol because it has a liquid and active options chain. If you want to swap in a different underlying, the only thing you need to change is the symbol variable in the config block.

What Data We're Using

This build is driven by one OptAnalytics message type from SpiderRock MLink: LiveImpliedQuote.

Each message represents an option contract and comes with the analytics you actually need for monitoring:

the option identifier (symbol, expiry, strike, call or put)
surface IV (sVol) and related surface fields
Greeks (delta, gamma, theta, vega)
context fields like underlying price (uPrc), time to expiry (years), and rate (rate)
timestamps and calc source markers, which matter when you're turning a live feed into a database

We'll treat sVol as the main volatility field for the article and refer to it as surface IV. That keeps the workflow consistent when we rebuild smiles or compute skew proxies from stored history.

The demo uses TSLA because it has a rich and active options chain, which makes the database and queries more interesting even in a short capture window. The same pipeline works for any other underlying – the only thing you change is the symbol filter.

Setup: Importing Packages

Before touching the database or the API, we set up a small, repeatable environment. This section is intentionally minimal. We only import what we need for three things: making REST calls, storing data in SQLite, and doing basic analysis and plots.

import requests
import sqlite3
import pandas as pd
import numpy as np
import time
from datetime import datetime, timezone
import matplotlib.pyplot as plt
plt.style.use('ggplot')

requests is used for calling MLink REST endpoints.
sqlite3 gives us a lightweight database we can write to locally without extra setup.
pandas and numpy are only for shaping and filtering the data once it comes back.
time and datetime help us run a polling loop and timestamp each snapshot so the database becomes a real-time series.

Database Design

If the goal is to make live analytics queryable, the database design has to support two different needs.

First, you want an audit trail. Every snapshot should be preserved so you can reconstruct what the surface looked like at a specific time.

Second, you also want a fast way to answer "what does it look like right now" without scanning everything you've ever stored.

So we use two tables:

implied_quote_history: Append-only. Every poll inserts a full snapshot.
implied_quote_latest: One row per option contract. Each poll upserts into this table so it always reflects the most recent snapshot.

The core of both tables is a stable option identifier. In the feed, the option key is nested, so we normalize it into a single option_key string that includes symbol, expiry, strike, call or put, and venue fields. This becomes the primary key for the latest table and the main join key for queries.

#config
api_key = "YOUR SPIDERROCK API KEY"
mlink_url = "https://mlink-live.nms.saturn.spiderrockconnect.com/rest/json"

msg_type = "LiveImpliedQuote"

symbol = "TSLA"
poll_interval_s = 10
poll_duration_s = 120
limit = 2000

#create db connection
db_path = "/mnt/data/optanalytics_iv_greeks.db"

def get_conn(path: str = db_path):
    conn = sqlite3.connect(path)
    conn.execute("PRAGMA journal_mode=WAL;")
    conn.execute("PRAGMA synchronous=NORMAL;")
    return conn

#create db schema
def setup_db(path: str = db_path):
    conn = get_conn(path)
    cur = conn.cursor()

    cur.execute("""
    create table if not exists implied_quote_history (
        id integer primary key autoincrement,
        asof_ts text not null,

        option_key text not null,
        symbol text not null,
        expiry text not null,
        strike real not null,
        cp text not null,

        calc_source text,
        u_prc real,
        years real,
        rate real,

        s_vol real,
        atm_vol real,
        s_mark real,

        o_bid real,
        o_ask real,
        o_bid_iv real,
        o_ask_iv real,

        delta real,
        gamma real,
        theta real,
        vega real,

        src_ts text
    );
    """)

    cur.execute("""
    create index if not exists idx_hist_symbol_expiry_asof
    on implied_quote_history(symbol, expiry, asof_ts);
    """)

    cur.execute("""
    create index if not exists idx_hist_option_asof
    on implied_quote_history(option_key, asof_ts);
    """)

    cur.execute("""
    create table if not exists implied_quote_latest (
        option_key text primary key,

        last_asof_ts text not null,
        symbol text not null,
        expiry text not null,
        strike real not null,
        cp text not null,

        calc_source text,
        u_prc real,
        years real,
        rate real,

        s_vol real,
        atm_vol real,
        s_mark real,

        o_bid real,
        o_ask real,
        o_bid_iv real,
        o_ask_iv real,

        delta real,
        gamma real,
        theta real,
        vega real,

        src_ts text
    );
    """)

    cur.execute("""
    create index if not exists idx_latest_symbol_expiry
    on implied_quote_latest(symbol, expiry);
    """)

    conn.commit()
    conn.close()

setup_db()

This creates the SQLite database file and both tables. The history table is append-only and indexed for the two queries we'll run later: pulling snapshots by expiry and time, and pulling a specific option's timeline by option_key. The latest table is keyed by option_key, which lets us upsert and maintain a consistent "current view."

The columns we store are intentionally opinionated. We keep surface IV (s_vol), surface mark (s_mark), Greeks, and a few context fields. We also store timestamps so later we can reason about when a value was produced.

Pulling LiveImpliedQuote

Now we do the first live pull. The goal here is not to build a perfect filter. It's to confirm that we can retrieve a meaningful slice of TSLA option analytics and that the response structure is what we expect.

We request LiveImpliedQuote and filter by symbol using the where clause. The response is a list where most rows are actual LiveImpliedQuote messages, and one row at the end is a QueryResult summary.

def fetch_live_implied_quote(symbol: str, limit: int = 2000):
    where = f"okey.tk:eq:{symbol}"

    params = {
        "apiKey": api_key,
        "cmd": "getmsgs",
        "msgType": msg_type,
        "where": where,
        "limit": limit
    }

    r = requests.get(mlink_url, params=params)
    r.raise_for_status()
    return r.json()

raw = fetch_live_implied_quote(symbol, limit=limit)
print("raw messages:", len(raw))
print("first type:", raw[0].get("header", {}).get("mTyp") if raw else None)

This is a straight REST getmsgs call. We pass the API key, message type, and a simple symbol filter. The limit is important. It caps how many messages we get back in one poll, so for active underlyings, the returned set of strikes and expiries can vary between polls. That's fine for this tutorial, because the goal is to show the database pattern and the types of monitoring queries it enables.

This is the output you should see:

Normalizing the Response Into Rows

Right now, raw is a list of nested message objects. That format is fine for transport, but it's not something you can store or query directly. So now, we turn each LiveImpliedQuote message into one flat row with a consistent schema.

def make_option_key(okey: dict) -> str:
    return "|".join([
        str(okey.get("tk")),
        str(okey.get("dt")),
        str(okey.get("xx")),
        str(okey.get("cp")),
        str(okey.get("at")),
        str(okey.get("ts")),
    ])

def normalize_liq(raw: list, asof_ts: str, keep_calc_source: str = "Loop") -> pd.DataFrame:
    rows = []

    for row in raw:
        if row.get("header", {}).get("mTyp") != "LiveImpliedQuote":
            continue

        m = row.get("message", {})
        if keep_calc_source and m.get("calcSource") != keep_calc_source:
            continue

        pkey = m.get("pkey", {})
        okey = pkey.get("okey", {})
        if not okey:
            continue

        s_vol = m.get("sVol")
        if s_vol is None or s_vol == 0:
            continue

        o_bid = m.get("oBid", 0) or 0
        o_ask = m.get("oAsk", 0) or 0

        quote_ok = int(not (o_bid == 0 and o_ask == 0))

        rows.append({
            "asof_ts": asof_ts,
            "option_key": make_option_key(okey),

            "symbol": okey.get("tk"),
            "expiry": okey.get("dt"),
            "strike": okey.get("xx"),
            "cp": okey.get("cp"),

            "calc_source": m.get("calcSource"),
            "u_prc": m.get("uPrc"),
            "years": m.get("years"),
            "rate": m.get("rate"),

            "s_vol": s_vol,
            "atm_vol": m.get("atmVol"),
            "s_mark": m.get("sMark"),

            "o_bid": o_bid,
            "o_ask": o_ask,
            "o_bid_iv": m.get("oBidIv"),
            "o_ask_iv": m.get("oAskIv"),
            "quote_ok": quote_ok,

            "delta": m.get("de"),
            "gamma": m.get("ga"),
            "theta": m.get("th"),
            "vega": m.get("ve"),

            "src_ts": m.get("timestamp"),
        })

    df = pd.DataFrame(rows)
    if df.empty:
        return df

    df = (
        df.sort_values("src_ts")
          .drop_duplicates(subset=["option_key"], keep="last")
          .reset_index(drop=True)
    )
    return df

asof_ts = datetime.now(timezone.utc).isoformat(timespec="seconds").replace("+00:00", "Z")
snapshot_df = normalize_liq(raw, asof_ts)

print("snapshot rows:", len(snapshot_df))
print("quote_ok distribution:", snapshot_df["quote_ok"].value_counts().to_dict() if not snapshot_df.empty else {})
snapshot_df.head()

There are three practical decisions baked into this normalization step:

First, we build a stable option_key from the option identifier so we have a consistent primary key for the latest table.
Second, we keep only calcSource="Loop". LiveImpliedQuote can include both Tick and Loop records. Loop records tend to be more consistent for snapshot-style analysis because the underlying reference price is stable across the surface.
Third, we avoid aggressive filtering. In this dataset, the top-of-book bid and ask fields can be zero even when the analytics fields are populated. So instead of dropping those rows, we store a quote_ok flag and keep the record. That keeps the pipeline usable while still making it obvious later which rows had live quotes.

This is the output:

At this point, one row represents one option contract snapshot. The fact that quote_ok is 0 across the board simply means bid and ask are not populated in this slice, even though surface IV, Greeks, and other analytics fields are present. That's still useful for building a monitoring database, because the core idea here is tracking the evolution of analytics over time, not reconstructing executable markets.

Writing to the Database

Now that we have a clean snapshot DataFrame, the job is to persist it in two places.

History table: Append everything. This is the audit log. Latest table: Upsert by option_key. This is the fast "current view."

This separation is what makes the database useful. History lets you reconstruct any past snapshot. Latest lets you answer "what does the surface look like right now" without scanning time series.

def safe_add_column(table: str, col: str, col_type: str, path: str = db_path):
    conn = get_conn(path)
    cur = conn.cursor()
    existing = [r[1] for r in cur.execute(f"PRAGMA table_info({table});").fetchall()]
    if col not in existing:
        cur.execute(f"ALTER TABLE {table} ADD COLUMN {col} {col_type};")
    conn.commit()
    conn.close()

safe_add_column("implied_quote_history", "quote_ok", "INTEGER")
safe_add_column("implied_quote_latest", "quote_ok", "INTEGER")

def write_snapshot_to_db(df: pd.DataFrame, path: str = db_path) -> tuple[int, int]:
    if df.empty:
        return 0, 0

    conn = get_conn(path)
    cur = conn.cursor()

    cols = [
        "asof_ts",
        "option_key","symbol","expiry","strike","cp",
        "calc_source","u_prc","years","rate",
        "s_vol","atm_vol","s_mark",
        "o_bid","o_ask","o_bid_iv","o_ask_iv",
        "delta","gamma","theta","vega",
        "quote_ok","src_ts"
    ]

    for c in cols:
        if c not in df.columns:
            df[c] = None

    insert_df = df[cols].copy()

    cur.executemany(
        """
        insert into implied_quote_history (
            asof_ts,
            option_key, symbol, expiry, strike, cp,
            calc_source, u_prc, years, rate,
            s_vol, atm_vol, s_mark,
            o_bid, o_ask, o_bid_iv, o_ask_iv,
            delta, gamma, theta, vega,
            quote_ok, src_ts
        ) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """,
        insert_df.itertuples(index=False, name=None)
    )
    history_inserted = cur.rowcount

    cur.executemany(
        """
        insert into implied_quote_latest (
            option_key,
            last_asof_ts, symbol, expiry, strike, cp,
            calc_source, u_prc, years, rate,
            s_vol, atm_vol, s_mark,
            o_bid, o_ask, o_bid_iv, o_ask_iv,
            delta, gamma, theta, vega,
            quote_ok, src_ts
        ) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        on conflict(option_key) do update set
            last_asof_ts=excluded.last_asof_ts,
            symbol=excluded.symbol,
            expiry=excluded.expiry,
            strike=excluded.strike,
            cp=excluded.cp,
            calc_source=excluded.calc_source,
            u_prc=excluded.u_prc,
            years=excluded.years,
            rate=excluded.rate,
            s_vol=excluded.s_vol,
            atm_vol=excluded.atm_vol,
            s_mark=excluded.s_mark,
            o_bid=excluded.o_bid,
            o_ask=excluded.o_ask,
            o_bid_iv=excluded.o_bid_iv,
            o_ask_iv=excluded.o_ask_iv,
            delta=excluded.delta,
            gamma=excluded.gamma,
            theta=excluded.theta,
            vega=excluded.vega,
            quote_ok=excluded.quote_ok,
            src_ts=excluded.src_ts
        """,
        insert_df[[
            "option_key","asof_ts","symbol","expiry","strike","cp",
            "calc_source","u_prc","years","rate",
            "s_vol","atm_vol","s_mark",
            "o_bid","o_ask","o_bid_iv","o_ask_iv",
            "delta","gamma","theta","vega",
            "quote_ok","src_ts"
        ]].itertuples(index=False, name=None)
    )
    latest_upserted = cur.rowcount

    conn.commit()
    conn.close()
    return history_inserted, latest_upserted

hist_n, latest_n = write_snapshot_to_db(snapshot_df)
print("history inserted:", hist_n)
print("latest upserted:", latest_n)

We batch write using executemany so inserts are fast even with thousands of option rows. The history insert is straightforward. The latest write uses a SQLite upsert keyed on option_key, which means if the contract already exists in the latest table, its fields are overwritten with the newest snapshot.

You should see:

After the first write, both tables have the same number of rows. That's expected, because there is only one snapshot in history so far. Once we start polling multiple snapshots, the history table will grow every cycle, while the latest table will stay roughly flat and continue updating in place.

Running a Short Polling Capture

At this point, the pipeline works end-to-end for a single snapshot. The whole point of the database, though, is to turn live analytics into a time series. So we run a short capture window and store multiple snapshots back-to-back.

This isn't meant to be a production scheduler. It's just a simple loop that runs for a couple of minutes, polls every few seconds, timestamps the snapshot, and writes it to both tables.

def poll_and_write(symbol: str, duration_s: int = poll_duration_s, interval_s: int = poll_interval_s):
    start = time.time()
    polls = 0
    total_hist = 0

    while time.time() - start < duration_s:
        asof_ts = datetime.now(timezone.utc).isoformat(timespec="seconds").replace("+00:00", "Z")

        raw = fetch_live_implied_quote(symbol, limit=limit)
        df = normalize_liq(raw, asof_ts)

        hist_n, latest_n = write_snapshot_to_db(df)
        polls += 1
        total_hist += hist_n

        print(f"[{polls}] {asof_ts} snapshot_rows={len(df)} history+={hist_n} latest_upsert={latest_n}")
        time.sleep(interval_s)

    print(f"done. polls={polls}, total_history_added={total_hist}")

poll_and_write(symbol, duration_s=120, interval_s=10)

Each loop iteration represents one snapshot. We generate a UTC timestamp (asof_ts), pull the latest batch from LiveImpliedQuote, normalize it into rows, then write it into the database. The history table accumulates every snapshot. The latest table overwrites by option_key, so it always represents the most recent view.

One practical detail is worth calling out. The API call is capped by limit, so you're not guaranteed to receive an identical set of strikes and expiries every poll. That's why snapshot_rows can vary between iterations.

In production, you usually stabilize the slice by pinning specific expiries and a strike band or by interpolating IV to fixed moneyness points. For this tutorial, we're keeping ingestion simple and focusing on the database pattern and the monitoring queries it enables.

You should see per-poll telemetry like this:

[1] 2026-04-14T18:09:29Z snapshot_rows=1454 history+=1454 latest_upsert=1454
...
done. polls=9, total_history_added=12806

This confirms the database is building a time series. Over nine polls, you stored 12,806 option rows in history. The latest table is updated each time, but it doesn't grow in the same way as history because it overwrites per contract key.

From the next section, we'll stop writing and start querying.

Analysis: Smile Reconstruction From the Database

Once the data is in implied_quote_history, the workflow flips. We stop thinking in terms of "API responses" and start thinking in terms of "queries." This section does two things. First, it picks an expiry that has enough rows to be representative. Then it reconstructs the call-side volatility smile for that expiry across a few timestamps.

Pick an Expiry with Good Coverage

If you pick an expiry that only appears sporadically in the captured snapshots, the smile plot will be misleading. So we start by looking at which expiries have the most rows in the history table.

conn = get_conn()

expiry_counts = pd.read_sql_query(
    """
    select expiry, count(*) as n
    from implied_quote_history
    where symbol = ?
    group by expiry
    order by n desc
    limit 10
    """,
    conn,
    params=(symbol,)
)

conn.close()
expiry_counts

This query scans only the history table, filters to TSLA, and counts how many option rows exist per expiry across the capture window. We keep the top 10 and pick the first one as the expiry we'll reconstruct.

The expiry date 2026-11-20 has the highest count.

Here, the count doesn't mean this expiry is "best" in any trading sense. It just means it showed up most consistently in the captured data. That makes it a practical choice for a clean smile comparison.

Rebuild the Smile Across Snapshots

Now we query the stored history for one expiry, keep only calls, and plot surface IV (s_vol) against strike for multiple snapshot timestamps.

chosen_expiry = "2026-11-20" 

conn = get_conn()
smile = pd.read_sql_query(
    """
    select asof_ts, strike, cp, s_vol, u_prc
    from implied_quote_history
    where symbol = ? and expiry = ?
    """,
    conn,
    params=(symbol, chosen_expiry)
)
conn.close()

smile_calls = smile[smile["cp"] == "Call"].copy()

ts_list = sorted(smile_calls["asof_ts"].unique())
pick = [ts_list[0], ts_list[len(ts_list)//2], ts_list[-1]]

plt.figure(figsize=(9,5))
for ts in pick:
    g = smile_calls[smile_calls["asof_ts"] == ts].sort_values("strike")
    plt.plot(g["strike"], g["s_vol"], label=ts)

plt.title(f"{symbol} Vol Smile (Calls) | Expiry {chosen_expiry} | 3 snapshots")
plt.xlabel("Strike")
plt.ylabel("Implied Vol (s_vol)")
plt.grid(True)
plt.legend()
plt.show()

We pull all rows for the chosen expiry from history, then filter to calls so we don't mix put and call shapes. To keep the plot readable, we only plot three snapshots. First, middle, and last.

Over a short capture window, the smiles often overlap heavily. That doesn't mean the system isn't working. It usually means the surface didn't move much in those two minutes. The important part is that we can reconstruct and compare it purely from stored history.

Zoom-In Around Spot

The full-range plot is useful for shape, but it can hide small shifts near the region people actually care about. So we zoom to a band around the underlying price.

s0 = float(smile_calls["u_prc"].dropna().median())
low, high = s0 * 0.6, s0 * 1.4

for ts in pick:
    g = smile_calls[smile_calls["asof_ts"] == ts].sort_values("strike")
    g = g[(g["strike"] >= low) & (g["strike"] <= high)]
    plt.plot(g["strike"], g["s_vol"], label=ts)

plt.title(f"{symbol} Vol Smile (Calls) | Expiry {chosen_expiry} | zoomed")
plt.xlabel("Strike")
plt.ylabel("Implied Vol (s_vol)")
plt.grid(True)
plt.legend(fontsize=8)
plt.show()

We take a robust spot proxy from the stored u_prc values and then keep strikes within a range around it. The goal is not precision. It's to make the chart readable and show whether the near-ATM region is drifting.

Here, even small changes become visible. This is also why storing history matters. If you only looked at one snapshot in isolation, these shifts would be easy to miss or dismiss.

Analysis: ATM IV and Skew Over Time

A full smile plot is useful, but it's not always the fastest way to monitor a surface. In practice, teams usually track a few summary numbers per expiry so they can spot changes quickly, then drill down only when something looks off.

Here we reduce each stored snapshot into two metrics for a single expiry.

ATM IV: Surface IV at the strike closest to spot.
Skew proxy: Surface IV at 0.9 times spot minus surface IV at 1.1 times spot, using the closest available strikes.

chosen_expiry = "2026-11-20"

conn = get_conn()
df = pd.read_sql_query(
    """
    select asof_ts, strike, s_vol, u_prc
    from implied_quote_history
    where symbol = ? and expiry = ? and cp = 'Call'
    """,
    conn,
    params=(symbol, chosen_expiry)
)
conn.close()

df["strike"] = df["strike"].astype(float)
df["s_vol"] = df["s_vol"].astype(float)

def closest_iv(grp: pd.DataFrame, target_strike: float):
    g = grp.iloc[(grp["strike"] - target_strike).abs().argsort()[:1]]
    return float(g["s_vol"].iloc[0]), float(g["strike"].iloc[0])

rows = []
for ts, grp in df.groupby("asof_ts"):
    spot = float(grp["u_prc"].dropna().median())
    atm_target = spot
    down_target = spot * 0.9
    up_target = spot * 1.1

    atm_iv, atm_k = closest_iv(grp, atm_target)
    down_iv, down_k = closest_iv(grp, down_target)
    up_iv, up_k = closest_iv(grp, up_target)

    rows.append({
        "asof_ts": ts,
        "spot": spot,
        "atm_strike": atm_k,
        "atm_iv": atm_iv,
        "k90": down_k,
        "iv_90": down_iv,
        "k110": up_k,
        "iv_110": up_iv,
        "skew_90_110": down_iv - up_iv
    })

metrics = pd.DataFrame(rows).sort_values("asof_ts").reset_index(drop=True)
metrics

We query the history table for one expiry and keep only calls, then group by snapshot timestamp. For each snapshot, we use the median u_prc as a spot proxy and pick the closest available strike to spot. That gives ATM IV. We repeat the same approach for 0.9 times spot and 1.1 times spot and compute a skew proxy as the difference.

The table also stores the actual strikes used (atm_strike, k90, k110). Options strikes are discrete, so the nearest strike can change between snapshots. Keeping the chosen strikes visible makes the metric explainable when it moves.

The output is a table with one row per snapshot timestamp and the computed metrics.

Now that we have a clean time series table, we can visualize the two metrics. First, ATM IV. Then, the skew proxy.

plt.plot(metrics["asof_ts"], metrics["atm_iv"])
plt.title(f"{symbol} ATM IV over time | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.ylabel("ATM IV (s_vol)")
plt.grid(True)
plt.show()

plt.plot(metrics["asof_ts"], metrics["skew_90_110"])
plt.title(f"{symbol} Skew proxy (IV@0.9S - IV@1.1S) | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.ylabel("Skew proxy")
plt.grid(True)
plt.show()

Here is the first chart, ATM IV over time.

ATM IV tends to move slowly over short windows unless there is a sharp repricing event. In this run, it stays fairly stable, which is a realistic outcome for a short capture. The value here is that the database turns "fairly stable" into something you can quantify and compare later, rather than a vague impression.

Here is the second chart, Skew proxy over time.

The skew proxy is more sensitive because it's based on wing points. If it changes, it usually means the downside is being repriced differently from the upside for that expiry. One nuance is that the nearest available strike can change between snapshots, which can create step-like moves even when the surface isn't moving dramatically. That's why we keep k90 and k110 in the metrics table. It keeps the skew plot explainable.

Alert-Style Thresholds

Once you have a metrics table per snapshot, adding a monitoring layer is straightforward. The idea isn't to generate trades. It's to flag when the surface moves enough that someone should look closer.

Here we do two checks:

ATM IV change alert: Flag if ATM IV changes more than a small threshold between snapshots.
Skew change alert: Flag if the skew proxy changes more than a threshold between snapshots.

alerts = metrics.copy()

alerts["atm_iv_change"] = alerts["atm_iv"].diff()
alerts["skew_change"] = alerts["skew_90_110"].diff()

atm_thresh = 0.002    
skew_thresh = 0.003   

alerts["atm_alert"] = alerts["atm_iv_change"].abs() >= atm_thresh
alerts["skew_alert"] = alerts["skew_change"].abs() >= skew_thresh

alerts[[
    "asof_ts",
    "atm_iv", "atm_iv_change", "atm_alert",
    "skew_90_110", "skew_change", "skew_alert",
    "atm_strike", "k90", "k110"
]]

We take the per-snapshot metrics table and compute first differences. Then we compare those changes to thresholds and store boolean flags. The output table keeps both the metrics and the strikes used for the calculations, so any alert is explainable rather than a black box.

In this run, the ATM IV alerts are all false, while the skew alert triggers once.

The skew alert fires because the skew proxy jumps by more than the threshold between two snapshots. This is explainable. If you see the table, you can see the strikes used for the proxy changed around the same time (k90 shifts from 340 to 315). Because strikes are discrete, nearest-strike metrics can step even when the surface is not moving dramatically.

To make this easier to read, we also plot the two series and mark alert points.

plt.plot(alerts["asof_ts"], alerts["atm_iv"])
for i, r in alerts[alerts["atm_alert"]].iterrows():
    plt.scatter(r["asof_ts"], r["atm_iv"],  s=30, edgecolors="r", alpha=0.6, linewidth=2)
plt.title(f"{symbol} ATM IV with alerts | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.grid(True)
plt.show()

plt.plot(alerts["asof_ts"], alerts["skew_90_110"])
for i, r in alerts[alerts["skew_alert"]].iterrows():
    plt.scatter(r["asof_ts"], r["skew_90_110"], s=30, edgecolors="r", alpha=0.6, linewidth=2)
plt.title(f"{symbol} Skew proxy with alerts | Expiry {chosen_expiry}")
plt.xticks(rotation=30, ha="right")
plt.grid(True)
plt.show()

Both plots use the same pattern. Plot the metric as a line, then overlay a marker on any timestamp where the corresponding alert flag is true. This makes it obvious when something crossed the threshold.

This chart represents skew proxy with alerts.

This chart shows one alert marker, which matches what we saw in the table.

The ATM IV plot isn't featured since there are no alert points.

Wrapping Up

In this walkthrough, we used SpiderRock MLink's LiveImpliedQuote feed for TSLA and turned it into a small internal database you can query. We stored every snapshot in an append-only history table, maintained a latest view keyed by a stable option identifier, then used that stored data to rebuild a smile, track ATM surface IV and a simple skew proxy, and add a basic alert rule on top.

This fits well in B2B workflows because it turns live analytics into something operational: a dataset you can audit, replay, and monitor. The same pattern works whether you're building an internal dashboard, running routine surface checks for a desk, or doing a quick post-event review without relying on screenshots and one-off notebook runs.

If you want to extend it, the most practical next steps are longer capture windows, tracking multiple symbols, and moving from SQLite to Postgres once the data volume grows. If metric stability becomes important, you can also standardize the slice you track per poll or interpolate IV to fixed moneyness points so skew measures don't step when nearest strikes change.

With that being said, you've reached the end of the article. Hope you learned something new and useful.

How to Build a Market Research Copilot with MCP and Python [Full Handbook]

Nikhil Adithyan — Wed, 06 May 2026 18:11:37 +0000

Most financial AI tools are good at one thing: summarizing a stock. You ask about Apple, NVIDIA, or Tesla, and they give you a clean overview of price action, a few ratios, and maybe some company context. That can be useful, but it falls short the moment the task becomes more like real research.

Real research usually starts with a view. Not a ticker. A trader, analyst, or product team is more likely to ask something like, “Apple looks attractive because downside has been controlled and business quality remains high. Does the data actually support that?” That's a different problem. A summary can't answer it properly because the system needs to test the claim itself, not just describe the company around it.

In this tutorial, we're going to build a financial research copilot that does exactly that. It takes a natural-language thesis, pulls historical prices and fundamentals through EODHD’s MCP server, turns those inputs into structured evidence, and returns a short research memo with a verdict.

Prerequisites
What This Copilot Actually Produces
What Makes This Different from a Normal Stock Assistant
The Workflow
Building the MCP Client
Setting Up core.py
Parsing a Research Prompt into a Structured Request
Fetching the Two Data Sources: Historical & Fundamental Data
Building the First Evidence Layer from Price Data
Building the Second Evidence Layer from Fundamentals
What do we have so far?
Classifying the Thesis
Turning Signals into Support, Contradiction, and Missing Evidence
- Sanity Check (Jupyter Notebook)
Assigning a Verdict
Building the Facts Object
Writing the Final Memo
- Sanity Check (Jupyter Notebook)
Stitching Everything Together
Demo Time! (Jupyter Notebook)
- Demo 1. Testing Whether a Premium Is Actually Justified
- Demo 2. Testing Whether Volatility Is Too High for the Underlying Business
Final Thoughts

Prerequisites

Before starting, make sure you have the following in place.

You will need Python 3.9 or later, along with these libraries: mcp, openai, numpy, and pandas. Install them with pip before running any code.

You will also need two API keys. One from EODHD for historical prices and fundamentals data, and one from OpenAI for parsing and memo generation. If you don't have an EODHD key, you can get one by registering for a developer account at eodhd.com.

The tutorial assumes basic familiarity with Python and async programming. You don't need a background in finance, but it helps to understand what a P/E ratio and drawdown mean before reading the evidence-building sections.

A Jupyter notebook environment is recommended for running the sanity checks, though any Python environment that supports await will work.

What This Copilot Actually Produces

Before getting into the pipeline, it helps to see the kind of output we're building toward. The easiest way to understand this project is to look at one real example.

Suppose the user gives the system this prompt:

I think Apple looks attractive because downside has been controlled and business quality remains high. Can you test that for AAPL over the last 180 days?

The copilot doesn't respond with a loose summary of Apple. It turns that into a structured research memo:

1. Thesis under review  

Apple appears attractive due to controlled downside and sustained high business 
quality.

2. Supporting evidence  

Over the past 180 days, maximum drawdown was limited to -13.82%, suggesting relatively contained downside.Profitability metrics are strong, with a 35.37% operating margin and 27.04% profit margin. Returns on capital are high, with ROA at 24.38% and ROE at 152.02%, indicating efficient asset use and strong  capital efficiency. Growth metrics support ongoing business strength, with quarterly revenue growth of 15.70% and earnings growth of 18.30% year-over-year. Forward estimates also remain positive, with expected earnings growth of 9.68% and 
revenue growth of 6.87%.

3. Evidence that weakens the thesis  

Net EPS revisions over the past 30 days are negative (-3), indicating some deterioration in analyst sentiment.

4. Missing evidence  

No material gaps in the provided dataset.

5. Verdict  

partially_supported - There is more supporting evidence than contradicting evidence, but the thesis is not fully confirmed.

6. Bottom-line assessment  

Apple demonstrates strong and consistent business quality supported by high margins, returns, and continued growth. Downside has been relatively contained over the observed period, though not negligible. However, negative earnings 
revisions introduce some caution, leaving the thesis supported but not conclusively established.

This example makes the goal of the project much clearer. We're not building a system that simply tells us what happened to Apple. We're building one that takes a claim, checks it against market and fundamentals data, and returns a structured judgment.

That distinction matters because the memo is only the final surface. Underneath it, the system first parses the thesis, pulls prices and fundamentals through EODHD’s MCP server, computes the relevant signals, builds support and contradiction, assigns a verdict, and only then writes the final note. That's what gives the output its structure.

In this first part, we’ll build everything up to the evidence layers that power this kind of output.

What Makes This Different from a Normal Stock Assistant

A normal stock assistant starts with a ticker and tries to explain what happened. It may summarize price action, mention a few ratios, and add some company context. That is useful when the question is broad, but it's not enough when the input is a specific investment view.

This project starts from the opposite direction. The input is not “tell me about Apple.” The input is a claim, like Apple looks attractive because downside has been controlled and business quality remains high. That changes the job of the system. It now has to test each part of that claim, decide what supports it, decide what weakens it, and be clear about what's still missing.

That one shift is what shapes the whole workflow. Instead of ending at retrieval and summarization, the pipeline has to parse the thesis, map the data to the right kind of evidence, and return a verdict. That's what makes this feel like a research copilot rather than a better stock summary tool.

The Workflow

At a high level, the copilot follows a simple sequence:

parse the user’s thesis into a structured request
fetch historical prices and fundamentals through MCP
turn those inputs into market and business signals
map those signals into support, contradiction, and missing evidence
assign a verdict
write the final memo

That's the full loop. The output may look like a short research note, but it sits on top of a more controlled pipeline in core.py.

Project structure:

project/
├── client.py
├── core.py
└── test.ipynb

client.py is the MCP access layer. It connects to EODHD, lists tools, calls them with retries and timeouts, and returns metadata for each request. core.py contains the actual thesis-testing logic, including parsing, data fetching, signal computation, evidence building, verdict assignment, and memo generation. test.ipynb is where the quality checks and end-to-end demos are run.

This split is useful because it keeps the tutorial easy to follow. When we move into code, each block has a clear place. MCP access stays in client.py, while the research workflow stays in core.py.

Building the MCP Client

We’ll start with the thinnest part of the project, which is the MCP access layer.

This file only does one job. It connects to EODHD’s MCP server, lists available tools, calls a tool with retries and a timeout, and returns a small metadata object alongside the response. The actual thesis logic doesn't belong here. Keeping this layer small makes the rest of the project much easier to reason about later.

Create a file called client.py and add this:

import time
import asyncio

from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

class EODHDMCP:
    def __init__(self, apikey, base_url=None):
        self.apikey = apikey
        self.base_url = base_url or "https://mcp.eodhd.dev/mcp"
        self._tools = None

    def _url(self):
        return f"{self.base_url}?apikey={self.apikey}"

    def _open(self):
        return streamablehttp_client(self._url())

    async def list_tools(self):
        if self._tools is not None:
            return self._tools

        async with self._open() as (read, write, _):
            async with ClientSession(read, write) as s:
                await s.initialize()
                resp = await s.list_tools()
                self._tools = [t.name for t in resp.tools]
                return self._tools

    async def call_tool(self, name, args, trace_id, timeout_s=25, retries=2):
        last = None

        for attempt in range(retries + 1):
            t0 = time.time()
            try:
                async with self._open() as (read, write, _):
                    async with ClientSession(read, write) as s:
                        await s.initialize()
                        out = await asyncio.wait_for(s.call_tool(name, args), timeout=timeout_s)
                        dt = time.time() - t0
                        meta = {
                            "trace_id": trace_id,
                            "tool": name,
                            "args": args,
                            "latency_s": round(dt, 3),
                        }
                        return out, meta
            except Exception as e:
                last = e
                if attempt < retries:
                    await asyncio.sleep(0.5 * (attempt + 1))

        raise last

There are only two methods that really matter here. list_tools() is just a quick way to inspect and cache the tools exposed by the MCP server. call_tool() is the method the rest of the project will actually use. It makes the request, applies timeout and retry handling, and returns both the raw output and a small metadata object.

That metadata becomes useful later because the workflow stays traceable. When the copilot returns a memo, we still know which tool was called, with what arguments, and how long it took. So even though this file is small, it gives the rest of the system a clean and inspectable access layer.

Setting Up `core.py`

Now that the MCP client is ready, we can start building the main workflow in core.py.

This file will hold the actual thesis-testing logic, so the first step is to set up the imports, API clients, a few limits, and some small helper functions that the rest of the pipeline will reuse.

Create a file called core.py and start with this:

import json
import re
import time
import uuid
import asyncio
from datetime import date, timedelta

import numpy as np
import pandas as pd
from openai import OpenAI

from client import EODHDMCP

eodhd_api_key = "your eodhd api key"
mcp_base_url = "https://mcp.eodhd.dev/mcp"

openai_api_key = "your openai api key"
model_name = "gpt-5.3-chat-latest"

max_lookback_days = 365
max_tool_calls = 10
max_tickers = 5

mcp = EODHDMCP(eodhd_api_key, base_url=mcp_base_url)
oa = OpenAI(api_key=openai_api_key)

def log_event(event, trace_id, **extra):
    payload = {
        "event": event,
        "trace_id": trace_id,
        "ts": round(time.time(), 3),
    }
    payload.update(extra)
    print(json.dumps(payload, default=str))

def get_dates_from_lookback(days):
    end = date.today()
    start = end - timedelta(days=int(days))
    return start.isoformat(), end.isoformat()

def make_state():
    return {
        "tool_calls": 0,
        "tool_trace": [],
    }

def bump_tool_call(state, meta):
    state["tool_calls"] += 1
    state["tool_trace"].append(meta)

    if state["tool_calls"] > max_tool_calls:
        raise RuntimeError("tool call budget exceeded")

def to_text(out):
    if isinstance(out, str):
        return out.strip()

    if hasattr(out, "content"):
        try:
            parts = []
            for item in out.content:
                if hasattr(item, "text") and item.text is not None:
                    parts.append(item.text)
                else:
                    parts.append(str(item))
            return "\n".join(parts).strip()
        except Exception:
            pass

    return str(out).strip()

Note: Replace “your eodhd api key” with your actual EODHD API key. If you don’t have one, you can obtain it by opening an EODHD developer account.

This block does three things:

First, it sets up the two clients we need. mcp is the EODHD MCP client from client.py, and oa is the OpenAI client that will be used for parsing and memo generation later.
Second, it defines a few small limits for the workflow. These help keep the system controlled by capping the lookback window, the number of tickers, and the number of tool calls in a single run.
Third, it adds helper functions that the rest of the file depends on. log_event() gives us lightweight tracing, get_dates_from_lookback() converts a lookback window into start and end dates, make_state() and bump_tool_call() help track MCP usage, and to_text() safely converts tool output into plain text before we parse it.

Parsing a Research Prompt into a Structured Request

The first thing this copilot needs to do is clean up the input. A user isn't going to send a perfectly formatted request every time. They're more likely to write a research thought in plain English and mix the thesis, ticker, and timeframe into one prompt.

That is why the system starts by turning the raw prompt into four fields:

ticker
lookback window
thesis
mode

This logic goes into core.py.

def parse_request(text):
    prompt = f"""
You are extracting fields for a financial thesis-testing copilot.

Return only valid JSON with this exact shape:
{{
  "tickers": ["AAPL"],
  "lookback_days": 180,
  "thesis": "the actual thesis statement",
  "mode": "single"
}}

Rules:
- Extract only tickers explicitly mentioned or strongly implied.
- Do not invent tickers.
- If there are multiple tickers, mode must be "watchlist".
- If there is one ticker, mode must be "single".
- If no timeframe is mentioned, use 180.
- Convert months to days using 30 days per month.
- Convert years to days using 365 days per year.
- Keep the thesis concise but faithful to the user's intent.
- Return JSON only. No markdown. No explanation.

User request:
{text}
""".strip()

    r = oa.responses.create(
        model=model_name,
        input=[{"role": "user", "content": prompt}],
    )

    raw = r.output_text.strip()

    try:
        parsed = json.loads(raw)
    except Exception:
        raise RuntimeError(f"parser returned non-json text: {raw[:500]}")

    return parsed

This function gives the model one very narrow job. It's not asking for an opinion or analysis. It's only asking for structured extraction. That matters because we want flexibility at the input layer, but we don't want the whole workflow to become fuzzy.

Once the model returns that JSON, Python takes over and tightens it up.

def enforce_limits(parsed):
    tickers = parsed.get("tickers", [])
    if not isinstance(tickers, list):
        tickers = []

    tickers = [str(x).upper().strip() for x in tickers if str(x).strip()]
    tickers = tickers[:max_tickers]

    lookback_days = parsed.get("lookback_days", 180)
    try:
        lookback_days = int(lookback_days)
    except Exception:
        lookback_days = 180

    if lookback_days < 1:
        lookback_days = 1
    if lookback_days > max_lookback_days:
        lookback_days = max_lookback_days

    thesis = str(parsed.get("thesis", "")).strip()
    if not thesis:
        thesis = "No thesis provided."

    mode = parsed.get("mode", "single")
    if len(tickers) > 1:
        mode = "watchlist"
    else:
        mode = "single"

    return {
        "tickers": tickers,
        "lookback_days": lookback_days,
        "thesis": thesis,
        "mode": mode,
    }

This second function is what keeps the workflow controlled. It cleans the tickers, caps how many we allow in one request, clamps the time window, and makes sure the mode matches the number of tickers. So the model gives us flexibility, while the code gives us boundaries. That combination is important for a build like this.

Fetching the Two Data Sources: Historical & Fundamental Data

Once the request is parsed, the next step is to pull the data that will feed the rest of the workflow. For this version, we only use two sources from EODHD: historical prices and fundamentals. That's enough to test a surprising number of thesis types without making the build unnecessarily wide.

Add these two functions to core.py:

async def fetch_prices(ticker, start_date, end_date, trace_id, state):
    args = {
        "ticker": ticker,
        "start_date": start_date,
        "end_date": end_date,
        "period": "d",
        "order": "a",
        "fmt": "json",
    }

    out, meta = await mcp.call_tool("get_historical_stock_prices", args, trace_id)
    text = to_text(out)

    bump_tool_call(state, meta)

    if not text:
        raise RuntimeError("empty response from get_historical_stock_prices")

    try:
        data = json.loads(text)
    except Exception:
        raise RuntimeError(f"price tool returned non-json text: {text[:300]}")

    if isinstance(data, dict) and data.get("error"):
        raise RuntimeError(data["error"])

    df = pd.DataFrame(data)
    if df.empty:
        return df

    keep = [c for c in ["date", "close"] if c in df.columns]
    df = df[keep].copy()
    df["ticker"] = ticker

    return df

async def fetch_fundamentals(ticker, trace_id, state):
    args = {
        "ticker": ticker,
        "include_financials": False,
        "fmt": "json",
    }

    out, meta = await mcp.call_tool("get_fundamentals_data", args, trace_id)
    text = to_text(out)

    bump_tool_call(state, meta)

    if not text:
        raise RuntimeError("empty response from get_fundamentals_data")

    try:
        data = json.loads(text)
    except Exception:
        raise RuntimeError(f"fundamentals tool returned non-json text: {text[:300]}")

    if isinstance(data, dict) and data.get("error"):
        raise RuntimeError(data["error"])

    return data

fetch_prices() pulls daily historical data for the requested window and reduces it to the fields we actually need right now: date, close, and the ticker itself. That trimmed DataFrame is what we'll later use for return, drawdown, volatility, trend, and other market signals.
fetch_fundamentals() keeps the fundamentals payload as JSON because we'll extract different categories from it in the next sections, including margins, growth, valuation, revisions, and beta.

A couple of details matter here. Both functions run through the same MCP wrapper, so they automatically inherit the timeout, retry, and metadata handling we already built in client.py. Both also call bump_tool_call(), which lets us track how many external calls were made during a single run. That becomes useful later when we want the workflow to stay inspectable rather than feel like a black box.

Building the First Evidence Layer from Price Data

Once the price data is in, the next step is to turn that raw series into something we can actually reason with. For this copilot, price history isn't the final answer, but it is still the first evidence layer. It helps us test claims around downside control, risk, momentum, and the quality of returns.

Add this to core.py:

def compute_price_signals(prices_df):
    if prices_df is None or prices_df.empty:
        return {}

    df = prices_df.copy()
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df["close"] = pd.to_numeric(df["close"], errors="coerce")

    df = df.dropna(subset=["date", "close"]).sort_values("date")
    if df.empty:
        return {}

    close = df["close"]
    rets = close.pct_change().dropna()

    out = {
        "n_points": int(len(close)),
        "start_price": float(close.iloc[0]),
        "end_price": float(close.iloc[-1]),
    }

    if len(close) >= 2:
        out["ret_total"] = float(close.iloc[-1] / close.iloc[0] - 1)

    if not rets.empty:
        vol_daily = float(rets.std())
        vol_annualized = float(vol_daily * np.sqrt(252))

        out["vol_daily"] = vol_daily
        out["vol_annualized"] = vol_annualized

        if vol_annualized > 0 and "ret_total" in out:
            out["ret_to_vol"] = float(out["ret_total"] / vol_annualized)

    peak = close.cummax()
    drawdown = close / peak - 1
    out["max_drawdown"] = float(drawdown.min())

    logp = np.log(close.values)
    x = np.arange(len(logp))
    if len(logp) >= 3:
        out["trend_slope"] = float(np.polyfit(x, logp, 1)[0])
    else:
        out["trend_slope"] = 0.0

    return out

This function gives us a compact set of market signals from a plain close-price series. ret_total tells us how the stock moved over the full window. vol_annualized tells us how noisy that move was. max_drawdown is useful when the thesis talks about downside control. trend_slope gives us a simple directional measure, and ret_to_vol helps us judge return quality instead of looking at raw return alone.

The important point here is that we aren't asking the model to infer all of this from raw prices. We compute it first in Python, so the later reasoning step starts from explicit signals rather than vague interpretation. That makes the whole workflow much more stable.

Building the Second Evidence Layer from Fundamentals

Price data gives us one side of the thesis. The second side comes from fundamentals. This is the part that makes the project stop sounding generic. Once the copilot starts treating fundamentals as actual evidence, instead of just company profile data, the outputs become much more useful.

Add this helper first in core.py:

def _to_float(x):
    if x in (None, "", "NA"):
        return None
    try:
        return float(x)
    except Exception:
        return None

This small function just cleans values before we use them. Fundamentals payloads often contain strings, nulls, or "NA", so it helps to normalize everything early.

Now add the main function:

def compute_fundamental_signals(fundamentals):
    if not isinstance(fundamentals, dict):
        return {}

    general = fundamentals.get("General", {}) or {}
    highlights = fundamentals.get("Highlights", {}) or {}
    valuation = fundamentals.get("Valuation", {}) or {}
    technicals = fundamentals.get("Technicals", {}) or {}

    earnings = fundamentals.get("Earnings", {}) or {}
    trend = earnings.get("Trend", {}) or {}

    latest_trend = None
    if isinstance(trend, dict) and trend:
        latest_key = sorted(trend.keys())[-1]
        latest_trend = trend.get(latest_key, {}) or {}
    else:
        latest_trend = {}

    out = {
        "sector": general.get("Sector"),
        "industry": general.get("Industry"),
        "employees": _to_float(general.get("FullTimeEmployees")),

        "market_cap": _to_float(highlights.get("MarketCapitalization")),
        "pe_ratio": _to_float(highlights.get("PERatio")),
        "peg_ratio": _to_float(highlights.get("PEGRatio")),
        "profit_margin": _to_float(highlights.get("ProfitMargin")),
        "operating_margin": _to_float(highlights.get("OperatingMarginTTM")),
        "roa": _to_float(highlights.get("ReturnOnAssetsTTM")),
        "roe": _to_float(highlights.get("ReturnOnEquityTTM")),
        "revenue_ttm": _to_float(highlights.get("RevenueTTM")),
        "revenue_growth_yoy": _to_float(highlights.get("QuarterlyRevenueGrowthYOY")),
        "earnings_growth_yoy": _to_float(highlights.get("QuarterlyEarningsGrowthYOY")),
        "dividend_yield": _to_float(highlights.get("DividendYield")),

        "trailing_pe": _to_float(valuation.get("TrailingPE")),
        "forward_pe": _to_float(valuation.get("ForwardPE")),
        "price_sales": _to_float(valuation.get("PriceSalesTTM")),
        "price_book": _to_float(valuation.get("PriceBookMRQ")),
        "ev_revenue": _to_float(valuation.get("EnterpriseValueRevenue")),
        "ev_ebitda": _to_float(valuation.get("EnterpriseValueEbitda")),

        "beta": _to_float(technicals.get("Beta")),

        "earnings_estimate_growth": _to_float(latest_trend.get("earningsEstimateGrowth")),
        "revenue_estimate_growth": _to_float(latest_trend.get("revenueEstimateGrowth")),
        "eps_revisions_up_30d": _to_float(latest_trend.get("epsRevisionsUpLast30days")),
        "eps_revisions_down_30d": _to_float(latest_trend.get("epsRevisionsDownLast30days")),
    }

    if out["trailing_pe"] is not None and out["forward_pe"] is not None:
        out["forward_vs_trailing_pe_change"] = out["forward_pe"] - out["trailing_pe"]

    if out["eps_revisions_up_30d"] is not None and out["eps_revisions_down_30d"] is not None:
        out["net_eps_revisions_30d"] = out["eps_revisions_up_30d"] - out["eps_revisions_down_30d"]

    return out

This function pulls together the parts of the fundamentals payload that matter most for thesis testing.

From Highlights, we get profitability, returns on capital, growth, and market cap. From Valuation, we get multiples like trailing P/E, forward P/E, price-to-sales, and EV-based ratios.
From Technicals, we take beta.
From Earnings.Trend, we pick up forward estimate growth and revision data.

These are the fields that let us test claims around business quality, premium justification, valuation, and forward expectations in a much more concrete way.

The last two derived fields are also useful. The gap between forward P/E and trailing P/E gives us a quick way to see whether valuation is easing or staying stretched. Net EPS revisions over the last 30 days tell us whether analyst expectations are improving or deteriorating.

What Do We Have So Far?

At this point, the copilot can parse a thesis, fetch prices and fundamentals, and convert both into two reusable signal layers:

Price signals cover return, volatility, drawdown, trend, and return quality
Fundamentals signals cover margins, returns on capital, growth, valuation, revisions, and beta.

Next, we’ll turn those signals into what a real research workflow needs: supporting evidence, weakening evidence, what’s missing, a verdict, and the final memo.

Classifying the Thesis

Before the copilot can judge a thesis, it first needs to understand what kind of claim is being made.

This matters because not every thesis should be tested the same way. A claim about controlled downside should care more about drawdown and volatility. A claim about business quality should lean more on margins, returns on capital, and growth. A claim about premium justification may need both business quality and valuation context.

So instead of jumping straight from signals to a verdict, we'll add a small classification step. This gives the system a short list of claim types to work with and a cleaner summary of the thesis.

Add this to core.py:

def classify_thesis(thesis):
    prompt = f"""
You are classifying a stock thesis into a few broad claim types.

Return only valid JSON like this:
{{
  "claim_types": ["controlled_downside", "business_quality"],
  "summary": "short restatement of the thesis"
}}

Allowed claim types:
- controlled_downside
- momentum_strength
- low_risk
- high_risk
- valuation_attractive
- valuation_expensive
- business_quality
- weak_business_quality
- premium_justified
- premium_not_justified

Rules:
- pick only the claim types that are clearly relevant
- do not invent extra labels
- if nothing fits strongly, return an empty list
- summary should be short and faithful

Thesis:
{thesis}
""".strip()

    r = oa.responses.create(
        model=model_name,
        input=[{"role": "user", "content": prompt}],
    )

    raw = r.output_text.strip()

    try:
        out = json.loads(raw)
    except Exception:
        raise RuntimeError(f"thesis classifier returned non-json text: {raw[:500]}")

    claim_types = out.get("claim_types", [])
    if not isinstance(claim_types, list):
        claim_types = []

    clean = []
    allowed = {
        "controlled_downside",
        "momentum_strength",
        "low_risk",
        "high_risk",
        "valuation_attractive",
        "valuation_expensive",
        "business_quality",
        "weak_business_quality",
        "premium_justified",
        "premium_not_justified",
    }

    for x in claim_types:
        x = str(x).strip()
        if x in allowed and x not in clean:
            clean.append(x)

    return {
        "claim_types": clean,
        "summary": str(out.get("summary", "")).strip(),
    }

This function keeps the model’s job narrow. It's not being asked to decide whether the thesis is right or wrong. It's only being asked to identify the kind of thesis it's dealing with. That makes the next step much cleaner, because the evidence engine no longer has to treat every prompt the same way.

The validation at the bottom is important too. Even though the model returns the labels, Python still filters them through an allowed set and removes anything unexpected. That keeps this step flexible, but still controlled.

Turning Signals into Support, Contradiction, and Missing Evidence

This is the step where the copilot actually starts reasoning.

Up to this point, we have three things in hand. We have the thesis, we have the claim types, and we have the signal layers built from price data and fundamentals. But none of that is useful on its own unless the system can turn it into a clear argument.

That means it needs to answer three questions for every thesis:

What in the data supports this claim?
What in the data weakens it?
What is still missing before we can judge it properly?

That's exactly what build_evidence_blocks() does. It takes the classified thesis, checks the relevant price and fundamentals signals, and sorts them into three buckets: support, contradiction, and missing evidence.

Add this to core.py:

def build_evidence_blocks(thesis, thesis_tags, price_signals, fundamental_signals):
    evidence_for = []
    evidence_against = []
    missing_evidence = []

    ret_total = price_signals.get("ret_total")
    vol = price_signals.get("vol_annualized")
    dd = price_signals.get("max_drawdown")
    trend = price_signals.get("trend_slope")
    ret_to_vol = price_signals.get("ret_to_vol")

    pe = fundamental_signals.get("pe_ratio") or fundamental_signals.get("trailing_pe")
    forward_pe = fundamental_signals.get("forward_pe")
    beta = fundamental_signals.get("beta")

    profit_margin = fundamental_signals.get("profit_margin")
    operating_margin = fundamental_signals.get("operating_margin")
    roa = fundamental_signals.get("roa")
    roe = fundamental_signals.get("roe")
    revenue_growth = fundamental_signals.get("revenue_growth_yoy")
    earnings_growth = fundamental_signals.get("earnings_growth_yoy")
    earnings_estimate_growth = fundamental_signals.get("earnings_estimate_growth")
    revenue_estimate_growth = fundamental_signals.get("revenue_estimate_growth")
    net_eps_revisions = fundamental_signals.get("net_eps_revisions_30d")

    claim_types = thesis_tags.get("claim_types", [])

    if "controlled_downside" in claim_types:
        if dd is not None:
            if dd > -0.15:
                evidence_for.append(f"Maximum drawdown was relatively contained at {dd:.2%}.")
            else:
                evidence_against.append(f"Maximum drawdown reached {dd:.2%}, which weakens the controlled-downside claim.")
        else:
            missing_evidence.append("No drawdown signal available to test downside control.")

    if "momentum_strength" in claim_types:
        if trend is not None and ret_total is not None:
            if trend > 0 and ret_total > 0:
                evidence_for.append(f"Trend was positive and total return over the window was {ret_total:.2%}.")
            else:
                evidence_against.append("Trend and total return do not strongly support a momentum-strength view.")
        else:
            missing_evidence.append("No usable trend or return signal available to test momentum.")

    if "low_risk" in claim_types:
        if vol is not None:
            if vol < 0.30:
                evidence_for.append(f"Annualized volatility was {vol:.2%}, which supports a lower-risk view.")
            else:
                evidence_against.append(f"Annualized volatility was {vol:.2%}, which weakens a low-risk thesis.")
        else:
            missing_evidence.append("No volatility signal available to test risk.")

    if "high_risk" in claim_types:
        if vol is not None:
            if vol >= 0.30:
                evidence_for.append(f"Annualized volatility was {vol:.2%}, which supports a higher-risk view.")
            else:
                evidence_against.append(f"Annualized volatility was only {vol:.2%}, which does not strongly support a high-risk thesis.")
        else:
            missing_evidence.append("No volatility signal available to test risk.")

    if "valuation_attractive" in claim_types:
        if pe is not None:
            if pe < 20:
                evidence_for.append(f"P/E is {pe:.2f}, which supports a more attractive valuation view.")
            elif pe > 30:
                evidence_against.append(f"P/E is {pe:.2f}, which weakens the attractive-valuation claim.")
        else:
            missing_evidence.append("No P/E metric available to test valuation attractiveness.")

        if forward_pe is not None and pe is not None:
            if forward_pe < pe:
                evidence_for.append(f"Forward P/E ({forward_pe:.2f}) is below trailing P/E ({pe:.2f}), which can support an improving earnings setup.")

    if "valuation_expensive" in claim_types or "premium_not_justified" in claim_types:
        if pe is not None:
            if pe > 30:
                evidence_for.append(f"P/E is {pe:.2f}, which supports an expensive-valuation view.")
            else:
                evidence_against.append(f"P/E is {pe:.2f}, which does not strongly support an expensive-valuation claim.")
        else:
            missing_evidence.append("No P/E metric available to test whether valuation looks expensive.")

    if "business_quality" in claim_types or "premium_justified" in claim_types:
        quality_hits = 0

        if operating_margin is not None:
            if operating_margin >= 0.25:
                evidence_for.append(f"Operating margin is {operating_margin:.2%}, which supports strong business quality.")
                quality_hits += 1
            else:
                evidence_against.append(f"Operating margin is {operating_margin:.2%}, which is not especially strong for a quality claim.")

        if profit_margin is not None:
            if profit_margin >= 0.20:
                evidence_for.append(f"Profit margin is {profit_margin:.2%}, which supports business quality.")
                quality_hits += 1
            else:
                evidence_against.append(f"Profit margin is {profit_margin:.2%}, which weakens a strong-quality thesis.")

        if roa is not None:
            if roa >= 0.10:
                evidence_for.append(f"ROA is {roa:.2%}, which supports efficient asset use.")
                quality_hits += 1
            else:
                evidence_against.append(f"ROA is {roa:.2%}, which does not strongly support a quality claim.")

        if roe is not None:
            if roe >= 0.20:
                evidence_for.append(f"ROE is {roe:.2%}, which supports strong capital efficiency.")
                quality_hits += 1
            else:
                evidence_against.append(f"ROE is {roe:.2%}, which is weaker than expected for a strong-quality thesis.")

        if revenue_growth is not None:
            if revenue_growth > 0:
                evidence_for.append(f"Quarterly revenue growth was {revenue_growth:.2%} YoY, which supports business momentum.")
                quality_hits += 1
            else:
                evidence_against.append(f"Quarterly revenue growth was {revenue_growth:.2%} YoY, which weakens the quality claim.")

        if earnings_growth is not None:
            if earnings_growth > 0:
                evidence_for.append(f"Quarterly earnings growth was {earnings_growth:.2%} YoY, which supports operating strength.")
                quality_hits += 1
            else:
                evidence_against.append(f"Quarterly earnings growth was {earnings_growth:.2%} YoY, which weakens the quality claim.")

        if earnings_estimate_growth is not None:
            if earnings_estimate_growth > 0:
                evidence_for.append(f"Forward earnings estimate growth is {earnings_estimate_growth:.2%}, which supports a healthier forward outlook.")
            else:
                evidence_against.append(f"Forward earnings estimate growth is {earnings_estimate_growth:.2%}, which weakens the quality argument.")

        if revenue_estimate_growth is not None:
            if revenue_estimate_growth > 0:
                evidence_for.append(f"Forward revenue estimate growth is {revenue_estimate_growth:.2%}, which supports ongoing business strength.")
            else:
                evidence_against.append(f"Forward revenue estimate growth is {revenue_estimate_growth:.2%}, which weakens the quality argument.")

        if net_eps_revisions is not None:
            if net_eps_revisions > 0:
                evidence_for.append(f"Net EPS revisions over the last 30 days are positive ({net_eps_revisions:.0f}), which supports improving expectations.")
            elif net_eps_revisions < 0:
                evidence_against.append(f"Net EPS revisions over the last 30 days are negative ({net_eps_revisions:.0f}), which weakens the thesis.")

        if quality_hits == 0:
            missing_evidence.append("This version could not extract enough direct business-quality metrics to test the quality claim.")

    if "weak_business_quality" in claim_types:
        if operating_margin is not None and operating_margin < 0.15:
            evidence_for.append(f"Operating margin is only {operating_margin:.2%}, which supports a weaker-quality view.")
        if profit_margin is not None and profit_margin < 0.10:
            evidence_for.append(f"Profit margin is only {profit_margin:.2%}, which supports a weaker-quality view.")
        if revenue_growth is not None and revenue_growth <= 0:
            evidence_for.append(f"Revenue growth is {revenue_growth:.2%} YoY, which supports a weaker-quality view.")
        if earnings_growth is not None and earnings_growth <= 0:
            evidence_for.append(f"Earnings growth is {earnings_growth:.2%} YoY, which supports a weaker-quality view.")

    if beta is not None:
        if beta > 1.2:
            evidence_against.append(f"Beta is {beta:.2f}, which suggests above-market sensitivity.")
        elif beta < 0.9:
            evidence_for.append(f"Beta is {beta:.2f}, which suggests below-market sensitivity.")
    else:
        missing_evidence.append("No beta value available.")

    if ret_to_vol is None:
        missing_evidence.append("No return-to-volatility signal available.")

    if not evidence_for and not evidence_against:
        missing_evidence.append("The current data is not enough to strongly support or reject the thesis.")

    return {
        "thesis": thesis,
        "thesis_summary": thesis_tags.get("summary", ""),
        "claim_types": claim_types,
        "evidence_for": evidence_for,
        "evidence_against": evidence_against,
        "missing_evidence": list(dict.fromkeys(missing_evidence)),
    }

The function looks long, but the logic is simple once you break it down.

It starts by pulling the signals it needs from the two evidence layers that we built earlier. Then it checks the thesis tags one by one. If the thesis is about controlled downside, it looks at drawdown. If it's about risk, it looks at volatility and beta. If't is about business quality, it leans on margins, returns on capital, growth, and revisions. If it's about valuation, it checks multiples like P/E and the relationship between forward and trailing valuation.

That's the key shift in this project. The copilot is no longer just collecting data. It's deciding which parts of the EODHD-backed signal set actually matter for the thesis in front of it.

The three output buckets are what make this useful.

evidence_for holds the points that support the claim.
evidence_against holds the points that weaken it.
missing_evidence makes the gaps explicit instead of letting the system sound more confident than it should.

That's what makes this feel like a thesis-testing workflow rather than a polished stock summary.

Sanity Check (Jupyter Notebook)

Run this code inside test.ipynb for a quick sanity check:

import uuid
from core import (
    fetch_prices,
    fetch_fundamentals,
    compute_price_signals,
    classify_thesis,
    build_evidence_blocks,
    make_state
)
import json

trace_id = uuid.uuid4().hex[:10]
state = make_state()

thesis = "Apple looks attractive because downside has been controlled and business quality remains high."

prices = await fetch_prices("AAPL.US", "2026-01-01", "2026-04-01", trace_id, state)
funds = await fetch_fundamentals("AAPL.US", trace_id, state)

signals = compute_price_signals(prices)
tags = classify_thesis(thesis)
evidence = build_evidence_blocks(thesis, tags, signals, funds)

print(tags)
print(json.dumps(evidence, indent=2))

Expected Output:

Assigning a Verdict

Once the evidence is structured, the copilot still needs one more layer before it can write a memo. It needs a controlled way to label the thesis.

That's the job of decide_verdict(). It looks at how much evidence supports the thesis, how much weakens it, and whether the claim still depends on missing business-quality or valuation evidence. The goal here isn't to create a perfect scoring model. It's to make sure the system doesn't jump from a few evidence strings straight into a confident conclusion.

Add this to core.py:

def decide_verdict(evidence, claim_types=None):
    claim_types = claim_types or []

    evidence_for = evidence.get("evidence_for", [])
    evidence_against = evidence.get("evidence_against", [])
    missing = evidence.get("missing_evidence", [])

    n_for = len(evidence_for)
    n_against = len(evidence_against)
    n_missing = len(missing)

    quality_claim = any(x in claim_types for x in ["business_quality", "weak_business_quality", "premium_justified", "premium_not_justified"])
    valuation_claim = any(x in claim_types for x in ["valuation_attractive", "valuation_expensive", "premium_justified", "premium_not_justified"])

    if n_for == 0 and n_against == 0:
        return {
            "verdict": "unresolved_due_to_missing_evidence",
            "reason": "There is not enough usable evidence to test the thesis.",
        }

    if quality_claim and n_missing >= 1:
        if n_against > 0:
            return {
                "verdict": "weakly_supported",
                "reason": "Some evidence supports the thesis, but direct business-quality evidence is missing and contradictory signals remain.",
            }
        return {
            "verdict": "partially_supported",
            "reason": "Part of the thesis is supported, but direct business-quality evidence is missing.",
        }

    if valuation_claim and n_missing >= 1:
        return {
            "verdict": "unresolved_due_to_missing_evidence",
            "reason": "The thesis depends on valuation evidence that is not available in this version.",
        }

    if n_for > 0 and n_against == 0:
        if n_missing >= 2:
            return {
                "verdict": "partially_supported",
                "reason": "The available evidence supports the thesis, but important evidence is still missing.",
            }
        return {
            "verdict": "supported",
            "reason": "The available evidence mainly supports the thesis.",
        }

    if n_against > 0 and n_for == 0:
        return {
            "verdict": "not_supported",
            "reason": "The available evidence mainly weakens the thesis.",
        }

    if n_for > n_against:
        return {
            "verdict": "partially_supported",
            "reason": "There is more supporting evidence than contradicting evidence, but the thesis is not fully confirmed.",
        }

    if n_against >= n_for:
        return {
            "verdict": "weakly_supported",
            "reason": "Contradicting evidence is meaningful enough that the thesis is only weakly supported.",
        }

    return {
        "verdict": "unresolved_due_to_missing_evidence",
        "reason": "The evidence is mixed and does not clearly resolve the thesis.",
    }

The logic here is intentionally simple. It doesn't try to do fine-grained scoring. Instead, it uses the shape of the evidence to decide whether the thesis is supported, partially supported, weakly supported, not supported, or still unresolved.

A couple of checks matter more than the rest. If the thesis depends on business-quality or valuation evidence and that evidence is still missing, the verdict gets capped early instead of sounding stronger than it should. That is important because a thesis can look convincing on price behavior alone, but still be incomplete if the claim depends on fundamentals that aren't actually present.

The other useful thing about this function is that it returns both a short label and a reason. That makes the final output easier to understand later, and it also gives the memo-writing step something cleaner to work from than a bare category.

Building the Facts Object

Before the memo gets written, the system first puts everything into one structured object. That object becomes the single source of truth for the final output. Instead of handing the model a mix of scattered variables, we'll give it one clean package containing the thesis, signals, company context, evidence, and verdict.

1. Company Context

We’ll start with a small helper that pulls the basic company context from the fundamentals payload.

Add this to core.py:

def extract_company_context(fundamentals):
    if not isinstance(fundamentals, dict):
        return {}

    gen = fundamentals.get("General", {}) or {}

    out = {
        "name": gen.get("Name"),
        "code": gen.get("Code"),
        "exchange": gen.get("Exchange"),
        "sector": gen.get("Sector"),
        "industry": gen.get("Industry"),
        "country": gen.get("CountryName"),
        "market_cap": gen.get("MarketCapitalization"),
        "pe_ratio": gen.get("PERatio"),
        "beta": gen.get("Beta"),
        "dividend_yield": gen.get("DividendYield"),
        "description": gen.get("Description"),
    }

    clean = {}
    for k, v in out.items():
        if v not in (None, "", "NA"):
            clean[k] = v

    return clean

This function is just a cleanup step. It gives us a compact company context block that can later sit alongside the price and fundamentals signals without dragging the full fundamentals payload into the memo layer.

2. Single-Stock Facts Builder

Now add the single-stock facts builder:

def build_thesis_facts(parsed, ticker, signals, fundamentals, thesis_tags, evidence):
    company = extract_company_context(fundamentals)

    facts = {
        "type": "single_name_thesis_test",
        "ticker": ticker,
        "lookback_days": parsed["lookback_days"],
        "thesis": parsed["thesis"],
        "thesis_summary": thesis_tags.get("summary", ""),
        "claim_types": thesis_tags.get("claim_types", []),
        "market_signals": {
            "ret_total": signals.get("ret_total"),
            "vol_annualized": signals.get("vol_annualized"),
            "max_drawdown": signals.get("max_drawdown"),
            "trend_slope": signals.get("trend_slope"),
            "ret_to_vol": signals.get("ret_to_vol"),
            "start_price": signals.get("start_price"),
            "end_price": signals.get("end_price"),
            "n_points": signals.get("n_points"),
        },
        "company_context": {
            "name": company.get("name"),
            "exchange": company.get("exchange"),
            "sector": company.get("sector"),
            "industry": company.get("industry"),
            "country": company.get("country"),
            "market_cap": company.get("market_cap"),
            "pe_ratio": company.get("pe_ratio"),
            "beta": company.get("beta"),
            "dividend_yield": company.get("dividend_yield"),
        },
        "description": company.get("description"),
        "evidence_for": evidence.get("evidence_for", []),
        "evidence_against": evidence.get("evidence_against", []),
        "missing_evidence": evidence.get("missing_evidence", []),
    }

    facts["verdict"] = decide_verdict(evidence, thesis_tags.get("claim_types", []))
    return facts

This is the main facts object for a single-stock thesis. It pulls together the parsed thesis, the market signals, the basic company context, the evidence buckets, and the verdict. At this point, the copilot has already done the reasoning work. The memo isn't deciding anything new. It's just writing from this object.

3. Watchlist Facts Builder

Now add the watchlist version:

def build_watchlist_facts(parsed, tickers, signals_by_ticker, fundamentals_by_ticker, thesis_tags, evidence_by_ticker):
    per_ticker = {}

    for t in tickers:
        company = extract_company_context(fundamentals_by_ticker.get(t, {}))
        signals = signals_by_ticker.get(t, {})
        evidence = evidence_by_ticker.get(t, {})

        per_ticker[t] = {
            "company_context": {
                "name": company.get("name"),
                "sector": company.get("sector"),
                "industry": company.get("industry"),
                "market_cap": company.get("market_cap"),
                "pe_ratio": company.get("pe_ratio"),
                "beta": company.get("beta"),
            },
            "market_signals": {
                "ret_total": signals.get("ret_total"),
                "vol_annualized": signals.get("vol_annualized"),
                "max_drawdown": signals.get("max_drawdown"),
                "trend_slope": signals.get("trend_slope"),
                "ret_to_vol": signals.get("ret_to_vol"),
            },
            "evidence_for": evidence.get("evidence_for", []),
            "evidence_against": evidence.get("evidence_against", []),
            "missing_evidence": evidence.get("missing_evidence", []),
            "verdict": decide_verdict(evidence, thesis_tags.get("claim_types", []))
        }

    facts = {
        "type": "watchlist_thesis_test",
        "tickers": tickers,
        "lookback_days": parsed["lookback_days"],
        "thesis": parsed["thesis"],
        "thesis_summary": thesis_tags.get("summary", ""),
        "claim_types": thesis_tags.get("claim_types", []),
        "per_ticker": per_ticker,
    }

    return facts

This version does the same thing, but across multiple tickers. Instead of one top-level evidence block, it stores a per-ticker structure so the memo layer can later compare names without needing to reconstruct anything.

That is the main reason this section matters. By the time we reach the memo step, we no longer want to pass loose values around. We want one structured object that already contains:

the thesis
the relevant signals
the company context
the evidence buckets
the verdict

That keeps the final writing step much cleaner and makes the whole workflow easier to debug.

Sanity Check (Jupyter Notebook)

Run this code inside test.ipynb for a quick sanity check:

from core import build_thesis_facts, extract_company_context

facts = build_thesis_facts(
    parsed={
        "tickers": ["AAPL"],
        "lookback_days": 180,
        "thesis": "Apple looks attractive because downside has been controlled and business quality remains high.",
        "mode": "single"
    },
    ticker="AAPL.US",
    signals=signals,
    fundamentals=funds,
    thesis_tags=tags,
    evidence=evidence
)

print(json.dumps(facts, indent=2))

Expected Output:

{
  "type": "single_name_thesis_test",
  "ticker": "AAPL.US",
  "lookback_days": 180,
  "thesis": "Apple looks attractive because downside has been controlled and business quality remains high.",
  "thesis_summary": "Apple is attractive due to controlled downside and strong business quality",
  "claim_types": [
    "controlled_downside",
    "business_quality"
  ],
  "market_signals": {
    "ret_total": -0.05675067340688533,
    "vol_annualized": 0.2504818805125429,
    "max_drawdown": -0.11322450740687473,
    "trend_slope": -0.0005437843809243782,
    "ret_to_vol": -0.22656598270006817,
    "start_price": 271.01,
    "end_price": 255.63,
    "n_points": 62
  },
  "company_context": {
    "name": "Apple Inc",
    "exchange": "NASDAQ",
    "sector": "Technology",
    "industry": "Consumer Electronics",
    "country": "USA",
    "market_cap": null,
    "pe_ratio": null,
    "beta": null,
    "dividend_yield": null
  },
  "description": "Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple Vision Pro, Apple TV, Apple Watch, Beats products, and HomePod, as well as Apple branded and third-party accessories. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts, as well as advertising services include third-party licensing arrangements and its own advertising platforms. In addition, the company offers various subscription-based services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription news and magazine service; Apple TV, which offers exclusive original content and live sports; Apple Card, a co-branded credit card; and Apple Pay, a cashless payment service, as well as licenses its intellectual property. The company serves consumers, and small and mid-sized businesses; and the education, enterprise, and government markets. It distributes third-party applications for its products through the App Store. The company also sells its products through its retail and online stores, and direct sales force; and third-party cellular network carriers and resellers. The company was formerly known as Apple Computer, Inc. and changed its name to Apple Inc. in January 2007. Apple Inc. was founded in 1976 and is headquartered in Cupertino, California.",
  "evidence_for": [
    "Maximum drawdown was relatively contained at -11.32%."
  ],
  "evidence_against": [],
  "missing_evidence": [
    "This version does not include direct business-quality metrics such as margins, growth, cash flow, or return on capital.",
    "Only basic company context is available, which is not enough on its own to confirm business quality.",
    "No beta value available."
  ],
  "verdict": {
    "verdict": "partially_supported",
    "reason": "Part of the thesis is supported, but direct business-quality evidence is missing."
  }
}

Writing the Final Memo

At this point, the hard part is already done.

By the time we reach the memo step, the copilot already has a structured facts object with the thesis, claim types, market signals, company context, evidence buckets, and verdict. So this final function isn't where the reasoning happens. It's just the presentation layer that turns that structured judgment into something readable.

Add this to core.py:

def write_thesis_memo(facts):
    prompt = f"""
You are writing a short financial research memo.

Write using only the facts provided below.
Do not invent numbers, events, comparisons, or opinions beyond the supplied evidence.
If evidence is missing, say so clearly.

Use this exact structure:

1. Thesis under review
2. Supporting evidence
3. Evidence that weakens the thesis
4. Missing evidence
5. Verdict
6. Bottom-line assessment

Style rules:
- Keep it concise
- Keep it analytical and professional
- No bullet points unless necessary
- No hype
- No generic investment disclaimer language
- The bottom-line assessment should be balanced and evidence-based
- The verdict section must explicitly use the supplied verdict

Facts:
{json.dumps(facts, indent=2, default=str)}
""".strip()

    r = oa.responses.create(
        model=model_name,
        input=[{"role": "user", "content": prompt}],
    )

    return r.output_text.strip()

This function keeps the model boxed into one narrow task. It's not being asked to look at raw price history, raw fundamentals, or scattered variables. It's being asked to write from one clean facts object that already contains the judgment.

That separation matters because it keeps the final memo grounded. The model isn't deciding what it thinks about the stock at the last second. It's simply turning the structured output of the earlier steps into a short research note.

The prompt is also deliberately strict. It fixes the memo structure, tells the model not to invent anything, and makes the verdict explicit instead of leaving it implied. That helps the final output stay consistent even when the underlying thesis changes.

Sanity Check (Jupyter Notebook)

You can test it with a facts object from the previous section:

from core import write_thesis_memo

memo = write_thesis_memo(facts)
print(memo)

Expected Output:

Stitching Everything Together

At this point, all the individual pieces are ready. We have the parser, the data fetchers, the signal builders, the thesis classifier, the evidence engine, the verdict layer, and the memo writer. The only thing left is to connect them into one end-to-end function.

Add this to core.py:

async def run_thesis_copilot(user_text):
    trace_id = uuid.uuid4().hex[:10]
    log_event("request_started", trace_id, text=user_text)

    parsed = enforce_limits(parse_request(user_text))
    tickers = parsed["tickers"]

    if not tickers:
        return {
            "memo": "No valid ticker was found in the request.",
            "facts": {},
            "data_used": {},
            "tool_trace_id": trace_id,
        }

    log_event(
        "parsed",
        trace_id,
        tickers=tickers,
        lookback_days=parsed["lookback_days"],
        mode=parsed["mode"],
        thesis=parsed["thesis"],
    )

    start_date, end_date = get_dates_from_lookback(parsed["lookback_days"])
    state = make_state()

    try:
        thesis_tags = classify_thesis(parsed["thesis"])

        if parsed["mode"] == "single":
            ticker = tickers[0]
            ticker_full = ticker if "." in ticker else f"{ticker}.US"

            log_event(
                "tool_phase",
                trace_id,
                mode="single",
                ticker=ticker_full,
                start_date=start_date,
                end_date=end_date,
            )

            prices = await fetch_prices(ticker_full, start_date, end_date, trace_id, state)
            funds = await fetch_fundamentals(ticker_full, trace_id, state)

            price_signals = compute_price_signals(prices)
            fundamental_signals = compute_fundamental_signals(funds)

            evidence = build_evidence_blocks(
                parsed["thesis"],
                thesis_tags,
                price_signals,
                fundamental_signals
            )

            facts = build_thesis_facts(
                parsed,
                ticker_full,
                price_signals,
                funds,
                thesis_tags,
                evidence
            )

            facts["fundamental_signals"] = fundamental_signals

            memo = write_thesis_memo(facts)

            out = {
                "memo": memo,
                "facts": facts,
                "data_used": {
                    "tickers": [ticker_full],
                    "date_range": [start_date, end_date],
                    "tools_called": [x.get("tool") for x in state["tool_trace"]],
                    "tool_calls": state["tool_calls"],
                },
                "tool_trace_id": trace_id,
            }

            log_event("request_finished", trace_id, tool_calls=state["tool_calls"])
            return out

        ticker_full = [x if "." in x else f"{x}.US" for x in tickers]

        log_event(
            "tool_phase",
            trace_id,
            mode="watchlist",
            tickers=ticker_full,
            start_date=start_date,
            end_date=end_date,
        )

        signals_by_ticker = {}
        funds_by_ticker = {}
        evidence_by_ticker = {}

        for t in ticker_full:
            prices = await fetch_prices(t, start_date, end_date, trace_id, state)
            funds = await fetch_fundamentals(t, trace_id, state)

            price_signals = compute_price_signals(prices)
            fundamental_signals = compute_fundamental_signals(funds)

            evidence = build_evidence_blocks(
                parsed["thesis"],
                thesis_tags,
                price_signals,
                fundamental_signals
            )

            signals_by_ticker[t] = {
                **price_signals,
                "fundamental_signals": fundamental_signals
            }
            funds_by_ticker[t] = funds
            evidence_by_ticker[t] = evidence

        facts = build_watchlist_facts(
            parsed,
            ticker_full,
            signals_by_ticker,
            funds_by_ticker,
            thesis_tags,
            evidence_by_ticker,
        )

        memo = write_thesis_memo(facts)

        out = {
            "memo": memo,
            "facts": facts,
            "data_used": {
                "tickers": ticker_full,
                "date_range": [start_date, end_date],
                "tools_called": [x.get("tool") for x in state["tool_trace"]],
                "tool_calls": state["tool_calls"],
            },
            "tool_trace_id": trace_id,
        }

        log_event("request_finished", trace_id, tool_calls=state["tool_calls"])
        return out

    except Exception as e:
        detail = repr(e)
        if hasattr(e, "exceptions"):
            detail = detail + " | " + " ; ".join([repr(x) for x in e.exceptions])

        log_event("request_failed", trace_id, err=detail)

        return {
            "memo": f"failed: {e}",
            "facts": {},
            "data_used": {
                "tickers": tickers,
                "date_range": [start_date, end_date],
                "tools_called": [x.get("tool") for x in state["tool_trace"]],
                "tool_calls": state["tool_calls"],
            },
            "tool_trace_id": trace_id,
        }

This function is just the full workflow in one place. It parses the request, fetches the data, computes the two signal layers, builds the evidence, assembles the facts object, writes the memo, and returns everything in a clean output.

The useful part is that it returns more than just the memo. It also returns the structured facts object, the tools that were used, the date range, and the trace ID. That keeps the final result inspectable instead of turning the copilot into a black box.

Demo Time! (Jupyter Notebook)

Demo 1: Testing Whether a Premium Is Actually Justified

This is a good first demo because it pushes the copilot beyond a basic single-stock check. The prompt isn't asking whether NVIDIA is a good company in general. It's asking whether NVIDIA’s premium over AMD can actually be defended using market behavior and business quality.

Here's the prompt:

from core import run_thesis_copilot

q = """
Between NVDA and AMD, I think NVDA's premium is still justified by stronger market behavior and business quality.
Check that over the last 6 months.
""".strip()

result = await run_thesis_copilot(q)

print(result["memo"])
print(result["data_used"])

And here's the output:

What makes this output useful is that it doesn't flatten the result into a simple yes or no. NVIDIA clearly looks stronger on business quality, but market behavior isn't as convincing, and the lack of direct valuation data stops the copilot from overclaiming.

This is the kind of behavior we want. The system isn't just comparing two companies. It's testing whether the specific claim about a premium actually holds up.

Demo 2: Testing Whether Volatility Is Too High for the Underlying Business

The second demo shifts back to a single-stock thesis, but the claim is different. This time, the question isn't whether the company looks attractive. It's whether the stock is more volatile than the underlying business quality would justify.

Here's the prompt:

q = """
TSLA feels too volatile for the underlying business quality.
Test that thesis over the last year.
""".strip()

result = await run_thesis_copilot(q)

print(result["memo"])
print(result["data_used"])

And here's the output:

This result is useful because it shows a more conflicted thesis. Tesla’s recent returns and forward growth expectations offer some support, but the current profitability, recent operating trends, revisions, and volatility profile all push back against the idea that the business quality is strong enough to fully justify that risk.

So the final verdict lands where it should: not as a clean confirmation, but as a weakly supported thesis.

Final Thoughts

At this point, the copilot already does the most important part well. It can take a natural-language thesis, pull the right market and fundamentals data through EODHD’s MCP layer, turn those inputs into structured evidence, and return a research memo that's much more disciplined than a normal stock summary.

At the same time, this version still has clear limits. It doesn't yet go deeper into statement-level accounting logic, it doesn't use news or catalyst context, and its handling of relative valuation can still be stronger for more demanding comparison cases.

But even with those limits, the shift here is already meaningful. The real change wasn't just connecting a model to financial data. It was moving from summarizing stocks to testing claims.

How to Build a Positioning-Based Crude Oil Strategy in Python [Full Handbook]

Nikhil Adithyan — Fri, 10 Apr 2026 15:57:19 +0000

Commitment of Traders (COT) data gets referenced a lot in commodity trading, especially when people talk about crowded positioning, speculative sentiment, or reversal risk. But most of that discussion stays at the idea level. It rarely becomes a rule that can actually be tested.

That was the starting point for this project.

I wanted to see whether crude oil positioning data could be turned into something more useful than a vague market read. Not a polished macro narrative. An actual strategy framework that could be coded, tested, and challenged.

The goal here was not to begin with a finished strategy. It was to start with a reasonable hypothesis, build the signal step by step, and see what survived once the data was involved.

For this, I used FinancialModelingPrep’s Commitment of Traders data along with historical West Texas Intermediate (WTI) crude oil prices. The first idea was simple: if speculative positioning becomes extreme, maybe that tells us something about what crude oil might do next. But as the build progressed, that idea had to be narrowed, filtered, and reworked before it became usable.

So this article is not a clean showcase of a strategy that worked on the first try. It's the full process of getting there.

Prerequisites
The Initial Idea: Use Positioning Extremes to Define Market Regimes
Importing Packages
Pulling the Data: COT + WTI Crude Prices using FMP APIs
Turning Raw COT Data Into Usable Features
Building the First Version of the Regime Model
First Test: What Happens After Each Regime?
Looking at the Regimes More Closely
Narrowing the Focus: Keeping Two Extra Variants for Comparison
Building the First Trade Rules
Comparing Bullish Unwind Against Buy-and-Hold
Adding a Trend Filter
Stress-Testing the Setup
The Final Strategy
Further Improvements
Conclusion

Prerequisites

To follow along with this article, you'll need a basic familiarity with Python and the pandas library, as we'll do most of the data manipulation and analysis using DataFrames. The following packages should be installed in your environment: requests, numpy, pandas, and matplotlib.

You'll also need a FinancialModelingPrep API key required to pull both the COT and WTI crude oil price data. If you don't have one, you can register for a free account on the FinancialModelingPrep website.

Finally, a general understanding of what the Commitment of Traders report is and what non-commercial positioning represents will help you follow the reasoning behind the signal construction, though it's not strictly necessary to get value from the code itself.

This article also assumes some baseline familiarity with financial markets and trading concepts. If terms like long and short positioning, open interest, or speculative sentiment are unfamiliar, it may be worth spending a little time with those before diving in.

The Initial Idea: Use Positioning Extremes to Define Market Regimes

The first version of the idea was not a trading rule. It was a framework.

If speculative positioning in crude oil becomes extreme, that probably means different things depending on what happens next. A market that is heavily long and still getting more crowded is not the same as a market that is heavily long but starting to unwind. The same logic applies on the bearish side too.

So instead of forcing one blunt signal like “extreme long means short” or “extreme short means buy,” I started by splitting the market into regimes.

The two variables I used were simple. First, how extreme positioning is relative to recent history. Second, whether that positioning is still building or starting to reverse.

That gave me four possible states:

bullish buildup
bullish unwind
bearish buildup
bearish unwind

This felt like a better starting point than jumping straight into a strategy. It let me treat COT data as a way to describe market state first, then test whether any of those states actually led to useful price behavior.

At this stage, I still didn't know whether any of these regimes would hold up. The point was just to create a structure that could be tested properly.

Importing Packages

We’ll keep the packages import minimal and simple.

import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (14,6)
plt.style.use("ggplot")

api_key = "YOUR FMP API KEY"
base_url = "https://financialmodelingprep.com/stable"

Nothing fancy here. Make sure to replace YOUR FMP API KEY with your actual FMP API key. If you don’t have one, you can obtain it by opening a FMP developer account.

Pulling the Data: COT + WTI Crude Prices using FMP APIs

To build this strategy, I needed two datasets. First, I needed COT data for crude oil. Second, I needed historical WTI crude oil prices.

I started with the COT market list to identify the correct crude oil contract.

url = f"{base_url}/commitment-of-traders-list?apikey={api_key}"
r = requests.get(url)
cot_list = pd.DataFrame(r.json())

crude_candidates = cot_list[
    cot_list.astype(str)
    .apply(lambda col: col.str.contains("crude", case=False, na=False))
    .any(axis=1)
]

crude_candidates

This gives a filtered list of crude-related contracts from the COT universe. In this case, the key contract I used was CL.

cot_symbol = "CL"
start_date = "2010-01-01"
end_date = "2026-03-20"

url = f"{base_url}/commitment-of-traders-report?symbol={cot_symbol}&from={start_date}&to={end_date}&apikey={api_key}"
r = requests.get(url)

cot_df = pd.DataFrame(r.json())
cot_df["date"] = pd.to_datetime(cot_df["date"])
cot_df = cot_df.sort_values("date").drop_duplicates(subset="date").reset_index(drop=True)
cot_df = cot_df.rename(columns={"date": "cot_date"})

cot_df.head()

This returns the weekly COT records for crude oil:

The main fields I needed later were:

date
openInterestAll
noncommPositionsLongAll
noncommPositionsShortAll

Next, I pulled the WTI crude oil price data using FMP’s commodity price endpoint.

price_symbol = "CLUSD"
start_date = "2010-01-01"
end_date = "2026-03-20"

url = f"{base_url}/historical-price-eod/full?symbol={price_symbol}&from={start_date}&to={end_date}&apikey={api_key}"
r = requests.get(url)

price_df = pd.DataFrame(r.json())
price_df["date"] = pd.to_datetime(price_df["date"])
price_df = price_df.sort_values("date").drop_duplicates(subset="date").reset_index(drop=True)

price_df

Since the COT dataset is weekly, I converted the price series into weekly bars using the Friday close.

price_df["date"] = pd.to_datetime(price_df["date"])
price_df = price_df.sort_values("date").drop_duplicates(subset="date").reset_index(drop=True)

weekly_price = price_df.set_index("date").resample("W-FRI").agg({
    "symbol": "last",
    "open": "first",
    "high": "max",
    "low": "min",
    "close": "last",
    "volume": "sum",
    "vwap": "mean"
}).dropna().reset_index()

weekly_price["weekly_return"] = weekly_price["close"].pct_change()
weekly_price = weekly_price.rename(columns={"date": "price_date"})

weekly_price

This step matters because the two datasets need to live on the same time scale. If I kept prices daily while COT stayed weekly, the signal alignment would become messy very quickly.

Finally, I aligned each COT observation with the next weekly WTI price bar.

merged_df = pd.merge_asof(
    cot_df.sort_values("cot_date"),
    weekly_price.sort_values("price_date"),
    left_on="cot_date",
    right_on="price_date",
    direction="forward"
)

merged_df[["cot_date", "price_date", "close", "weekly_return", "openInterestAll", "noncommPositionsLongAll", "noncommPositionsShortAll"]]

The output is one clean working table with:

the COT report date
the matched WTI weekly price date
weekly crude price data
the main positioning fields needed for feature engineering

That is the full base dataset for the strategy. With this in place, the next step is to turn the raw positioning data into something more useful.

Turning Raw COT Data Into Usable Features

At this point, the raw data was ready, but it still wasn't useful as a signal. The COT report gives positioning numbers, but those numbers by themselves don't say much unless they're turned into something comparable over time.

So the next step was to build a few features that could describe positioning in a more meaningful way.

I started with the net non-commercial position. This is just the difference between non-commercial longs and non-commercial shorts.

merged_df["net_position"] = merged_df["noncommPositionsLongAll"] - merged_df["noncommPositionsShortAll"]

This gives the raw speculative bias. A positive value means non-commercial traders are net long. A negative value means they're net short.

But raw net positioning has a problem. The size of the market changes over time, so a value that looked extreme in one period may not mean the same thing in another. To fix that, I normalized it by open interest.

merged_df["net_position_ratio"] = merged_df["net_position"] / merged_df["openInterestAll"]

This made the signal much more useful. Instead of looking at absolute positioning, I was now looking at positioning as a share of the total market.

Next, I needed to know whether that positioning was still building or starting to unwind. For that, I calculated the week-over-week change in the ratio.

merged_df["net_position_ratio_change"] = merged_df["net_position_ratio"].diff()

This was important because the direction of change adds context. An extreme long position that's still increasing isn't the same as an extreme long position that has started to fall.

The last feature was the most important one: a rolling percentile of the positioning ratio. I used a 104-week window.

def rolling_percentile(x):
    return pd.Series(x).rank(pct=True).iloc[-1]

merged_df["position_percentile_104"] = merged_df["net_position_ratio"].rolling(104).apply(rolling_percentile)

This tells us how extreme the current positioning is relative to the last two years. A value above 0.80 means the market is in the top 20% of bullish positioning relative to that recent history. A value below 0.20 means the market is in the bottom 20%.

After adding all four features, I checked the output.

merged_df[["cot_date","price_date","net_position","net_position_ratio","net_position_ratio_change","position_percentile_104"]]

The first few rows of net_position_ratio_change were NaN, which is expected since the first row has no prior week to compare with. The first 103 rows of position_percentile_104 were also NaN because the rolling window needs 104 weeks of history before it can calculate the percentile.

That was fine. What mattered was that the dataset now had four usable pieces:

raw speculative positioning
normalized positioning
weekly change in positioning
a rolling measure of how extreme that positioning is

This was the point where the COT data stopped being just a table of trader positions and started becoming something that could be turned into a regime model.

Building the First Version of the Regime Model

Once the features were ready, the next step was to turn them into actual market states.

The main idea was simple: positioning extremes on their own aren't enough. A market can stay heavily long or heavily short for a long time. What matters more is what happens while positioning is extreme. Is it still building, or has it started to reverse?

That's why I used two dimensions:

the 104-week positioning percentile
the weekly change in the positioning ratio

With those two variables, I defined four regimes.

merged_df["regime"] = "neutral"

merged_df.loc[(merged_df["position_percentile_104"] > 0.8) & (merged_df["net_position_ratio_change"] > 0), "regime"] = "bullish_buildup"
merged_df.loc[(merged_df["position_percentile_104"] > 0.8) & (merged_df["net_position_ratio_change"] < 0), "regime"] = "bullish_unwind"
merged_df.loc[(merged_df["position_percentile_104"] < 0.2) & (merged_df["net_position_ratio_change"] < 0), "regime"] = "bearish_buildup"
merged_df.loc[(merged_df["position_percentile_104"] < 0.2) & (merged_df["net_position_ratio_change"] > 0), "regime"] = "bearish_unwind"

Here's what each one means:

bullish buildup: positioning is already very bullish, and it's still getting more bullish
bullish unwind: positioning is very bullish, but that bullishness has started to fade
bearish buildup: positioning is already very bearish, and it's still getting more bearish
bearish unwind: positioning is very bearish, but that bearishness has started to ease

Anything that didn't meet one of those extreme conditions stayed in the neutral bucket.

After assigning the regimes, I checked how many observations fell into each one.

print(merged_df["regime"].value_counts())

This output matters because it tells us whether the framework is usable or too sparse. In this case, neutral was still the largest group, which is expected. Most weeks shouldn't be extreme. The four regime buckets were smaller, but still had enough observations to test properly.

I also looked at a sample of the classified rows.

merged_df[["cot_date","price_date","net_position_ratio","net_position_ratio_change","position_percentile_104","regime"]].tail(10)

At this point, the raw COT data had been turned into a regime model. The next question was whether any of these regimes actually led to useful price behavior.

First Test: What Happens After Each Regime?

At this point, I had a regime framework, but not a strategy. Before turning any of these states into trades, I wanted to know what crude oil actually did after each one.

So the next step was to measure forward returns after every regime over four holding windows:

1 week
2 weeks
4 weeks
8 weeks

I started by creating the forward return columns from the weekly close series.

merged_df["fwd_return_1w"] = merged_df["close"].shift(-1) / merged_df["close"] - 1
merged_df["fwd_return_2w"] = merged_df["close"].shift(-2) / merged_df["close"] - 1
merged_df["fwd_return_4w"] = merged_df["close"].shift(-4) / merged_df["close"] - 1
merged_df["fwd_return_8w"] = merged_df["close"].shift(-8) / merged_df["close"] - 1

merged_df[["cot_date","price_date","close","regime","fwd_return_1w","fwd_return_2w","fwd_return_4w","fwd_return_8w"]].tail(12)

Each of these columns answers a simple question. If crude oil is in a given regime this week, what happens over the next 1, 2, 4, or 8 weeks?

The last few rows had NaN values, which is normal. There is no future price data available beyond the end of the dataset, so the longest horizons drop off first.

Next, I grouped the data by regime and calculated a few summary statistics:

count
average forward return
median forward return
hit rate

regime_summary = merged_df.groupby("regime").agg(
    count=("regime", "size"),
    avg_1w=("fwd_return_1w", "mean"),
    median_1w=("fwd_return_1w", "median"),
    hit_rate_1w=("fwd_return_1w", lambda x: (x > 0).mean()),
    avg_2w=("fwd_return_2w", "mean"),
    median_2w=("fwd_return_2w", "median"),
    hit_rate_2w=("fwd_return_2w", lambda x: (x > 0).mean()),
    avg_4w=("fwd_return_4w", "mean"),
    median_4w=("fwd_return_4w", "median"),
    hit_rate_4w=("fwd_return_4w", lambda x: (x > 0).mean()),
    avg_8w=("fwd_return_8w", "mean"),
    median_8w=("fwd_return_8w", "median"),
    hit_rate_8w=("fwd_return_8w", lambda x: (x > 0).mean())
).reset_index()

regime_summary

This table was the first real test of the framework, and it immediately ruled out some of the original ideas.

The results weren't great for the raw regime model. In fact, they were weaker than I expected.

A few things stood out:

neutral often outperformed the regime buckets
bullish_buildup looked consistently weak
bearish_buildup also looked weak
bearish_unwind looked stronger at first glance, but some of that came from a few large upside outliers
bullish_unwind was the only regime that looked somewhat stable across multiple horizons

That changed the direction of the project.

Up to this point, the plan was to build a full four-regime framework and maybe convert multiple states into trade rules. After looking at the forward returns, that no longer made sense. Most of the regimes were not adding much value.

So instead of carrying all four forward, I started focusing on the one regime that still looked promising: bullish unwind.

Before making that decision, I wanted to look at the distributions visually and see whether the averages were hiding anything important.

Looking at the Regimes More Closely

The summary table already told me that most of the raw regime framework was weak, but I still wanted to look at the behavior visually before dropping anything.

I started with a simple chart that places WTI crude oil next to the speculative net positioning ratio.

plt.plot(merged_df["price_date"], merged_df["close"], label="wti close")
plt.plot(merged_df["price_date"], merged_df["net_position_ratio"] * 100, label="net position ratio x 100")
plt.title("WTI crude oil price vs speculative net positioning")
plt.xlabel("date")
plt.ylabel("value")
plt.legend()
plt.show()

This chart isn't meant to compare the two series on the same scale. It's just a quick way to see whether large moves in crude oil tend to happen when speculative positioning is becoming stretched.

Next, I plotted the 104-week positioning percentile itself.

plt.plot(merged_df["price_date"], merged_df["position_percentile_104"])
plt.axhline(0.8, linestyle="--", color="b")
plt.axhline(0.2, linestyle="--", color="b")
plt.title("104-week positioning percentile")
plt.xlabel("date")
plt.ylabel("percentile")
plt.show()

This made the regime logic easier to understand. Any time the percentile moved above 0.80, the market entered the bullish extreme zone. Any time it dropped below 0.20, the market entered the bearish extreme zone.

Then I looked at how many observations actually fell into each regime.

regime_counts = merged_df["regime"].value_counts()

plt.bar(regime_counts.index, regime_counts.values)
plt.title("Regime counts")
plt.xlabel("regime")
plt.ylabel("count")
plt.xticks(rotation=30)
plt.show()

The regime counts looked reasonable. Neutral was still the largest bucket, and the four signal regimes had enough observations to test without being too sparse.

After that, I plotted the average 4-week forward return by regime.

avg_4w = regime_summary.set_index("regime")["avg_4w"].sort_values()

plt.bar(avg_4w.index, avg_4w.values)
plt.title("Average 4-week forward return by regime")
plt.xlabel("regime")
plt.ylabel("average return")
plt.xticks(rotation=30)
plt.show()

This was the first strong sign that the original framework was too broad. Both buildup regimes looked weak. bullish_unwind was slightly positive, but not by much. bearish_unwind looked strongest on average, which was interesting, but I still didn't trust that result without checking the distribution.

So I looked at the 4-week hit rate next.

hit_4w = regime_summary.set_index("regime")["hit_rate_4w"].sort_values()

plt.bar(hit_4w.index, hit_4w.values)
plt.title("4-week hit rate by regime")
plt.xlabel("regime")
plt.ylabel("hit rate")
plt.xticks(rotation=30)
plt.show()

The hit rates told a similar story. bullish_unwind was one of the better regimes, but still not strong enough to justify calling it a strategy. neutral was still doing too well, which meant the regime filter wasn't creating a very clean edge yet.

At that point, I wanted to check whether the averages were being distorted by a few large moves. So I plotted the 4-week return distribution for each regime.

plot_df = merged_df[["regime", "fwd_return_4w"]].dropna()

plot_df.boxplot(column="fwd_return_4w", by="regime", grid=False)
plt.title("4-week forward return distribution by regime")
plt.suptitle("")
plt.xlabel("regime")
plt.ylabel("4-week forward return")
plt.xticks(rotation=30)
plt.show()

This chart made the problem much clearer.

bearish_unwind looked strong on average, but that strength came from a few very large upside outliers. That made it less convincing as a base strategy.

bullish_buildup and bearish_buildup were weak both in the summary table and in the distribution.

bullish_unwind was the only regime that looked somewhat stable without depending too much on a handful of extreme observations.

That changed the direction of the build.

Up to this point, the idea was to test a full regime framework and maybe keep multiple paths. After these charts, that no longer made sense. Most of the framework had already done its job by showing what not to use.

So instead of carrying all four regimes forward, I narrowed the focus to just one: bullish unwind.

Narrowing the Focus: Keeping Two Extra Variants for Comparison

At this point, bullish_unwind was already the main regime worth paying attention to. The buildup regimes were weak, and bearish_unwind was less convincing because a big part of its strength came from a few outsized moves.

So the focus was already shifting toward bullish_unwind.

Still, before fully committing to it, I kept two additional unwind-based variants in the next step just for comparison:

a long signal based on bearish_unwind
a combined long signal that fires on either unwind regime

That way, the first round of backtests could show whether bullish_unwind was actually better in practice, or whether the broader unwind logic worked better as a whole.

merged_df["long_bullish_unwind"] = (merged_df["regime"] == "bullish_unwind").astype(int)
merged_df["long_bearish_unwind"] = (merged_df["regime"] == "bearish_unwind").astype(int)
merged_df["long_any_unwind"] = merged_df["regime"].isin(["bullish_unwind", "bearish_unwind"]).astype(int)

print("number of trades:\n", merged_df[["long_bullish_unwind", "long_bearish_unwind", "long_any_unwind"]].sum())
merged_df[["cot_date","price_date","regime","long_bullish_unwind","long_bearish_unwind","long_any_unwind"]].tail()

This creates three simple binary signals:

long_bullish_unwind is 1 only when the regime is bullish_unwind
long_bearish_unwind is 1 only when the regime is bearish_unwind
long_any_unwind is 1 when either unwind regime appears

The output also gives the number of signal occurrences for each one, which matters because the next step is a proper backtest. A signal can look interesting conceptually, but if it barely appears, there isn't much to test.

So going into the strategy layer, bullish_unwind was already the main path. The other two were still kept around, but mainly to compare how much weaker or stronger they looked once the trades were actually executed.

Building the First Trade Rules

Once the three unwind-based signals were ready, the next step was to turn them into actual trades.

I kept the backtest simple on purpose:

long-only
4-week holding period
non-overlapping trades

The non-overlapping part matters. If a new signal appeared while a current trade was still active, I skipped it. That kept the trade list cleaner and avoided inflating the strategy by stacking overlapping positions on top of each other.

Here is the backtest function I used.

def run_fixed_hold_backtest(df, signal_col, hold_weeks=4):
    trades = []
    i = 0

    while i < len(df) - hold_weeks:
        if df.iloc[i][signal_col] == 1:
            entry_date = df.iloc[i]["price_date"]
            exit_date = df.iloc[i + hold_weeks]["price_date"]
            entry_price = df.iloc[i]["close"]
            exit_price = df.iloc[i + hold_weeks]["close"]
            trade_return = exit_price / entry_price - 1

            trades.append({
                "signal": signal_col,
                "entry_index": i,
                "exit_index": i + hold_weeks,
                "entry_date": entry_date,
                "exit_date": exit_date,
                "entry_price": entry_price,
                "exit_price": exit_price,
                "trade_return": trade_return
            })

            i += hold_weeks
        else:
            i += 1

    return pd.DataFrame(trades)

This function scans through the dataset, checks whether a signal is active, enters at the current weekly bar, exits four weeks later, and records the trade result.

Then I ran it for all three unwind-based signals.

bullish_unwind_trades = run_fixed_hold_backtest(merged_df, "long_bullish_unwind", hold_weeks=4)
bearish_unwind_trades = run_fixed_hold_backtest(merged_df, "long_bearish_unwind", hold_weeks=4)
any_unwind_trades = run_fixed_hold_backtest(merged_df, "long_any_unwind", hold_weeks=4)

After that, I checked how many trades were actually executed.

print("executed bullish_unwind trades:", len(bullish_unwind_trades))
print("executed bearish_unwind trades:", len(bearish_unwind_trades))
print("executed any_unwind trades:", len(any_unwind_trades))

This output was lower than the raw signal counts from the previous section, which is expected because overlapping signals were skipped.

Next, I built a small helper function to summarize the trade results and applied it to all three strategies.

def summarize_trades(trades):
    return pd.Series({
        "trades": len(trades),
        "win_rate": (trades["trade_return"] > 0).mean(),
        "avg_trade_return": trades["trade_return"].mean(),
        "median_trade_return": trades["trade_return"].median(),
        "cumulative_return": (1 + trades["trade_return"]).prod() - 1
    })

trade_summary = pd.DataFrame({
    "bullish_unwind": summarize_trades(bullish_unwind_trades),
    "bearish_unwind": summarize_trades(bearish_unwind_trades),
    "any_unwind": summarize_trades(any_unwind_trades)
}).T

trade_summary

This was the first full strategy result, and it cleared up the hierarchy very quickly.

bullish_unwind was still the best of the three. It wasn't strong yet, but it was clearly better than the other two.

A few things stood out:

bullish_unwind had the best win rate
bullish_unwind had the best average and median trade return
bearish_unwind and any_unwind both performed badly on a cumulative basis
Combining the two unwind regimes didn't help, just diluted the stronger one

I also wanted to see how these strategies behaved over time, not just in a summary table. So I added simple equity curves for each one.


bullish_unwind_trades["equity_curve"] = (1 + bullish_unwind_trades["trade_return"]).cumprod()
bearish_unwind_trades["equity_curve"] = (1 + bearish_unwind_trades["trade_return"]).cumprod()
any_unwind_trades["equity_curve"] = (1 + any_unwind_trades["trade_return"]).cumprod()

plt.plot(bullish_unwind_trades["exit_date"], bullish_unwind_trades["equity_curve"], label="bullish unwind")
plt.plot(bearish_unwind_trades["exit_date"], bearish_unwind_trades["equity_curve"], label="bearish unwind")
plt.plot(any_unwind_trades["exit_date"], any_unwind_trades["equity_curve"], label="any unwind")
plt.title("Equity curves for 4-week unwind strategies")
plt.xlabel("date")
plt.ylabel("equity multiple")
plt.legend()
plt.show()

This chart made the same point more clearly. bullish_unwind was still weak in absolute terms, but it held up much better than the other two. bearish_unwind didn't survive the conversion from regime idea to actual strategy, and any_unwind was even worse because it inherited the weakness of both.

So by the end of this step, the picture was much clearer.

The broader unwind idea didn't work well as a whole. bearish_unwind wasn't holding up in a clean backtest. any_unwind was even worse. That left only one regime worth carrying further: bullish unwind.

Still, even that result wasn't strong enough yet. The strategy was better than the alternatives, but not good enough to stop here. In fact, we haven’t even made a profit yet.

The next step was to compare it against buy-and-hold and see whether it actually added anything useful.

Comparing Bullish Unwind Against Buy-and-Hold

By this point, bullish_unwind had already beaten the other regime-based variants. But that still did not mean much on its own.

A strategy can look decent relative to weaker alternatives and still fail the most basic test: does it do anything better than just holding crude oil?

So the next step was to compare the raw bullish_unwind strategy against a simple buy-and-hold benchmark.

I started by building the buy-and-hold curve from the weekly WTI price series.

buy_hold_df = weekly_price.copy()
buy_hold_df = buy_hold_df.sort_values("price_date").reset_index(drop=True)
buy_hold_df["buy_hold_curve"] = buy_hold_df["close"] / buy_hold_df["close"].iloc[0]

buy_hold_df[["price_date", "close", "buy_hold_curve"]].tail()

Then I plotted buy-and-hold against the raw bullish_unwind strategy.

plt.plot(buy_hold_df["price_date"], buy_hold_df["buy_hold_curve"], label="buy and hold wti", linewidth=2, alpha=0.5)
plt.plot(bullish_unwind_trades["exit_date"], bullish_unwind_trades["equity_curve"], label="bullish unwind strategy", color="b")
plt.title("Bullish unwind strategy vs buy and hold crude oil")
plt.xlabel("date")
plt.ylabel("equity multiple")
plt.legend()
plt.show()

The chart was useful because it showed the exact problem with the raw signal. bullish_unwind was more selective than buy-and-hold, but that selectivity was not creating a real edge. The strategy had some decent stretches, but it still lagged the simpler benchmark overall.

To make that comparison more explicit, I calculated the full buy-and-hold return over the sample, then I put both results into one small summary table.

buy_hold_return = buy_hold_df["buy_hold_curve"].iloc[-1] - 1

comparison_summary = pd.DataFrame({
    "strategy": ["bullish_unwind", "buy_and_hold"],
    "trades": [len(bullish_unwind_trades), np.nan],
    "win_rate": [(bullish_unwind_trades["trade_return"] > 0).mean(), np.nan],
    "avg_trade_return": [bullish_unwind_trades["trade_return"].mean(), np.nan],
    "cumulative_return": [
        (1 + bullish_unwind_trades["trade_return"]).prod() - 1,
        buy_hold_return
    ]
})

comparison_summary

This was the real turning point in the article.

Even though bullish_unwind was the best regime-based candidate so far, it still underperformed buy-and-hold. That made the conclusion very clear: the raw signal wasn't strong enough yet.

So this was no longer a question of choosing between regimes. That part was already settled. The real question now was whether the bullish_unwind setup could be improved without turning the strategy into something over-engineered.

That's what led to the next step: adding a simple trend filter.

Adding a Trend Filter

At this point, the core signal had been narrowed to bullish_unwind, but the raw version still wasn't good enough. It underperformed buy-and-hold, which meant the signal needed more context.

The next idea was simple: not every bullish unwind should be treated the same way. If speculative positioning is starting to unwind while crude oil is already in a weak broader trend, that long signal may not be worth taking. So I added one basic filter: only take the bullish_unwind trade when WTI is above its 26-week moving average.

First, I created the moving average and a binary trend flag. Then I combined that filter with the existing bullish_unwind regime.

merged_df["ma_26"] = merged_df["close"].rolling(26).mean()
merged_df["above_ma_26"] = (merged_df["close"] > merged_df["ma_26"]).astype(int)
merged_df["long_bullish_unwind_tf"] = ((merged_df["regime"] == "bullish_unwind") & (merged_df["above_ma_26"] == 1)).astype(int)

This creates a filtered version of the original signal. The output also shows how many trade opportunities remain after applying the trend filter. As expected, the number drops. That isn't a problem if the remaining trades are better.

Next, I ran the same 4-week non-overlapping backtest on the filtered signal.

bullish_unwind_tf_trades = run_fixed_hold_backtest(
    merged_df,
    "long_bullish_unwind_tf",
    hold_weeks=4
)

filtered_summary = pd.DataFrame({
    "bullish_unwind": summarize_trades(bullish_unwind_trades),
    "bullish_unwind_tf": summarize_trades(bullish_unwind_tf_trades)
}).T

filtered_summary

This was the first major improvement in the process.

The filtered version didn't just look slightly better. It changed the profile of the strategy in a meaningful way:

fewer trades
higher win rate
higher average trade return
much stronger cumulative return

That was exactly what I wanted from a filter. It made the signal more selective, but it also made it much cleaner.

To visualize the difference, I added equity curves for the raw strategy, the filtered version, and buy-and-hold.

bullish_unwind_tf_trades["equity_curve"] = (1 + bullish_unwind_tf_trades["trade_return"]).cumprod()

plt.plot(bullish_unwind_trades["exit_date"], bullish_unwind_trades["equity_curve"], label="bullish unwind")
plt.plot(bullish_unwind_tf_trades["exit_date"], bullish_unwind_tf_trades["equity_curve"], label="bullish unwind + trend filter")
plt.plot(buy_hold_df["price_date"], buy_hold_df["buy_hold_curve"], label="buy and hold wti")
plt.title("Bullish unwind strategy with and without trend filter")
plt.xlabel("date")
plt.ylabel("equity multiple")
plt.legend()
plt.show()

This chart made the change easy to see. The raw strategy was drifting, while the filtered version was much more stable and clearly stronger over the full sample.

So this was the point where the strategy started becoming usable. The signal was no longer just “extreme bullish positioning is starting to unwind.” It was: extreme bullish positioning is starting to unwind, while crude oil is still in a broader uptrend

That was much more specific, and much more effective.

The next question was whether this improved version was actually stable, or whether it only worked because of one lucky parameter choice.

Stress-Testing the Setup

Once the trend filter improved the strategy, I still didn't want to treat that version as final without checking how fragile it was.

A setup can look strong simply because one exact combination of parameters happened to work. So the next step was to test nearby variations and see whether the result still held up.

I kept the core idea the same:

bullish unwind
long-only
trend filter stays on

Then I varied three things:

the percentile window
the threshold that defines an extreme
the holding period

First, I created a helper function to build bullish unwind signals using different percentile columns and threshold levels, and then, a second percentile series using a shorter 52-week window.

def add_bullish_unwind_signal(df, percentile_col, high_threshold, signal_name):
    df[signal_name] = (
        (df[percentile_col] > high_threshold) &
        (df["net_position_ratio_change"] < 0) &
        (df["above_ma_26"] == 1)
    ).astype(int)
    
def rolling_percentile(x):
    return pd.Series(x).rank(pct=True).iloc[-1]

merged_df["position_percentile_52"] = merged_df["net_position_ratio"].rolling(52).apply(rolling_percentile)

With that in place, I built four signal variants:

104-week percentile with an 80th percentile threshold
104-week percentile with an 85th percentile threshold
52-week percentile with an 80th percentile threshold
52-week percentile with an 85th percentile threshold

add_bullish_unwind_signal(merged_df, "position_percentile_104", 0.80, "sig_104_80")
add_bullish_unwind_signal(merged_df, "position_percentile_104", 0.85, "sig_104_85")
add_bullish_unwind_signal(merged_df, "position_percentile_52", 0.80, "sig_52_80")
add_bullish_unwind_signal(merged_df, "position_percentile_52", 0.85, "sig_52_85")

After that, I ran the same backtest across three holding periods:

2 weeks
4 weeks
8 weeks

results = []

for signal_col in ["sig_104_80", "sig_104_85", "sig_52_80", "sig_52_85"]:
    for hold_weeks in [2, 4, 8]:
        trades = run_fixed_hold_backtest(merged_df, signal_col, hold_weeks=hold_weeks)

        if len(trades) == 0:
            continue

        results.append({
            "signal": signal_col,
            "hold_weeks": hold_weeks,
            "trades": len(trades),
            "win_rate": (trades["trade_return"] > 0).mean(),
            "avg_trade_return": trades["trade_return"].mean(),
            "median_trade_return": trades["trade_return"].median(),
            "cumulative_return": (1 + trades["trade_return"]).prod() - 1
        })

stress_test = pd.DataFrame(results)
stress_test

This output was one of the most important parts of the entire article. It showed whether the improved strategy was actually stable, or whether it only worked in one narrow version.

A few things stood out immediately.

The 104-week / 80th percentile version was clearly the strongest family. It held up across all three holding periods:

2-week hold: cumulative return 38.16%
4-week hold: cumulative return 45.95%
8-week hold: cumulative return 19.02%

That consistency mattered. It meant the signal wasn't collapsing the moment the hold period changed.

The 4-week hold stood out as the best overall choice. It had:

26 trades
65.38% win rate
1.84% average trade return
3.69% median trade return
45.95% cumulative return

The 8-week hold had a slightly higher average trade return in some cases, but it came with fewer trades. That made it thinner and harder to treat as the main version.

The 104-week / 85th percentile setup was too restrictive for the shorter holds. Its 2-week and 4-week versions turned negative, even though the 8-week hold still worked reasonably well.

The 52-week variants were much less convincing overall. A few of them were positive, but they were not nearly as stable as the 104-week / 80th percentile version.

So by the end of this step, the final structure wasn't just the version that happened to look good once. It was the version that kept holding up even after nearby variations were tested.

That gave me a clear final setup:

104-week percentile
80th percentile threshold
bullish unwind
26-week moving average filter
4-week hold

The Final Strategy

By this stage, the process had already done most of the filtering.

The raw four-regime framework didn't work well as a strategy. The broader unwind idea didn't work either. The raw bullish_unwind signal was better than the alternatives, but still weaker than buy-and-hold.

The only version that held up after all of that was this one:

bullish unwind
104-week positioning percentile
80th percentile threshold
26-week moving average filter
4-week hold
non-overlapping trades

So now it made sense to stop iterating and show the final result clearly. I first locked the final signal and reran the backtest using the chosen setup.

final_signal = "sig_104_80"
final_hold = 4
final_trades = run_fixed_hold_backtest(merged_df, final_signal, hold_weeks=final_hold)
final_trades["equity_curve"] = (1 + final_trades["trade_return"]).cumprod()

final_summary = pd.DataFrame({
    "metric": [
        "trades",
        "win_rate",
        "avg_trade_return",
        "median_trade_return",
        "cumulative_return"
    ],
    "value": [
        len(final_trades),
        (final_trades["trade_return"] > 0).mean(),
        final_trades["trade_return"].mean(),
        final_trades["trade_return"].median(),
        (1 + final_trades["trade_return"]).prod() - 1
    ]
})

final_summary

That output gives the final performance profile:

Those numbers were already a big improvement over the earlier raw versions, but I still wanted the comparison in one place. So I built a final table against the two reference points:

buy-and-hold
raw bullish unwind

final_comparison = pd.DataFrame({
    "strategy": ["buy_and_hold", "bullish_unwind_raw", "bullish_unwind_filtered"],
    "trades": [
        np.nan,
        len(bullish_unwind_trades),
        len(final_trades)
    ],
    "win_rate": [
        np.nan,
        (bullish_unwind_trades["trade_return"] > 0).mean(),
        (final_trades["trade_return"] > 0).mean()
    ],
    "avg_trade_return": [
        np.nan,
        bullish_unwind_trades["trade_return"].mean(),
        final_trades["trade_return"].mean()
    ],
    "cumulative_return": [
        buy_hold_return,
        (1 + bullish_unwind_trades["trade_return"]).prod() - 1,
        (1 + final_trades["trade_return"]).prod() - 1
    ]
})

final_comparison

This was the full payoff of the build:

buy-and-hold: 13.67%
raw bullish unwind: -2.13%
filtered bullish unwind: 45.95%

The trend filter didn't just smooth the strategy a bit. It changed the result completely.

To make that visible, I plotted the three curves together.

plt.plot(buy_hold_df["price_date"], buy_hold_df["buy_hold_curve"], label="buy and hold wti", linewidth=2, alpha=0.5)
plt.plot(bullish_unwind_trades["exit_date"], bullish_unwind_trades["equity_curve"], label="raw bullish unwind", color="indigo")
plt.plot(final_trades["exit_date"], final_trades["equity_curve"], label="filtered bullish unwind", color="b")
plt.title("Crude oil strategy comparison")
plt.xlabel("date")
plt.ylabel("equity multiple")
plt.legend()
plt.show()

This chart says the same thing as the table, but more directly. The raw signal drifts. Buy-and-hold is positive over the full sample, but much noisier. The filtered version is the only one that compounds in a cleaner way.

I also wanted to show where these filtered trades actually appear on the WTI chart.

plt.plot(merged_df["price_date"], merged_df["close"], label="wti close", linewidth=2, alpha=0.5)
plt.scatter(merged_df.loc[merged_df[final_signal] == 1, "price_date"], merged_df.loc[merged_df[final_signal] == 1, "close"],
            s=25, label="filtered bullish unwind signal", color="b")
plt.title("Filtered bullish unwind signals on WTI crude oil")
plt.xlabel("date")
plt.ylabel("price")
plt.legend()
plt.show()

This is useful because it shows the strategy is selective. It doesn't fire all the time. It only activates when positioning stays in an extreme bullish zone, starts to unwind, and the broader price trend is still intact.

I did the same on the positioning side.

plt.plot(merged_df["price_date"], merged_df["position_percentile_104"], label="104-week percentile", linewidth=2, alpha=0.5)
plt.axhline(0.8, linestyle="--", label="80th percentile")
plt.scatter(merged_df.loc[merged_df[final_signal] == 1, "price_date"], merged_df.loc[merged_df[final_signal] == 1, "position_percentile_104"],
            s=25, label="trade signals", color="indigo")
plt.title("Bullish unwind signals from COT positioning extremes")
plt.xlabel("date")
plt.ylabel("percentile")
plt.legend()
plt.show()

This final chart ties everything together. The trades only appear when the percentile is already in the extreme zone, which means the signal is still doing what it was originally designed to do. It's just doing it in a much more disciplined way than the raw regime framework.

Further Improvements

There are still a few places where this can be pushed further.

The first is execution realism. Right now the strategy uses a clean weekly entry and exit rule, but it doesn't include slippage, spreads, or any contract-level execution constraints. Adding those would make the result stricter.

The second is signal depth. This version only uses non-commercial positioning, a trend filter, and a fixed hold period. It would be worth testing whether commercial positioning, volatility filters, or dynamic exits can improve the setup without overcomplicating it.

Conclusion

This started as a broad COT idea, not a finished strategy. The first regime framework looked reasonable, but most of it didn't hold up once the data was tested. That part was important, because it made the final signal much narrower and much cleaner.

What survived was a very specific setup: extreme bullish positioning that starts to unwind, while WTI is still above its 26-week moving average. That version ended up outperforming both the raw signal and buy-and-hold over the tested sample.

The nice part is that the whole thing can be built from scratch with FinancialModelingPrep’s COT and commodity price data APIs, without needing to patch together multiple data sources. That made it much easier to go from idea to actual testing.

With that being said, you’ve reached the end of the article. Hope you learned something new and useful. Thank you for your time.

How to Build a Market Pulse App in Python: Real-Time & Multi-Asset

Nikhil Adithyan — Mon, 06 Apr 2026 14:30:29 +0000

A “market pulse” screen is basically the tab you keep open when you don’t want to stare at charts all day. It tells you what’s moving right now, what’s unusually volatile, and which names are starting to move together.

Not in a research-paper way. In a product way. The kind of feed you could drop into a media platform or investing app and have it feel instantly useful.

In this tutorial, we’ll build a minimal version of that in Python using Streamlit. The dashboard has three parts:

a Pulse table that ranks the biggest movers across your watchlist
a Stress feed that emits event-style alerts instead of raw tick spam
a small Correlation card that updates based on the current volatility regime

The data for the dashboard will be powered by EODHD’s real-time WebSocket feeds.

Quick expectation setting: this isn’t TradingView, and it’s not a backtester. It’s a lightweight real-time system that streams prices, maintains rolling buffers, computes a few live metrics, and turns them into UI-ready widgets.

The goal here is to build something you can actually ship as a “market pulse” feature, not a one-off notebook demo.

Prerequisites
The App We’re Building
The App Architecture
- Code File Structure
Streaming Layer: One Queue, Many Feeds
- feeds.py
- Why the Watchlist is Curated
Rolling State: Buffers, Returns, Volatility, Trend
- pulse_store.py
Turning Live Stats Into Events (Stress Feed)
- events.py
Regime Tagging (Small but Important)
- Add This to pulse_store.py
- Attach Regime Inside snapshot() in pulse_store.py
Correlation Card (Stocks Only, Regime-aware Window)
- correlation.py
Building the Streamlit App
Final Output
What I’d Improve Next
Conclusion

Prerequisites

Before we get into the build, make sure you have a few basics ready.

You should be comfortable running Python scripts, installing packages with pip, and working with a small multi-file project.

This tutorial isn't notebook-based. We’ll be building a lightweight real-time app with separate files for streaming, state, events, correlation logic, and the Streamlit UI.

You’ll need Python 3.10+ and these packages installed:

pip install streamlit pandas websockets

You’ll also need an EODHD API key with access to their real-time WebSocket feeds, since the dashboard depends on live stock, forex, and crypto data.

To follow along smoothly, create these files in your project folder before starting:

feeds.py
pulse_store.py
events.py
correlation.py
app.py

One quick note before we begin: Since this app runs on live market data, what you see will depend on when you open it. During weekends or off-market hours, crypto will usually dominate the dashboard while stocks and most forex pairs stay relatively quiet. That is expected.

The App We’re Building

Before we touch any code, here’s what the finished dashboard looks like:

https://gumlet.tv/watch/69b99df9554f0fb510c28ce6/

Let's go over its main features:

Pulse Table

This is the main screen. It’s your ranked list of movers across the watchlist. Each row is one symbol, and the columns are the small set of signals we compute live: last price, 1-minute return, 5-minute return when available, 15-minute volatility, and a simple regime label.

If you open the app and only want one thing, it’s this table. You can glance at it and immediately know what deserves attention.

Stress Feed

This is where the app stops feeling like a live ticker and starts feeling like a product feature. Instead of printing every update, we only emit events when something crosses a threshold, like a sharp 1-minute move or a volatility spike. Those events become “cards” in a feed. The point is to reduce noise, not create more of it.

Correlation Card

This is intentionally small and conservative. Correlation in real time gets messy fast because different symbols tick at different frequencies and you need alignment. For this build, we keep it to stocks only and compute correlation off time buckets.

It’s not meant to be a full correlation matrix. It’s just a quick “what’s moving with my base symbol right now” view, and it adapts its lookback window depending on whether the base symbol is in a normal or high-vol regime.

Control Panel

At the top, you have a few controls that make the demo feel interactive without turning it into a settings page. Top movers lets you pick how many rows you want in the Pulse table. Correlation base switches which stock you’re anchoring correlation around. Correlation bucket changes the time bucket size used for alignment, which is useful when the feed is sparse and you want correlation to stabilize.

The App Architecture

If you’ve ever tried to build a live Streamlit app, you’ve probably hit the same wall. Streamlit reruns your script constantly. Any time a widget changes, any time you call st.rerun(), the whole file executes again from the top.

That’s great for normal dashboards, but it’s a terrible place to run an infinite WebSocket loop. If you do that in the main thread, the UI either freezes or you end up reconnecting to feeds on every rerun.

So the architecture here is intentionally split into two roles.

One background worker owns the real-time work. It connects to the WebSocket feeds, ingests ticks, updates rolling buffers, computes metrics, and emits stress events. That worker runs continuously, and it keeps the latest state in memory. That’s the engine of the app.

Streamlit itself stays dumb on purpose. On every rerun, it only reads whatever state the worker has produced and renders tables and a small correlation card. There's no data fetching in the UI loop. No heavy computation. Just display. That separation is the reason the app stays stable even when you keep refreshing the page or tweaking controls.

In practice, the simplest way to do this in Python is a background thread that runs an async loop. Streamlit starts that thread once using st.session_state as a guard, and then the UI code just keeps rerendering from the shared state.

It’s not fancy. But it’s the difference between a “works for 30 seconds” demo and something that can sit open like a real market pulse screen.

Code File Structure

To keep this build readable, I split the app into five small files. Each file has one job, and the Streamlit UI doesn’t touch the WebSocket logic directly.

feeds.py handles WebSocket connections and normalizes every incoming message into the same tick format.
pulse_store.py keeps rolling buffers per symbol and computes pulse metrics (returns, vol, trend, regime). This is the core state.
events.py turns the live metrics into a stress feed with cooldowns and asset-aware thresholds.
correlation.py builds the correlation card by bucketing and aligning returns, then changing the lookback window based on regime.
app.py is the Streamlit dashboard. It starts the background worker once, then keeps rerendering from shared state.

That split is what makes the app stable. The background worker can run forever. Streamlit can rerun as often as it wants without reconnecting to feeds or recomputing everything from scratch.

Streaming Layer: One Queue, Many Feeds

The first step is getting real-time ticks into the system. We connect to EODHD’s WebSocket feeds for stocks, forex, and crypto, subscribe to a small watchlist, then normalize every message into one tick schema: {symbol, asset, ts, price}.

Once we have that, everything downstream becomes predictable.

`feeds.py:`

import asyncio
import json
import time
import websockets

API_KEY = "YOUR EODHD API KEY"

WS = {
    "stocks": "wss://ws.eodhistoricaldata.com/ws/us?api_token=",
    "forex":  "wss://ws.eodhistoricaldata.com/ws/forex?api_token=",
    "crypto": "wss://ws.eodhistoricaldata.com/ws/crypto?api_token=",
}

def _tick(symbol, asset, price):
    return {"symbol": symbol, "asset": asset, "ts": time.time(), "price": float(price)}

def _parse(asset, msg):
    s = msg.get("s")
    p = msg.get("p")
    if s is None or p is None:
        return None
    return _tick(s, asset, p)

async def _stream(asset, symbols, q):
    url = WS[asset] + API_KEY

    while True:
        try:
            async with websockets.connect(url, ping_interval=20, ping_timeout=20) as ws:
                sub = {"action": "subscribe", "symbols": ",".join(symbols)}
                await ws.send(json.dumps(sub))

                async for raw in ws:
                    try:
                        msg = json.loads(raw)
                    except Exception:
                        continue

                    t = _parse(asset, msg)
                    if t:
                        await q.put(t)

        except Exception:
            await asyncio.sleep(1.0)

async def start_streams(q):
    tasks = []
    tasks.append(asyncio.create_task(_stream("stocks", ["AAPL","TSLA","NVDA","AMZN","MSFT","META","GOOGL"], q)))
    tasks.append(asyncio.create_task(_stream("forex", ["EURUSD","USDINR","USDJPY","GBPUSD","AUDUSD"], q)))
    tasks.append(asyncio.create_task(_stream("crypto", ["BTC-USD","ETH-USD","BTC-USDT","ETH-USDT","SOL-USDT"], q)))
    return tasks

Note: Replace YOUR EODHD API KEY with your actual EODHD API key. If you don’t have one, you can obtain it by opening an EODHD developer account.

What this code is doing is simple. Each feed runs in its own async task, pushes normalized ticks into a single shared queue, and reconnects if the socket drops. We don’t try to do anything smart here. This layer is just plumbing.

Why the Watchlist is Curated

A bigger watchlist makes the demo look impressive, but it also makes debugging and alignment harder. For the article, you want a list that’s small enough to reason about, but diverse enough to show multi-asset behavior.

One thing that will skew what you see is weekends. Stocks and most forex won’t meaningfully tick when markets are closed, while crypto runs 24/7. So if you run the app on a Sunday, crypto will naturally dominate the pulse table. That’s not a bug. It’s just what happens when only one asset class is actually moving.

In a real product, you’d solve this by ranking movers per asset class or rendering separate sections. For this build, we'll keep it simple and accept that the output depends on when you run it.

Rolling State: Buffers, Returns, Volatility, Trend

This is the core of the app. We keep a rolling buffer per symbol, compute a few live signals from it, and expose everything as a compact snapshot that the UI and the event system can consume.

`pulse_store.py:`

import time
import math
import threading
from collections import deque

class PulseStore:
    def __init__(self, window_sec=3600):
        self.window_sec = window_sec
        self.buffers = {}
        self.latest = {}
        self.asset = {}
        self.vol_hist = {}
        self.lock = threading.Lock()

    def _buf(self, symbol):
        if symbol not in self.buffers:
            self.buffers[symbol] = deque()
        return self.buffers[symbol]

    def update(self, tick):
        symbol = tick["symbol"]
        ts = tick["ts"]
        px = tick["price"]

        with self.lock:
            b = self._buf(symbol)
            b.append((ts, px))
            self.latest[symbol] = px
            self.asset[symbol] = tick.get("asset")

            cutoff = ts - self.window_sec
            while b and b[0][0] < cutoff:
                b.popleft()

        return len(b)

    def _price_at_or_before(self, b, target_ts):
        with self.lock:
            data = list(b)

        for i in range(len(data) - 1, -1, -1):
            if data[i][0] <= target_ts:
                return data[i][1]
        return None

    def ret(self, symbol, window_sec):
        b = self.buffers.get(symbol)
        if not b:
            return None

        with self.lock:
            if len(b) < 2:
                return None
            now_ts, now_px = b[-1]

        px0 = self._price_at_or_before(b, now_ts - window_sec)
        if px0 is None:
            return None

        return (now_px / px0) - 1.0

    def ret_1m(self, symbol):
        return self.ret(symbol, 60)

    def ret_5m(self, symbol):
        return self.ret(symbol, 300)

    def ret_15m(self, symbol):
        return self.ret(symbol, 900)

    def _recent_prices(self, b, window_sec):
        with self.lock:
            data = list(b)

        if not data:
            return []

        cutoff = data[-1][0] - window_sec
        out = []
        for ts, px in data:
            if ts >= cutoff:
                out.append(px)
        return out

    def vol_15m(self, symbol):
        b = self.buffers.get(symbol)
        if not b:
            return None

        prices = self._recent_prices(b, 900)
        if len(prices) < 6:
            return None

        rets = []
        for i in range(1, len(prices)):
            rets.append(prices[i] / prices[i-1] - 1.0)

        if len(rets) < 3:
            return None

        m = sum(rets) / len(rets)
        var = sum((x - m) ** 2 for x in rets) / len(rets)
        return var ** 0.5

    def trend_15m(self, symbol):
        b = self.buffers.get(symbol)
        if not b:
            return None

        prices = self._recent_prices(b, 900)
        if len(prices) < 8:
            return None

        lp = []
        for p in prices:
            if p > 0:
                lp.append(math.log(p))

        if len(lp) < 8:
            return None

        n = len(lp)
        xs = list(range(n))

        xbar = sum(xs) / n
        ybar = sum(lp) / n

        num = 0.0
        den = 0.0
        for i in range(n):
            dx = xs[i] - xbar
            dy = lp[i] - ybar
            num += dx * dy
            den += dx * dx

        if den == 0:
            return None

        return num / den

    def _vh(self, symbol):
        if symbol not in self.vol_hist:
            self.vol_hist[symbol] = deque(maxlen=200)
        return self.vol_hist[symbol]

    def update_vol_history(self, symbol):
        v = self.vol_15m(symbol)
        if v is None:
            return None
        self._vh(symbol).append(v)
        return v

    def regime(self, symbol):
        h = self.vol_hist.get(symbol)
        if not h or len(h) < 30:
            return "unknown"

        cur = h[-1]
        hs = sorted(h)
        p80 = hs[int(0.8 * (len(hs) - 1))]

        if cur >= p80:
            return "high_vol"
        return "normal"

    def snapshot(self, symbol):
        last = self.latest.get(symbol)
        if last is None:
            return None

        out = {"symbol": symbol, "asset": self.asset.get(symbol), "last": last}

        r1 = self.ret_1m(symbol)
        r5 = self.ret_5m(symbol)
        r15 = self.ret_15m(symbol)
        v15 = self.vol_15m(symbol)
        tr = self.trend_15m(symbol)

        if r1 is not None:
            out["ret_1m"] = r1
        if r5 is not None:
            out["ret_5m"] = r5
        if r15 is not None:
            out["ret_15m"] = r15
        if v15 is not None:
            out["vol_15m"] = v15
        if tr is not None:
            out["trend_15m"] = tr

        v = self.update_vol_history(symbol)
        if v is not None:
            out["regime"] = self.regime(symbol)

        return out

    def snapshots(self):
        with self.lock:
            syms = list(self.buffers.keys())

        out = []
        for s in syms:
            snap = self.snapshot(s)
            if snap:
                out.append(snap)
        return out

update() is the entry point. Every incoming tick gets appended to that symbol’s deque, and old points get pruned so the buffer never grows unbounded.

Returns are computed using a small trick: we don’t assume we have a price exactly 60 seconds ago or 300 seconds ago. We scan backwards and grab the most recent price at or before the target timestamp. That keeps returns stable even when ticks come in unevenly.

Volatility is computed from short returns inside the last 15 minutes of prices. It’s not annualized. It’s just a live noise meter. Trend is a tiny slope on log prices over that same window, which gives a directional hint without doing anything heavy.

The vol_hist deque is used to label regimes. We store a rolling history of recent volatility values per symbol, then call the current state high_vol if it’s above the 80th percentile of that recent history. It’s intentionally simple, but it’s good enough to drive the correlation window logic later.

The concurrency issue is the reason the lock exists. The background thread is writing to deques while Streamlit is reading them. If you iterate a deque while it’s being mutated, Python will throw an error. So every place where we iterate, we first take a snapshot copy of the deque under the lock and iterate that list instead. That keeps reads safe without making the writer slow.

Turning Live Stats Into Events (Stress Feed)

Once you have live metrics, the next question is what you do with them. If you stream raw ticks into a UI, you’ll drown the user in noise. What we want instead is an event feed. Small cards that only show up when something crosses a threshold.

That’s what the stress feed does. It watches the snapshot coming out of PulseStore and emits one of three event types.

move_1m when the 1-minute move is large enough
move_5m when the 5-minute move is large enough
vol_spike when 15-minute volatility crosses a threshold

Two practical features make this usable in a real dashboard. First, cooldowns. If TSLA crosses the 1-minute threshold, we don’t want 50 duplicate events on every tick. Second, asset-aware thresholds. Crypto naturally moves more than equities, so if you use one global threshold, BTC will dominate your stress feed all day.

`events.py`

import time
from collections import deque

class EventStore:
    def __init__(self, max_events=25):
        self.max_events = max_events
        self.events = deque(maxlen=max_events)
        
    def add(self, e):
        self.events.appendleft(e)

    def latest(self):
        return list(self.events)


class StressDetector:
    def __init__(self, move_thr_1m=0.0015, move_thr_5m=0.004, vol_thr=0.00025):
        self.move_thr_1m = move_thr_1m
        self.move_thr_5m = move_thr_5m
        self.vol_thr = vol_thr
        self.cooldown_sec = 30
        self.last_emit = {}
        self.thr = {
            "stocks": {"move_1m": 0.0012, "move_5m": 0.0040, "vol": 0.00006},
            "crypto": {"move_1m": 0.0025, "move_5m": 0.0080, "vol": 0.00045},
            "forex":  {"move_1m": 0.0006, "move_5m": 0.0018, "vol": 0.00015},
        }

    def _can_emit(self, symbol, etype, now):
        k = (symbol, etype)
        prev = self.last_emit.get(k)
        if prev is None:
            self.last_emit[k] = now
            return True
        if now - prev >= self.cooldown_sec:
            self.last_emit[k] = now
            return True
        return False

    def check(self, snap):
        if not snap:
            return None

        sym = snap.get("symbol")
        asset = snap.get("asset", None)
        thr = self.thr.get(asset, {"move_1m": self.move_thr_1m, "move_5m": self.move_thr_5m, "vol": self.vol_thr})
        move_thr_1m = thr["move_1m"]
        move_thr_5m = thr["move_5m"]
        vol_thr = thr["vol"]
        now = time.time()

        r5 = snap.get("ret_5m")
        r1 = snap.get("ret_1m")
        v15 = snap.get("vol_15m")

        if r5 is not None and abs(r5) >= move_thr_5m:
            if self._can_emit(sym, "move_5m", now):
                return {"ts": now, "type": "move_5m", "symbol": sym, "asset": asset, "value": float(r5)}
            return None

        if r1 is not None and abs(r1) >= move_thr_1m:
            if self._can_emit(sym, "move_1m", now):
                return {"ts": now, "type": "move_1m", "symbol": sym, "asset": asset, "value": float(r1)}
            return None

        if v15 is not None and v15 >= vol_thr:
            if self._can_emit(sym, "vol_spike", now):
                return {"ts": now, "type": "vol_spike", "symbol": sym, "asset": asset, "value": float(v15)}
            return None

        return None

EventStore is just a rolling feed. It keeps the last N events so Streamlit can render them as a table.

StressDetector.check() is the filter. It looks at the latest snapshot and decides whether it’s worth creating an event. The cooldown logic is what stops spam. Once a symbol emits a move_1m event, it won’t emit another move_1m for 30 seconds.

The thresholds are intentionally different per asset class. Crypto needs wider bands for both moves and volatility. Otherwise, even a quiet BTC session will look like constant stress relative to equities. This one change makes the feed feel balanced and product-like.

Regime Tagging (Small but Important)

Regime is just a lightweight context label. We keep a short history of vol_15m per symbol and classify the current state as high_vol if it’s above the recent 80th percentile, otherwise normal. This gives us a stable switch we can use later. Most importantly, we use it to change the correlation lookback window depending on conditions.

Add this to `pulse_store.py`

You already have PulseStore in pulse_store.py. Insert the following methods inside the PulseStore class, right after vol_15m() and trend_15m() (placement isn’t critical. it just keeps the file readable).

    def _vh(self, symbol):
        if symbol not in self.vol_hist:
            self.vol_hist[symbol] = deque(maxlen=200)
        return self.vol_hist[symbol]

    def update_vol_history(self, symbol):
        v = self.vol_15m(symbol)
        if v is None:
            return None
        self._vh(symbol).append(v)
        return v

    def regime(self, symbol):
        h = self.vol_hist.get(symbol)
        if not h or len(h) < 30:
            return "unknown"

        cur = h[-1]
        hs = sorted(h)
        p80 = hs[int(0.8 * (len(hs) - 1))]

        if cur >= p80:
            return "high_vol"
        return "normal"

Attach regime inside `snapshot()` in `pulse_store.py`

In the same file, inside snapshot(self, symbol), add this block near the end of the function, right before return out:

    v = self.update_vol_history(symbol)
    if v is not None:
        out["regime"] = self.regime(symbol)

That’s it for regime tagging.

Why this matters later:

Once snapshot() includes regime, the rest of the app can use it without recomputing anything. In the next section, the correlation card reads store.regime(base_symbol) and uses that to decide whether it should look back 60 minutes (normal) or just 15 minutes (high volatility). This is what stops correlation from feeling stale during spikes and overly jumpy during calm periods.

Correlation Card (Stocks Only, Regime-aware Window)

Correlation sounds simple until you try to do it live. In real-time feeds, different symbols tick at different moments. If you just correlate raw tick-to-tick returns, you’re basically correlating noise and timing gaps.

So we do two things to make it usable.

First, we align prices by time. We bucket ticks into fixed time bins (like 10s, 20s, 30s) and treat the last price inside each bin as the price for that bin. That gives every symbol a comparable timeline.

Second, we make the correlation window regime-aware. If the base symbol is in high_vol, we compute correlation on a shorter recent slice so the card reacts faster. If the regime is normal, we use a longer lookback so it doesn’t flip wildly every refresh.

We also keep this card stocks-only in the app. Multi-asset correlation is doable, but alignment becomes much harder when tick frequency differs massively across assets. This article is about building something shippable. A stable stocks card beats a flaky multi-asset one.

`correlation.py`

import math

def _bucket(ts, bin_sec):
    return int(ts // bin_sec) * bin_sec

def build_price_table(store, symbols, window_sec=1800, bin_sec=10):
    table = {}
    now = None

    for s in symbols:
        b = store.buffers.get(s)
        if not b:
            continue
        if now is None:
            now = b[-1][0]
        else:
            now = max(now, b[-1][0])

    if now is None:
        return {}

    cutoff = now - window_sec

    for s in symbols:
        b = store.buffers.get(s)
        if not b:
            continue

        for ts, px in b:
            if ts < cutoff:
                continue
            k = _bucket(ts, bin_sec)
            row = table.get(k)
            if row is None:
                row = {}
                table[k] = row
            row[s] = px

    return table

def to_return_matrix(price_table, symbols):
    buckets = sorted(price_table.keys())
    if len(buckets) < 3:
        return []

    last_prices = None
    rows = []

    for bt in buckets:
        rowp = price_table[bt]
        if any(s not in rowp for s in symbols):
            continue

        prices = [float(rowp[s]) for s in symbols]

        if last_prices is None:
            last_prices = prices
            continue

        rets = []
        ok = True
        for i in range(len(symbols)):
            p0 = last_prices[i]
            p1 = prices[i]
            if p0 <= 0 or p1 <= 0:
                ok = False
                break
            rets.append(p1 / p0 - 1.0)

        last_prices = prices
        if ok:
            rows.append(rets)

    return rows

def corr(a, b):
    n = len(a)
    if n < 5:
        return None
    am = sum(a) / n
    bm = sum(b) / n
    num = 0.0
    da = 0.0
    db = 0.0
    for i in range(n):
        x = a[i] - am
        y = b[i] - bm
        num += x * y
        da += x * x
        db += y * y
    if da == 0 or db == 0:
        return None
    return num / math.sqrt(da * db)

def corr_card(store, symbols, base_symbol, bin_sec=10):
    reg = store.regime(base_symbol)
    win = 900 if reg == "high_vol" else 3600

    pt = build_price_table(store, symbols, window_sec=win, bin_sec=bin_sec)
    mat = to_return_matrix(pt, symbols)
    if not mat:
        return {"base": base_symbol, "regime": reg, "window_sec": win, "top": []}

    cols = list(zip(*mat))
    if base_symbol not in symbols:
        return {"base": base_symbol, "regime": reg, "window_sec": win, "top": []}

    bi = symbols.index(base_symbol)
    base = list(cols[bi])

    scores = []
    for i, s in enumerate(symbols):
        if s == base_symbol:
            continue
        c = corr(base, list(cols[i]))
        if c is None:
            continue
        scores.append((s, c))

    scores.sort(key=lambda x: abs(x[1]), reverse=True)
    top = [{"symbol": s, "corr": float(v)} for s, v in scores[:3]]

    return {"base": base_symbol, "regime": reg, "window_sec": win, "top": top}

build_price_table() creates the aligned timeline. It scans each symbol’s rolling buffer, buckets timestamps into fixed bins, and stores the last price per bucket.

to_return_matrix() converts those bucketed prices into returns, but only when every symbol has a price in the same bucket. That’s the alignment step that keeps correlation meaningful.

corr_card() is the actual widget output. It checks the base symbol’s regime, chooses a lookback window (15m for high-vol, 60m for normal), then computes correlations against the base symbol and returns the top matches.

Next, we’ll wire all of this into Streamlit and render the dashboard. That’s where the build starts to feel like a real app.

Building the Streamlit App

At this point, we have all the moving parts. A streaming layer that produces ticks, a state engine that produces snapshots, a stress detector that emits events, and a correlation function that can generate a small card. Now we just need to wrap it in a Streamlit app without breaking everything.

The key trick is to start the real-time worker once and keep it running in the background. Streamlit reruns the script constantly, so the UI code should never reconnect to WebSockets or spin up new loops. It should only read shared state and render tables.

import asyncio
import threading
import time

import pandas as pd
import streamlit as st

from feeds import start_streams
from pulse_store import PulseStore
from events import StressDetector, EventStore
from correlation import corr_card

st.set_page_config(page_title="Market Pulse", layout="wide")

st.markdown("""

""", unsafe_allow_html=True)

def _runner(state):
    async def _main():
        q = asyncio.Queue()
        await start_streams(q)

        store = PulseStore(window_sec=3600)
        detector = StressDetector()
        ev = EventStore(max_events=50)

        state["store"] = store
        state["events"] = ev
        state["detector"] = detector
        state["started_at"] = time.time()

        while True:
            t = await q.get()
            store.update(t)
            snap = store.snapshot(t["symbol"])
            e = detector.check(snap)
            if e:
                ev.add(e)

    asyncio.run(_main())

if "bg_started" not in st.session_state:
    st.session_state.bg_started = True
    st.session_state.state = {}
    th = threading.Thread(target=_runner, args=(st.session_state.state,), daemon=True)
    th.start()

state = st.session_state.state

st.title("Market Pulse")

col1, col2, col3 = st.columns([2, 2, 1])
with col1:
    st.caption("Real-time multi-asset pulse. Moves, stress events, and a simple correlation card.")
with col3:
    up = 0
    if "started_at" in state:
        up = int(time.time() - state["started_at"])
    st.metric("Uptime (s)", up)

if "store" not in state:
    st.info("Connecting to feeds and warming up buffers...")
    st.stop()

store = state["store"]
ev = state["events"]

c1, c2, c3 = st.columns(3)
with c1:
    top_k = st.slider("Top movers", 3, 10, 5)
with c2:
    base = st.selectbox("Correlation base (stocks)", ["TSLA", "AAPL"], index=0)
with c3:
    bin_sec = st.selectbox("Correlation bucket (sec)", [10, 20, 30], index=2)

snaps = store.snapshots()

def score(x):
    r1 = x.get("ret_1m")
    r5 = x.get("ret_5m")
    if r1 is not None:
        return abs(r1)
    if r5 is not None:
        return abs(r5)
    return 0.0

snaps.sort(key=score, reverse=True)
top = snaps[:top_k]

pulse_df = pd.DataFrame(top)
keep_cols = ["symbol", "asset", "last", "ret_1m", "ret_5m", "vol_15m", "regime"]
pulse_df = pulse_df[[c for c in keep_cols if c in pulse_df.columns]]

st.subheader("Pulse")
st.dataframe(pulse_df, use_container_width=True, height=260)

st.subheader("Stress feed")
events = ev.latest()[:15]
if events:
    ev_df = pd.DataFrame(events)
    ev_df["time"] = pd.to_datetime(ev_df["ts"], unit="s").dt.strftime("%H:%M:%S")
    ev_df = ev_df[["time", "type", "symbol", "asset", "value"]]
    st.dataframe(ev_df, use_container_width=True, height=260)
else:
    st.caption("No events yet.")

st.subheader("Correlation card (stocks)")
corr_symbols = ["AAPL", "TSLA"]
card = corr_card(store, corr_symbols, base_symbol=base, bin_sec=bin_sec)

st.write(card)

time.sleep(2.0)
st.rerun()

The background worker starts exactly once, inside a daemon thread. It owns the async WebSocket loop and keeps updating store and events in memory. Streamlit never touches the sockets.

The Pulse table comes straight from store.snapshots(). We sort by absolute 1-minute return when available, and fall back to 5-minute return when it exists.

The stress feed is rendered as a simple table, but we convert the raw epoch timestamp into a readable time string so it looks like a real UI.

The correlation card is a small JSON-ish object. It includes the base symbol, current regime, the window used, and the top correlations.

Finally, the refresh loop is intentionally basic. Sleep for two seconds, rerun, render the latest state. The heavy work continues in the worker thread.

Final Output

The final app: https://gumlet.tv/watch/69b99df9554f0fb510c28ce6/

What I’d Improve Next

If you want to take this beyond a demo, I’d start with a few practical upgrades.

First, split the Pulse table by asset class. A single global ranking is fine, but crypto will often dominate simply because it trades all the time and moves more. Separate tables for stocks, forex, and crypto makes the dashboard feel more balanced and closer to how a real product would present it.

Second, add light persistence. Even a tiny SQLite file or parquet dump every few minutes is enough to replay the last hour and debug issues without leaving the app running all day.

Third, route stress events somewhere useful. A webhook, a queue, or a small database table. Once events leave the UI and become part of a system, you can power alerts, newsletters, and internal monitoring.

Finally, if you want correlation to truly be multi-asset, you’ll need a stronger alignment approach. Bucketing works well for liquid equities, but for mixed tick rates you’ll want resampling logic, missing-data handling, and probably different bucket sizes per asset class.

Conclusion

That’s the full build: a live market pulse screen that streams multi-asset prices, maintains rolling state in memory, converts noisy ticks into usable signals, and surfaces everything through a simple Streamlit dashboard.

The main takeaway is the pattern. Keep streaming, state, and UI separated. Compute a small set of metrics that update smoothly. Then turn those metrics into event cards and widgets that a product team can actually use.

If you already use a multi-asset feed like EODHD for pricing and coverage, this kind of dashboard becomes a straightforward extension. Not a giant engineering project, just a clean way to ship real-time market context.

How to Use the Model Context Protocol to Build a Personal Financial Assistant

Nikhil Adithyan — Wed, 25 Mar 2026 16:41:36 +0000

LLMs are great at writing market commentary. The problem is they can sound confident even when they haven't looked at any data. That’s fine for casual chat, but it’s not fine if you’re building a feature for a product, an internal tool, or anything a user might rely on.

In this guide, we’ll build a small financial assistant that fetches real data by calling tools exposed via the MCP protocol (Model Context Protocol), then computes the numbers in Python. The LLM’s job is only to narrate the computed facts. It doesn't invent metrics, and it doesn't do the math.

By the end, you’ll have two outputs you can actually plug into a product flow: a single-ticker market brief, and a watchlist snapshot that compares multiple tickers on volatility and drawdown, with the tool calls traced so you can see exactly what data was used.

What is MCP, and How Does it Change the Integration Story?
Architecture: The “Narrator” Pattern
Step 1: MCP Client Wrapper (client.py)
Step 2: The Assistant Core (core.py)
Demo 1: Market Brief for One Ticker
Demo 2: Watchlist Snapshot
What Makes this Shippable, and What Can Be Improved?
Conclusion

Prerequisites

This is a code-first guide. I won’t explain every line of Python, so you should be comfortable reading pandas code, basic async/await patterns, and calling APIs from Python.

Before you start, you’ll need:

Python 3.10+
An EODHD API key (to access the EODHD MCP server)
An OpenAI API key (for the narration step)
The MCP Python client installed, plus the usual data stack: numpy and pandas
A local environment where you can run async Python code (Jupyter or a normal script both work)

If you’ve never worked with async code before, you can still follow along. Just treat the async functions as "network calls" and focus on how the data flows from tool calls, to deterministic metrics, to narration.

What is MCP, and How Does it Change the Integration Story?

MCP (Model Context Protocol) is a protocol for how an LLM application can discover and call external tools exposed by an MCP server. Instead of hardcoding a bunch of function schemas or building custom connectors per framework, you plug into an MCP server and the tools become “available” in a consistent format.

For product teams, this matters because it reduces integration churn. Tool discovery is predictable, you’re not rewriting wrappers every time your stack changes, and you get a clean separation between the model and the data layer.

In our case, that data layer is EOD Historical Data (EODHD), a market data provider. We’ll use EODHD’s MCP server, which exposes market data tools the assistant can call whenever it needs prices or fundamentals.

One important clarification for this tutorial: we’re using an MCP server purely as the data access layer. The model doesn’t decide which MCP tools to call or what parameters to pass. We'll do that deterministically in Python, then hand the model a facts object and let it write the narrative. This keeps the output grounded and makes the system much easier to trust and debug.

Architecture: The “Narrator” Pattern

Here’s the architecture we’re using in this guide:

The idea is simple: we'll separate “getting facts” from “writing words”. The model only does the second part.

First, the user asks a question like “Give me a 30-day brief for AAPL” or “Compare TSLA, NVDA, AMZN over the last 60 days”. That raw text goes into a tiny parser. The parser is intentionally boring. It only extracts what the system needs to operate: a list of tickers and a lookback window.

Once we have tickers and dates, we fetch data by calling MCP tools on the EODHD MCP server. In this case, our MCP client connects to the EODHD MCP server. So instead of the assistant guessing prices or fundamentals, it calls tools like “get historical prices” and “get fundamentals”. At this point we have raw data. Nothing has been computed yet, and the model has not written a single sentence.

Then Python takes over. This is where we compute everything deterministically: returns, volatility, max drawdown, trend slope, and a simple volatility regime label. For watchlists, we align returns and compute correlation. These numbers are the backbone of the output. If you rerun the same query with the same window, you should get the same metrics.

Only after that do we involve the LLM. We pass it a compact facts object. It contains the metrics we computed, plus a few clean fundamentals fields. The prompt is strict. Use only these facts – no extra numbers and no guessing. The model’s job is to turn the facts into a clean note that feels like something a product would show.

Finally, the assistant returns a structured response object. Not just text. You get:

answer (the narrative)
metrics (the exact computed numbers)
data_used (tickers, date range, and which tools were called)
tool_trace_id (a trace id you can log, debug, or attach to monitoring)

This pattern is B2B-friendly for a very practical reason. It reduces hallucinations because the model isn’t doing analysis. It makes numbers repeatable because Python computes them. And it’s easy to audit because you can always show what data was fetched, what window was used, and which tool calls happened.

Step 1: MCP Client Wrapper (`client.py`)

Before we touch any “assistant logic”, we need one thing: a tiny MCP client wrapper that opens MCP sessions to the EODHD MCP server and calls tools reliably. That’s it.

This file does three jobs:

opens a streamable HTTP MCP session
calls a tool with a timeout and a small retry loop
returns the tool output plus a small metadata object we can later attach to logs and traces

Here’s the complete client.py:

import time
import asyncio

from mcp import ClientSession
from mcp.client.streamable_http import streamable_http_client

class EODHDMCP:
    def __init__(self, apikey, base_url=None):
        self.apikey = apikey
        self.base_url = base_url or "https://mcp.eodhd.dev/mcp"
        self._tools = None

    def _url(self):
        return f"{self.base_url}?apikey={self.apikey}"

    def _open(self):
        return streamable_http_client(self._url())

    async def list_tools(self):
        if self._tools is not None:
            return self._tools

        async with self._open() as (read, write, _):
            async with ClientSession(read, write) as s:
                await s.initialize()
                resp = await s.list_tools()
                self._tools = [t.name for t in resp.tools]
                return self._tools

    async def call_tool(self, name, args, trace_id, timeout_s=25, retries=1):
        last = None

        for attempt in range(retries + 1):
            t0 = time.time()
            try:
                async with self._open() as (read, write, _):
                    async with ClientSession(read, write) as s:
                        await s.initialize()
                        out = await asyncio.wait_for(s.call_tool(name, args), timeout=timeout_s)
                        dt = time.time() - t0
                        meta = {"trace_id": trace_id, "tool": name, "args": args, "latency_s": round(dt, 3)}
                        return out, meta
            except Exception as e:
                last = e
                if attempt < retries:
                    await asyncio.sleep(0.25)

        raise last

How this works:

streamablehttp_client(self._url()) opens an MCP session over streamable HTTP. The URL includes your API key as a query param, so the server can authenticate.
list_tools() is just a convenience. It asks the server which tools exist and caches the names in memory so you don’t fetch them repeatedly.
call_tool() is the workhorse. It opens a session, initializes it, calls a tool with call_tool(name, args), and wraps the result with a meta object.
That meta object is important later. It lets you trace which tool was called, with which params, how long it took, and which request it belonged to (trace_id).

Next, we’ll build the core runner in core.py. This is where we parse the user’s request, fetch prices and fundamentals via MCP, compute metrics in Python, and then hand the facts to the LLM for narration.

Step 2: The Assistant Core (`core.py`)

This is where the assistant actually becomes “real”. client.py was just a connector. Here we decide what data to fetch, how much to fetch, how to compute the numbers, and what we hand to the model for narration.

1. Budgets and Trace Logging

When you build anything that calls real tools, you want limits. Not because you don’t trust your code, but because without limits, one messy prompt can easily turn into an expensive, slow request.

In our case, we cap:

how far back we’ll fetch data (MAX_LOOKBACK_DAYS)
how many tool calls we allow per request (MAX_TOOL_CALLS)
how many tickers we’ll accept in one query (MAX_TICKERS)

And we log a few events so we can always debug what happened later.

Here’s the top part of core.py for that:

import json
import re
import time
import uuid
from datetime import date, timedelta
from openai import OpenAI
import numpy as np
import pandas as pd
import asyncio
from client import EODHDMCP

EODHD_API_KEY = "YOUR EODHD API KEY"
MCP_BASE_URL = "https://mcp.eodhd.dev/mcp"

MAX_LOOKBACK_DAYS = 365
MAX_TOOL_CALLS = 6
MAX_TICKERS = 5

mcp = EODHDMCP(EODHD_API_KEY, base_url=MCP_BASE_URL)
oa = OpenAI(api_key = "OPENAI API KEY")
NARRATION_MODEL = "gpt-5.3-chat-latest"

def log_event(event, trace_id, **k):
    payload = {"event": event, "trace_id": trace_id, "ts": round(time.time(), 3)}
    payload.update(k)
    print(json.dumps(payload, default=str))

What’s going on here:

MAX_LOOKBACK_DAYS, MAX_TOOL_CALLS, MAX_TICKERS are basically your safety rails. We’ll enforce them later, right after parsing the user query.
trace_id is a small id we generate per request. Every log line includes it, so when something breaks, you can reconstruct the exact flow for that request.
log_event() prints one JSON line. Nothing fancy – but it’s enough for debugging and it also looks very similar to how real systems emit traces.

Note: Make sure to replace YOUR EODHD API KEY with your actual EODHD API key. If you don’t have one, you can obtain it by creating an EODHD developer account.

2. Parsing the Request

This part is intentionally not “smart”. We’re not doing NLP. We’re not letting the model interpret the query. We just want to extract two things in a predictable way:

tickers
lookback window

That’s it.

The benefit of keeping it dumb is that the behavior is stable. If the query is messy, we still do something consistent, and the rest of the pipeline remains controllable.

Here are the two functions:

def parse_request(text):
    t = (text or "").upper()

    raw = re.findall(r"\b[A-Z]{1,5}\b", t)

    bad = {
        "I","A","AN","THE","AND","OR","TO","FOR","OF","IN","ON","BY","WITH","ME","WE","US",
        "GIVE","DAY","DAYS","BRIEF","COMPARE","RANK","OVER","LAST","TREND","VOL","VOLATILITY",
        "DRAWDOWN","FLAG","RISKS","RISK","PLUS","MAX","MIN","LOOKBACK"
    }

    tickers = []
    for x in raw:
        if x in bad:
            continue
        if len(x) < 2:
            continue
        if x not in tickers:
            tickers.append(x)

    days = 30

    if "LAST" in t:
        after = t.split("LAST", 1)[1]
        m = re.search(r"\d{1,4}", after)
        if m:
            days = int(m.group(0))
    
    return tickers, days

def enforce_budgets(tickers, lookback_days):
    if lookback_days < 1:
        lookback_days = 1
    if lookback_days > MAX_LOOKBACK_DAYS:
        lookback_days = MAX_LOOKBACK_DAYS

    tickers = tickers[:MAX_TICKERS]

    return tickers, lookback_days

How to read this:

re.findall(r"\b[A-Z]{1,5}\b", t) pulls out every short uppercase token. That’s our crude “ticker candidate” list.
The bad set is just a blacklist of common words that show up in prompts but are obviously not tickers.
We keep unique tickers in order, because the first ticker becomes the “base” for correlation in the watchlist demo.
Lookback is simple: the default is 30 days. If the query contains “last …”, we grab the first number after “LAST”. That avoids regex edge cases with punctuation.

Then enforce_budgets() clamps everything so one request can’t ask for 500 tickers or a 10-year window.

Next, we’ll wire these parsed values into a request state and start making actual MCP calls for prices and fundamentals.

3. Tool Wrappers: Prices and Fundamentals

Now we’re at the point where the assistant actually touches data.

These two functions do the same job in different ways:

fetch_prices() calls the historical prices tool on the EODHD MCP server, then normalizes the output into a tiny DataFrame with just date and price.
fetch_fundamentals() calls the fundamentals tool on the EODHD MCP server.

We also keep a small state object per request. It tracks tool calls and keeps a trace of what was called. That’s how we later produce the data_used block in the final response.

Here’s the code:

def new_state():
    return {"tool_calls": 0, "tool_trace": [], "rows": {}}

def _bump(state, meta):
    state["tool_calls"] += 1
    state["tool_trace"].append(meta)
    if state["tool_calls"] > MAX_TOOL_CALLS:
        raise RuntimeError("tool call budget exceeded")

def _as_json_text(out):
    if isinstance(out, str):
        return out
    if hasattr(out, "content"):
        try:
            return out.content[0].text
        except Exception:
            pass
    return str(out)

async def fetch_prices(ticker, start_date, end_date, trace_id, state):
    args = {
        "ticker": ticker,
        "start_date": start_date,
        "end_date": end_date,
        "period": "d",
        "order": "a",
        "fmt": "json",
    }

    out, meta = await mcp.call_tool("get_historical_stock_prices", args, trace_id)
    txt = _as_json_text(out)

    _bump(state, meta)

    data = json.loads(txt)
    if isinstance(data, dict) and data.get("error"):
        raise RuntimeError(data["error"])

    df = pd.DataFrame(data)
    if df.empty:
        return df

    cols = [c for c in ["date", "adjusted_close", "close"] if c in                   df.columns]
    df = df[cols].copy()

    if "adjusted_close" in df.columns:
        df = df.rename(columns={"adjusted_close": "price"})
    elif "close" in df.columns:
        df = df.rename(columns={"close": "price"})
    else:
        return pd.DataFrame()

    df["ticker"] = ticker

    state["rows"][f"{meta['tool']}:{ticker}"] = len(df)
    return df

async def fetch_fundamentals(ticker, trace_id, state):
    args = {
        "ticker": ticker,
        "include_financials": False,
        "fmt": "json",
    }

    out, meta = await mcp.call_tool("get_fundamentals_data", args, trace_id)
    txt = _as_json_text(out)

    _bump(state, meta)

    data = json.loads(txt)
    if isinstance(data, dict) and data.get("error"):
        raise RuntimeError(data["error"])

    return data

What’s happening here:

_bump() is the budget guard. Every time we make a tool call, we increment the counter and store the tool metadata. If we cross the budget, we fail fast.
meta comes from client.py. It contains tool, args, and latency. That’s enough to trace “what did we call and how long did it take”.
_as_json_text() is there because the tool results returned by the MCP server are not always plain strings. Sometimes it’s an object with .content. This helper just tries to extract the text cleanly.
In fetch_prices(), we intentionally keep only date and price. That’s not because OHLC is useless. It’s because this tutorial’s metrics only need adjusted closes. Fewer columns means simpler code, smaller payloads, and fewer chances to break.

Next, we’ll compute the actual metrics. This is where the assistant stops being “an API caller” and starts producing something useful.

4. Deterministic Metrics

This is the most important design choice in the whole build. The model never computes numbers. Python does.

So for every ticker, we compute a small set of metrics that are easy to explain and are actually useful in a “market brief” style output:

total return over the window
realized volatility (daily and annualized)
max drawdown (worst peak-to-trough fall)
a simple trend slope (so we can say “mild uptrend” or “downtrend” without vibes)
a lightweight regime label (low, mid, high volatility)

Here’s the code:

def compute_metrics(prices_df):
    if prices_df is None or prices_df.empty:
        return {}

    df = prices_df.copy()
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df = df.dropna(subset=["date"]).sort_values("date")

    close = pd.to_numeric(df["price"], errors="coerce").dropna()
    if close.empty:
        return {}

    rets = close.pct_change().dropna()

    out = {}

    # realized vol (daily), annualize with sqrt(252)
    if not rets.empty:
        out["vol_daily"] = float(rets.std())
        out["vol_annualized"] = float(rets.std() * np.sqrt(252))
        out["ret_total"] = float((close.iloc[-1] / close.iloc[0]) - 1.0)

    # max drawdown
    peak = close.cummax()
    dd = (close / peak) - 1.0
    out["max_drawdown"] = float(dd.min())

    # simple trend score
    logp = np.log(close.values)
    x = np.arange(len(logp))
    if len(logp) >= 3:
        slope = np.polyfit(x, logp, 1)[0]
        out["trend_slope"] = float(slope)
    else:
        out["trend_slope"] = 0.0

    # basic helpers
    out["n_points"] = int(len(close))
    out["start_close"] = float(close.iloc[0])
    out["end_close"] = float(close.iloc[-1])

    return out

def compute_regime(prices_df, window=20):
    # cheap regime label, based on rolling vol percentile
    if prices_df is None or prices_df.empty:
        return {"regime": "unknown"}

    df = prices_df.copy()
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df = df.dropna(subset=["date"]).sort_values("date")

    close = pd.to_numeric(df["price"], errors="coerce").dropna()
    if close.empty:
        return {"regime": "unknown"}

    rets = close.pct_change()
    rv = rets.rolling(window).std()

    last = rv.dropna()
    if last.empty:
        return {"regime": "unknown"}

    cur = float(last.iloc[-1])
    p80 = float(last.quantile(0.8))
    p50 = float(last.quantile(0.5))

    if cur >= p80:
        reg = "high_vol"
    elif cur >= p50:
        reg = "mid_vol"
    else:
        reg = "low_vol"

    return {"regime": reg, "rolling_vol": cur, "window": int(window)}

How to think about these calculations:

Total return is just end / start - 1. It’s the simplest “did it go up or down” number.
Volatility here is realized volatility of daily returns. That’s just the standard deviation of daily % changes. We annualize it using sqrt(252) because markets have roughly 252 trading days.
Max drawdown tells you how bad the worst dip was during the window. It’s often more meaningful than return when you’re writing a quick risk note.
Trend slope is intentionally simple. We fit a straight line to log prices. If the slope is positive, it’s generally drifting up. If it’s negative, it’s drifting down.
Regime label is not a fancy model. It just says “compared to its own recent rolling volatility, are we currently in a high, medium, or low vol phase”.

The main point is this: these numbers are deterministic. If the assistant says “max drawdown was -13%”, you can trace it back to the exact adjusted close series that produced it.

Next, we’ll handle the watchlist side. That means aligning returns across tickers, computing correlation, and generating a ranked snapshot.

5. Watchlist Utilities

Once you have more than one ticker, you want two extra things:

a quick ranking so you can say “this is the riskiest name in the basket”
a correlation snapshot so you can see what’s moving together

The only “gotcha” with correlation is dates. If TSLA has 41 price points and NVDA has 39 because of missing days, you can’t just correlate blindly. You need the returns lined up on the same dates first. That’s what align_returns() does.

Here’s the code:

def align_returns(price_frames):
    if not price_frames:
        return pd.DataFrame()

    parts = []
    for df in price_frames:
        if df is None or df.empty:
            continue
        x = df.copy()
        x["date"] = pd.to_datetime(x["date"], errors="coerce")
        x = x.dropna(subset=["date"])
        x["price"] = pd.to_numeric(x["price"], errors="coerce")
        x = x.dropna(subset=["price"])
        x = x.sort_values("date")
        x["ret"] = x["price"].pct_change()
        x = x.dropna(subset=["ret"])
        parts.append(x[["date", "ticker", "ret"]])

    if not parts:
        return pd.DataFrame()

    allr = pd.concat(parts, ignore_index=True)
    wide = allr.pivot(index="date", columns="ticker", values="ret").dropna(how="any")
    return wide


def corr_summary(ret_wide, base_ticker, top_n=3):
    if ret_wide is None or ret_wide.empty:
        return []

    if base_ticker not in ret_wide.columns:
        return []

    c = ret_wide.corr()[base_ticker].dropna()
    c = c.drop(labels=[base_ticker], errors="ignore")
    if c.empty:
        return []

    out = []
    for k, v in c.sort_values(ascending=False).head(top_n).items():
        out.append({"ticker": k, "corr": float(v)})

    return out


def rank_watchlist(metrics_by_ticker):
    rows = []
    for t, m in metrics_by_ticker.items():
        if not m:
            continue
        rows.append({
            "ticker": t,
            "vol_annualized": m.get("vol_annualized"),
            "max_drawdown": m.get("max_drawdown"),
            "ret_total": m.get("ret_total"),
            "trend_slope": m.get("trend_slope"),
        })

    if not rows:
        return pd.DataFrame()

    df = pd.DataFrame(rows)
    df = df.sort_values(["vol_annualized", "max_drawdown"], ascending=[False, True])
    return df.reset_index(drop=True)

What’s happening here:

align_returns() takes a list of price DataFrames, computes daily returns for each, then pivots them into a wide table like: date -> TSLA.US, NVDA.US, AMZN.US.
We drop rows where any ticker is missing, because correlation only makes sense when the returns are aligned on the same dates.
corr_summary() is a compact “who moves with whom” helper. We pick one base ticker, compute correlations against everything else, then grab the top few. For a watchlist widget, that’s usually enough.
rank_watchlist() is the ranking logic for the snapshot. We sort primarily by annualized volatility, and use drawdown as a secondary risk indicator. You could choose different ranking logic. The point is to keep it deterministic and explainable.

Next, we’ll build the facts objects and narration layer. That’s where we enforce the “model is just a narrator” contract.

6. Facts Object and Narration

This is where the “narrator pattern” becomes real.

Up to this point, we’ve done everything with MCP and Python. We fetched prices and fundamentals from EODHD, we computed metrics, and we aligned returns. Now we need one clean object that represents “the truth” for this request.

That’s what the facts object is.

The rule is simple.

facts contains only things we actually fetched or computed.
The model never sees raw market data. It sees the cleaned facts.
The model is told to write using only those facts, and not to invent any numbers.

Here are the functions that build those facts objects for the two demos, plus the narration function.

def build_facts_single(ticker, lookback_days, metrics, regime, fundamentals):
    # keep this compact. LLM will narrate from this later
    out = {
        "type": "single_ticker_brief",
        "ticker": ticker,
        "lookback_days": int(lookback_days),
        "metrics": metrics,
        "regime": regime,
    }

    if isinstance(fundamentals, dict):
        gen = fundamentals.get("General", {}) or {}
        hi = fundamentals.get("Highlights", {}) or {}
        val = fundamentals.get("Valuation", {}) or {}
        tech = fundamentals.get("Technicals", {}) or {}

        base = {
            "name": gen.get("Name"),
            "exchange": gen.get("Exchange"),
            "sector": gen.get("Sector"),
            "industry": gen.get("Industry"),
        }

        metrics = {
            "market_cap": hi.get("MarketCapitalization"),
            "pe": hi.get("PERatio") or val.get("TrailingPE") or val.get("PERatio"),
            "beta": tech.get("Beta"),
            "div_yield": hi.get("DividendYield"),
        }

        out["fundamentals"] = {k: v for k, v in {**base, **metrics}.items() if v is not None}

    return out


def build_facts_watchlist(tickers, lookback_days, rank_df, corr_bits, metrics_by_ticker):
    out = {
        "type": "watchlist_snapshot",
        "tickers": tickers,
        "lookback_days": int(lookback_days),
        "ranking": rank_df.to_dict(orient="records") if isinstance(rank_df, pd.DataFrame) else [],
        "correlation": corr_bits,
        "metrics_by_ticker": metrics_by_ticker,
    }
    return out


def narrate(facts):
    prompt = (
        "Write a short, product-ready market note using ONLY the facts below.\n"
        "No guessing. No extra numbers. If something is missing, say it's missing.\n"
        "Keep it tight and readable.\n\n"
        f"FACTS:\n{json.dumps(facts, indent=2, default=str)}"
    )

    r = oa.responses.create(
        model=NARRATION_MODEL,
        input=[{"role": "user", "content": prompt}],
    )

    try:
        return r.output_text
    except Exception:
        return str(r)

What’s happening here:

build_facts_single() takes the ticker, window, computed metrics, the vol regime label, and the fundamentals payload. But it doesn’t dump the entire fundamentals JSON. It picks a handful of fields from the General section and only keeps what exists. That keeps the prompt tight and the output predictable.
build_facts_watchlist() is the same idea but for multiple tickers. It passes the ranking table, correlation notes, and per-ticker metrics.
narrate() is basically “convert this facts object into human-friendly text”. The prompt is strict on purpose. If the model can only see these facts, it cannot hallucinate numbers outside them.

One small implementation detail: narrate() is a normal blocking function, while everything else is async. That’s why later, inside run_assistant(), we call it with await asyncio.to_thread(...) so it doesn’t block the async flow.

7. The Orchestration Function (`run_assistant()`)

This is the piece that ties everything together. It does four things in order:

create a trace id and log the request
parse tickers and lookback, then clamp them to budgets
fetch EODHD data via MCP and compute metrics in Python
call the model to narrate the facts, then return a structured response

Here’s the function:

def _dates_from_lookback(lookback_days):
    end = date.today()
    start = end - timedelta(days=int(lookback_days))
    return start.isoformat(), end.isoformat()

async def run_assistant(user_text, mode="auto"):
    trace_id = uuid.uuid4().hex[:10]
    log_event("request_started", trace_id, text=user_text, mode=mode)

    tickers, lookback = parse_request(user_text)
    tickers, lookback = enforce_budgets(tickers, lookback)

    if not tickers:
        return {
            "answer": "no tickers found in request",
            "metrics": {},
            "data_used": {},
            "tool_trace_id": trace_id,
        }

    log_event("parsed", trace_id, tickers=tickers, lookback_days=lookback)
    
    start_date, end_date = _dates_from_lookback(lookback)
    state = new_state()
        
    if mode == "auto":
        mode = "watchlist" if len(tickers) > 1 else "single"

    try:
        if mode == "single":
            t = tickers[0]
            t_full = t if "." in t else f"{t}.US"

            log_event("tool_phase", trace_id, mode="single", ticker=t_full, start_date=start_date, end_date=end_date)

            prices = await fetch_prices(t_full, start_date, end_date, trace_id, state)
            metrics = compute_metrics(prices)
            regime = compute_regime(prices)

            fundamentals = await fetch_fundamentals(t_full, trace_id, state)

            facts = build_facts_single(t_full, lookback, metrics, regime, fundamentals)
            answer = await asyncio.to_thread(narrate, facts)

            resp = {
                "answer": answer,
                "metrics": metrics,
                "data_used": {
                    "tickers": [t_full],
                    "date_range": [start_date, end_date],
                    "tools_called": [x.get("tool") for x in state["tool_trace"]],
                    "tool_calls": state["tool_calls"],
                },
                "tool_trace_id": trace_id,
            }

            log_event("request_finished", trace_id, tool_calls=state["tool_calls"])
            return resp

        # watchlist
        full = [x if "." in x else f"{x}.US" for x in tickers]

        log_event("tool_phase", trace_id, mode="watchlist", tickers=full, start_date=start_date, end_date=end_date)

        frames = []
        metrics_by = {}

        for t in full:
            prices = await fetch_prices(t, start_date, end_date, trace_id, state)
            frames.append(prices)
            metrics_by[t] = compute_metrics(prices)

        ret_wide = align_returns(frames)

        base = full[0]
        corr_bits = []
        top = corr_summary(ret_wide, base, top_n=3)
        if top:
            corr_bits.append({"base": base, "top": top})

        rank_df = rank_watchlist(metrics_by)
        facts = build_facts_watchlist(full, lookback, rank_df, corr_bits, metrics_by)
        answer = await asyncio.to_thread(narrate, facts)

        resp = {
            "answer": answer,
            "metrics": {"by_ticker": metrics_by},
            "data_used": {
                "tickers": full,
                "date_range": [start_date, end_date],
                "tools_called": [x.get("tool") for x in state["tool_trace"]],
                "tool_calls": state["tool_calls"],
            },
            "tool_trace_id": trace_id,
        }

        log_event("request_finished", trace_id, tool_calls=state["tool_calls"])
        return resp

    except Exception as e:
        detail = repr(e)
        if hasattr(e, "exceptions"):
            detail = detail + " | " + " ; ".join([repr(x) for x in e.exceptions])

        log_event("request_failed", trace_id, err=detail)
        
        return {
            "answer": f"failed: {e}",
            "metrics": {},
            "data_used": {
                "tickers": tickers,
                "date_range": [start_date, end_date],
                "tools_called": [x.get("tool") for x in state["tool_trace"]],
                "tool_calls": state["tool_calls"],
            },
            "tool_trace_id": trace_id,
        }

This function is the glue. It creates a trace_id, logs the request, extracts tickers and a lookback window, then clamps both to your budgets so the assistant can’t over-fetch or spam tool calls.

After that, it turns the lookback into a start_date and end_date, initializes a fresh state, and picks a mode. In single mode, it fetches prices and fundamentals for one ticker via EODHD’s MCP tools, computes the metrics in Python, packs everything into a facts object, and asks the LLM to only narrate those facts. In watchlist mode it does the same across multiple tickers, then aligns returns so correlation is computed on matching dates, and builds a ranked snapshot.

The response is always structured the same way. You get the narrative answer, the raw computed metrics, a data_used block that shows tickers, date range, and tools called, plus a tool_trace_id so you can trace any output back to logs.

That structure is the difference between “a chat response” and “a shippable assistant output”. You can plug the same response into a UI card, a Slack alert, or a dashboard without changing anything.

Demo 1: Market Brief for One Ticker

Let’s start with the simplest flow. One ticker, one lookback window, and a market brief that looks like something you could show inside a product.

Prompt used:

“Give me a 30-day brief for AAPL. trend, volatility, max drawdown, plus 3 fundamental highlights.”

Code (Jupyter Notebook):

import asyncio
import json
from core import run_assistant

q1 = "Give me a 30-day brief for AAPL. trend, volatility, max drawdown, plus 3 fundamental highlights."

r1 = await run_assistant(q1, mode="single")
print(json.dumps(r1, indent=2, ensure_ascii=False))

Output:

{"event": "request_started", "trace_id": "2af550173f", "ts": 1772735388.777, "text": "Give me a 30-day brief for AAPL. trend, volatility, max drawdown, plus 3 fundamental highlights.", "mode": "single"}
{"event": "parsed", "trace_id": "2af550173f", "ts": 1772735388.778, "tickers": ["AAPL"], "lookback_days": 30}
{"event": "tool_phase", "trace_id": "2af550173f", "ts": 1772735388.778, "mode": "single", "ticker": "AAPL.US", "start_date": "2026-02-03", "end_date": "2026-03-05"}
{"event": "request_finished", "trace_id": "2af550173f", "ts": 1772735404.392, "tool_calls": 2}
{
  "answer": "Apple Inc (AAPL.US) | NASDAQ | Technology — Consumer Electronics\n
\nOver the past 30 days, Apple shares declined 2.58%, falling from 269.48 to 
262.52 across 21 trading observations. The trend slope over the period was 
negative (-0.00175), indicating a modest downward drift.\n\nRealized daily 
volatility was 1.93%, equivalent to about 30.65% annualized. The stock is currently 
classified in a high‑volatility regime based on a 20‑day rolling volatility measure.
\n\nMaximum drawdown during the period reached -8.03%.\n\nAdditional fundamentals 
or valuation metrics were not provided.",
  "metrics": {
    "vol_daily": 0.01930981768788001,
    "vol_annualized": 0.3065338527847606,
    "ret_total": -0.02582751966750796,
    "max_drawdown": -0.08032503955127279,
    "trend_slope": -0.0017498633497641184,
    "n_points": 21,
    "start_close": 269.48,
    "end_close": 262.52
  },
  "data_used": {
    "tickers": [
      "AAPL.US"
    ],
    "date_range": [
      "2026-02-03",
      "2026-03-05"
    ],
    "tools_called": [
      "get_historical_stock_prices",
      "get_fundamentals_data"
    ],
    "tool_calls": 2
  },
  "tool_trace_id": "2af550173f"
}

First, you’ll see the log events. They’re not part of the final response. They’re just the trace trail.

request_started shows the raw prompt and that we forced mode="single".
parsed confirms the parser extracted AAPL and a 30-day lookback.
tool_phase shows what we actually fetched: AAPL.US from 2026-02-03 to 2026-03-05.
request_finished confirms we made exactly 2 tool calls.

Now the actual response JSON:

answer is the narrative. In this run it summarizes:

return of -2.58% (269.48 to 262.52)
21 price observations in that window
negative trend slope (-0.00175) meaning mild downward drift
daily vol 1.93% and annualized vol 30.65%
max drawdown -8.03%
and it labels the regime as high volatility using the rolling vol logic.

metrics is where those numbers come from. This is the deterministic part. ret_total, vol_daily, vol_annualized, max_drawdown, and trend_slope were computed directly from the fetched closes. start_close, end_close, and n_points explain the exact series used.

data_used is the audit block for this specific output. It shows:

ticker normalized to AAPL.US
the exact date range pulled
the exact tools called on the MCP server: get_historical_stock_prices and get_fundamentals_data
and again, tool_calls: 2 so you can quickly spot runaway calls.

tool_trace_id (2af550173f) is your handle for debugging. Every log line above carries the same id, so you can trace this brief back to the exact tool calls and parameters.

Demo 2: Watchlist Snapshot

Now let’s switch to the watchlist flow. Same assistant core. The only difference is we pass multiple tickers and a longer window, so the output becomes a comparative risk snapshot.

Prompt used:

“Compare TSLA, NVDA, AMZN over the last 60 days. rank by volatility and drawdown, and flag valuation risks.”

Code:

q2 = "Compare TSLA, NVDA, AMZN over the last 60 days. rank by volatility and drawdown, and flag risk outliers."

r2 = await run_assistant(q2, mode="watchlist")
print(json.dumps(r2, indent=2, ensure_ascii=False))

Output:

{"event": "request_started", "trace_id": "1b67bb47d6", "ts": 1772735404.394, "text": "Compare TSLA, NVDA, AMZN over the last 60 days. rank by volatility and drawdown, and flag valuation risks.", "mode": "watchlist"}
{"event": "parsed", "trace_id": "1b67bb47d6", "ts": 1772735404.394, "tickers": ["TSLA", "NVDA", "AMZN"], "lookback_days": 60}
{"event": "tool_phase", "trace_id": "1b67bb47d6", "ts": 1772735404.394, "mode": "watchlist", "tickers": ["TSLA.US", "NVDA.US", "AMZN.US"], "start_date": "2026-01-05", "end_date": "2026-03-06"}
{"event": "request_finished", "trace_id": "1b67bb47d6", "ts": 1772735423.004, "tool_calls": 3}
{
  "answer": "Market Watchlist Snapshot (last 60 days)\n\nAll three names show 
negative total returns and downward trend slopes over the period.\n\nNVDA.US 
ranks highest in the group despite a small decline. Total return is -0.027. 
Price moved from 188.12 to 183.04 across 41 observations. Annualized volatility is 
0.3808 and maximum drawdown is -0.107.\n\nTSLA.US shows the second‑highest volatility 
profile with annualized volatility of 0.3561. Total return is -0.101, with price 
falling from 451.67 to 405.94. Maximum drawdown reached -0.131. Trend slope is negative.
\n\nAMZN.US has the lowest volatility in the set (annualized 0.3196) but the deepest 
drawdown at -0.196. Total return is -0.0697, with price moving from 233.06 to 
216.82. Trend slope is also negative.\n\nCorrelation: TSLA shows a stronger 
relationship with NVDA (0.533) than with AMZN (0.177).\n\nMissing from the 
data: trading volume, catalysts, sector context, and forward-looking indicators.",
  "metrics": {
    "by_ticker": {
      "TSLA.US": {
        "vol_daily": 0.02243518393199404,
        "vol_annualized": 0.3561475038122908,
        "ret_total": -0.10124648526579139,
        "max_drawdown": -0.13115770363318358,
        "trend_slope": -0.0026452119688441023,
        "n_points": 41,
        "start_close": 451.67,
        "end_close": 405.94
      },
      "NVDA.US": {
        "vol_daily": 0.023987861378298222,
        "vol_annualized": 0.3807954941476091,
        "ret_total": -0.027004039974484417,
        "max_drawdown": -0.10716326424601319,
        "trend_slope": -4.3573704505466623e-05,
        "n_points": 41,
        "start_close": 188.12,
        "end_close": 183.04
      },
      "AMZN.US": {
        "vol_daily": 0.020129905817481322,
        "vol_annualized": 0.31955234824924766,
        "ret_total": -0.06968162704882863,
        "max_drawdown": -0.1964184655186353,
        "trend_slope": -0.00520436173926906,
        "n_points": 41,
        "start_close": 233.06,
        "end_close": 216.82
      }
    }
  },
  "data_used": {
    "tickers": [
      "TSLA.US",
      "NVDA.US",
      "AMZN.US"
    ],
    "date_range": [
      "2026-01-05",
      "2026-03-06"
    ],
    "tools_called": [
      "get_historical_stock_prices",
      "get_historical_stock_prices",
      "get_historical_stock_prices"
    ],
    "tool_calls": 3
  },
  "tool_trace_id": "1b67bb47d6"
}

The logs show the assistant correctly extracted TSLA, NVDA, AMZN and a 60-day lookback, then fetched TSLA.US, NVDA.US, and AMZN.US from 2026-01-05 to 2026-03-06. Since this is a watchlist request, it made exactly 3 tool calls. One get_historical_stock_prices call per ticker.

Inside answer, the model is basically summarizing what Python computed. In this run, all three names had negative returns and negative trend slopes.

NVDA had the highest annualized volatility at 0.3808 with a relatively small decline of -2.7%.
TSLA was next in volatility (0.3561) with a larger decline (-10.1%) and drawdown of about -13.1%.
AMZN had the lowest volatility (0.3196) but the deepest drawdown at around -19.6%. It also includes a correlation note derived from the aligned returns table.
TSLA’s return series correlated more with NVDA (0.533) than with AMZN (0.177) in this window.

metrics.by_ticker is where the snapshot really lives. It contains the full computed metric set per ticker, including observation count (n_points=41) and the start and end closes used for the return calculation. data_used shows exactly what we fetched, including the tickers, the date range, and the three price tool calls. And tool_trace_id is the id that links this output back to the full trace logs.

So how would a product team use this? Well, this output is already shaped like a widget backend. You can render the ranking as a watchlist “risk card”, show the top volatility and drawdown names, and drop the narrative into a compact summary box. Since you also get deterministic metrics, you can build UI elements without parsing text, and still keep the narration as a layer on top.

What Makes this Shippable, and What Can Be Improved?

The core reason this works in a real product setting is that the numbers are deterministic. Prices and fundamentals come from EODHD via MCP, metrics are computed in Python, and the model only writes narrative from a facts object.

On top of that, every run is traceable. You get tool logs, data_used, and a tool_trace_id, plus hard limits on lookback, tickers, and tool calls so the system can’t spiral.

At the same time, this is still an MVP. The parsing is a simple heuristic, the metric set is intentionally small, and fundamentals are only lightly extracted.

If you want to take this further, the next upgrades are straightforward: you can add volume and a couple more data tools like earnings calendar and news, introduce caching for repeated requests, build a tiny evaluation harness with fixed prompts and expected outputs, then wrap run_assistant() behind a small API so it can power an actual UI or internal service.

Conclusion

The main takeaway is simple. If you want a financial assistant to be usable beyond casual chat, you need to separate facts from narrative. The MCP protocol gives you a clean way to connect to tool providers via an MCP server. Python gives you deterministic metrics, and the model becomes the last-mile layer that turns those facts into readable output.

This is still a small build, but it’s already shaped like something you can ship. The response format is structured, traceable, and easy to plug into a UI. If you extend it with a few more tools and add basic caching, it can quickly move from a Jupyter notebook demo to a real feature.

If you want to try the same approach with a full market data tool layer out of the box, EODHD’s MCP server is a solid starting point.

With that being said, you’ve reached the end of the article. Hope you learned something new and useful today. Thank you very much for your time.

A Comprehensive Guide to Financial Storytelling using Data Visualization

Nikhil Adithyan — Wed, 11 Mar 2026 20:21:36 +0000

In any analysis project, raw tables of numbers often don’t tell the full story. Visualisations simplify complexity by transforming data into shapes that our brains can quickly understand, emphasising trends, outliers, and regime shifts that might be overlooked in raw data.

This is especially vital in finance and trading, where clear visuals can uncover risks, opportunities, and patterns, directly affecting decisions on position sizing, timing, and confidence.

Today, we'll use FMP APIs to interpret earnings data: extracting announcements, surprises, and price reactions across almost 1,000 stocks to identify actionable patterns in post‑earnings movements.

Here’s exactly what we’ll build:

Sector heatmap: Maps strongest 3/10-day post-earnings reactions by sector/market-cap buckets.
EPS scatter: Tests if earnings beats drive returns (sector-colored, with regression).
Return violins: Shows 3-day post-earnings volatility/skew by sector and market-cap.
Mega-tech time series: Tracks AAPL/MSFT/NVDA post-earnings patterns over time.
Monthly seasonality: Reveals calendar edges in post-earnings returns/surprises.
Regime cross-section: Tests sector robustness across bull/bear/sideways markets.

What we'll cover:

Prerequisites
Data Extraction
Storytelling with Charts and Visuals
What Did We Get Out of All This Storyline?
Final Thoughts

Prerequisites

To follow along, you should be comfortable with Python and basic data manipulation in pandas.

This is a code-first guide. I’ll focus on the workflow and the story the charts reveal, and I won’t explain every line of Python. You should be comfortable reading pandas code, loops, and basic plotting logic so you can follow along without needing a step-by-step breakdown of each block.

You’ll need:

Python 3.10+
A Financial Modeling Prep (FMP) API key
pandas, numpy, matplotlib, seaborn, scipy installed
Enough local compute and patience to run API loops across a large stock universe

Data Extraction

In the first part of this article, we need to collect all the data required for our visualisation exercise. Using FMP’s Stock Screener API, we will retrieve NASDAQ stocks. The first API call will return 1,000 stocks.

import requests
import pandas as pd
import numpy as np
import json
from datetime import datetime, timedelta
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats

token = 'YOUR FMP TOKEN'

url = f'https://financialmodelingprep.com/stable/company-screener'
querystring = {"apikey":token,"country":"US", "exchange": "NASDAQ", "isActiveTrading": True, "isEtf": False, "isFund": False}
resp = requests.get(url, querystring).json()

df_universe = pd.DataFrame(resp)
df_universe = df_universe[df_universe['exchangeShortName'] == 'NASDAQ']
df_universe

This will give us 1,000 stocks! Next, we'll bin the market capitalisation to gain a better understanding of the results later on, and we will keep only four columns that are necessary: the symbol, name, market cap, and sector.

bins = [0,
        250_000_000,    # 250M
        2_000_000_000,  # 2B
        10_000_000_000, # 10B
        200_000_000_000,# 200B
        float("inf")]

labels = ["Micro", "Small", "Mid", "Large", "Mega"]

df_universe["marketCap"] = pd.cut(df_universe["marketCap"], bins=bins, labels=labels, right=False)
df_universe = df_universe[['symbol', 'companyName', 'marketCap', 'sector']]
df_universe

Now it is time to retrieve the earnings using FMP’s Earnings Report API. We'll loop through each symbol and collect all the earnings the endpoint provides to us.

symbols = df_universe['symbol'].to_list()

all_dfs = []

for symbol in symbols:
    url = f"https://financialmodelingprep.com/stable/earnings?symbol={symbol}"
    params = {"apikey": token}
    resp = requests.get(url, params=params)

    if resp.status_code != 200:
        print(f"Error for {symbol}: {resp.status_code} - {resp.text}")
        continue

    data = resp.json()
    if not data:
        print(f"No data for {symbol}")
        continue

    df_symbol = pd.DataFrame(data)
    df_symbol["symbol"] = symbol
    all_dfs.append(df_symbol)

# Single DataFrame with all earnings
df_earnings = pd.concat(all_dfs, ignore_index=True)
df_earnings = df_earnings.dropna(subset=['epsActual', 'epsEstimated', 'revenueActual','revenueEstimated'])
df_earnings

Now we'll calculate the surprise, both for earnings and revenue in percentage terms, so we can later compare apples with apples! We'll keep everything from 2010 onwards.

df_earnings["eps_surprise"] = ((df_earnings["epsActual"] - df_earnings["epsEstimated"]) /
                               abs(df_earnings["epsEstimated"]) * 100).round(2)

df_earnings["revenue_surprise"] = ((df_earnings["revenueActual"] - df_earnings["revenueEstimated"]) /
                                   abs(df_earnings["revenueEstimated"]) * 100).round(2)

df_earnings = df_earnings[['symbol', 'date', 'eps_surprise', 'revenue_surprise']]

df_earnings["date"] = pd.to_datetime(df_earnings["date"])
df_earnings = df_earnings[df_earnings["date"] > "2009-12-31"]

Lastly, as a final step in gathering the data needed for visualization, using FMP’s Historical Index Full Chart API, we'll loop through the stocks in our dataframe, retrieve the historical daily prices, and calculate the return of the stock 3 and 10 trading days before and after the earnings announcement.

unique_symbols = df_earnings["symbol"].unique()

price_results = []

print(f"Processing {len(unique_symbols)} symbols...")

for symbol in unique_symbols:
    # Fetch full historical prices
    url = f"https://financialmodelingprep.com/stable/historical-price-eod/full"
    params = {"apikey":token, "symbol":symbol, "from":'2009-10-01'}
    resp = requests.get(url, params=params)

    if resp.status_code != 200:
        print(f"Error for {symbol}: {resp.status_code}")
        continue

    data = resp.json()

    hist_df = pd.DataFrame(data)
    hist_df["date"] = pd.to_datetime(hist_df["date"])
    hist_df = hist_df.sort_values("date").reset_index(drop=True)

    # Get matching earnings rows
    earnings_symbol = df_earnings[df_earnings["symbol"] == symbol].copy()

    for _, row in earnings_symbol.iterrows():
        earn_date = pd.to_datetime(row["date"]).date()

        # === 3-DAY WINDOWS ===
        pre3_mask = (hist_df["date"].dt.date < earn_date) & \
                    (hist_df["date"].dt.date >= earn_date - timedelta(days=10))
        pre3 = hist_df[pre3_mask].tail(3)

        post3_mask = (hist_df["date"].dt.date > earn_date) & \
                     (hist_df["date"].dt.date <= earn_date + timedelta(days=10))
        post3 = hist_df[post3_mask].head(3)

        pre3_start = pre3["close"].iloc[0] if len(pre3) >= 3 else None
        pre3_end = pre3["close"].iloc[-1] if len(pre3) >= 1 else None
        post3_end = post3["close"].iloc[-1] if len(post3) >= 3 else None

        pct_pre_3d = ((pre3_end - pre3_start) / pre3_start * 100) if pre3_start and pre3_end else None
        pct_post_3d = ((post3_end - pre3_end) / pre3_end * 100) if pre3_end and post3_end else None

        # === 10-DAY WINDOWS ===
        pre10_mask = (hist_df["date"].dt.date < earn_date) & \
                     (hist_df["date"].dt.date >= earn_date - timedelta(days=20))
        pre10 = hist_df[pre10_mask].tail(10)

        post10_mask = (hist_df["date"].dt.date > earn_date) & \
                      (hist_df["date"].dt.date <= earn_date + timedelta(days=20))
        post10 = hist_df[post10_mask].head(10)

        pre10_start = pre10["close"].iloc[0] if len(pre10) >= 10 else None
        pre10_end = pre10["close"].iloc[-1] if len(pre10) >= 1 else None
        post10_end = post10["close"].iloc[-1] if len(post10) >= 10 else None

        pct_pre_10d = ((pre10_end - pre10_start) / pre10_start * 100) if pre10_start and pre10_end else None
        pct_post_10d = ((post10_end - pre10_end) / pre10_end * 100) if pre10_end and post10_end else None

        price_results.append({
            "symbol": symbol,
            "earn_date": earn_date,
            "month": earn_date.month,
            "pct_pre_3d": round(pct_pre_3d, 2) if pct_pre_3d else None,
            "pct_post_3d": round(pct_post_3d, 2) if pct_post_3d else None,
            "pct_pre_10d": round(pct_pre_10d, 2) if pct_pre_10d else None,
            "pct_post_10d": round(pct_post_10d, 2) if pct_post_10d else None,
            "eps_surprise": row["eps_surprise"],
            "revenue_surprise": row["revenue_surprise"]
        })



df_earnings = pd.DataFrame(price_results)
df_earnings.dropna(inplace=True)
df_earnings = df_universe.merge(df_earnings, on="symbol")
df_earnings

As you can see, at the end of the code, we have also merged the initial dataset, so all the information, such as name, marketCap, and sector, is now in a single dataset.

Storytelling with Charts and Visuals

Sector Heatmap

First, we'll present the Sector Heatmap of average 3-day post-earnings returns segmented by sector and market-cap category. This basic visualisation highlights areas with the most significant reactions, enabling traders to swiftly identify high-alpha sectors and market caps for earnings strategies.

# Aggregate: average post-earnings returns and EPS surprise
agg = (
    df_earnings
    .dropna(subset=['pct_post_3d', 'pct_post_10d', 'eps_surprise', 'marketCap', 'sector'])
    .groupby(['sector', 'marketCap'])
    .agg(
        avg_post3d=('pct_post_3d', 'mean'),
        avg_post10d=('pct_post_10d', 'mean'),
        avg_eps_surprise=('eps_surprise', 'mean')
    )
    .reset_index()
)

# Heatmap: average 3-day post-earnings return
heatmap_3d = agg.pivot(index='sector', columns='marketCap', values='avg_post3d')

plt.figure(figsize=(12, 8))
sns.heatmap(
    heatmap_3d,
    annot=True,
    fmt='.2f',
    cmap='RdYlGn',
    center=0,
    linewidths=0.5,
    linecolor='grey'
)
plt.title('Average 3-Day Post-Earnings Return by Sector and Market-Cap Bucket')
plt.xlabel('Market-cap bucket')
plt.ylabel('Sector')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

Consumer Cyclical and Materials are performing really well, with small and mid caps seeing positive reactions over 1.1%. Real Estate is also doing great, jumping up to +4.0% in mid caps. Energy and Financials are holding steady, staying close to zero. Technology, on the other hand, is showing more muted gains, under 1.1%, indicating there might be limited immediate upside from the big tech earnings.

Building on the 3‑day heatmap, we'll now look at the Sector Heatmap for average 10‑day post‑earnings returns by sector and market‑cap category. This extends the timeframe to capture momentum persistence, revealing which sectors maintain or reverse short‑term reactions.

# Heatmap: average 10-day post-earnings return
heatmap_10d = agg.pivot(index='sector', columns='marketCap', values='avg_post10d')

plt.figure(figsize=(12, 8))
sns.heatmap(
    heatmap_10d,
    annot=True,
    fmt='.2f',
    cmap='RdYlGn',
    center=0,
    linewidths=0.5,
    linecolor='grey'
)
plt.title('Average 10-Day Post-Earnings Return by Sector and Market-Cap Bucket')
plt.xlabel('Market-cap bucket')
plt.ylabel('Sector')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

Consumer Cyclical stands out with peaks at 3.2% (mega caps), and Industrials and Health Care show consistent gains in mid and large caps around 1.1%. Real Estate has eased after its 3-day surge. Technology has seen a small boost in mega caps (+1.8%) but remains less active overall compared to cyclicals.

Mega‑Cap Tech Time Series

Extending the heatmaps, we’ll now look at a Mega-Cap Tech time series. It tracks 10-day post-earnings returns over time for AAPL, MSFT, NVDA, and a few other mega-cap tech names.

A bubble chart works well here because it encodes more than one thing at once. The x-axis is the earnings date, the y-axis is the 10-day post-earnings return, the bubble size scales with the absolute EPS surprise magnitude, and the color shows whether the surprise was a beat or a miss. This makes it easy to spot outlier quarters and see whether big surprises consistently lead to bigger post-earnings moves.

# Define mega-cap tech tickers (top ones from data: AAPL, MSFT, NVDA, AMZN, GOOG/GOOGL, META)
tech_tickers = ['AAPL', 'MSFT', 'NVDA', 'AMZN', 'GOOG', 'GOOGL', 'META']

# Filter data for mega-cap tech
df_tech = (
    df_earnings[df_earnings['symbol'].isin(tech_tickers)]
    .dropna(subset=['earn_date', 'pct_post_10d', 'eps_surprise'])
    .sort_values('earn_date')
    .assign(
        earn_date=lambda x: pd.to_datetime(x['earn_date'])
    )
)

# Create time-series plot: pct_post_10d vs earn_date, sized/color by eps_surprise
plt.figure(figsize=(14, 8))

# Scatter plot
scatter = plt.scatter(
    df_tech['earn_date'],
    df_tech['pct_post_10d'],
    s=np.abs(df_tech['eps_surprise']) * 50 + 20,  # Size by abs(eps_surprise)
    c=df_tech['eps_surprise'],
    cmap='RdYlBu_r',
    alpha=0.7,
    edgecolors='black',
    linewidth=0.5
)

plt.colorbar(scatter, label='EPS Surprise (%)')
plt.xlabel('Earnings Date')
plt.ylabel('10-Day Post-Earnings Return (%)')
plt.title('Mega-Cap Tech: 10-Day Post-Earnings Returns vs Time\n(Point size/color by EPS Surprise)')
plt.grid(True, alpha=0.3)

# Add trend line
z = np.polyfit(pd.to_numeric(df_tech['earn_date']), df_tech['pct_post_10d'], 1)
p = np.poly1d(z)
plt.plot(df_tech['earn_date'], p(pd.to_numeric(df_tech['earn_date'])), "r--", alpha=0.8, linewidth=2, label=f'Trend: {z[0]:.3f}x + {z[1]:.1f}')

plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

That large red bubble around 2018 is almost certainly AAPL’s Q4 2018 earnings miss (Jan 2019 announcement, but fiscal Q4 2018 data) and it stands out because:

Large size = massive EPS surprise magnitude (Apple cut guidance dramatically, ~10% miss)
Red colour = negative surprise
Low Y position = poor 10‑day return (~-10% range visible)

This was Apple’s infamous “iPhone demand warning” that triggered the January 2019 market panic. Perfect example of how one outlier event can anchor the whole trend line downward in your visualisation.

EPS Surprise Scatter Plot

After identifying major tech trends, let's now look at the EPS Surprise Scatter plots. This plot checks a simple hypothesis. Do earnings beats lead to positive returns, and do misses lead to negative returns? We plot EPS surprise on the x-axis and post-earnings returns on the y-axis, then add a regression line to show the average relationship.

# Prepare data: drop NaNs and convert earn_date if needed (not used here)
df_plot = (
    df_earnings
    .dropna(subset=['eps_surprise', 'pct_post_3d', 'pct_post_10d', 'sector'])
    .copy()
)

# 1. Scatter: EPS Surprise vs 3-Day Post-Return, colored by sector
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
sns.scatterplot(
    data=df_plot,
    x='eps_surprise',
    y='pct_post_3d',
    hue='sector',
    alpha=0.6,
    s=40
)

# Regression line (overall)
slope, intercept, r_value, p_value, std_err = stats.linregress(df_plot['eps_surprise'], df_plot['pct_post_3d'])
line = slope * df_plot['eps_surprise'] + intercept
plt.plot(df_plot['eps_surprise'], line, 'red', linestyle='--', linewidth=2,
         label=f'y = {slope:.3f}x + {intercept:.2f}\nR²={r_value**2:.3f}')
plt.xlabel('EPS Surprise (%)')
plt.ylabel('3-Day Post-Earnings Return (%)')
plt.title('EPS Surprise vs 3-Day Post-Return by Sector')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)

# 2. Scatter: EPS Surprise vs 10-Day Post-Return, colored by sector
plt.subplot(1, 2, 2)
sns.scatterplot(
    data=df_plot,
    x='eps_surprise',
    y='pct_post_10d',
    hue='sector',
    alpha=0.6,
    s=40
)

# Regression line (overall)
slope10, intercept10, r_value10, p_value10, std_err10 = stats.linregress(df_plot['eps_surprise'], df_plot['pct_post_10d'])
line10 = slope10 * df_plot['eps_surprise'] + intercept10
plt.plot(df_plot['eps_surprise'], line10, 'red', linestyle='--', linewidth=2,
         label=f'y = {slope10:.3f}x + {intercept10:.2f}\nR²={r_value10**2:.3f}')
plt.xlabel('EPS Surprise (%)')
plt.ylabel('10-Day Post-Earnings Return (%)')
plt.title('EPS Surprise vs 10-Day Post-Return by Sector')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Optional: Summary table of correlations by sector
corr_3d = df_plot.groupby('sector')[['eps_surprise', 'pct_post_3d']].corr().unstack().xs('pct_post_3d', level=1, axis=1)['eps_surprise']
corr_10d = df_plot.groupby('sector')[['eps_surprise', 'pct_post_10d']].corr().unstack().xs('pct_post_10d', level=1, axis=1)['eps_surprise']

corr_df = pd.DataFrame({
    'Corr_EPS_3Day': corr_3d.round(3),
    'Corr_EPS_10Day': corr_10d.round(3)
}).sort_values('Corr_EPS_10Day', ascending=False)

The red dashed trend line illustrates the typical relationship: for every 1% EPS beat, stocks tend to gain about 0.05–0.1% over 3 to 10 days. The gentle slope suggests that while surprises can give a little boost, they don’t guarantee large moves.

You’ll notice that Consumer Cyclical dots mainly cluster in the upper right (beats leading to gains), and Real Estate shows a steeper increase. The wide spread around the line indicates that other factors often influence stock movements beyond surprises.

Return Distribution Violins

Heatmaps show averages, but averages can hide risk. Violin plots show the full distribution of returns, including how wide the outcomes are and whether the tails are heavy. Here we plot 3-day post-earnings return distributions by sector and by market-cap bucket.

# Prepare data
df_plot = (
    df_earnings
    .dropna(subset=['pct_post_3d', 'sector', 'marketCap'])
    .copy()
)

# 1. Violin plot: 3-day post-returns by sector
plt.figure(figsize=(15, 6))

plt.subplot(1, 2, 1)
sns.violinplot(
    data=df_plot,
    x='sector',
    y='pct_post_3d',
    inner='quartile',
    palette='Set2'
)
plt.title('Distribution of 3-Day Post-Earnings Returns by Sector (Violin)')
plt.xlabel('Sector')
plt.ylabel('3-Day Post-Earnings Return (%)')
plt.xticks(rotation=45, ha='right')
plt.grid(True, alpha=0.3)

# 2. Violin plot: 3-day post-returns by market-cap group
plt.subplot(1, 2, 2)
sns.violinplot(
    data=df_plot,
    x='marketCap',
    y='pct_post_3d',
    inner='quartile',
    palette='Set3'
)
plt.title('Distribution of 3-Day Post-Earnings Returns by Market-Cap (Violin)')
plt.xlabel('Market-cap bucket')
plt.ylabel('3-Day Post-Earnings Return (%)')
plt.xticks(rotation=45, ha='right')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()


plt.show()

# Summary statistics table
summary = df_plot.groupby(['sector', 'marketCap'])['pct_post_3d'].agg(['mean', 'median', 'std', 'count']).round(2)
print("Summary Statistics: Mean/Median/Std/Count of 3-Day Returns by Sector & Market-Cap")
print(summary)

All violins concentrate near zero with modest variations (±5%), indicating that post-earnings reactions are generally noisy and lack a clear direction. Markets efficiently incorporate expectations, resulting in little predictable advantage. Consumer Cyclical and Materials sectors display slightly more frequent upside surprises, while small caps exhibit the greatest variability, reflecting higher risk and occasional gains. Not every visualization reveals alpha; this one honestly illustrates the difficulty involved.

Monthly Seasonality

After observing narrow return distributions near zero, let's now look at Monthly Seasonality in four panels: average 3/10‑day post‑returns, EPS surprises, and event counts by month. This reveals calendar effects, systematic seasonal biases , that can influence timing of entries despite noisy individual responses.

# 1. Ensure earn_date is datetime
df_month = (
    df_earnings
    .dropna(subset=['earn_date', 'pct_post_3d', 'pct_post_10d', 'eps_surprise'])
    .copy()
)

df_month['earn_date'] = pd.to_datetime(df_month['earn_date'])

# 2. Derive month number and name
df_month['month_num'] = df_month['earn_date'].dt.month
df_month['month_name'] = df_month['earn_date'].dt.strftime('%b')

# 3. Aggregate averages by month
monthly_agg = (
    df_month
    .groupby('month_num')
    .agg(
        pct_post_3d_mean=('pct_post_3d', 'mean'),
        pct_post_10d_mean=('pct_post_10d', 'mean'),
        eps_surprise_mean=('eps_surprise', 'mean'),
        n_obs=('earn_date', 'count')
    )
    .reset_index()
    .sort_values('month_num')
)

# Keep a stable month order and names
month_order = monthly_agg['month_num'].tolist()
month_labels = df_month.drop_duplicates('month_num').set_index('month_num')['month_name'].reindex(month_order)

monthly_agg['month_name'] = month_labels.values

# 4. Plot bar charts
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Monthly Seasonality of Post-Earnings Returns and EPS Surprise', fontsize=16)

# Avg 3-day return
axes[0, 0].bar(monthly_agg['month_name'], monthly_agg['pct_post_3d_mean'], color='skyblue')
axes[0, 0].set_title('Avg 3-Day Post-Earnings Return by Month')
axes[0, 0].set_ylabel('Return (%)')
axes[0, 0].grid(alpha=0.3)

# Avg 10-day return
axes[0, 1].bar(monthly_agg['month_name'], monthly_agg['pct_post_10d_mean'], color='lightgreen')
axes[0, 1].set_title('Avg 10-Day Post-Earnings Return by Month')
axes[0, 1].set_ylabel('Return (%)')
axes[0, 1].grid(alpha=0.3)

# Avg EPS surprise
axes[1, 0].bar(monthly_agg['month_name'], monthly_agg['eps_surprise_mean'], color='salmon')
axes[1, 0].set_title('Avg EPS Surprise by Month')
axes[1, 0].set_ylabel('EPS Surprise')
axes[1, 0].grid(alpha=0.3)

# Number of observations
axes[1, 1].bar(monthly_agg['month_name'], monthly_agg['n_obs'], color='gold')
axes[1, 1].set_title('Number of Earnings Events by Month')
axes[1, 1].set_ylabel('Count')
axes[1, 1].grid(alpha=0.3)

for ax in axes.ravel():
    ax.set_xlabel('Month')
    ax.tick_params(axis='x', rotation=0)

plt.tight_layout()
plt.show()

Jan/Oct tend to have the best 3‑day returns, about 0.8%, while May/Jul usually see weaker results. The 10‑day trends show a similar but gentler pattern, with February and August reaching peaks. EPS surprises are slightly negative in January and May, possibly due to tough comparisons, and there are fewer events in July, August, and December because of holidays. While there’s a hint of seasonality, its impact is quite small, around 0.5%.

Regime Cross-Section

Finally, after subtle monthly patterns, we'll look at the Regime Cross‑Section: sector 10‑day post‑earnings returns by market regime (heatmap at the top, bars below). This stress‑tests earlier findings ( do patterns persist across bull, bear, and COVID eras), revealing rotation opportunities and regime dependence.

# Prepare data with year extraction
df_regimes = (
    df_earnings
    .dropna(subset=['earn_date', 'pct_post_10d', 'sector'])
    .copy()
)

df_regimes['earn_date'] = pd.to_datetime(df_regimes['earn_date'])
df_regimes['year'] = df_regimes['earn_date'].dt.year

# Define market regimes (adjust years based on your data/market history)
# Example: Bull (2023-2025), Bear/Transition (2022), COVID (2020-2021), etc.
def assign_regime(year):
    if year >= 2023:
        return 'Bull (2023+)'
    elif year == 2022:
        return 'Bear (2022)'
    elif 2020 <= year <= 2021:
        return 'COVID Recovery'
    elif 2018 <= year <= 2019:
        return 'Pre-COVID'
    else:
        return 'Earlier'

df_regimes['market_regime'] = df_regimes['year'].apply(assign_regime)

# 1. Aggregate: average 10-day returns by sector and regime/year
agg_data = (
    df_regimes
    .groupby(['sector', 'market_regime'])['pct_post_10d']
    .agg(['mean', 'count'])
    .reset_index()
    .query('count >= 5')  # Filter low-sample regimes
)

# 2. Visualization: Heatmap first (quick overview)
plt.figure(figsize=(12, 8))

plt.subplot(2, 1, 1)
pivot_heatmap = agg_data.pivot(index='sector', columns='market_regime', values='mean')
sns.heatmap(pivot_heatmap, annot=True, fmt='.2f', cmap='RdYlGn', center=0, linewidths=0.5)
plt.title('Average 10-Day Post-Earnings Returns: Sector x Market Regime Heatmap')

# 3. Bar charts: By regime (stacked by sector)
plt.subplot(2, 1, 2)
regime_order = agg_data.groupby('market_regime')['mean'].mean().sort_values(ascending=False).index
sns.barplot(data=agg_data, x='market_regime', y='mean', hue='sector',
            palette='Set2', order=regime_order)
plt.title('Average 10-Day Returns by Market Regime (Colored by Sector)')
plt.ylabel('10-Day Post-Return (%)')
plt.xlabel('Market Regime')
plt.xticks(rotation=45, ha='right')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

# 5. Summary tables
print("Average Returns by Sector x Market Regime (min 5 obs):")
print(agg_data.pivot(index='sector', columns='market_regime', values='mean').round(2))

# 6. Ranking: Best/worst performing sectors by regime
print("\nTop/Bottom Sectors by Regime:")
for regime in regime_order:
    regime_data = agg_data[agg_data['market_regime'] == regime].sort_values('mean', ascending=False)
    print(f"\n{regime}:")
    print(regime_data[['sector', 'mean', 'count']].round(2).head(3))

Consumer Cyclical does well during Bull (2023+) and COVID Recovery (~~1.5–2%), but it’s less favorable in Bear 2022. Utilities turned negative before COVID. The bottom bars show the COVID era led overall gains (~~1%), with Basic Materials and Industrials being the strongest. The recent Bull remains positive but less so. Sector leadership shifts depending on the market regime , there are no consistent winners.

What Did We Get Out of All This Storyline?

Guiding you through six interconnected visualizations, we’ve turned 15 years of earnings data into a clear and engaging story.

Each chart responds to a specific question, yet together, they paint a bigger picture: earnings surprises influence markets, but not in the same way everywhere. Some sectors, periods, and regimes often provide consistent advantages, while others don’t.

Here’s what the data shows us:

No definitive alpha here, but specific opportunities are present: Markets are mostly efficient, returns hover near zero with weak surprise correlations , yet Consumer Cyclicals and Materials consistently show upside potential across different timeframes and market sizes. Timing your sector choice is important.
Timing windows alter the story: 3-day reactions benefit Real Estate mid-caps (+4%), while 10-day reactions shift leadership to Consumer Cyclical mega-caps (+3.2%). Don’t assume all earnings reactions occur at the same pace.
Mega-tech hype isn’t eternal: The bubble chart shows AAPL/MSFT/NVDA delivered strong returns from 2020–2022, but the falling trend since then indicates waning market enthusiasm. Don’t chase yesterday’s overhyped stocks.
Calendar patterns reward patience: January and October deliver slightly stronger post-earnings returns (~0.8%), while July and August tend to have lower liquidity. Combine seasonal timing with sector choices for additional gains.
Market regimes change winners: Cyclicals underperformed during COVID recovery and the bull run (2023+), while Industrials peaked during the recovery. There are no universal “best performers,” only the best performers for now. Adjust to the regime.
The actionable setup: Small to mid-cap cyclical longs in January during bull markets combine all these signals for maximum conviction , where sector timing, seasonality, and regime alignment converge.

Final Thoughts

This exercise shows why visualization is important in finance: raw tables of returns and surprises wouldn’t reveal these patterns.

Heatmaps instantly highlighted sector winners.
Scatter plots demonstrated the weak surprise‑return connection. Bubble charts narrated the mega‑tech story over time.
Violins unveiled the harsh truth that markets are noisy. Cross‑sectional regime analysis reminded us that yesterday’s approach doesn’t ensure tomorrow’s returns.

The effort to interpret this data pays off: you shift from passive observation to active pattern recognition. You see not just what occurred, but where and when it happened. In trading and analysis, understanding the shape of complexity often surpasses having a perfect formula.

Visual storytelling turns data into intuition . And intuition, based on evidence, outperforms guesswork every time.

How to Build an LLM Market Copilot MVP with LangChain, APIs, and Streamlit

Nikhil Adithyan — Tue, 24 Feb 2026 20:00:00 +0000

In fintech or wealthtech products, people constantly need quick market context. They need to know why a particular stock moved, what changed recently, or what they should watch next.

This usually becomes manual work. Someone pulls recent returns, checks a couple fundamentals, scans headlines, then writes a short Slack or Notion note. It works, but it doesn't scale, and everyone formats it differently.

Dashboards help when the question is predefined. Pure LLM answers are flexible, but they're not something you can trust unless the numbers are tool-backed.

In this handbook, we’ll build a market copilot MVP. Think of it as a lightweight “market note generator” for a single stock.

A stock question could be anything like: “What happened to AAPL over the last 60 days?”, “Is this move unusually risky?”, or “What changed in the news this week?” A market brief is the short, structured write-up you’d paste into Slack. It includes a snapshot, a few key metrics, and a compact interpretation backed by real data.

We'll keep the product logic separate from the UI. The engine lives in copilot.py. It fetches facts through EODHD-backed tools, which are just Python functions that call EODHD endpoints and return small, predictable outputs. The Streamlit app in app.py is only a shell that calls the engine, then renders the brief and the tool-backed metrics side by side.

One quick clarification: When I say “copilot” in this handbook, I’m not referring to GitHub Copilot. This will serve as an in-product assistant that helps generate repeatable market context by calling data tools and writing a brief from those tool outputs.

Prerequisites and Tools

You’ll get the most out of this handbook if you’re comfortable with Python basics and have built at least one small script that calls an API.

Before you start, make sure you have:

Python 3.10+ installed
An EODHD API key
An OpenAI API key
A working local environment (venv or conda is fine)

Tools used in this build:

EODHD. You'll use it as the data layer for end-of-day prices, fundamentals, and news.
OpenAI. You'll use a chat model to write the brief, but only after tools return the underlying facts.
LangChain + LangGraph. You'll use these tools and a ReAct-style agent so the model can decide which data functions to call, then compose a short brief.
Streamlit. You'll use it only as the quickest way to demo the copilot as a clickable product surface.

Prerequisites and Tools
What the MVP Does
- Non-negotiables
Architecture
copilot.py : Build the Engine
Demo Runs (Outside copilot.py )
Build the Streamlit MVP
App Demo
Practical Notes
- Things that will break in real usage
- Small extensions that fit this MVP
Conclusion

What the MVP Does

At a high level, this MVP has one job: to turn a stock question into a short, repeatable market brief.

You give it:

A ticker (like AAPL.US)
A recent window in trading days (like 60 or 120)
A free-form query (what you actually want to know)
Optional parameters that force certain parts to be included, like fundamentals, risk, or headlines

In practice, the query drives the brief. The optional parameters are there for consistency when a team wants a standard format.

Then it returns two things:

A short brief in Markdown with a consistent structure you can read quickly.
A set of tool-backed artifacts, basically the raw metrics the UI can render without re-calling the APIs.

That second output is important. It keeps the app fast and makes the “numbers” auditable.

Non-negotiables

This MVP is designed like a product feature, not a chat demo.

Metrics are tool-first. The model doesn’t guess.
If data is missing, it says so.
No raw price dumps, no giant news lists.
It computes only what the query asked for.
The output reads like an internal note you’d paste into Slack or a weekly memo.

Once you have this pattern working, a few useful things happen.

First, you get consistent briefs that PMs, research, and sales can all reuse. You can also generate weekly market notes faster. And demos become simple. Type a query, get a brief, show the metrics next to it.

Architecture

We’ll keep this simple: two files with two clear responsibilities.

copilot.py – the engine

This file holds everything that actually makes the copilot work:

The EODHD data tools (prices, fundamentals, news, risk)
The agent setup and prompt rules
A single run_brief() function that takes inputs and returns:
the markdown brief
the structured artifacts for the UI

If you want to reuse this copilot anywhere else later, this is the file you keep.

app.py – the MVP shell

This is just the Streamlit layer:

Sidebar inputs (ticker, window, query, optional parameters)
A two-pane layout: left side shows the brief, right side shows tool-backed metrics and headlines

No data logic lives here. It only calls run_brief() and renders what comes back.

Why this split matters

If everything is mixed into one Streamlit script, you’re stuck with Streamlit forever.

With this split, you can replace Streamlit with FastAPI later without rewriting the core logic. You also keep “product logic” in one place, which makes testing and iteration much easier. And you avoid the notebook trap where UI code and data code become impossible to maintain.

copilot.py: Build the Engine

This section is where we build the backend engine. By the end of it, you’ll have a single callable function that takes a query, pulls the required facts using EODHD tools, and returns two things: a Markdown brief for humans, and a structured artifacts dictionary for the UI.

1. Import packages

We’re keeping the stack minimal. The goal is not to show off tooling – it’s to ship something that works and is easy to maintain.


import json
from datetime import datetime, timedelta
from typing import Any, Dict, List, Optional, Tuple
import numpy as np
import pandas as pd
import requests

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent

eodhd_api_key = 'YOUR EODHD API KEY'
openai_api_key = 'YOUR OPENAI API KEY'

Apart from importing the packages, we also define eodhd_api_key and openai_api_key at the top so the file can run as-is. In a real deployment, you’d move these to environment variables.

2. Helper Functions

Before we touch tools or the agent, we add three small helpers. None of them are “AI-related”, but they’re the difference between a demo that works once and a feature that keeps working.


def normalize_ticker(t: str) -> str:
    t = (t or "").strip().upper()
    if not t:
        return t
    if "." in t:
        return t
    return f"{t}.US"

def _safe_json_loads(x: Any) -> Optional[Any]:
    if x is None:
        return None
    if isinstance(x, (dict, list)):
        return x
    if not isinstance(x, str):
        return None
    try:
        return json.loads(x)
    except Exception:
        return None
    
def get_eod_prices_raw(ticker: str, start: str, end: str) -> pd.DataFrame:
    url = f"https://eodhd.com/api/eod/{ticker}"
    params = {"from": start, "to": end, "api_token": eodhd_api_key, "fmt": "json"}
    r = requests.get(url, params=params)
    data = r.json()

    if not isinstance(data, list) or not data:
        return pd.DataFrame(columns=["date", "open", "high", "low", "close", "volume", "ticker"])

    df = pd.DataFrame(data)
    keep = [c for c in ["date", "open", "high", "low", "close", "volume"] if c in df.columns]
    df = df[keep].copy()
    df["ticker"] = ticker
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df = df.dropna(subset=["date", "close"]).sort_values("date").reset_index(drop=True)
    return df

Here’s a brief explanation of the three helper functions in the code:

normalize_ticker() fixes user input. People will type aapl, AAPL, AAPL.US with spaces sometimes. EODHD expects a consistent symbol format. This function forces that consistency before any API call.
_safe_json_loads() is there because when we read tool outputs from the agent messages, the payload might already be a Python dict/list, or it might be a JSON string. This helper lets us handle both without throwing errors.
get_eod_prices_raw() is the base price fetcher. Every tool that needs OHLCV uses this instead of re-writing request + cleaning logic each time. It returns a cleaned DataFrame extracted using EODHD’s end-of-day historical data API, sorted by date, with missing values handled, so the rest of the tools can assume they’re working with sane data.

That’s it. Nothing fancy. It just keeps the rest of the code predictable.

3. Data tools

Before the agent, we need a reliable data layer.

If you’re building this as a product, the tools are your “internal API”. They decide what the copilot can and cannot say. The agent is just calling them and turning their outputs into a brief.

In this MVP, each tool has a narrow job and returns compact outputs. That’s intentional. You want predictable shapes for the UI. You also want to avoid dumping raw data into the model unless you genuinely need it.


@tool
def last_n_days_prices(ticker: str, n: int = 60) -> Dict[str, Any]:
    """
    Quick return window over last N trading days.
    Returns a compact summary. No raw rows.
    """
    ticker = normalize_ticker(ticker)

    end = datetime.utcnow().date().isoformat()
    start = (datetime.utcnow().date() - timedelta(days=240)).isoformat()

    df = get_eod_prices_raw(ticker, start, end)
    if df.empty:
        return {"ticker": ticker, "error": "no_price_data"}

    df = df.tail(int(n)).reset_index(drop=True)
    if df.empty:
        return {"ticker": ticker, "error": "no_price_data"}

    first_close = float(df.loc[0, "close"])
    last_close = float(df.loc[len(df) - 1, "close"])
    total_return = float((last_close / first_close) - 1.0)

    return {
        "ticker": ticker,
        "n": int(n),
        "start_date": str(df.loc[0, "date"].date()),
        "end_date": str(df.loc[len(df) - 1, "date"].date()),
        "first_close": first_close,
        "last_close": last_close,
        "total_return": total_return,
    }

@tool
def fundamentals_snapshot(ticker: str) -> Dict[str, Any]:
    """
    Lightweight fundamentals snapshot.
    Returns a flat dict.
    """
    ticker = normalize_ticker(ticker)

    url = f"https://eodhd.com/api/fundamentals/{ticker}"
    params = {"api_token": eodhd_api_key, "fmt": "json"}
    r = requests.get(url, params=params)
    data = r.json()

    if not isinstance(data, dict) or not data:
        return {"ticker": ticker, "error": "no_data"}

    highlights = data.get("Highlights", {}) or {}
    general = data.get("General", {}) or {}
    valuation = data.get("Valuation", {}) or {}
    technicals = data.get("Technicals", {}) or {}

    return {
        "ticker": ticker,
        "name": general.get("Name"),
        "sector": general.get("Sector"),
        "industry": general.get("Industry"),
        "market_cap": highlights.get("MarketCapitalization"),
        "pe": valuation.get("TrailingPE"),
        "pb": valuation.get("PriceBookMRQ"),
        "profit_margin": highlights.get("ProfitMargin"),
        "dividend_yield": highlights.get("DividendYield"),
        "beta": technicals.get("Beta"),
    }

@tool
def latest_news(ticker: str, limit: int = 5) -> List[Dict[str, Any]]:
    """
    Latest headlines for a ticker.
    Returns a compact list of dicts.
    """
    ticker = normalize_ticker(ticker)

    url = f"https://eodhd.com/api/news"
    params = {"s": ticker, "limit": int(limit), "offset": 0, "api_token": eodhd_api_key, "fmt": "json"}
    r = requests.get(url, params=params)
    data = r.json()

    if not isinstance(data, list) or not data:
        return []

    df = pd.DataFrame(data)
    keep = [c for c in ["date", "title", "link", "source"] if c in df.columns]
    df = df[keep].copy()

    if "date" in df.columns:
        df["date"] = pd.to_datetime(df["date"], errors="coerce")
        df = df.sort_values("date", ascending=False)

    out = df.head(int(limit)).reset_index(drop=True).to_dict(orient="records")
    for row in out:
        dt = row.get("date")
        if isinstance(dt, (pd.Timestamp, datetime)):
            row["date"] = dt.isoformat()
    return out

@tool
def risk_metrics(ticker: str, start: str, end: str) -> Dict[str, Any]:
    """
    Risk metrics from daily close prices over a window.
    volatility_ann: annualized vol from daily returns
    max_drawdown: max drawdown over the window
    """
    ticker = normalize_ticker(ticker)

    df = get_eod_prices_raw(ticker, start, end)
    if df.empty:
        return {"ticker": ticker, "error": "no_price_data"}

    df = df.sort_values("date").reset_index(drop=True)
    df["ret"] = df["close"].pct_change().fillna(0.0)

    vol_ann = float(df["ret"].std(ddof=0) * np.sqrt(252))

    cummax = df["close"].cummax()
    dd = (df["close"] / cummax) - 1.0
    max_dd = float(dd.min())

    first_close = float(df.loc[0, "close"])
    last_close = float(df.loc[len(df) - 1, "close"])
    total_return = float((last_close / first_close) - 1.0)

    return {
        "ticker": ticker,
        "start_date": str(df.loc[0, "date"].date()),
        "end_date": str(df.loc[len(df) - 1, "date"].date()),
        "n": int(len(df)),
        "total_return": total_return,
        "volatility_ann": vol_ann,
        "max_drawdown": max_dd,
    }

@tool
def eod_prices(ticker: str, start: str, end: str) -> List[Dict[str, Any]]:
    """
    Raw OHLCV rows. Use only for custom calculations that cannot be done with other tools.
    """
    ticker = normalize_ticker(ticker)
    df = get_eod_prices_raw(ticker, start, end)
    return json.loads(df.to_json(orient="records"))

Let’s go through the key parts of this code.

1. last_n_days_prices – Price window

Most real requests start with something like: “what happened recently?”

So this tool does one thing: it pulls enough daily bars to safely cover the last N trading days (using a buffer window), then returns a small summary:

start and end dates for the window
first and last close
total return
number of trading days used

It doesn’t return raw rows. That keeps the agent from flooding output, and it keeps the UI fast.

2. fundamentals_snapshot – Fundamentals snapshot

This tool is for quick context. You usually want a rough valuation anchor in the brief, but you don’t want to turn the MVP into a full fundamentals pipeline.

So we’ll keep it simple. It fetches the EODHD fundamentals data API once and extracts a handful of fields that are commonly useful in a brief:

PE, PB
market cap
sector and beta
a couple of optional extras like dividend yield and profit margin

If a field is missing, it just returns None for that field. No guessing.

3. latest_news – Headlines

Price moves without context aren’t helpful.

This tool pulls the latest headlines for a ticker via EODHD Financial News API, sorts them by date when available, and returns a compact list with only what we actually need in the app:

date
title
link
source

We’re not doing sentiment here. The point is simply to ground the brief in real narrative context.

4. risk_metrics – Risk metrics

Sometimes the question isn’t “what happened?”. It’s “how extreme was this move?”

That’s where volatility and drawdown are useful. This tool takes a start and end date, pulls daily closes, then calculates:

annualized volatility from daily returns
max drawdown over the window
and it also returns total return again for the same window, so everything stays consistent

In the product, this tool should only run when the user asks for risk. It’s extra compute and extra API calls.

5. eod_prices – Escape Hatch

This is the tool you keep around for later extensions.

Most of the time, the MVP doesn’t need raw OHLCV rows. But as soon as you want custom metrics (rolling indicators, ATR, custom signals, pattern detection), you’ll need raw bars.

So eod_prices returns the full daily rows as a list of dicts.

The rule is simple: don’t call it unless you have to. It’s heavier, and it’s the easiest way to accidentally blow up token usage or slow down the app.

4. Testing the Data Tools (outside copilot.py)

Before the agent writes anything, you’ll want to know that the data layer is behaving. This isn’t “testing for fun”. It’s a quick sanity check that answers three questions:

Can we fetch data for a normal ticker without errors?
Are the fields we depend on actually present?
Do the outputs look roughly reasonable, so the brief won’t be garbage?

Here’s the exact test block I ran. One call per tool. I printed the key parts and kept the code block small.


print("\n--- last_n_days_prices ---")
out_price = last_n_days_prices.invoke({"ticker": "AAPL.US", "n": 60})
print(out_price)

print("\n--- fundamentals_snapshot ---")
out_fund = fundamentals_snapshot.invoke({"ticker": "AAPL.US"})
print(out_fund)

print("\n--- latest_news ---")
out_news = latest_news.invoke({"ticker": "AAPL.US", "limit": 5})
print(f"news rows: {len(out_news)}")
print(out_news[:2])

print("\n--- risk_metrics ---")
end = datetime.utcnow().date()
start = (end - timedelta(days=180)).isoformat()
end = end.isoformat()
out_risk = risk_metrics.invoke({"ticker": "AAPL.US", "start": start, "end": end})
print(out_risk)

print("\n--- eod_prices (raw rows, small window) ---")
raw_rows = eod_prices.invoke({"ticker": "AAPL.US", "start": "2025-12-01", "end": "2026-01-15"})
print(f"rows: {len(raw_rows)}")
print(raw_rows[:2])

This output is basically confirming that the data layer works.

last_n_days_prices gave you a clean 60 trading day window (2025-10-28 to 2026-01-23) with first close 269.0, last close 248.04, and total return around -7.79%. fundamentals_snapshot also returned the key fields you want for a brief. PE 33.2048, PB 49.4443, market cap ~3.665T, beta 1.093, plus sector and industry.

latest_news returned 5 items in a consistent shape (date, title, link). risk_metrics worked too, but it used a different window (last 180 calendar days became 123 trading days), so its total return (+18.65%) won’t match the 60 day tool, which is why we later force risk metrics to use the same start and end dates as the return window.

eod_prices returned 32 raw rows as expected. The date field shows up as an epoch-style number here, which is fine since this tool is meant for internal calculations, not direct display.

5. Creating the agent

This is where the whole thing becomes a copilot instead of a bunch of loose functions. We define how the agent should behave, give it the only tools it’s allowed to use, then set up a clean way to capture tool outputs for the UI.


system_prompt = (
    "You are a market brief copilot embedded in a product.\n"
    "Rules:\n"
    "1) Use tools for facts. Never invent numbers.\n"
    "2) Do not dump raw price rows or long news lists.\n"
    "3) If the user didn't ask for something, don't compute it.\n"
    "4) Output in clean Markdown with sections.\n"
    "5) Keep it short and useful, like an internal dashboard note.\n"
    "Tool guidance:\n"
    "- Use last_n_days_prices for return windows.\n"
    "- Use fundamentals_snapshot for PE/PB/market cap/sector/beta.\n"
    "- Use latest_news for headlines.\n"
    "- Use risk_metrics only if asked for vol/drawdown.\n"
    "- Use eod_prices only if absolutely required for custom calcs.\n"
)

def _build_agent() -> Any:
    llm = ChatOpenAI(
        model='gpt-5-nano',
        temperature=0,
        api_key=openai_api_key,
    )
    tools = [last_n_days_prices, fundamentals_snapshot, latest_news, risk_metrics, eod_prices]
    return create_react_agent(model=llm, tools=tools)

AGENT = _build_agent()

def _extract_artifacts(messages: List[Any]) -> Dict[str, Any]:
    """
    Pull tool outputs from the LangGraph message list.
    This avoids calling the endpoints twice in Streamlit.
    """
    out: Dict[str, Any] = {}
    for m in messages:
        name = getattr(m, "name", None)
        content = getattr(m, "content", None)

        if not name:
            continue

        payload = _safe_json_loads(content)
        if payload is None:
            continue

        if name.endswith("last_n_days_prices"):
            out["price"] = payload
        elif name.endswith("fundamentals_snapshot"):
            out["valuation"] = payload
        elif name.endswith("risk_metrics"):
            out["risk"] = payload
        elif name.endswith("latest_news"):
            out["headlines"] = payload

    return out

The system prompt is basically a contract. If you don’t spell this out, the agent will eventually drift. It will start guessing numbers, dumping long outputs, or doing work you didn’t ask for. This prompt keeps it in the “internal brief writer” lane, and the tool guidance reduces tool misuse.

_build_agent() is just wiring. One model, a fixed toolset, and a ReAct agent that can decide when to call what. The other important piece here is _extract_artifacts(). We’re not building this just to print a nice paragraph. We also want structured outputs that the UI can render. So instead of calling the endpoints again inside Streamlit, we reuse the tool results that already happened during the agent run.

6. Turning the agent into a callable backend

Up to now, we’ve built tools and an agent. This is the piece that turns it into something your app can call like a regular backend function. One input in, one brief out, plus the structured data you need to render the UI.


def run_brief(
    ticker: str,
    n_days: int = 60,
    include_fundamentals: bool = True,
    include_risk: bool = False,
    include_news: bool = True,
    news_limit: int = 5,
) -> Tuple[str, Dict[str, Any]]:
    """
    Returns:
      - markdown brief (string)
      - artifacts dict with keys like price/valuation/risk/headlines when tools were used
    """
    t = normalize_ticker(ticker)

    request_parts = [
        f"Ticker: {t}.",
        f"Compute total return over the last {int(n_days)} trading days.",
    ]
    if include_fundamentals:
        request_parts.append("Fetch fundamentals and report PE, PB, market cap, sector, beta.")
    if include_risk:
        request_parts.append("Compute annualized volatility and max drawdown over the same window.")
        request_parts.append("Use the same start_date and end_date as the return window.")
    if include_news:
        request_parts.append(f"Pull {int(news_limit)} latest headlines and reference them briefly.")
    request_parts.append(
        "Write a short market brief with sections: Snapshot, Metrics, What it might mean, Caveats."
    )
    request_parts.append("Keep it concise. Do not paste raw rows.")

    user_prompt = " ".join(request_parts)

    response = AGENT.invoke(
        {"messages": [("system", system_prompt), ("user", user_prompt)]}
    )

    messages = response.get("messages", [])
    final_msg = messages[-1]
    brief_md = getattr(final_msg, "content", "") or ""

    artifacts = _extract_artifacts(messages)
    return brief_md, artifacts

The run_brief function is doing two jobs. First, it translates “what the user wants” into a very specific instruction set that keeps the agent on rails. That’s why it builds request_parts instead of passing the user a blank prompt and hoping for the best.

Second, it returns two outputs. brief_md is what you show on the left side of the app. artifacts is what you render on the right side. Those artifacts come from _extract_artifacts(messages), which is just a clean way to reuse the tool outputs that already happened during the run, instead of re-calling EODHD again just to populate the UI.

Demo Runs (Outside copilot.py)

Below are three runs that map to how a PM, founder, or analyst would actually use this in a product. Each demo has a short setup line, the exact code you run, the output, then a tight interpretation tied to what the output actually says.

Demo 1: Baseline brief (return + fundamentals + headlines)

This is the default “give me the situation” request. In the output, you want to see one window, one return, key valuation fields, and a short headline-backed story.


def run_agent(query: str):
    resp = AGENT.invoke({"messages": [("system", system_prompt), ("user", query)]})
    msgs = resp.get("messages", [])
    final = msgs[-1].content if msgs else ""
    print("\n" + "=" * 80)
    print("QUERY:")
    print(query)
    print("\nANSWER:")
    print(final)
    return resp

resp = run_agent(
    "Ticker: AAPL.US. Compute total return over the last 60 trading days. "
    "Fetch fundamentals and report PE, PB, market cap, sector, beta. "
    "Pull 5 latest headlines and reference them briefly. "
    "Write a short market brief with sections: Snapshot, Metrics, What it might mean, Caveats. "
    "Keep it concise. Do not paste raw rows."
)

Output:


================================================================================
QUERY:
Ticker: AAPL.US. Compute total return over the last 60 trading days. Fetch fundamentals and report PE, PB, market cap, sector, beta. Pull 5 latest headlines and reference them briefly. Write a short market brief with sections: Snapshot, Metrics, What it might mean, Caveats. Keep it concise. Do not paste raw rows.

ANSWER:
### Snapshot
- Window: last 60 trading days (2025-10-28 to 2026-01-23)
- Price path: 269.00 → 248.04
- Total return: -7.79%

### Metrics
- Sector: Technology
- Market cap: $3.665T (3,665,126,490,112)
- P/E: 33.20
- P/B: 49.44
- Beta: 1.09

### What it might mean
- The 60-day horizon shows a ~7.8% decline alongside a tech-focused mega-cap backdrop. Elevated P/B suggests high balance-sheet or growth expectations reflected in asset valuation. Macro headlines point to ongoing risk factors (Fed policy, geopolitical/economic headlines) shaping near-term moves.

Headlines reference (brief)
- IWO vs. MGK: How Small-Cap Diversification Compares to Mega-Cap Growth - context on growth tilts and diversification
- Stock Futures Are Falling Ahead of Fed Meeting as Shutdown Fears Rise - macro risk backdrop
- This founder cracked firefighting - now he's creating an AI gold mine - AI/tech narrative
- Dow Jones Futures Fall; Trump Tariffs, Government Shutdown, Big Earnings In Focus - earnings/macroe attention
- SPDR's SPTM Offers Broad Market Reach, While Vanguard's VTV Targets Value Stocks. Which Is the Better Buy? - market breadth/value debate

### Caveats
- Data reflect the latest available snapshot; updates can shift returns, multiples, and fundamentals.
- PB is unusually high here; interpret as market perception of value/assets rather than a strict earnings-based metric.
- This is not investment advice; consider broader context and your risk tolerance.

The core signal is the -7.79% move across a clearly stated window (2025–10–28 to 2026–01–23). The “Metrics” section does what we want for an internal note: it gives sector and a compact valuation context (P/E, P/B, beta, market cap) without turning into a fundamentals report.

The headlines are intentionally brief and more macro-tilted, which is useful if you just need “what’s in the air” rather than a deep Apple-only news breakdown.

Demo 2: Risk-first brief (volatility + drawdown on the same window)

This is the “tell me how ugly it got” workflow. The output should show a return, volatility, and max drawdown all computed on the same dates.


resp = run_agent(
    "Ticker: MSFT.US. Compute total return over the last 90 trading days. "
    "Compute annualized volatility and max drawdown over the same window. "
    "Use the same start_date and end_date as the return window. "
    "Write a short market brief with sections: Snapshot, Metrics, What it might mean, Caveats."
)

Output:


================================================================================
QUERY:
Ticker: MSFT.US. Compute total return over the last 90 trading days. Compute annualized volatility and max drawdown over the same window. Use the same start_date and end_date as the return window. Write a short market brief with sections: Snapshot, Metrics, What it might mean, Caveats.

ANSWER:
## Market Brief - MSFT.US

Snapshot
- Window: 2025-09-16 to 2026-01-23 (90 trading days)
- Start price: 509.04
- End price: 465.95
- Total return (price only): -8.46% over the window (dividends not included)

Metrics
- Annualized volatility: 19.30%
- Max drawdown: -18.07% (peak-to-trough during the window)

What it might mean
- The period shows meaningful downside with elevated volatility, ending the 90-day window in a weaker price regime.
- A roughly 18% drawdown alongside ~19% annualized volatility suggests higher-than-average price swings for this horizon.
- If you're evaluating risk, this window points to downside risk exposure that could matter for near-term positioning.

Caveats
- This total return reflects price movement only; dividends are not included.
- Results depend on the exact window; future performance can differ materially.
- Data as of 2026-01-23; markets and fundamentals can change quickly. If you want, I can add a quick fundamentals snapshot or news context.

The return (-8.46%) tells you direction. The drawdown (-18.07%) tells you the worst pain inside the window, which is usually what stakeholders actually care about when they say “risk”. Volatility (19.30% annualized) gives you a scale for how noisy the ride was. Also notice the window is explicitly pinned (2025–09–16 to 2026–01–23), which is what makes these metrics comparable and repeatable.

Demo 3: News-only “what changed” (no metrics unless required)

This is the “give me context fast” workflow. The output should stay narrative and not sneak in extra metrics since the query explicitly says not to.


resp = run_agent(
    "Ticker: AAPL.US. Pull 7 latest headlines. "
    "Give a short 'what changed' summary in 6-8 lines, referencing themes, not every headline. "
    "Do not compute returns unless needed."
)

Output:


================================================================================
QUERY:
Ticker: AAPL.US. Pull 7 latest headlines. Give a short 'what changed' summary in 6-8 lines, referencing themes, not every headline. Do not compute returns unless needed.

ANSWER:
## AAPL.US – 7 latest headlines: thematic quick read

What changed (themes)
- Macro risk is back in focus: futures drift lower ahead of the Fed meeting, with shutdown fears weighing sentiment.
- Policy risk remains a thread: tariffs and government shutdown headlines keep policy uncertainty front and center.
- Allocation debates persist: small-cap vs mega-cap discussions (IWO vs MGK) drive diversification talk.
- Growth vs value framing broadens: SPTM vs VTV and VOOG vs IWO highlight different exposure bets.
- Earnings season adds collateral volatility alongside macro noise.
- AI narrative gains traction: a founder profile signals growing interest in AI-enabled investment theme

This is doing the right kind of compression. It doesn’t list seven headlines and call it a day. It clusters them into themes (macro, policy, allocation, style drift, earnings, AI narrative). Also important, it respected the constraint. No return or risk metrics were pulled “just because”, which is exactly what you want if this is meant to be a quick context panel inside a product.

Build the Streamlit MVP

At this stage, the goal is not a perfect UI. It's a working product surface you can show to someone on your team.

A notebook is fine when you're the only user. The moment you want feedback from a PM, a founder, or anyone non-technical, you need something they can click through. Streamlit is the fastest way to wrap your copilot function into that kind of experience, without building a frontend stack.

UI Design: Query First, with Optional Parameters

The biggest change in the UI is making the query the primary input. That’s how people actually think. They don’t start with “60 trading days + fundamentals + headlines”. They start with a question.

So the sidebar should lead with a Query box where someone can type something like:

“Give me a 60-day brief on AAPL. Include fundamentals and 5 headlines.”

Then we keep the other controls as optional parameters. These aren’t the “main input”. They’re enforcement knobs. If your team wants every brief to always include fundamentals, you can force that. If you’re doing a risk-focused workflow, you can keep risk always on. If headlines are too noisy for your use case, you can switch them off.

Two-Pane Layout: Brief on the Left, Numbers on the Right

Once you hit “Generate”, you want the output to feel like a product screen, not a chat window.

Left side is the brief. It’s the thing you’d copy into Slack or drop into a weekly memo. It’s narrative and compressed.

Right side is the tool-backed artifacts. That’s where the trust comes from. You can scan the return window, the key fundamentals, the risk metrics, and the headline list without hunting through paragraphs. It also makes it obvious what the model actually pulled from tools versus what it wrote as interpretation.

i. App Skeleton

We’re not building logic here. We’re just defining the outer shell so the app feels like a small product surface instead of a notebook cell.

import streamlit as st
import pandas as pd
from copilot import run_query

st.set_page_config(page_title="Market Brief Copilot", layout="wide")

st.title("Market Brief Copilot")
st.caption("LangChain + EODHD. Minimal internal-style brief, with tool-backed metrics.")

The important line here is from copilot import run_query. This keeps the boundary clean. Streamlit stays a UI layer, and the copilot logic stays in copilot.py. That separation is what makes this reusable later if you decide to wrap the same backend inside FastAPI or a different internal UI.

st.set_page_config(..., layout="wide") is mostly a UX decision. Since we’re going to render a brief on the left and tool-backed metrics on the right, you want the wide layout so the output doesn’t feel cramped.

ii. Inputs Panel

This is the most important part of the UI, because it defines how the copilot is used.

The whole point of moving to a query-first design is that this matches how people actually ask for market context. They don’t think in terms of “checkboxes first”. They think in terms of “here’s my question”. The ticker and window still exist, but only as defaults. They’re there as guardrails when the query doesn’t specify them.

Then we add “Optional parameters” as a forcing layer. This is not for normal usage. This is for teams that want consistency. For example, you might want fundamentals always included in every brief, even if the query forgot to ask. Same for risk, or headlines.

with st.sidebar:
    st.header("Inputs")

    query = st.text_area("Query", value="For AAPL.US, compute total return over the last 60 trading days. Fetch PE and PB. Pull 5 latest headlines. Brief interpretation.")
    default_ticker = st.text_input("Default ticker (used only if query doesn't mention one)", value="AAPL.US")
    default_n_days = st.slider("Default trading days window (used only if query doesn't mention one)", min_value=20, max_value=180, value=60, step=5)

    st.divider()
    
    with st.sidebar.expander("Optional parameters (force include)"):
        include_fund = st.checkbox("Fundamentals (PE, PB, etc.)", value=False)
        include_risk = st.checkbox("Risk metrics (volatility, drawdown)", value=False)
        include_news = st.checkbox("Headlines", value=False)
        news_limit = st.slider("Headline count", min_value=3, max_value=10, value=5, step=1, disabled=not include_news)

    run_btn = st.button("Generate brief", type="primary")

The query text area is the primary input. In the demo, you can literally paste the same kind of prompts you used in the agent test runs. That’s intentional. It keeps the product surface aligned with the real workflows this tool is meant for.

The default_ticker and default_n_days are secondary. They only matter when the query is vague. In a product setting, this matters more than it sounds. People will type “give me a 60-day brief” and forget to mention the ticker because they assume the context is already set. Defaults prevent the whole run from failing.

The expander is where the “team enforcement” idea lives. By keeping it collapsed by default, you’re not cluttering the UI for normal users. But the controls are still there when you want to run a consistent template, like “always include fundamentals and headlines for every brief”.

iii. Metrics Rendering

The brief is useful, but in a product you also need the numbers to be scannable and reusable.

So we treat the output as two layers:

Narrative (the markdown brief).
Structured artifacts (price window, fundamentals, risk, headlines).

The key is. We don’t want Streamlit to call EODHD again just to show metrics. The agent already called the tools once. So we extract those tool outputs from the agent messages and pass them straight to the UI.

Extracting tool outputs inside `copilot.py`

This helper walks through the LangGraph message list and pulls out anything that came from our tools. It gives us a single artifacts dict with consistent keys that the UI can render.

def _extract_artifacts(messages: List[Any]) -> Dict[str, Any]:
    out: Dict[str, Any] = {}
    for m in messages:
        name = getattr(m, "name", None)
        content = getattr(m, "content", None)

        if not name:
            continue

        payload = _safe_json_loads(content)
        if payload is None:
            continue

        if name.endswith("last_n_days_prices"):
            out["price"] = payload
        elif name.endswith("fundamentals_snapshot"):
            out["valuation"] = payload
        elif name.endswith("risk_metrics"):
            out["risk"] = payload
        elif name.endswith("latest_news"):
            out["headlines"] = payload

    return out

This is the bridge between “agent world” and “UI world”. run_query() just calls this at the end and returns both brief_md and artifacts.

Rendering artifacts in `app.py`

On the Streamlit side, we keep rendering logic in one place. _render_metrics() takes the artifacts dict and turns it into a clean right-hand panel.

def _render_metrics(artifacts: dict):
    cols = st.columns(3)

    price = artifacts.get("price")
    valuation = artifacts.get("valuation")
    risk = artifacts.get("risk")
    headlines = artifacts.get("headlines")

    with cols[0]:
        st.subheader("Price window")
        if isinstance(price, dict) and "error" not in price:
            st.metric("Total return", f"{price.get('total_return', 0.0) * 100:.2f}%")
            st.caption(f"{price.get('start_date')} to {price.get('end_date')} . N={price.get('n')}")
            st.write(
                pd.DataFrame([price]).rename(
                    columns={
                        "first_close": "first_close",
                        "last_close": "last_close",
                        "total_return": "total_return (decimal)",
                    }
                ).T
            )
        elif isinstance(price, dict) and "error" in price:
            st.warning(price["error"])
        else:
            st.info("No price tool output (not requested or tool not used).")

    with cols[1]:
        st.subheader("Fundamentals")
        if isinstance(valuation, dict) and "error" not in valuation:
            df = pd.DataFrame([valuation])
            keep = ["ticker", "name", "sector", "market_cap", "pe", "pb", "beta", "dividend_yield", "profit_margin"]
            keep = [c for c in keep if c in df.columns]
            st.write(df[keep].T)
        elif isinstance(valuation, dict) and "error" in valuation:
            st.warning(valuation["error"])
        else:
            st.info("No fundamentals tool output (not requested or tool not used).")

    with cols[2]:
        st.subheader("Risk")
        if isinstance(risk, dict) and "error" not in risk:
            st.metric("Volatility (ann.)", f"{risk.get('volatility_ann', 0.0) * 100:.2f}%")
            st.metric("Max drawdown", f"{risk.get('max_drawdown', 0.0) * 100:.2f}%")
            st.caption(f"{risk.get('start_date')} to {risk.get('end_date')} . N={risk.get('n')}")
            st.write(pd.DataFrame([risk]).T)
        elif isinstance(risk, dict) and "error" in risk:
            st.warning(risk["error"])
        else:
            st.info("No risk tool output (not requested or tool not used).")

    st.subheader("Headlines")
    if isinstance(headlines, list) and len(headlines) > 0:
        for h in headlines:
            title = h.get("title", "Untitled")
            link = h.get("link")
            src = h.get("source")
            dt = h.get("date")
            line = f"- {title}"
            if src:
                line += f" ({src})"
            if dt:
                line += f" . {dt}"
            if link:
                st.markdown(f"{line}  \n  {link}")
            else:
                st.markdown(line)
    else:
        st.info("No headlines tool output (not requested or tool not used).")

This is why the whole app feels “product-ish”. The model can write a brief, but the UI can still show hard numbers in a predictable layout. Also, we’re not re-fetching anything. We’re only rendering what the tools already returned during the agent run.

iv. Wiring the UI to the Engine

At this point, the Streamlit app shouldn’t “think”. It should just collect inputs, call one function, and render whatever comes back.

Originally, copilot.py exposed run_brief(ticker, n_days, …). Once we moved to a query-first UI, that shape stopped making sense. So we updated the backend function to run_query(query, default_ticker, default_n_days, force_..., …). The app stays simple, but the engine becomes flexible enough to handle real product-style prompts.

This is the updated run_query function on copilot.py:

def run_query(
    query: str,
    default_ticker: str = "AAPL.US",
    default_n_days: int = 60,
    force_fundamentals: bool = True,
    force_risk: bool = False,
    force_news: bool = True,
    news_limit: int = 5,
) -> Tuple[str, Dict[str, Any]]:
    
    q = (query or "").strip()

    if not q:
        q = f"For {default_ticker}, compute total return over the last {int(default_n_days)} trading days."

    constraints = [
        "Constraints:",
        "1) Use tools for facts. Never invent numbers.",
        "2) Do not dump raw price rows or long news lists.",
        "3) Output in clean Markdown with sections: Snapshot, Metrics, What it might mean, Caveats.",
        "4) Keep it short and useful.",
        f"5) If the query does not specify a window, assume last {int(default_n_days)} trading days.",
        f"6) If the query does not specify a ticker, assume {normalize_ticker(default_ticker)}.",
    ]

    if force_fundamentals:
        constraints.append("7) You must include fundamentals (PE, PB, market cap, sector, beta). Use fundamentals_snapshot.")
    if force_risk:
        constraints.append("8) You must include risk metrics (annualized volatility and max drawdown). Use risk_metrics.")
        constraints.append("   Use the same start_date and end_date as the return window.")
    if force_news:
        constraints.append(f"9) You must include headlines. Pull exactly {int(news_limit)}. Use latest_news.")

    user_prompt = "User query:\n" + q + "\n\n" + "\n".join(constraints)

    response = AGENT.invoke(
        {"messages": [("system", system_prompt), ("user", user_prompt)]}
    )

    messages = response.get("messages", [])
    final_msg = messages[-1] if messages else None
    brief_md = getattr(final_msg, "content", "") or ""

    artifacts = _extract_artifacts(messages)
    return brief_md, artifacts

Here’s the core wiring inside app.py. It only runs when the user clicks the button.

if run_btn:
    with st.spinner("Running tools and generating brief..."):
        brief_md, artifacts = run_query(
            query=query,
            default_ticker=default_ticker,
            default_n_days=default_n_days,
            force_fundamentals=include_fund,
            force_risk=include_risk,
            force_news=include_news,
            news_limit=news_limit,
        )

    left, right = st.columns([1.2, 1])

    with left:
        st.subheader("Market brief")
        st.markdown(brief_md)

    with right:
        st.subheader("Tool-backed metrics")
        _render_metrics(artifacts)

else:
    st.info("Set inputs on the left and click **Generate brief**.")

The call returns two things, same idea as before. brief_md is the markdown brief you show on the left. artifacts are the tool outputs you render on the right without making extra API calls.

The important change is what the engine now expects. Instead of the UI building a “request_parts” prompt itself, the UI just passes the raw query and the enforcement flags. The enforcement logic lives inside run_query(), not inside Streamlit. That’s a cleaner separation. Your UI can evolve, but the product behavior stays consistent in one place.

App Demo

This section shows a demo of the Streamlit MVP. These are example queries you can paste into the app to validate that the UI, tool calls, and brief output behave the way you expect.

Demo 1. Baseline brief (return + valuation + headlines)

This is the default “tell me what’s going on” query. It forces the copilot to combine price movement, a small fundamentals snapshot, and a few headlines to add context.

Query:

For AAPL.US, compute total return over the last 60 trading days. Fetch PE and PB. Pull 5 latest headlines. Brief interpretation.

%[https://gumlet.tv/watch/6986e2cb4db88a967f4169a0/\]

Demo 2. Risk-first workflow (volatility + drawdown, no news)

This is the “how ugly did it get?” workflow. It’s useful when someone is checking risk exposure or explaining why a position feels painful even if the end-to-end return is not extreme.

Query:

For MSFT.US, last 90 trading days. Compute annualized volatility and max drawdown. Keep it short. No headlines.

%[https://gumlet.tv/watch/6986e4b54db88a967f4190e4/\]

Demo 3. News-only context panel (themes, no extra metrics)

This is the fastest “what changed?” workflow. The point is narrative compression. It should not sneak in returns or risk metrics unless the query genuinely requires it.

Query:

For NVDA.US, pull 7 latest headlines. Summarize what changed in 6–8 lines. Reference themes, not every headline. Don’t compute returns unless needed.

%[https://gumlet.tv/watch/6986e794924a60df4b1298c9/\]

Practical Notes

Things that will break in real usage

People will type messy symbols. Some will type aapl, some will type AAPL, some will paste AAPL.US . If you don’t normalize that upfront, you’ll spend time debugging “random” API failures. That’s why normalize_ticker() exists.

You’ll also hit missing data. Some tickers won’t have news. Some fundamentals fields will be null. Sometimes the price API returns nothing for the window. The tools already return small error objects. The Streamlit UI should surface that as warnings instead of crashing or silently showing blanks.

The biggest silent killer is tool cost. eod_prices is useful, but it’s the easiest way to slow down the app and bloat what the model sees. Keep it as an escape hatch. Default to compact tools like the 60-day summary, fundamentals snapshot, and headline list.

Finally, output drift is real. If you let the agent freestyle, it will start doing extra work and the format will slowly degrade. The fix is boring but effective. Keep the prompt strict, keep the toolset small, and keep the output format consistent.

Small extensions that fit this MVP

A simple next step is multi-ticker compare. Same query pattern, but for two or three tickers, then return a short side-by-side summary.

You can also schedule briefs. Run a daily or weekly query for a watchlist and push the output to Slack or email. The core pattern stays the same.

Caching is another quick win. Cache tool results by (ticker, window) so repeated demos don’t keep hitting the APIs and the UI stays snappy.

If you want this to live inside a real product, wrap run_query() behind a FastAPI endpoint. Streamlit can stay as the demo shell, and your app can call the backend like any other internal service.

Conclusion

At this point, you have a working Market Copilot MVP. It takes a natural-language query, pulls the relevant facts through tools, and returns a short brief plus the underlying metrics that the UI can display. The main win is not the model response – it’s the repeatable workflow and the clean split between the engine and the app.

If you’re building a fintech product, this pattern maps well to a common need. Teams often already have the raw ingredients (like EODHD’s prices, fundamentals, news), but they sit across endpoints and dashboards. A small copilot layer can turn that into a consistent “market note” output that a PM, analyst, or sales team can reuse. It’s also a practical internal demo artifact because the numbers are visible and traceable, not buried behind a chat response.

From here, the best next step is to run it with real internal questions for a week and see what people keep asking for. That will tell you whether to add caching, multi-ticker comparisons, scheduled briefs, or an API wrapper. The MVP is already enough to test that loop without overbuilding.

Nikhil Adithyan - freeCodeCamp.org

How to Find Stock-Specific Moves in the S&P 500 with Python

Table of Contents

Prerequisites

Setting Up: Importing Packages

Building the S&P 500 Universe

Fetching Prices, Volume, and Daily Returns

Calculating Daily Returns

Estimating Rolling Beta and Alpha

Computing the Residual Return

Scoring the Residual With a Drift-Corrected Z-Score

Adding Multi-Day Confirmation

Confirming With Volume

Building the Alpha Investigation Queue

Checking the Story Against the News

Visualizing the Abnormal Movers

Actual vs Expected Return

Top 30 Abnormal Movers by Z-Score

Trailing Abnormal Returns

Conclusions and Ideas for Next Steps

How to Analyze Analyst Estimate Ranges with Python

Table of Contents

Prerequisites

The Data I Needed to Test This

Pulling Analyst Estimates Across A Mixed Universe

Turning Estimate Ranges Into Spread Metrics

First View: Analyst Coverage Does Not Guarantee Agreement

A Few Names Made The Pattern Obvious

What This Changes In A Forecasting Workflow

What I Would Not Overclaim

Final Takeaway: Consensus Has Structure

Geopolitical Risk Isn't One Thing. I Built a Python Framework to Prove It

Table of Contents

Prerequisites

Setup: The Asset Basket and Data Source

The Repricing Sequence Engine

Options Data and IV Skew

Composite Stress Score

News Sentiment

Event 1: Hamas Attack on Israel, Oct 7 2023

Event 2: Yen Carry Unwind, Aug 5 2024

Event 3: US-China Tariff Shock, Apr 2025

Putting It All Together: The Heatmap

Final Thoughts

How to Choose the Best Stock Market API for FinTech Projects and AI Agents

Table of Contents

Why Stock API Choice Depends On The Workflow

1. If You Are Building A Backtester

Start with historical data quality.

2. If You Are Building A Dashboard

Start with freshness and reliability.

3. If You Are Building A Stock Screener

Start with fundamentals and structured fields.

4. If You Are Building A Valuation Or Research Tool

Start with financial statements.

5. If You Are Building An AI Assistant Or Agent

Start with structure.

What A Modern Stock Market Data Workflow Actually Requires

Building A Practical Stock Research Workflow With Alpha Vantage

Step 1: Fetch Adjusted Historical Prices

Step 2: Add Company Or Fundamental Data

Step 3: Add Technical Indicators

Step 4: Combine Everything Into A Research-Ready Table

Step 5: Connect The Workflow To AI Agents With MCP

Where Each Provider Fits In The Stock API Workflow

Provider Breakdown Through A Workflow Lens

1. When The Project Needs Several Data Layers: Alpha Vantage

2. When The Workflow Is Institutional: Bloomberg API

3. When The Product Needs Investor Relations Widgets: QuoteMedia

4. When The Workflow Is Global Historical Research: EODHD

5. When The Workflow Needs US Fundamentals: Intrinio

6. When The Workflow Needs Enterprise Data Delivery: Xignite

Final Checklist Before Choosing A Stock API

Final Thoughts

Beyond NVIDIA: Where the AI Infra Trade Actually Shows Up

Table of Contents

Prerequisites

What We're Investigating

Import the Required Packages

Building the AI Capex Universe

Setting Up `core.py`