Memo 2: Market Models

Introduction

Financial markets generate enormous amounts of data.
The core problem in quantitative finance is simple to state but hard to solve:

How do we turn past information into useful predictions about future prices?

Different mathematical frameworks answer this question in different ways.

Broadly, three perspectives dominate modern quantitative finance:

Linear statistical models – treat returns as weighted sums of observable variables
Nonlinear machine learning models – learn complex patterns directly from data
Stochastic process models – describe price paths as random processes obeying statistical laws

Each framework reflects a different view of markets:

Linear models: relationships are simple and additive.
Machine learning: relationships may be complex, nonlinear, and regime‑dependent.
Stochastic models: prices are fundamentally random but follow stable distributions and dynamics.

Understanding these three perspectives helps connect:

how quantitative trading systems are built,
how machine learning models are trained, and
how derivative pricing and risk models are constructed.

1. Linear Models in Finance

Linear models are the simplest mathematical approach to predicting returns. They assume that future returns can be approximated by adding together the effects of several observable variables.

Typical inputs include:

recent returns
momentum indicators
volatility measures
macroeconomic variables

Each variable contributes some amount to the predicted return.

Linear prediction equation

y = w1*x1 + w2*x2 + ... + wn*xn + b

Where:

y = predicted future return
x = input variables (features)
w = weight assigned to each variable
b = baseline constant

In simple terms:

predicted return = weighted sum of inputs + baseline level

To estimate the model, we choose weights that make the model’s predictions as close as possible to historical returns.

Least squares objective

min Σ (y_i − ŷ_i)²

This means selecting the parameters that minimize prediction error.

Matrix form

y = Xβ + ε

Where:

y = vector of observed returns
X = matrix of input variables
β = vector of coefficients
ε = random noise

The noise term represents the unpredictable component of market movements.

Key Idea
Linear models treat expected returns as a weighted combination of observable variables, plus an irreducible noise term.

In words:

the signal lives in the weighted sum Xβ, and
the unexplained randomness lives in ε.

Common financial examples include:

Autoregressive models:

R_t = α + β1*R_(t−1) + β2*R_(t−2) + ε_t

These models use past returns to estimate whether returns tend to persist or revert.

Factor models:

R_i = α + β1*F1 + β2*F2 + ... + βk*Fk + ε

Where factors might include market, value, size, or momentum.

These models are widely used in:

statistical arbitrage
factor investing
cross-sectional stock ranking
mean reversion strategies

Limitation:

Linear models cannot easily capture regime shifts or nonlinear relationships between variables.

2. Nonlinear Machine Learning Models

Real markets often exhibit nonlinear behavior. For example:

momentum may work only in certain volatility regimes
signals may interact with one another
relationships may change across market environments

Machine learning models are designed to learn these complex patterns directly from data without assuming a simple linear form.

The general prediction function becomes:

y = f(x1, x2, ..., xn)

Where f(·) is an unknown function learned from data.

Decision tree models divide the feature space into regions and assign predictions to each region.

Conceptually this looks like:

IF volatility < threshold
AND momentum > threshold
→ positive expected return

Mathematically:

f(x) = Σ γ_m * I(x ∈ R_m)

Where each region represents a set of conditions in the data.

Modern machine learning models typically combine many trees.

Random Forest:

f(x) = (1/M) Σ T_m(x)

The model averages predictions from many trees built on different samples.

Gradient Boosting:

f(x) = Σ γ_m * h_m(x)

Each new model focuses on correcting errors made by earlier models.

Popular implementations include XGBoost and LightGBM.

These models are widely used for:

alpha factor discovery
market regime classification
alternative data modeling
order flow prediction

Key Idea
Machine learning models approximate an unknown function f(x) that may be highly nonlinear and interaction-heavy, using large historical datasets.

Advantages:

capture nonlinear relationships and interactions between variables
flexible functional form

Disadvantages:

require large datasets
need careful validation to avoid overfitting
can be less interpretable than linear models

At a high level, linear models answer “How much does each variable matter on average?”
Machine learning models answer “What complex combinations of variables explain returns?”

3. Stochastic / Brownian Motion Models

While machine learning focuses on prediction, classical financial theory focuses on how prices evolve through time.

The central idea is that price changes are largely driven by random shocks.
These shocks are modeled using stochastic processes.

The most important example is Brownian motion, which represents a continuous stream of random fluctuations.

You can think of price movements as the accumulation of many small unpredictable shocks.

Stochastic differential equation for prices

dS = μ*S*dt + σ*S*dW

Where:

S = asset price
μ = average growth rate (drift)
σ = volatility
dW = Brownian motion shock

Brownian motion has two key properties:

E[dW] = 0
Var(dW) = dt

Meaning shocks have zero average but variance proportional to time.

Solving this equation produces the Geometric Brownian Motion model:

S(t) = S(0) * exp((μ − σ²/2)t + σW_t)

Under this assumption:

log returns are normally distributed
prices follow a log-normal distribution

This framework forms the mathematical foundation of the Black-Scholes option pricing model.

Interpretation
Brownian motion models price changes as the accumulation of many small random shocks with well-defined statistical properties.

4. Comparison of the Three Frameworks

Assumptions about markets

Linear models
Returns are linear combinations of observable features.

Machine learning models
Returns depend on complex nonlinear interactions in data.

Stochastic models
Prices evolve as random processes with statistical structure.

Interpretability

Linear models: high interpretability
Machine learning: lower interpretability
Stochastic models: interpretable parameters for pricing and risk

Applications

Linear models
Factor investing, statistical arbitrage, mean reversion.

Machine learning
Alpha discovery, regime detection, alternative data.

Stochastic models
Derivative pricing, risk modeling, Monte Carlo simulation.

In simple terms:

Linear models assume markets behave like weighted averages of signals.
Machine learning assumes markets contain complex patterns hidden in data.
Stochastic models assume markets behave like random processes governed by probability.

Decision-tree view: choosing a framework

A useful way to summarize the three perspectives is as a simple decision tree:

Start: What problem are you solving?

  │
  ├── Predict next-period returns on many instruments?
  │     │
  │     ├── Need interpretability and stable relationships?
  │     │       → Use LINEAR MODELS (factor / regression)
  │     │
  │     └── Expect nonlinear interactions and regime effects?
  │             → Use MACHINE LEARNING MODELS (trees / boosting)
  │
  └── Price derivatives or quantify path-dependent risk?
        │
        └── Model price dynamics as a stochastic process
                → Use STOCHASTIC MODELS (e.g., GBM, diffusion models)

This decision tree is not prescriptive, but it highlights that:

forecasting tasks often start with linear or machine learning models, and
pricing and risk tasks often lean on stochastic process models.

Conclusion

Modern quantitative trading rarely relies on just one framework.

Instead, successful systems often combine them.

Linear models provide interpretable baseline signals.

Machine learning models capture complex nonlinear patterns.

Stochastic models underpin derivative pricing, risk management, and portfolio simulation.

Choosing the right framework depends on the problem being solved.

Prediction problems often combine linear and machine learning models.

Pricing and risk management rely heavily on stochastic process models.

Understanding these perspectives provides a foundation for modern quantitative finance.