Quant Memo #2
Memo 2: Market Models
Written by
Priyanshu Priyank
Overview of linear, machine learning, and stochastic process models for understanding and modeling financial markets.
Reading Note
This memo introduces several mathematical frameworks used in quantitative finance. The goal is conceptual understanding rather than full mathematical detail. Equations are included for completeness, but the key ideas can be understood without following every derivation.
Introduction
Financial markets generate enormous amounts of data.
The core problem in quantitative finance is simple to state but hard to solve:
How do we turn past information into useful predictions about future prices?
Different mathematical frameworks answer this question in different ways.
Broadly, three perspectives dominate modern quantitative finance:
- Linear statistical models – treat returns as weighted sums of observable variables
- Nonlinear machine learning models – learn complex patterns directly from data
- Stochastic process models – describe price paths as random processes obeying statistical laws
Each framework reflects a different view of markets:
- Linear models: relationships are simple and additive.
- Machine learning: relationships may be complex, nonlinear, and regime‑dependent.
- Stochastic models: prices are fundamentally random but follow stable distributions and dynamics.
Understanding these three perspectives helps connect:
- how quantitative trading systems are built,
- how machine learning models are trained, and
- how derivative pricing and risk models are constructed.
1. Linear Models in Finance
Linear models are the simplest mathematical approach to predicting returns. They assume that future returns can be approximated by adding together the effects of several observable variables.
Typical inputs include:
- recent returns
- momentum indicators
- volatility measures
- macroeconomic variables
Each variable contributes some amount to the predicted return.
Linear prediction equation
y = w1*x1 + w2*x2 + ... + wn*xn + b
Where:
y= predicted future returnx= input variables (features)w= weight assigned to each variableb= baseline constant
In simple terms:
predicted return = weighted sum of inputs + baseline level
To estimate the model, we choose weights that make the model’s predictions as close as possible to historical returns.
Least squares objective
min Σ (y_i − ŷ_i)²
This means selecting the parameters that minimize prediction error.
Matrix form
y = Xβ + ε
Where:
y= vector of observed returnsX= matrix of input variablesβ= vector of coefficientsε= random noise
The noise term represents the unpredictable component of market movements.
Key Idea
Linear models treat expected returns as a weighted combination of observable variables, plus an irreducible noise term.
In words:
- the signal lives in the weighted sum
Xβ, and - the unexplained randomness lives in
ε.
Common financial examples include:
Autoregressive models:
R_t = α + β1*R_(t−1) + β2*R_(t−2) + ε_t
These models use past returns to estimate whether returns tend to persist or revert.
Factor models:
R_i = α + β1*F1 + β2*F2 + ... + βk*Fk + ε
Where factors might include market, value, size, or momentum.
These models are widely used in:
- statistical arbitrage
- factor investing
- cross-sectional stock ranking
- mean reversion strategies
Limitation:
Linear models cannot easily capture regime shifts or nonlinear relationships between variables.
2. Nonlinear Machine Learning Models
Real markets often exhibit nonlinear behavior. For example:
- momentum may work only in certain volatility regimes
- signals may interact with one another
- relationships may change across market environments
Machine learning models are designed to learn these complex patterns directly from data without assuming a simple linear form.
The general prediction function becomes:
y = f(x1, x2, ..., xn)
Where f(·) is an unknown function learned from data.
Decision tree models divide the feature space into regions and assign predictions to each region.
Conceptually this looks like:
IF volatility < threshold
AND momentum > threshold
→ positive expected return
Mathematically:
f(x) = Σ γ_m * I(x ∈ R_m)
Where each region represents a set of conditions in the data.
Modern machine learning models typically combine many trees.
Random Forest:
f(x) = (1/M) Σ T_m(x)
The model averages predictions from many trees built on different samples.
Gradient Boosting:
f(x) = Σ γ_m * h_m(x)
Each new model focuses on correcting errors made by earlier models.
Popular implementations include XGBoost and LightGBM.
These models are widely used for:
- alpha factor discovery
- market regime classification
- alternative data modeling
- order flow prediction
Key Idea
Machine learning models approximate an unknown functionf(x)that may be highly nonlinear and interaction-heavy, using large historical datasets.
Advantages:
- capture nonlinear relationships and interactions between variables
- flexible functional form
Disadvantages:
- require large datasets
- need careful validation to avoid overfitting
- can be less interpretable than linear models
At a high level, linear models answer “How much does each variable matter on average?”
Machine learning models answer “What complex combinations of variables explain returns?”
3. Stochastic / Brownian Motion Models
While machine learning focuses on prediction, classical financial theory focuses on how prices evolve through time.
The central idea is that price changes are largely driven by random shocks.
These shocks are modeled using stochastic processes.
The most important example is Brownian motion, which represents a continuous stream of random fluctuations.
You can think of price movements as the accumulation of many small unpredictable shocks.
Stochastic differential equation for prices
dS = μ*S*dt + σ*S*dW
Where:
S= asset priceμ= average growth rate (drift)σ= volatilitydW= Brownian motion shock
Brownian motion has two key properties:
E[dW] = 0
Var(dW) = dt
Meaning shocks have zero average but variance proportional to time.
Solving this equation produces the Geometric Brownian Motion model:
S(t) = S(0) * exp((μ − σ²/2)t + σW_t)
Under this assumption:
- log returns are normally distributed
- prices follow a log-normal distribution
This framework forms the mathematical foundation of the Black-Scholes option pricing model.
Interpretation
Brownian motion models price changes as the accumulation of many small random shocks with well-defined statistical properties.
4. Comparison of the Three Frameworks
Assumptions about markets
Linear models
Returns are linear combinations of observable features.
Machine learning models
Returns depend on complex nonlinear interactions in data.
Stochastic models
Prices evolve as random processes with statistical structure.
Interpretability
- Linear models: high interpretability
- Machine learning: lower interpretability
- Stochastic models: interpretable parameters for pricing and risk
Applications
Linear models
Factor investing, statistical arbitrage, mean reversion.
Machine learning
Alpha discovery, regime detection, alternative data.
Stochastic models
Derivative pricing, risk modeling, Monte Carlo simulation.
In simple terms:
- Linear models assume markets behave like weighted averages of signals.
- Machine learning assumes markets contain complex patterns hidden in data.
- Stochastic models assume markets behave like random processes governed by probability.
Decision-tree view: choosing a framework
A useful way to summarize the three perspectives is as a simple decision tree:
Start: What problem are you solving?
│
├── Predict next-period returns on many instruments?
│ │
│ ├── Need interpretability and stable relationships?
│ │ → Use LINEAR MODELS (factor / regression)
│ │
│ └── Expect nonlinear interactions and regime effects?
│ → Use MACHINE LEARNING MODELS (trees / boosting)
│
└── Price derivatives or quantify path-dependent risk?
│
└── Model price dynamics as a stochastic process
→ Use STOCHASTIC MODELS (e.g., GBM, diffusion models)
This decision tree is not prescriptive, but it highlights that:
- forecasting tasks often start with linear or machine learning models, and
- pricing and risk tasks often lean on stochastic process models.
Conclusion
Modern quantitative trading rarely relies on just one framework.
Instead, successful systems often combine them.
Linear models provide interpretable baseline signals.
Machine learning models capture complex nonlinear patterns.
Stochastic models underpin derivative pricing, risk management, and portfolio simulation.
Choosing the right framework depends on the problem being solved.
Prediction problems often combine linear and machine learning models.
Pricing and risk management rely heavily on stochastic process models.
Understanding these perspectives provides a foundation for modern quantitative finance.
Key Takeaway
Modern quantitative finance combines multiple perspectives:
- Linear models provide interpretable signals and factor exposures.
- Machine learning captures nonlinear interactions and complex patterns.
- Stochastic models underpin derivative pricing and risk management.
Different problems require different frameworks.