using Plots
using LinearAlgebra
using Statistics
using Distributions
using RandomRust (1987): Optimal Engine Replacement
1 Introduction
1.1 From McCall to Rust
In the McCall search model, an unemployed worker receives random wage offers each period and faces a binary choice: accept the offer (become employed) or reject it (keep searching). We solved this problem using dynamic programming and the Bellman equation, and the key insight was the existence of a reservation wage—the worker accepts any offer above it and rejects anything below.
Rust (1987) poses a structurally similar question in a very different setting: when should a bus superintendent stop maintaining an engine and replace it? Both problems are dynamic discrete choice problems solved by dynamic programming, but Rust’s contribution goes further—he shows how to estimate the structural parameters of such a model from observed data.
The parallels between the two models are instructive:
| McCall | Rust | |
|---|---|---|
| Agent | Unemployed worker | Bus superintendent |
| State | Wage offer | Accumulated mileage |
| Choice | Accept / Reject | Replace / Keep |
| Tradeoff | Good wage now vs. better offer later | Replacement cost vs. rising maintenance |
1.2 Harold Zurcher
The agent in Rust’s model is Harold Zurcher, the bus maintenance superintendent at the Madison Metropolitan Bus Company. Zurcher manages a fleet of GMC buses, and each period he must decide: replace the engine or keep running it?
Replacing is expensive but resets the engine to new condition. Keeping is cheap today but maintenance costs rise with mileage.
1.3 Why This Paper Matters
Rust (1987, Econometrica) is a foundational paper in structural econometrics. It shows how to estimate a dynamic programming model from observed data and introduced the Nested Fixed Point (NFXP) algorithm. The framework has been applied across economics to study firm entry/exit decisions, investment under uncertainty, health decisions, migration, and education.
1.4 Roadmap
These notes cover four main topics:
- The Environment — states, actions, payoffs, and mileage transitions
- Choices with Extreme Value Shocks — how unobserved shocks lead to logit probabilities
- Solving the DP — Bellman equation and value function iteration
- Estimation — recovering structural parameters from data using NFXP
2 The Environment
2.1 The Agent’s Problem
Zurcher manages a fleet of buses, each characterized by its accumulated mileage \(x_t\). Mileage is discretized into \(S\) bins (e.g., each bin = 5,000 miles). Each period, Zurcher observes \(x_t\) and chooses:
- \(d_t = 0\): Keep the current engine
- \(d_t = 1\): Replace the engine (mileage resets to 0)
2.1.1 Per-Period Payoffs
The per-period payoffs depend on the action chosen. If the agent keeps the engine (\(d_t = 0\)), the payoff is the negative of the maintenance cost:
\[ u(x_t, 0) = -c(x_t, \theta_1) \]
If the agent replaces the engine (\(d_t = 1\)), the payoff is the negative of the replacement cost, and the engine resets to new:
\[ u(x_t, 1) = -RC \]
These are the payoffs the econometrician can compute from data on mileage and costs.
2.1.2 Unobserved Components
In reality, the agent also considers factors we cannot see: weather conditions, driver complaints, parts availability, gut feeling, and so on. We model these as random shocks \(\varepsilon_t(d)\) added to each option:
\[ \tilde{u}(x_t, d) = u(x_t, d) + \varepsilon_t(d) \]
The key assumptions are:
- \(\varepsilon_t(0)\) and \(\varepsilon_t(1)\) are iid across options and time
- Independent of the observed state \(x_t\) (Conditional Independence)
What distribution should we assume for \(\varepsilon\)? This is a modeling choice—and it turns out to matter a lot. We will tackle this in the next section.
2.1.3 The Cost Function
We use a linear specification for the maintenance cost function: \(c(x, \theta_1) = \theta_1 \cdot x\). Higher mileage means higher maintenance costs, and at \(x = 0\) (a new engine) the maintenance cost is zero. The model has two key structural parameters to estimate:
- \(\theta_1\): how fast maintenance costs rise with mileage
- \(RC\): the one-time cost of engine replacement
2.1.4 Mileage Transitions
Each period, mileage increases by a random increment \(\Delta x \in \{0, 1, 2\}\) bins, with probabilities:
\[ \Pr(\Delta x = k) = \theta_{3,k+1} \quad \text{for } k = 0, 1, 2 \]
If the agent keeps the engine (\(d_t = 0\)), the next state is \(x_{t+1} = x_t + \Delta x\). If the agent replaces (\(d_t = 1\)), mileage resets and then accumulates: \(x_{t+1} = 0 + \Delta x\). At the boundary (\(x = S\)), mileage is capped.
2.2 Setup in Julia
We begin by loading the packages we will need:
2.2.1 The RustModel Struct
We bundle all model parameters into a struct using the @kwdef macro, which provides default values for each field. This is the same approach we used with the McCall model:
@kwdef struct RustModel
β::Float64 = 0.95 # discount factor
S::Int = 50 # number of mileage bins
RC::Float64 = 20.0 # replacement cost
θ₁::Float64 = 0.04 # maintenance cost param
θ₃::Vector{Float64} = [0.36, 0.48, 0.16]# transition probs
x̄::Vector{Float64} = Float64.(0:S-1) # mileage states
c::Vector{Float64} = θ₁ * x̄ # cost at each mileage
endRustModel
The struct stores the discount factor \(\beta\), the number of mileage bins \(S\), the replacement cost \(RC\), the maintenance cost parameter \(\theta_1\), the transition probabilities \(\theta_3\), and two derived fields: the vector of mileage states \(\bar{x}\) and the cost at each mileage level \(c\).
2.2.2 Creating an Instance
Calling RustModel() with no arguments creates an instance with all default values. We use Julia’s destructuring syntax to extract the fields into local variables, so we can write β instead of m.β throughout our code:
m = RustModel()
(; x̄, c, S, β, RC, θ₃) = mRustModel(0.95, 50, 20.0, 0.04, [0.36, 0.48, 0.16], [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0 … 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0], [0.0, 0.04, 0.08, 0.12, 0.16, 0.2, 0.24, 0.28, 0.32, 0.36 … 1.6, 1.6400000000000001, 1.68, 1.72, 1.76, 1.8, 1.84, 1.8800000000000001, 1.92, 1.96])
2.2.3 Plotting the Cost Function
With the default parameter \(\theta_1 = 0.04\), let us visualize how maintenance costs increase with mileage:
scatter(x̄, c, label="Maintenance Cost", legend=:topleft)
xlabel!("Mileage State (x)")
ylabel!("Per-Period Cost c(x)")Maintenance costs rise linearly—the engine becomes more expensive to maintain as mileage accumulates. At mileage state 0 (a brand-new engine), the cost is zero. By state 49 (the highest mileage bin), the per-period cost has risen to nearly 2.0. This steady increase in maintenance costs is what eventually makes replacement worthwhile, despite its high upfront cost.
2.2.4 Building Transition Matrices
We need two transition matrices: \(F_0\) for transitions when the agent keeps the engine, and \(F_1\) for transitions when the agent replaces the engine. The function buildTransition constructs both from the transition probabilities \(\theta_3\) and the number of states \(S\):
function buildTransition(θ₃, S)
K = length(θ₃)
F0 = zeros(S, S) # keep: mileage goes up
F1 = zeros(S, S) # replace: mileage resets
for i in 1:S
for k in 1:K
# keep: state i → state i + (k-1)
j_keep = min(i + k - 1, S)
F0[i, j_keep] += θ₃[k]
# replace: any state → state (k)
j_replace = min(k, S)
F1[i, j_replace] += θ₃[k]
end
end
return F0, F1
endbuildTransition (generic function with 1 method)
2.2.5 Computing and Inspecting the Transition Matrices
With our transition probabilities \(\theta_3 = (0.36, 0.48, 0.16)\), we build the two \(S \times S\) transition matrices:
F0, F1 = buildTransition(θ₃, S)([0.36 0.48 … 0.0 0.0; 0.0 0.36 … 0.0 0.0; … ; 0.0 0.0 … 0.36 0.64; 0.0 0.0 … 0.0 1.0], [0.36 0.48 … 0.0 0.0; 0.36 0.48 … 0.0 0.0; … ; 0.36 0.48 … 0.0 0.0; 0.36 0.48 … 0.0 0.0])
Row \(i\) of \(F_0\) gives the distribution of the next state given current state \(i\) and the decision to keep. Row \(i\) of \(F_1\) gives the distribution given replace (which is the same for all \(i\), since replacing resets the engine). Let us inspect a few rows to build intuition:
F0[1, 1:5] # from state 1 (new engine): can go to states 1, 2, or 35-element Vector{Float64}:
0.36
0.48
0.16
0.0
0.0
F1[25, 1:5] # after replacement from state 25: same as starting fresh5-element Vector{Float64}:
0.36
0.48
0.16
0.0
0.0
2.2.6 Transition Matrix Structure
We can visualize the structure of \(F_0\) as a heatmap, where brighter cells indicate higher transition probabilities:
heatmap(F0[1:20, 1:20], xlabel="Next State", ylabel="Current State",
title="Transition Matrix F₀ (Keep)", color=:viridis, yflip=true)The heatmap reveals a characteristic band structure: from any state, the bus can only move to one of three adjacent states (stay, +1, or +2 bins). This is a direct consequence of the transition probabilities \(\theta_3\) allowing increments of 0, 1, or 2. The matrix is sparse—most entries are zero—which reflects the physical reality that mileage cannot jump by large amounts in a single period.
3 Choices with Extreme Value Shocks
3.1 The Static Choice Problem
Before solving the full dynamic programming problem, it is helpful to understand a simpler, static version of the choice problem. An agent chooses between two options with payoffs:
\[ \text{Option 0: } v_0 + \varepsilon_0 \qquad \text{Option 1: } v_1 + \varepsilon_1 \]
Here \(v_0, v_1\) are the observable (deterministic) payoff components, and \(\varepsilon_0, \varepsilon_1\) are unobservable (random) shocks known to the agent but not to the econometrician. The agent picks the option with the highest total payoff:
\[ d^* = \arg\max_d \left\{ v_d + \varepsilon_d \right\} \]
3.1.1 Why Model Unobserved Shocks?
Without shocks, the model predicts deterministic choices—everyone at the same mileage makes the same decision. But in the data, we see buses at similar mileage making different choices. With shocks, the model generates probabilistic choices, which is consistent with the data.
We need to choose a distribution for \(\varepsilon\). This is a modeling assumption that affects both the math and the economics. Rust chose the Type I Extreme Value distribution.
3.1.2 The Expected Maximum
The value of facing this choice (before seeing the shocks) is:
\[ \mathbb{E}\left[\max_d \left\{ v_d + \varepsilon_d \right\}\right] \]
This is the expected payoff from the best option—it matters for the Bellman equation because today’s value depends on what you expect to do optimally tomorrow. Can we compute this in closed form? It depends on the distribution of \(\varepsilon\).
3.2 The Type I Extreme Value Distribution
The Type I Extreme Value (Gumbel) distribution has CDF:
\[ F(\varepsilon) = e^{-e^{-\varepsilon}} \]
Its key properties are: mean \(\gamma \approx 0.5772\) (the Euler–Mascheroni constant), variance \(\pi^2 / 6 \approx 1.645\), and right-skew (extreme positive shocks are possible but rare).
3.2.1 Generating Extreme Value Draws
If \(U \sim \text{Uniform}(0,1)\), then \(\varepsilon = -\log(-\log(U))\) has the Type I EV distribution. We can verify this with simulation:
N = 100_000
ε_draws = -log.(-log.(rand(N)))
println("Sample mean: ", round(mean(ε_draws), digits=3), " (theory: 0.577)")
println("Sample var: ", round(var(ε_draws), digits=3), " (theory: 1.645)")Sample mean: 0.578 (theory: 0.577)
Sample var: 1.646 (theory: 1.645)
3.2.2 Plotting the Distribution
We can overlay the histogram of our simulated draws with the theoretical PDF, \(f(\varepsilon) = \exp(-\varepsilon - \exp(-\varepsilon))\), to confirm the inverse-CDF method is working:
histogram(ε_draws, bins=100, normalize=true, label="Simulated Draws",
alpha=0.7, color=:steelblue)
ε_grid = LinRange(-3, 8, 200)
# f(ε) = exp(-ε - exp(-ε))
pdf_ev = exp.(-ε_grid .- exp.(-ε_grid))
plot!(ε_grid, pdf_ev, linewidth=2, color=:red, label="Theoretical PDF")
xlabel!("ε")
ylabel!("Density")The distribution is right-skewed: most draws cluster near the mean (\(\approx 0.58\)), but the agent occasionally gets a very favorable unobserved shock. This right tail is important economically—it means that even when the deterministic payoffs strongly favor one option, there is always some probability that a large positive shock tips the balance toward the other.
3.3 Two Key Results
The Type I Extreme Value assumption delivers two powerful closed-form results that make the Rust model tractable.
3.3.1 Result 1: The Expected Maximum (Log-Sum-Exp)
If \(\varepsilon_0, \varepsilon_1\) are iid Type I Extreme Value, then:
\[ \mathbb{E}\left[\max\{v_0 + \varepsilon_0,\; v_1 + \varepsilon_1\}\right] = \log\left(e^{v_0} + e^{v_1}\right) + \gamma \]
where \(\gamma \approx 0.5772\) is Euler’s constant.
This extends to \(J\) options: \(\mathbb{E}\left[\max_j\{v_j + \varepsilon_j\}\right] = \log\left(\sum_j e^{v_j}\right) + \gamma\).
3.3.2 Monte Carlo: Verifying the Expected Maximum
Let’s check this result with simulation. We set \(v_0 = 2\) and \(v_1 = 3\):
v0_val = 2.0
v1_val = 3.0
# N extreme value draws
ε0 = -log.(-log.(rand(N)))
ε1 = -log.(-log.(rand(N)))
# simulated max for each draw
max_payoffs = max.(v0_val .+ ε0, v1_val .+ ε1)
# Euler's constant ≈ 0.5772
const γ_euler = Base.MathConstants.eulergamma
theoretical = log(exp(v0_val) + exp(v1_val)) + γ_euler
println("Simulated E[max]: ", round(mean(max_payoffs), digits=4))
println("Theoretical: ", round(theoretical, digits=4))Simulated E[max]: 3.8982
Theoretical: 3.8905
We can wrap this into a reusable function to explore how the expected maximum varies with the gap between \(v_0\) and \(v_1\):
function verifyExpectedMaximum(v0_val, v1_val, N)
#N extreme value draws for ε0 and ε1
ε0 = -log.(-log.(rand(N)))
ε1 = -log.(-log.(rand(N)))
#simulate max payoff for each draw
max_payoffs = max.(v0_val .+ ε0, v1_val .+ ε1)
#compute theoretical expected maximum
theoretical = log(exp(v0_val) + exp(v1_val)) + γ_euler
#return simulated and theoretical expected maximum
return mean(max_payoffs), theoretical
endverifyExpectedMaximum (generic function with 1 method)
3.3.3 Varying the Gap
An important property of the expected maximum is how it responds to the gap \(v_1 - v_0\). When the two options are far apart, the expected maximum is close to the better option (the shocks rarely overturn the ranking). When the options are close, the expected maximum is higher than either option alone—the agent benefits from having the option value of picking whichever shock happens to be favorable.
Δv_grid = LinRange(-4, 4, 50)
Emax_sim = zeros(length(Δv_grid))
Emax_theory = zeros(length(Δv_grid))
for (i, Δv) in enumerate(Δv_grid)
Emax_sim[i], Emax_theory[i] = verifyExpectedMaximum(0.0, Δv, N)
end
plot(Δv_grid, Emax_theory, linewidth=2, label="log(eᵛ⁰ + eᵛ¹) + γ", legend=:topleft)
scatter!(Δv_grid, Emax_sim, label="Monte Carlo", markersize=3)
xlabel!("v₁ - v₀")
ylabel!("E[max{v₀ + ε₀, v₁ + ε₁}]")The formula matches the simulation perfectly across the full range. Notice the smooth, convex shape: the expected maximum is always at least as large as the maximum of the deterministic values, and the gap is largest when the two options are close in value.
3.3.4 Result 2: Choice Probabilities (Logit)
If \(\varepsilon_0, \varepsilon_1\) are iid Type I Extreme Value, then:
\[ P(d = 1) = \Pr(v_1 + \varepsilon_1 > v_0 + \varepsilon_0) = \frac{e^{v_1}}{e^{v_0} + e^{v_1}} = \frac{1}{1 + e^{v_0 - v_1}} \]
This is the logit formula—the same one from discrete choice econometrics. The probability of choosing option 1 depends only on the difference \(v_1 - v_0\).
3.3.5 Monte Carlo: Verifying Choice Probabilities
We can also verify the logit formula by simulation. For each draw, we check which option has the higher total payoff and record the fraction of times option 1 is chosen:
# 1 if option 1 chosen
choices = (v1_val .+ ε1) .> (v0_val .+ ε0)
simulated_prob = mean(choices)
theoretical_prob = exp(v1_val) / (exp(v0_val) + exp(v1_val))
println("Simulated P(d=1): ", round(simulated_prob, digits=4))
println("Theoretical P(d=1): ", round(theoretical_prob, digits=4))Simulated P(d=1): 0.7311
Theoretical P(d=1): 0.7311
With \(v_0 = 2\) and \(v_1 = 3\), option 1 has the higher deterministic value, so \(P(d=1) > 0.5\). But it is not 1.0—the shocks sometimes make option 0 the better choice. This is the essence of the random utility framework: observed characteristics favor one option, but unobserved factors can overturn the ranking.
We can also wrap this into a function and trace out the full logit curve:
function verifyChoiceProbabilities(v0_val, v1_val, N)
ε0 = -log.(-log.(rand(N)))
ε1 = -log.(-log.(rand(N)))
choices = (v1_val .+ ε1) .> (v0_val .+ ε0)
theoretical = exp(v1_val) / (exp(v0_val) + exp(v1_val))
return mean(choices), theoretical
endverifyChoiceProbabilities (generic function with 1 method)
3.3.6 The Logit Curve
We can trace out the full relationship between the deterministic advantage \(v_1 - v_0\) and the choice probability \(P(d = 1)\):
Δv_points = LinRange(-5, 5, 25)
P_mc = zeros(length(Δv_points))
P_theory = zeros(length(Δv_points))
for (i, Δv) in enumerate(Δv_points)
P_mc[i], P_theory[i] = verifyChoiceProbabilities(0.0, Δv, N)
end
plot(Δv_points, P_theory, linewidth=2, label="Logit Formula", legend=:topleft)
scatter!(Δv_points, P_mc, label="Monte Carlo", markersize=5)
xlabel!("v₁ - v₀")
ylabel!("P(d = 1)")The S-shaped logit curve shows a smooth transition from “never choose option 1” (when \(v_1 \ll v_0\)) to “always choose option 1” (when \(v_1 \gg v_0\)). At \(v_1 - v_0 = 0\), the probability is exactly \(0.5\)—the agent is equally likely to pick either option, with the tie broken by the shocks. The smoothness of this curve is a direct consequence of the extreme value distributional assumption; a different distribution for \(\varepsilon\) would yield a different shape.
3.4 From Static Choices to Dynamic Programming
We now have two powerful tools from the extreme value assumption:
- Log-sum-exp: \(\mathbb{E}[\max\{v_0 + \varepsilon_0, v_1 + \varepsilon_1\}] = \log(e^{v_0} + e^{v_1}) + \gamma\)
- Logit: \(P(d=1) = e^{v_1} / (e^{v_0} + e^{v_1})\)
These hold for any values \(v_0, v_1\). Next, we define \(v^{\text{keep}}\) and \(v^{\text{replace}}\) for the Rust model and apply these results to build the Bellman equation.
4 Solving the Dynamic Problem
4.1 The Bellman Equation
4.1.1 Zurcher’s Optimization Problem
Zurcher maximizes expected discounted lifetime utility:
\[ \max_{d_t} \; \mathbb{E} \left[ \sum_{t=0}^{\infty} \beta^t \left( u(x_t, d_t, \theta_1) + \varepsilon_t(d_t) \right) \right] \]
This has the same structure as the McCall model, but the state evolves stochastically regardless of the action (mileage always goes up). As always, the problem can be tackled with a “let” statement: let \(V(x)\) be the value of having a bus in state \(x\) before the shocks \(\varepsilon\) are realized.
4.1.2 Choice-Specific Value Functions
Just like in the McCall model, we define the value of each option. The value of keeping at mileage \(x\) is the flow payoff from maintaining plus the discounted expected continuation value:
\[ v^{\text{keep}}(x) = -c(x, \theta_1) + \beta \sum_{x'} F_0(x, x') \cdot V(x') \]
The value of replacing at mileage \(x\) is the replacement cost plus the discounted expected continuation value starting from a new engine:
\[ v^{\text{replace}}(x) = -RC + \beta \sum_{x'} F_1(x, x') \cdot V(x') \]
Flow payoff today plus discounted continuation—the same structure as McCall’s \(V_{\text{accept}}\) and \(V_{\text{reject}}\). These are the \(v_0\) and \(v_1\) from our extreme value analysis.
4.1.3 Defining \(V(x)\)
With \(v^{\text{keep}}(x)\) and \(v^{\text{replace}}(x)\) defined, the agent at state \(x\) faces exactly the choice we studied in the static problem:
\[ \max\left\{ v^{\text{keep}}(x) + \varepsilon_0, \;\; v^{\text{replace}}(x) + \varepsilon_1 \right\} \]
We define \(V(x)\) as the value of having a bus in state \(x\) before the shocks are realized:
\[ \boxed{V(x) \equiv \mathbb{E}_\varepsilon\left[\max\left\{ v^{\text{keep}}(x) + \varepsilon_0, \;\; v^{\text{replace}}(x) + \varepsilon_1 \right\}\right]} \]
Compare to McCall: \(V_s = \max\{V_{\text{accept},s},\; V_{\text{reject},s}\}\). Here we take an expected max because of the unobserved shocks.
4.1.4 The Bellman Equation
Applying Result 1 (log-sum-exp):
\[ V(x) = \log\left(e^{v^{\text{keep}}(x)} + e^{v^{\text{replace}}(x)}\right) + \gamma \]
The Euler constant \(\gamma\) is just a level shift—dropping it does not affect choices or convergence:
\[ \boxed{V(x) = \log\left(\exp(v^{\text{keep}}(x)) + \exp(v^{\text{replace}}(x))\right)} \]
\(V\) appears on both sides: on the left directly, and on the right inside \(v^{\text{keep}}\) and \(v^{\text{replace}}\). Just like in McCall, we solve by iterating until convergence.
4.1.5 Choice Probabilities
Applying Result 2 (logit), the probability of replacing at mileage \(x\) is:
\[ P(d = 1 \mid x) = \frac{\exp(v^{\text{replace}}(x))}{\exp(v^{\text{keep}}(x)) + \exp(v^{\text{replace}}(x))} = \frac{1}{1 + \exp(v^{\text{keep}}(x) - v^{\text{replace}}(x))} \]
This is the same logit formula we verified with Monte Carlo. The difference: \(v^{\text{keep}}\) and \(v^{\text{replace}}\) incorporate the entire future path of costs and decisions—the agent is forward-looking, not myopic.
4.1.6 Matrix Notation
Using vectors and matrices, the Bellman equation becomes:
\[ v^{\text{keep}} = -c + \beta \, F_0 \, V \] \[ v^{\text{replace}} = -RC \cdot \mathbf{1} + \beta \, F_1 \, V \]
Note that since \(F_1\) has identical rows, \(F_1 \cdot V\) is the same scalar for every \(x\). The value of replacing does not depend on current mileage—you are getting a fresh engine regardless.
4.2 Value Function Iteration
4.2.1 The Bellman Operator
The iterateBellman function takes the current guess \(V\) and returns the updated value \(V'\) from one application of the Bellman equation:
function iterateBellman(m::RustModel, V, F0, F1)
(; β, RC, c) = m
v_keep = -c + β * (F0 * V)
v_replace = (-RC) .+ β * (F1 * V)
vmax = max.(v_keep, v_replace)
# log-sum-exp trick: prevents overflow
V_new = vmax + log.(exp.(v_keep - vmax)
+ exp.(v_replace - vmax))
return V_new
enditerateBellman (generic function with 1 method)
We compute \(\log(e^a + e^b) = m + \log(e^{a-m} + e^{b-m})\) where \(m = \max(a,b)\). This prevents numerical overflow when \(a\) or \(b\) are large.
4.2.2 Iterating to Convergence
Just as in the McCall model, we solve the Bellman equation by repeatedly applying the Bellman operator until convergence. We start from an initial guess of all zeros and iterate until the maximum absolute change between successive value functions falls below our tolerance (here \(10^{-10}\)). The contraction mapping theorem guarantees that this process converges to the unique fixed point regardless of the starting guess:
V = zeros(S) # initial guess
dist = 1.0
niter = 0
while dist > 1e-10
V_new = iterateBellman(m, V, F0, F1)
dist = norm(V - V_new, Inf)
V = V_new
niter += 1
end
println("Converged in $niter iterations")Converged in 455 iterations
4.2.3 The Value Function
With the converged value function in hand, we can visualize how the value of having a bus depends on its mileage:
scatter(x̄, V, label="V(x)", legend=:topright)
xlabel!("Mileage State (x)")
ylabel!("Value")Value is decreasing in mileage—higher mileage means higher costs ahead, whether the agent keeps the engine (paying rising maintenance) or replaces it (paying the fixed cost \(RC\)). However, the decline is not catastrophic: even at high mileage the agent always has the option to replace, which puts a floor on how bad things can get.
4.2.4 Choice-Specific Values
To understand the agent’s decision, we plot the value of each option separately. The value of keeping includes today’s maintenance cost plus the discounted future, while the value of replacing includes the replacement cost plus the discounted future starting fresh:
v_keep = -c + β * (F0 * V)
v_replace = (-RC) .+ β * (F1 * V)
scatter(x̄, v_keep, label="v_keep", legend=:topright)
scatter!(x̄, v_replace, label="v_replace")
xlabel!("Mileage State (x)")
ylabel!("Choice-Specific Value")At low mileage, \(v^{\text{keep}} > v^{\text{replace}}\): the engine is cheap to maintain, so keeping dominates. As mileage rises, \(v^{\text{keep}}\) falls (maintenance costs are increasing) while \(v^{\text{replace}}\) stays nearly flat (replacement always resets you to the same fresh engine). The crossing point is where the agent becomes indifferent between the two options—this is the analog of the reservation wage in the McCall model.
4.2.5 Replacement Probability
Applying the logit formula to the choice-specific values gives us the probability of replacement at each mileage state:
# logit formula
P_rep = 1 ./ (1 .+ exp.(v_keep - v_replace))
scatter(x̄, P_rep, label="P(Replace)", legend=:topleft)
xlabel!("Mileage State (x)")
ylabel!("Probability of Replacement")The S-shaped curve shows a smooth transition from “never replace” at low mileage to “always replace” at high mileage. This is analogous to the reservation wage in McCall, but here it is a reservation mileage. The transition is gradual rather than a sharp cutoff because of the unobserved shocks—even at moderate mileage, a particularly bad unobserved shock can tip the balance toward replacement.
4.2.6 The solveBellman and solveRust Functions
For convenience, we wrap the VFI loop into a reusable function solveBellman that takes a model and transition matrices and returns the converged value function. This modular design will be important later when we need to solve the DP repeatedly inside the estimation routine:
function solveBellman(m::RustModel, F0, F1; tol=1e-10)
V = zeros(m.S)
dist = 1.0
while dist > tol
V_new = iterateBellman(m, V, F0, F1)
dist = norm(V - V_new, Inf)
V = V_new
end
return V
endsolveBellman (generic function with 1 method)
The solveRust function is our main workhorse: given a RustModel, it builds the transition matrices, solves for the value function, and returns a named tuple with the value function V, replacement probabilities P, and the choice-specific values v_keep and v_replace:
function solveRust(m::RustModel)
(; β, RC, c, θ₃, S) = m
F0, F1 = buildTransition(θ₃, S)
V = solveBellman(m, F0, F1)
v_keep = -c + β * (F0 * V)
v_replace = (-RC) .+ β * (F1 * V)
P_rep = 1 ./ (1 .+ exp.(v_keep - v_replace))
return (V=V, P=P_rep, v_keep=v_keep, v_replace=v_replace)
endsolveRust (generic function with 1 method)
We can verify that the replacement probabilities at low mileage are near zero—a brand-new engine should almost never be replaced:
sol = solveRust(m)
# replacement probability at low mileage
sol.P[1:5]5-element Vector{Float64}:
2.0611536181902037e-9
4.202144358547094e-9
8.518414382484044e-9
1.716378823590513e-8
3.436078646097847e-8
4.3 Comparative Statics
4.3.1 Effect of Replacement Cost (\(RC\))
What happens when replacement becomes more or less expensive? A lower \(RC\) makes replacement cheaper, so the agent should be willing to replace at lower mileage. A higher \(RC\) makes the agent tolerate more wear before pulling the trigger:
plot()
for RC_val in [10.0, 15.0, 20.0, 25.0, 30.0]
sol = solveRust(RustModel(RC=RC_val))
scatter!(x̄, sol.P, label="RC = $(Int(RC_val))", markersize=3)
end
xlabel!("Mileage State (x)")
ylabel!("P(Replace)")As expected, higher \(RC\) shifts the replacement probability curve to the right—the agent waits longer before replacing because the upfront cost is harder to justify. This is directly relevant for policy: if a government subsidy reduces \(RC\), we would expect to see earlier replacement and a fleet of newer, cleaner engines.
4.3.2 Effect of Maintenance Cost (\(\theta_1\))
Now consider changes in the rate at which maintenance costs grow. A higher \(\theta_1\) means that each additional unit of mileage adds more to the maintenance bill, making replacement relatively more attractive at every mileage level:
plot()
for θ₁_val in [0.02, 0.03, 0.04, 0.05, 0.06]
sol = solveRust(RustModel(θ₁=θ₁_val))
scatter!(x̄, sol.P, label="θ₁ = $θ₁_val", markersize=3)
end
xlabel!("Mileage State (x)")
ylabel!("P(Replace)")Higher \(\theta_1\) shifts the replacement probability curve to the left—the agent replaces sooner because the cost of continuing to maintain the engine grows more steeply. Conversely, if a technological improvement reduces \(\theta_1\) (e.g., better engine parts), the agent can afford to run the engine longer before replacing.
4.3.3 Comparing to McCall
The comparative statics in the Rust model mirror those in McCall in a satisfying way. Both models feature a forward-looking agent trading off a current cost against future outcomes, and the direction of each effect follows the same economic logic:
| Comparative Static | McCall | Rust |
|---|---|---|
| Higher benefits / lower RC | More selective (higher reservation wage) | Less replacement (higher threshold mileage) |
| Higher firing rate / higher \(\theta_1\) | Less selective (lower reservation wage) | More replacement (lower threshold mileage) |
| Higher \(\beta\) | More selective (patient, wait for good offers) | More replacement (forward-looking, avoid future costs) |
5 Estimation
5.1 What We Observe
We now turn to the central contribution of Rust (1987): showing how to estimate the structural parameters of the model from data. The econometrician observes a panel of buses over time: the mileage state \(x_t\) at each period and the replacement decision \(d_t \in \{0, 1\}\). We do not observe the cost parameters \(\theta_1\) and \(RC\), nor the unobserved shocks \(\varepsilon_t\). The challenge is to infer the cost parameters from the pattern of observed choices.
Recover the structural parameters \((\theta_1, RC, \theta_3)\) from the observed data \((x_t, d_t)\).
5.1.1 The Log-Likelihood Function
The standard approach in structural econometrics is maximum likelihood estimation: find the parameter values that make the observed data most probable. The log-likelihood function sums the log-probabilities of each observed choice:
\[ \ell(\theta) = \sum_{t=1}^{T} \left[ d_t \log P(d_t=1 \mid x_t, \theta) + (1-d_t) \log P(d_t=0 \mid x_t, \theta) \right] \]
Each term measures how well the model’s predicted choice probability matches the actual decision. The choice probability \(P(d=1 \mid x, \theta)\) comes from the logit formula—but requires solving the entire DP to compute. This is the key computational challenge: the likelihood is not a simple closed-form expression, but depends on the solution to a fixed-point problem.
5.1.2 Two-Step Estimation
The transition probabilities \(\theta_3\) can be estimated separately from \((RC, \theta_1)\):
- Step 1: Estimate \(\hat\theta_3\) directly from mileage transitions (just counting)
- Step 2: Given \(\hat\theta_3\), estimate \((RC, \theta_1)\) by maximizing the choice likelihood using NFXP
This works because \(\theta_3\) only affects the transition part of the likelihood, while \((RC, \theta_1)\) only affect the choice part (through the DP solution).
5.2 Simulating Data
Before estimating, we simulate data from a model with known parameters so we can verify that our estimator recovers the truth. The simulateRust function takes the solved replacement probabilities and generates a time series of mileage states and replacement decisions:
function simulateRust(P_rep, θ₃, S, T; seed=42)
Random.seed!(seed)
x_sim = zeros(Int, T)
d_sim = zeros(Int, T)
# start at state 1 (0 mileage)
x_sim[1] = 1
for t in 1:T
# Bernoulli draw
d_sim[t] = rand() < P_rep[x_sim[t]] ? 1 : 0
if t < T
# mileage increment: 0, 1, or 2
Δ = rand(Categorical(θ₃)) - 1
if d_sim[t] == 1
# replace: reset then increment
x_sim[t+1] = min(1 + Δ, S)
else
# keep: current + increment
x_sim[t+1] = min(x_sim[t] + Δ, S)
end
end
end
return x_sim, d_sim
endsimulateRust (generic function with 1 method)
5.2.1 Generating Simulated Data
We generate \(T = 5000\) periods of simulated data using the true model parameters. The replacement rate tells us what fraction of periods involve a replacement—this gives a quick sanity check that the model is behaving reasonably:
m = RustModel()
sol = solveRust(m)
T = 5000
x_sim, d_sim = simulateRust(sol.P, m.θ₃, m.S, T)
println("Replacement rate: ", round(mean(d_sim), digits=3))Replacement rate: 0.022
A replacement rate of a few percent is typical—most periods the bus is kept running, with occasional replacements when mileage gets too high.
5.2.2 Simulated Mileage Path
Plotting the first 300 periods reveals the dynamics of the model:
plot(x̄[x_sim[1:300]], linewidth=1, color=:steelblue, label="Mileage",
title="Simulated Mileage (First 300 Periods)", size=(700, 350))
xlabel!("Period")
ylabel!("Mileage State")The sawtooth pattern is the signature of the Rust model: mileage gradually rises over time as the bus accumulates miles, then drops sharply back to zero when the engine is replaced. The height of each “tooth” varies because of the stochastic mileage increments and the probabilistic replacement rule—sometimes the agent replaces early (due to a bad unobserved shock), and sometimes the engine runs to high mileage before being replaced.
5.2.3 Mileage Distribution
The stationary distribution of mileage across the simulation tells us where buses spend most of their time:
histogram(x̄[x_sim], bins=S, normalize=true, label="Mileage Distribution",
color=:steelblue, alpha=0.7)
xlabel!("Mileage State")
ylabel!("Frequency")The distribution is heavily skewed toward low mileage: buses are frequently reset by replacement, so they spend most of their life at relatively low mileage states. The tail thins out quickly at higher mileage because the replacement probability rises sharply, making it unlikely for any bus to accumulate very high mileage.
5.3 The NFXP Algorithm
5.3.1 Step 1: Estimate Transition Probabilities
The first step is to estimate the transition probabilities \(\theta_3\) directly from the data by counting mileage increments. This is straightforward: we look at every period where we observe a “keep” decision (and the bus is not at the boundary), compute the mileage change, and tally up how often each increment \(\Delta \in \{0, 1, 2\}\) occurs. For replacement periods, we can also infer the increment since we know that mileage resets to 0 before adding \(\Delta\). The estimated probabilities are simply the normalized counts:
counts = zeros(3) # for Δ = 0, 1, 2
for t in 1:T-1
# keep, away from boundary
if d_sim[t] == 0 && x_sim[t] <= S - 2
Δ = x_sim[t+1] - x_sim[t]
counts[Δ + 1] += 1
elseif d_sim[t] == 1
Δ = x_sim[t+1] - 1 # next state = 1 + Δ
counts[Δ + 1] += 1
end
end
θ₃_hat = counts / sum(counts)
println("True θ₃: ", round.(m.θ₃, digits=3))
println("Estimated θ₃: ", round.(θ₃_hat, digits=3))True θ₃: [0.36, 0.48, 0.16]
Estimated θ₃: [0.351, 0.486, 0.163]
5.3.2 Step 2: The NFXP Concept
The second step uses the Nested Fixed Point algorithm. For each candidate \(\theta = (RC, \theta_1)\):
- Inner loop: Solve the DP via VFI to get \(EV_\theta\)
- Compute choice probabilities \(P(d \mid x, \theta)\)
- Evaluate the log-likelihood:
\[ \ell(\theta) = \sum_{t=1}^{T} \left[ d_t \log P(d_t=1 \mid x_t, \theta) + (1-d_t) \log P(d_t=0 \mid x_t, \theta) \right] \]
The outer loop then searches over \(\theta\) to maximize \(\ell(\theta)\).
Every single evaluation of the likelihood requires solving the entire dynamic program. The fixed point (VFI) is nested inside the optimization.
5.3.3 The NFXP Objective Function
The nfxpObjective function implements the inner loop. Given candidate parameters, it constructs a trial model, solves the DP, computes choice probabilities, and evaluates the negative log-likelihood (negative because Optim.jl minimizes by default):
function nfxpObjective(params, x_data, d_data, β, θ₃_hat, S)
# unpack candidate parameters (RC, θ₁)
RC_est, θ₁_est = params
# check parameter bounds
if RC_est < 0 || θ₁_est < 0
return 1e10
end
# build trial model and solve the DP
m_trial = RustModel(β=β, S=S, RC=RC_est,
θ₁=θ₁_est, θ₃=θ₃_hat)
sol = solveRust(m_trial)
# compute log-likelihood from choice probabilities
ll = 0.0
for t in eachindex(d_data)
s = x_data[t]
ll += d_data[t] * log(max(sol.P[s], 1e-15)) +
(1 - d_data[t]) * log(max(1 - sol.P[s], 1e-15))
end
# return negative log-likelihood (for minimization)
return -ll
endnfxpObjective (generic function with 1 method)
5.3.4 Running the Estimation
We use Optim.jl with the Nelder-Mead algorithm to minimize the negative log-likelihood. Nelder-Mead is a derivative-free optimizer, which is useful here because the objective function involves solving a fixed-point problem (VFI) at each evaluation—computing analytical gradients would be cumbersome. The initial guess [15.0, 0.03] places us in a reasonable region of the parameter space:
using Optimresult = optimize(
p -> nfxpObjective(p, x_sim, d_sim, β, θ₃_hat, S),
[15.0, 0.03], # initial guess: [RC, θ₁]
NelderMead(), # derivative-free optimizer
Optim.Options(iterations=5000)
)
RC_hat, θ₁_hat = Optim.minimizer(result)2-element Vector{Float64}:
21.220120311603008
0.04306776468765347
Note what happens at each iteration of the optimizer: it proposes new values of \((RC, \theta_1)\), the nfxpObjective function builds a fresh RustModel, solves the entire Bellman equation via VFI, computes choice probabilities, and evaluates the log-likelihood. This is why the algorithm is called “nested”—there is a full fixed-point computation inside every likelihood evaluation.
5.3.5 Comparing Estimates to Truth
Since we simulated the data from known parameters, we can directly compare the estimates to the truth:
println("────────────────────────────────────────")
println("Parameter │ True │ Estimated")
println("────────────────────────────────────────")
println(" RC │ $(lpad(m.RC, 6)) │ $(lpad(round(RC_hat, digits=3), 6))")
println(" θ₁ │ $(lpad(m.θ₁, 6)) │ $(lpad(round(θ₁_hat, digits=4), 6))")
println("────────────────────────────────────────")────────────────────────────────────────
Parameter │ True │ Estimated
────────────────────────────────────────
RC │ 20.0 │ 21.22
θ₁ │ 0.04 │ 0.0431
────────────────────────────────────────
With \(T = 5000\) observations, the estimates should be close to the true values, though not exact—there is always sampling variation. With more data, the estimates would converge to the truth (the estimator is consistent).
5.3.6 Estimated vs. True Replacement Probability
The ultimate test of our estimation is whether the implied policy function matches the true one. We construct a model using the estimated parameters and compare the predicted replacement probabilities:
# estimated model
m_hat = RustModel(RC=RC_hat, θ₁=θ₁_hat, θ₃=θ₃_hat)
sol_hat = solveRust(m_hat)
scatter(x̄, sol.P, label="True P(Replace)", markersize=4)
scatter!(x̄, sol_hat.P, label="Estimated P(Replace)", markersize=4,
markershape=:diamond)
xlabel!("Mileage State (x)")
ylabel!("P(Replace)")The estimated choice probabilities should closely track the true ones across the full range of mileage states. Any discrepancy reflects estimation error in \(\hat{RC}\) and \(\hat{\theta}_1\), which propagates through the entire DP solution and into the implied choice probabilities.
5.3.7 Why Structural Estimation?
With the estimated structural parameters, we can perform counterfactual analysis:
- What if the government subsidizes replacement (lowers \(RC\))?
- What if a new technology reduces maintenance costs (lowers \(\theta_1\))?
- What if buses are driven more intensively (changes \(\theta_3\))?
A reduced-form model (e.g., a logit on current mileage) cannot answer these questions—it would break under the policy change. The structural model accounts for how agents re-optimize under new conditions.
6 NFXP Convergence Properties
6.1 The Two Loops
The NFXP algorithm nests two optimization problems: an inner loop that solves the dynamic program via value function iteration, and an outer loop that searches over structural parameters to maximize the likelihood. Understanding the convergence properties of each loop is essential for reliable estimation.
6.1.1 Inner Loop: Value Function Iteration
The inner loop solves the Bellman equation for a given set of parameters \((\theta_1, RC)\). The Bellman operator \(T\) is a contraction mapping with modulus \(\beta\)—each application brings the value function at least a fraction \((1-\beta)\) closer to the fixed point. This means that the number of iterations required to reach a tolerance \(\epsilon\) is approximately:
\[ n_{\text{inner}} \approx \frac{\log(\epsilon)}{\log(\beta)} \]
With \(\beta = 0.95\) and a tolerance of \(10^{-10}\), this works out to roughly \(\log(10^{-10}) / \log(0.95) \approx 449\) iterations. Let us verify this by instrumenting our solver to count iterations:
function solveBellmanCounting(m::RustModel, F0, F1; tol=1e-10)
V = zeros(m.S)
dist = 1.0
niter = 0
while dist > tol
V_new = iterateBellman(m, V, F0, F1)
dist = norm(V - V_new, Inf)
V = V_new
niter += 1
end
return V, niter
end
m = RustModel()
F0, F1 = buildTransition(m.θ₃, m.S)
V_sol, n_iter = solveBellmanCounting(m, F0, F1; tol=1e-10)
n_theory = ceil(Int, log(1e-10) / log(m.β))
println("Actual inner iterations: $n_iter")
println("Theoretical approximation: $n_theory")Actual inner iterations: 455
Theoretical approximation: 449
The theoretical formula gives a good approximation. The exact count may differ slightly because the formula assumes worst-case contraction and a specific initial distance, but the order of magnitude is reliable.
6.1.2 Effect of Tolerance on Inner Iterations
We can trace out how the number of inner iterations changes with the tolerance level. Tighter tolerances require more iterations, but the relationship is logarithmic—going from \(10^{-6}\) to \(10^{-10}\) (four more orders of magnitude) only adds a modest number of iterations:
tols = [1e-4, 1e-6, 1e-8, 1e-10, 1e-12]
iters_by_tol = Int[]
for tol in tols
_, ni = solveBellmanCounting(m, F0, F1; tol=tol)
push!(iters_by_tol, ni)
end
scatter(log10.(tols), iters_by_tol, label="Inner Iterations",
markersize=6, legend=:topright)
xlabel!("log₁₀(tolerance)")
ylabel!("Number of VFI Iterations")6.1.3 Outer Loop: Maximum Likelihood
The outer loop uses Nelder-Mead (a derivative-free simplex method) to search over \((RC, \theta_1)\). Each evaluation of the objective function requires a full inner-loop solve. The total computational cost is therefore:
\[ \text{Total cost} \approx n_{\text{outer}} \times n_{\text{inner}} \times S \]
where \(S\) is the number of states. With \(n_{\text{outer}} \approx 100\)–\(500\) evaluations (typical for Nelder-Mead), \(n_{\text{inner}} \approx 450\) iterations, and \(S = 50\) states, the total number of Bellman operator applications is on the order of millions. This is manageable for a small model but can become prohibitive with larger state spaces.
6.1.4 Tolerance and Estimation Accuracy
A critical question is whether the inner-loop tolerance affects the quality of the estimates. If the tolerance is too loose, the value function is not fully converged, the implied choice probabilities contain errors, and the likelihood surface becomes noisy. The outer optimizer may then fail to converge or converge to the wrong point. Let us investigate by running the estimation with different inner-loop tolerances:
function nfxpObjectiveTol(params, x_data, d_data, β, θ₃_hat, S; tol=1e-10)
RC_est, θ₁_est = params
if RC_est < 0 || θ₁_est < 0
return 1e10
end
m_trial = RustModel(β=β, S=S, RC=RC_est, θ₁=θ₁_est, θ₃=θ₃_hat)
F0_t, F1_t = buildTransition(m_trial.θ₃, m_trial.S)
V_t = solveBellman(m_trial, F0_t, F1_t; tol=tol)
v_keep = -m_trial.c + β * (F0_t * V_t)
v_replace = (-RC_est) .+ β * (F1_t * V_t)
P_t = 1 ./ (1 .+ exp.(v_keep - v_replace))
ll = 0.0
for t in eachindex(d_data)
s = x_data[t]
ll += d_data[t] * log(max(P_t[s], 1e-15)) +
(1 - d_data[t]) * log(max(1 - P_t[s], 1e-15))
end
return -ll
endnfxpObjectiveTol (generic function with 1 method)
println("─────────────────────────────────────────────────────────")
println(" Tolerance │ RC_hat │ θ₁_hat │ Converged?")
println("─────────────────────────────────────────────────────────")
for tol in [1e-6, 1e-8, 1e-10]
res = optimize(
p -> nfxpObjectiveTol(p, x_sim, d_sim, β, θ₃_hat, S; tol=tol),
[15.0, 0.03],
NelderMead(),
Optim.Options(iterations=5000)
)
RC_t, θ₁_t = Optim.minimizer(res)
conv = Optim.converged(res) ? "Yes" : "No"
println(" 1e-$(Int(-log10(tol))) │ $(lpad(round(RC_t, digits=4), 8)) │ $(lpad(round(θ₁_t, digits=5), 9)) │ $conv")
end
println("─────────────────────────────────────────────────────────")
println(" True │ $(lpad(m.RC, 8)) │ $(lpad(m.θ₁, 9)) │")
println("─────────────────────────────────────────────────────────")─────────────────────────────────────────────────────────
Tolerance │ RC_hat │ θ₁_hat │ Converged?
─────────────────────────────────────────────────────────
1e-6 │ 21.2201 │ 0.04307 │ Yes
1e-8 │ 21.2201 │ 0.04307 │ Yes
1e-10 │ 21.2201 │ 0.04307 │ Yes
─────────────────────────────────────────────────────────
True │ 20.0 │ 0.04 │
─────────────────────────────────────────────────────────
The results should show that the estimates stabilize as the tolerance tightens. With \(10^{-6}\), the estimates may exhibit noticeable error; by \(10^{-10}\), the estimates are essentially identical to what a tighter tolerance would give. The practical recommendation is to use at least \(10^{-8}\) for the inner loop.
If the inner-loop tolerance is too loose, the computed value function is not fully converged. This introduces small errors into the choice probabilities, which make the likelihood surface noisy and non-smooth. The outer optimizer—which relies on the likelihood surface being well-behaved—may fail to converge or converge to the wrong point. Always ensure that the inner loop is solved to high precision relative to the outer loop’s convergence criterion.
An alternative to NFXP is the Mathematical Programming with Equilibrium Constraints (MPEC) approach. Instead of nesting VFI inside MLE, MPEC formulates the problem as a single constrained optimization: maximize the likelihood subject to the constraint that the Bellman equation holds at the solution. This avoids repeated inner-loop solves and can be faster for large models. The tradeoff is that the constrained optimization problem is much larger (parameters + value function as variables) and requires a good constrained optimizer.
7 Identification
7.1 What Does Identification Mean?
Before estimating a model, we must ask whether the parameters can be recovered from the data in principle—even with an infinite sample. A parameter is identified if different values of that parameter produce different observable implications. If two different parameter vectors generate exactly the same data distribution, the data cannot distinguish them, and the parameters are not identified.
In the Rust model, we need to establish that the structural parameters \((\theta_1, RC, \theta_3)\) are identified from the observables \((x_t, d_t)\)—that is, from the sequence of mileage states and replacement decisions.
Without identification, no amount of data can pin down the structural parameters. The optimizer may find many parameter combinations that fit the data equally well, and counterfactual predictions would be meaningless because they depend on which (arbitrary) combination we happen to find.
7.2 Transition Probabilities \(\theta_3\)
The transition probabilities \(\theta_3 = (\theta_{3,1}, \theta_{3,2}, \theta_{3,3})\) are the easiest to identify. They govern the distribution of mileage increments \(\Delta x \in \{0, 1, 2\}\), which we observe directly in the data. Given any “keep” period where the bus is not at the boundary, we see the mileage change and can count how often each increment occurs. With enough observations, the sample frequencies converge to the true probabilities by the law of large numbers. No model solution is needed—\(\theta_3\) is identified nonparametrically from the transition data.
7.3 Cost Parameters \((RC, \theta_1)\)
Identifying the cost parameters is more subtle because we never observe costs directly—we only observe binary choices (replace or keep). The identification argument works through the mapping from parameters to choice probabilities.
The replacement probability at each mileage state \(x\) is:
\[ P(\text{replace} \mid x) = \frac{1}{1 + \exp(v^{\text{keep}}(x) - v^{\text{replace}}(x))} \]
Both \(v^{\text{keep}}(x)\) and \(v^{\text{replace}}(x)\) depend on \((RC, \theta_1)\) through the Bellman equation. The key identification argument is that different \((RC, \theta_1)\) pairs produce different replacement probability curves \(P(\text{replace} \mid x)\). Since the choice probabilities are observable (we can estimate them from the data), and since the mapping from parameters to choice probabilities is one-to-one, the parameters are identified.
7.3.1 The Role of \(RC\) and \(\theta_1\)
The two parameters play distinct roles in shaping the replacement probability curve:
- Replacement cost \(RC\): shifts the entire curve horizontally. A higher \(RC\) makes replacement more expensive at every mileage level, pushing the curve to the right (the agent waits longer before replacing).
- Maintenance cost slope \(\theta_1\): affects the steepness of the curve. A higher \(\theta_1\) means maintenance costs rise faster with mileage, making the transition from “never replace” to “always replace” sharper.
Because \(RC\) and \(\theta_1\) affect different features of the replacement probability curve (location vs. steepness), they are separately identified. Let us verify this visually by plotting replacement probability curves for several \((RC, \theta_1)\) combinations:
plot(legend=:topleft, xlabel="Mileage State (x)",
ylabel="P(Replace)", title="Identification: Different Parameters → Different Curves")
param_combos = [
(RC=15.0, θ₁=0.04, label="RC=15, θ₁=0.04"),
(RC=25.0, θ₁=0.04, label="RC=25, θ₁=0.04"),
(RC=20.0, θ₁=0.03, label="RC=20, θ₁=0.03"),
(RC=20.0, θ₁=0.05, label="RC=20, θ₁=0.05"),
]
for pc in param_combos
sol_id = solveRust(RustModel(RC=pc.RC, θ₁=pc.θ₁))
plot!(x̄, sol_id.P, linewidth=2, label=pc.label)
endEach parameter combination produces a visually distinct curve. The two curves with \(\theta_1 = 0.04\) but different \(RC\) values are horizontally shifted versions of each other. The two curves with \(RC = 20\) but different \(\theta_1\) values differ in steepness. This is the graphical manifestation of identification: an econometrician observing the replacement probability curve (estimated from data) can distinguish which parameters generated it.
7.3.2 The Normalization Issue
In discrete choice models, only differences in utility matter. Adding a constant \(k\) to all payoffs does not change the agent’s behavior—if option A was better than option B before, it is still better after adding \(k\) to both. This means the level of the utility function is not identified from choice data.
In the Rust model, this is handled by the structure of the payoffs. The replacement payoff \(u(x, 1) = -RC\) does not depend on mileage—replacement gives a fresh engine regardless of the current state. The keep payoff \(u(x, 0) = -\theta_1 x\) depends on mileage. The difference \(u(x, 0) - u(x, 1) = RC - \theta_1 x\) is what drives the choice, and both \(RC\) and \(\theta_1\) appear in this difference. This means we do not need an additional normalization beyond what the model structure already provides.
However, note that \(\beta\) (the discount factor) is typically not estimated but rather fixed at a known value (here \(\beta = 0.95\)). This is because \(\beta\) is weakly identified—many combinations of \((\beta, RC, \theta_1)\) can produce similar choice probabilities. Fixing \(\beta\) resolves this issue and is standard practice in the literature.
8 Counterfactual Policy Analysis
8.1 Why Structural Models?
The central motivation for structural estimation is the ability to perform counterfactual analysis—predicting how agents would behave under conditions that have never been observed. A reduced-form model (for example, a logit regression of replacement on current mileage) can describe the pattern in the data, but it cannot predict what would happen if a policy changed the economic environment. This is because the reduced-form coefficients are functions of the structural parameters and the policy environment—when the environment changes, the coefficients change too.
Robert Lucas (1976) argued that econometric models estimated under one policy regime cannot be used to evaluate alternative policies, because the estimated parameters are not “structural”—they depend on the policy itself. In the Rust model, a logit regression of replacement on mileage would capture the current relationship, but this relationship changes when costs change because agents re-optimize. The structural model, by contrast, estimates parameters that are invariant to policy (\(\theta_1\) is a technology parameter, \(RC\) is a physical cost), so it remains valid under counterfactual scenarios.
We now demonstrate this advantage with two policy experiments, using our estimated model as the baseline.
8.2 Example 1: Replacement Subsidy
Suppose the government introduces a subsidy that covers 50% of the engine replacement cost, perhaps to encourage cleaner, newer engines in public transit fleets. Under this policy, the effective replacement cost falls from \(RC\) to \(RC/2\):
# Baseline model (true parameters)
m_base = RustModel()
sol_base = solveRust(m_base)
# Counterfactual: 50% replacement subsidy
m_subsidy = RustModel(RC = m_base.RC / 2)
sol_subsidy = solveRust(m_subsidy)(V = [-8.645993922810403, -9.217824810837758, -9.774876884590402, -10.316219027296103, -10.84087830835213, -11.347847736377693, -11.83609913210201, -12.304602612042613, -12.752353981995665, -13.1784106975836 … -17.984502696900503, -18.017826077585855, -18.04910365538607, -18.07848973071031, -18.10611394558626, -18.1320704956432, -18.156372301956424, -18.178888106276126, -18.19876638606832, -18.214949562036004], P = [4.5397868702434395e-5, 8.042266561520066e-5, 0.00014037931067170743, 0.00024121538873677655, 0.00040762577876720475, 0.0006767615347186445, 0.0011027588544250957, 0.0017617692667621465, 0.0027567982149082097, 0.004221230422472427 … 0.516057737794833, 0.5335442631922888, 0.5504959574158075, 0.5669129065622586, 0.5827917409303051, 0.5981170394329958, 0.6128304206388532, 0.6267853036345465, 0.6393693777669371, 0.6498005822688334], v_keep = [-8.646039321800917, -9.217905236828742, -9.775017273846464, -10.316460271873245, -10.841286017324162, -11.348524727110163, -11.83720249953366, -12.306365935141303, -12.755114587268368, -13.182640862642552 … -18.710292369337804, -18.780418224458376, -18.848714087075045, -18.915306162428024, -18.980283705301176, -19.043664871356075, -19.105264794413397, -19.16448953782306, -19.218667437753254, -19.264202083934205], v_replace = [-18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917 … -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917, -18.646039321800917])
With lower replacement costs, the agent should be willing to replace at lower mileage. Let us compare the replacement probability curves:
plot(x̄, sol_base.P, linewidth=2, label="Baseline (RC = $(m_base.RC))",
legend=:topleft, xlabel="Mileage State (x)", ylabel="P(Replace)")
plot!(x̄, sol_subsidy.P, linewidth=2, label="Subsidy (RC = $(m_base.RC/2))",
linestyle=:dash)The subsidy shifts the replacement probability curve sharply to the left—the agent now replaces at much lower mileage. This is the economically intuitive result: when replacement is cheaper, the agent is less willing to tolerate the rising maintenance costs of an aging engine.
8.2.1 Simulating the Policy Effect
To quantify the impact, we simulate bus operations under both regimes and compare average mileage at replacement:
function avgMileageAtReplacement(P_rep, θ₃, S; T=50_000, seed=123)
Random.seed!(seed)
replace_miles = Float64[]
x = 1
for t in 1:T
if rand() < P_rep[x]
push!(replace_miles, x̄[x])
Δ = rand(Categorical(θ₃)) - 1
x = min(1 + Δ, S)
else
Δ = rand(Categorical(θ₃)) - 1
x = min(x + Δ, S)
end
end
return mean(replace_miles), length(replace_miles) / T
end
avg_mile_base, rate_base = avgMileageAtReplacement(sol_base.P, m_base.θ₃, m_base.S)
avg_mile_sub, rate_sub = avgMileageAtReplacement(sol_subsidy.P, m_subsidy.θ₃, m_subsidy.S)
println("─────────────────────────────────────────────────────────────")
println(" Scenario │ Avg Mileage at Replace │ Replace Rate")
println("─────────────────────────────────────────────────────────────")
println(" Baseline │ $(lpad(round(avg_mile_base, digits=1), 10)) │ $(round(rate_base, digits=4))")
println(" 50% Subsidy │ $(lpad(round(avg_mile_sub, digits=1), 10)) │ $(round(rate_sub, digits=4))")
println("─────────────────────────────────────────────────────────────")─────────────────────────────────────────────────────────────
Scenario │ Avg Mileage at Replace │ Replace Rate
─────────────────────────────────────────────────────────────
Baseline │ 36.5 │ 0.0219
50% Subsidy │ 21.3 │ 0.0376
─────────────────────────────────────────────────────────────
The subsidy reduces average mileage at replacement (agents replace sooner) and increases the replacement rate (replacements happen more frequently). These are the kind of quantitative predictions that a policymaker needs to evaluate the cost and environmental impact of a subsidy program.
8.3 Example 2: Technological Improvement
Now consider a different scenario: a new engine technology reduces maintenance costs by 25%. This might correspond to better materials, improved lubrication, or more reliable components. The effect is captured by lowering \(\theta_1\):
# Counterfactual: 25% reduction in maintenance costs
m_tech = RustModel(θ₁ = 0.75 * m_base.θ₁)
sol_tech = solveRust(m_tech)(V = [-8.481850343923773, -9.04040515525516, -9.596268274075914, -10.149264887338024, -10.69920882954404, -11.245901846119558, -11.78913280940811, -12.328676884523631, -12.864294642318637, -13.395731116876156 … -25.507617590072154, -25.678792048114513, -25.835006280152992, -25.976461535169577, -26.10315242615236, -26.21476435626719, -26.310474285810308, -26.38896356162325, -26.4472781987127, -26.484240007946013], P = [2.0611536181902037e-9, 3.6031938986831e-9, 6.281970876218972e-9, 1.0920921774073837e-8, 1.8927659550227073e-8, 3.2698111929625606e-8, 5.6291767584056204e-8, 9.65530284937492e-8, 1.6496117336654304e-7, 2.8066073968450824e-7 … 0.05108661487477952, 0.060624371434404474, 0.07087453584838653, 0.08164384780224557, 0.09267116427157308, 0.10361367189992095, 0.11402060968982947, 0.12333059260731256, 0.13073640646767717, 0.1356590755379333], v_keep = [-8.481850346077822, -9.04040515895126, -9.596268280450795, -10.14926489835186, -10.699208848564611, -11.245901878910578, -11.789132865792782, -12.328676981169556, -12.864294807372703, -13.395731397629799 … -25.56005534432815, -25.7413318975507, -25.90851777662481, -26.061631533506084, -26.20040276764511, -26.324148145539613, -26.431535876051658, -26.520588877895936, -26.58738706879092, -26.63002800754871], v_replace = [-28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782 … -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782, -28.48185034607782])
The direction of the effect is interesting to think about before looking at the results. On one hand, lower maintenance costs mean the agent can afford to run the engine longer—the cost of keeping is lower at every mileage level. On the other hand, the agent is also better off overall, which might affect the option value of replacement. Let us see which effect dominates:
plot(x̄, sol_base.P, linewidth=2, label="Baseline (θ₁ = $(m_base.θ₁))",
legend=:topleft, xlabel="Mileage State (x)", ylabel="P(Replace)")
plot!(x̄, sol_tech.P, linewidth=2, label="Better Tech (θ₁ = $(round(m_tech.θ₁, digits=3)))",
linestyle=:dash)Better technology shifts the replacement probability curve to the right—the agent replaces less frequently because the engine is cheaper to maintain at every mileage level. The agent runs each engine longer before the accumulated maintenance costs justify replacement.
avg_mile_tech, rate_tech = avgMileageAtReplacement(sol_tech.P, m_tech.θ₃, m_tech.S)
println("─────────────────────────────────────────────────────────────")
println(" Scenario │ Avg Mileage at Replace │ Replace Rate")
println("─────────────────────────────────────────────────────────────")
println(" Baseline │ $(lpad(round(avg_mile_base, digits=1), 10)) │ $(round(rate_base, digits=4))")
println(" Better Tech │ $(lpad(round(avg_mile_tech, digits=1), 10)) │ $(round(rate_tech, digits=4))")
println("─────────────────────────────────────────────────────────────")─────────────────────────────────────────────────────────────
Scenario │ Avg Mileage at Replace │ Replace Rate
─────────────────────────────────────────────────────────────
Baseline │ 36.5 │ 0.0219
Better Tech │ 43.9 │ 0.0176
─────────────────────────────────────────────────────────────
8.4 Comparing Counterfactuals
We can now bring all three scenarios together into a single comparison. The table below summarizes how each counterfactual policy changes the equilibrium behavior relative to the baseline:
println("═══════════════════════════════════════════════════════════════════════")
println(" Scenario │ RC │ θ₁ │ Avg Mile at Replace │ Rate ")
println("═══════════════════════════════════════════════════════════════════════")
println(" Baseline │ $(lpad(m_base.RC, 5)) │ $(lpad(m_base.θ₁, 6)) │ $(lpad(round(avg_mile_base, digits=1), 10)) │ $(round(rate_base, digits=4))")
println(" 50% Subsidy │ $(lpad(m_base.RC/2, 5)) │ $(lpad(m_base.θ₁, 6)) │ $(lpad(round(avg_mile_sub, digits=1), 10)) │ $(round(rate_sub, digits=4))")
println(" Better Tech │ $(lpad(m_base.RC, 5)) │ $(lpad(round(m_tech.θ₁, digits=3), 6)) │ $(lpad(round(avg_mile_tech, digits=1), 10)) │ $(round(rate_tech, digits=4))")
println("═══════════════════════════════════════════════════════════════════════")═══════════════════════════════════════════════════════════════════════
Scenario │ RC │ θ₁ │ Avg Mile at Replace │ Rate
═══════════════════════════════════════════════════════════════════════
Baseline │ 20.0 │ 0.04 │ 36.5 │ 0.0219
50% Subsidy │ 10.0 │ 0.04 │ 21.3 │ 0.0376
Better Tech │ 20.0 │ 0.03 │ 43.9 │ 0.0176
═══════════════════════════════════════════════════════════════════════
The two policies have opposite effects on replacement behavior. The subsidy encourages more frequent replacement (newer fleet, lower average mileage), while the technology improvement encourages less frequent replacement (longer engine life, higher average mileage). A policymaker choosing between subsidizing replacement and investing in better engine technology would draw very different conclusions from these two experiments.
8.4.1 Why Reduced-Form Fails
A reduced-form logit regression of replacement on mileage would estimate coefficients that reflect the current policy environment. Under a different policy, agents re-optimize: the replacement probability at each mileage level changes because the underlying value functions change. The reduced-form model has no mechanism to capture this re-optimization—it would predict the same replacement probabilities regardless of the policy, which is exactly the error that Lucas (1976) warned about.
The structural model avoids this problem because it estimates primitives—the maintenance cost technology (\(\theta_1\)) and the replacement cost (\(RC\))—that are invariant to policy. When we change \(RC\) (subsidy) or \(\theta_1\) (technology), the model re-solves the Bellman equation and produces new optimal behavior, exactly as a rational agent would.
9 Summary
These notes covered Rust’s (1987) model of optimal engine replacement, a foundational paper in structural econometrics.
The Environment. Harold Zurcher manages a fleet of buses and each period decides whether to replace or keep running the engine. Maintenance costs rise linearly with mileage (\(c(x) = \theta_1 x\)), while replacement costs a fixed amount \(RC\) but resets the engine to new. Mileage transitions follow a discrete distribution with probabilities \(\theta_3\).
Extreme Value Shocks. Unobserved shocks to each option are assumed to follow the Type I Extreme Value (Gumbel) distribution. This assumption delivers two closed-form results: the log-sum-exp formula for the expected maximum and the logit formula for choice probabilities. We verified both results with Monte Carlo simulations.
Solving the Dynamic Problem. The Bellman equation combines the log-sum-exp formula with choice-specific value functions. We solve it by value function iteration, converging to a fixed point. The solution yields an S-shaped replacement probability curve—analogous to the reservation wage in McCall.
Estimation via NFXP. The Nested Fixed Point algorithm recovers structural parameters from observed data. Transition probabilities are estimated in a first step by counting. Cost parameters \((RC, \theta_1)\) are then estimated by maximum likelihood, where each likelihood evaluation requires solving the entire dynamic program. We verified the estimator recovers the true parameters from simulated data.
Rust (1987) showed that dynamic programming models can be estimated from data. The key ingredients: (1) a structural model of forward-looking behavior, (2) distributional assumptions on unobservables, and (3) the NFXP algorithm to solve the model inside the estimator. With estimated structural parameters, we can perform counterfactual policy analysis that reduced-form models cannot support.