Primer on using SDE for noise in numerical simulations
Physics, Numerical Analysis, Nonlinear Dynamics
Author
Anil Radhakrishnan
Published
October 12, 2023
Motivation
Stochastic calculus can be confusing to newcomers with identical physical processes giving conflicting results. My goal is to provide a high level intuitive overview of the choices involved in modelling stochastic differential equations and the results that arise from them. This work is not novel but a summarization of a few results in a format I best understand.
Stochastic differential equations
SDEs are extentions to differential equations with the goal of describing noise in physical processes. Given a system described by an ODE:
\frac{dx}{dt}=f(x,t)
we can add intrinsic noise to it by adding a noise term \xi with a strength \sigma:
\frac{dx}{dt}=f(x,t)+ σ(x,t)ξ
The deterministic component can be solved easily to find:
x(t)=x(0)+\int_0^T f(x(t),t)dt
To get a similar interpretation for the stochastic component we need to define what we mean by the integral of a stochastic process. To do this let’s first characterize the noise term \xi with the properties we expect.
Let the noise have a mean of zero: 𝔼[ξ]=0 This can be assumed without loss of generality since a non-zero mean can be absorbed into the deterministic component as a drift.
Let the noise be uncorrelated: 𝔼[ξ(t)ξ(t')]=δ(t-t')
Higher moments are gives by the rules of Gaussian processes: 𝔼[ξ(t)ξ(t')ξ(t'')]=δ(t-t')δ(t-t'') i.e the generating functional is
⟨\exp\left(i\int_{-∞}^∞ k(t)ξ(t) dt\right)⟩=\exp\left(\frac{1}{2}\int_{-∞}^∞ dt[k(t)]^2\right)
where k is an arbitrary test function.
These properties give a guassian white noise or “Langevin process” which is a stationary stochastic “process”.
THe first problem we have is that there is no properly defined stochastic process with these properties. The Gaussian white noise is a singular object. Its integral is a the well defined Wiener process or Brownian motion given by
W(t)=\int_0^t ξ(t')dt'
which is not stationary and nowhere differentiable.
We can visulalize ξ as a random sequence of small positve and negative pulses. In the limit of short duration and small heights, we arrive at a dense succession which is the Wiener process.
Since brownian motion is the only process that satisfies our critieria, we will use it as the basis for our stochastic calculus.
For much of this dicussion we will use discretized versions with computation in mind. Rigorously developing this requires measure theory to formally show that the continous limit exists and is unique and is beyond the scope of this discussion.
Now consider the discrete equation:
X_T = X_0 + \sum_{t=1}^N f(X_{t},t)Δt + \sum_{t=1}^N σ(X_{t},t)ΔW_t
where ΔW_t is the increment of the random process W_t between t+1 and t.
The choice
Recall that the Reimann(standard) integral is defined as the limit of the sum of the areas of rectangles under a curve.
\int_a^b f(x)dx = \lim_{n\to\infty} \sum_{i=1}^n f(\hat{x_i})Δx_i
where Δx_i is the width of the rectangle and x_i is the height.
For ordinary smooth functions, the choice of where \hat{x_i} lies in the rectangle does not matter.
We can understand this more precisely by defining a characteristic function χ_{Δx_i} which is 1 if x is in the rectangle and 0 otherwise. Then the rectangular construction is equivalent to:
f(t)≈\sum_{i=1}^n f(\hat{x_i})χ_{Δx_i}
where f(\hat{x_i}) are constants. This approximation is exact in the limit of n\to\infty.
Now lets try to extend this to the case of a stochastic process. We can define a characteristic function χ_{ΔW_i} which is 1 if W is in the rectangle and 0 otherwise. Then the rectangular construction is equivalent to:
\phi(t)≈\sum_{i=1}^n \hat{e}_iχ_{Δx_i}
Then the integral is:
\int_0^T \phi(t) dW_t=\sum_{i=1}^n \hat{e}_i ΔW_i
Here, the coeffients \hat{e}_i are not constants but random variables since we are allowed to integrate σ which depends on X_t.
So, instead of a finite sum of simple functions we have a sum of random variables multiplied by increments of a brownian walk.
Since proceeding the same way as before clearly would not work, lets try to find the extent to which this matters.
Let the evaluation point be \tau_i\in [t_i,t_{i+1}), so \hat{e}_i=W_{\tau_i}. Then the expectation is:
\begin{align*}
𝔼\left[\int_0^T W_tdW_t\right] &= 𝔼\left[\sum_i W_{\tau_i}ΔW_i\right]\\
&= \sum_i 𝔼\left[ W_{\tau_i}(W_{t_{i+1}}-W_{t_i})\right]\\
&= \sum_i 𝔼\left[ W_{\tau_i}W_{t_{i+1}}\right]-𝔼\left[ W_{\tau_i}W_{t_i}\right]\\
&= \sum_i \min(\tau_i,t_{i+1})- \min(\tau_i,t_i)\\
&= \sum_i \tau_i-t_i\\
\end{align*}
where we have used the fact that 𝔼[W_t]=0 and 𝔼[W_tW_s]=\min(t,s). This can be any value between 0 (\tau_i=t_i) and T (\tau_i=t_{i+1}).
So it is imperative that we make a choice for \tau to define the integral of any stochastic process.
There are two choices that are commonly used: Itô and Stratonovich. Itô evaluates the integrad at the left endpoint corresponding to \tau_i=t_i and Stratonovich evaluates it at the midpoint corresponding to \tau_i=(t_i+t_{i+1})/2.
Itô integrals
Since we evaluate the integrand at the left endpoint, the Itô formulation ensures that the evaluations of the function are completely uncorrelated with the increment. That is if we evlaute the function f(x,W_t,t) at the left endpont,the limiting sum is:
\int_0^T f(x,W_t,t)dW_t=\lim_{n\to\infty} \sum_{i=1}^n f(x,W_{t_i},t_i)(W_{t_{i+1}}-W_{t_i})
Since the increment (W_{t_{i+1}}-W_{t_i}) is uncorrelated with W_{t_k}, it is also uncorrelated with f(x,W_{t_k},t_k) for any k.
This gives the property that that an Itô integral,
\int_S^T f(x,t)dW_t,
always has a zero mean and is martingale, i.e. the conditional expectation of the integral at any time t is the value at that time:
𝔼\left[\int_S^T f(x,t)dW_t|W_s\right]=\int_S^s f(x,t)dW_t
More intuitively, this means that the best guess for the future value of the integral is the current value.
Informally, Itô’s choice allows is to think of the stochastic process as the limit of a discrete one where time per round is inifinitesimal. Since in a discrete process, the value of the function is evaluated at the beginning of the round, we have the martingale property.
But Itô integrals require a modification of the chain rule, If we have an Itô process X_t,
dX_t = f(X_t,t)dt + σ(X_t,t)dW_t
amd another random process Y_t, such that Y_t=g(X_t,t), where g is twice differentiable, then the Itô stochastic differential equation for Y_t is:
dY_t = \frac{\partial g}{\partial t}(X_t,t)dt + f\frac{\partial g}{\partial x}(X_t,t)dX_t + \frac{1}{2}σ^2\frac{\partial^2 g}{\partial x^2}(X_t,t)\sigma(X_t,t)^2dt
which is the famous Itô lemma which adds the second order noise tern to the standard chain rule.
Intuitively we can understand this by considering that brownian motion has 𝔼[ΔW_i^2]=Δt_i which in the limit gives (dW_t)^2=dt
So the second order term in X_t is relevant in the infinitesimal dY_t since dt^2 and dtdW_t will diminish faster than dW_t and dt.
Stratonovich integrals
While Itô integrals have the nice property of being martingales, the deviation from the chain rule makes them difficult to use in practice. Stranovich integrals preserve the standard chain rule at the cost of the martingale property.
While Itô’s choice gleaned intuition from the discrete time process, Stratonovich’s choice can be motivated as the limit of smooth noise.
Concretely, consider a guassian process with covariance,
\Sigma_{ij}=\exp\left(-\frac{(|t_i-t_j|)^2}{τ^2}\right)
As τ\to0, the process becomes a white noise process \xi. So for some process W_t^{\tau} such that W_t^{\tau}\to W_t as τ\to0, we have:
\frac{dX_t^{\tau}}{dt} = f + σ\frac{dW_t^{\tau}}{dt}
the solution to this converges to the Stratanovich equation(can be formally shown with measure theory):
dX_t= fdt + σ∘dW_t
in the limit of white noise. With this argument, it is easy to see the Stratonovich process as the limit of a continous time process with colored noise. This also lends intuition to why it preserves the normal chain rule, since it is a limit of a series of ODEs, all of which obey the same chain rule.
We do however lose the martingale property since with the midpoint evaluation, the function is correlated with the increment.
Numerical Comparison
Code
import numpy as npimport matplotlib.pyplot as pltplt.style.use("seaborn-v0_8-whitegrid")# ParametersT =1.0# Total timeN =1000# Number of time stepsdt = T / N # Time step sizet = np.linspace(0, T, N+1) # Time arraymu =0.1# Drift coefficientsigma =0.2# Diffusion coefficient# Initialize Brownian motiondW = np.sqrt(dt) * np.random.randn(N)W = np.cumsum(dW)# Itô interpretationX_ito = np.zeros(N+1)X_ito[0] =1for i inrange(1, N+1): X_ito[i] = X_ito[i-1] + mu * X_ito[i-1] * dt + sigma * X_ito[i-1] * dW[i-1]# Stratonovich interpretationX_strat = np.zeros(N+1)X_strat[0] =1for i inrange(1, N+1): X_strat[i] = X_strat[i-1] + mu * X_strat[i-1] * dt + sigma *0.5* (X_strat[i-1] + X_strat[i-1]) * dW[i-1]# Plot the pathsplt.figure(figsize=(8, 5))plt.plot(t, X_ito, label="Itô Interpretation", alpha=0.5)plt.plot(t, X_strat, label="Stratonovich Interpretation", alpha=0.5)plt.xlabel("Time")plt.ylabel("Value")plt.legend()plt.title("Brownian Motion under Itô and Stratonovich Interpretations")plt.grid()plt.show()
From the mathematics and the intuition, we expect to see a difference between the two interpretations, where Itô’s formulation is more volatile than Stratonovich’s. But on a simple implementation on a brownian motion, we see that the two interpretations are surprisingly indistinguishable.
Transformations and purpose
While there are intuitive and mechanical differences between the two choices, the two techniques are mathematically equivalent in the following sense. Any Stratonovich SDE,
dX_t = f(X_t,t)dt + σ(X_t,t)∘dW_t
can be transformed into an equivalent Itô SDE,
dX_t = f(X_t,t)dt + σ(X_t,t)dW_t+ \frac{1}{2}\frac{∂σ}{∂x}(X_t,t)σ(X_t,t)dt
and vice versa. So given a well defined SDE, the choice of formalism is a matter of convenience since the transformation simply modifies the drift term.
The formalism is truly relevant only in a regime where we have a prior noise free model and need to choose a “correct” interpretation of the noise and how to add it. So our formulation of the noise should be guided by the physical process we are modelling since while it may not affect the behaviour of the system, it will affect the interpretation of the drift and diffusion coefficients of the model.