MSc in Financial Mathematics, FM50
Stochastic Control for Optimal Trading
Roxana Dumitrescu and Leandro Sa´nchez-Betancourt
Department of Mathematics
King’s College London
1 Part 1: Literature review
A stochastic optimal control problem deals with uncertainties when making decisions to maximize or
minimize an objective function. With a given objective function, decision makers need to determine a
strategy, which is the stochastic control, to optimize the objective function in a random environment.
The decision-making problem is the so-called stochastic control problem. One powerful tool to study
the stochastic control problems is the dynamic programming principle and the associated Hamilton-
Jacobi-Bellman (HJB) equation. Optimal controls are obtained by solving the HJB equation. For an
overview of this approach, students are referred to [Pha09], [FS06, Tou12]. Stochastic control theory
has applications to a wide range of areas, from engineering to financial mathematics and economics.
For Part 1, students are expected to present an overview of possible applications of stochastic
control problems arising in financial mathematics. You should describe some examples of problems
to which these techniques can be applied and you should try to write down some HJB equations for
the problems you consider and the solutions if known. You should be proactive in researching the
literature, which involves published journal papers and books. Working papers should be used mostly
for orientation, given that their content has not been peer reviewed. It is particularly important that
you synthesize the information gathered from these sources and presents it as a flowing story that is
consistent both in terms of notation and mathematical and financial content.
2 Part 2: An optimal trading problem
In this part, we consider an extension of the standard optimal trading problem. We refer the student
to Chapter 6 in [CJP15] for an introduction to the subject.
Consider a completed filtered probability space (Ω,F ,F = (Ft)t∈T,P), with Ft the natural filtration
generated by the 2−dimensional Brownian motion W = (Wα, WS), with Wα, and WS independent,
and T = [0, T ], where T > 0 is the trading horizon. We let the mid-price process of the traded asset
dSt = κ
s αt dt+ σ
s dWSt , (1)
where κs, σs ∈ R+ and αt is the informed trader’s signal, which we assume follows
dαt = −κα αt dt+ σα dWαt , (2)
for κα, σα ∈ R+. The process (νt)t∈T is understood as the speed at which the trader trades in the
market. In particular, we have the understanding that when νt > 0 it means that the trader is
purchasing the security and when νt < 0 the trader is selling the security.
We denote by A the set of admissible strategies for the informed trader defined as
A :=
ν = (νt){t∈T} | ν is F− progressively measurable, and E
[∫ T
< ∞
and we let ν ∈ A be a given trading strategy. Given ν, the inventory process of the informed trader,
denoted by (Qνt )t∈T, satisfies
dQνt = νt dt, Q
0 = 0 . (4)
Similarly, we define the controlled cash process (Xνt )t∈T of the informed trader, which follows
dXνt = −νt (St + κ νt) dt , Xν0 = 0 , (5)
where κ is the temporary price impact parameter that captures the quality of the liquidity that the
broker offers to their clients.
The performance criterion of the informed trader is given by
Hν(t, α, q, S, x) = Et,x,S,q
XνT +Q
T ST − a (QνT )2 − ϕ
∫ T
(Qνs )
2 ds
, (6)
with the value function given by
H(t, α, q, S, x) = sup
Hν(t, α, q, S, x) , (7)
and the notation Et,x,S,q means
Et,x,S,q[ · ] = E[ · |Xνt = x, St = S, Qνt = q] . (8)
Task 1. Find the explicit solution to the control problem described above, i.e., find the value function
H and the optimal control ν∗ in closed-form. What happens to the optimal trading strategy if the
Brownian motions Wα and WS have a correlation ρ ̸= 0?
To accomplish this task, you can follow the standard approach, which consists in (formally) proving
the dynamic programming principle satisfied by the value function H and deriving the associated HJB
equation (see e.g. [CJP15, Pha09]). Then, compute the optimal control in feedback form and substitute
it back in the HJB equation and derive the PDE satisfied by the value function.
Using the PDE satisfied by the value function, one can propose the ansatz H(t, α, q, S, x) =
x+ S q + h(t, α, q) and derive the PDE satisfied by h. Then, propose a linear-quadratic ansatz (in q)
for the function h and deduce a system of ODEs, that you should solve next. Use the solution for H
to find a closed-form solution to the optimal trading strategy.
Consider the following model parameters: S0 = 100, κ
s = 1, σs = 2, T = 1, κα = 10, σα = 5,
κ = 1× 10−3, a = 1, b = 0, and ϕ = 0. Consider the discrete version of the model using time steps of
∆ = T/1000 and implement the optimal trading strategy (ν∗).
Task 2. Based on the 10,000 simulations using the above parameters, fill the following table and
produce histograms of the four random variables below:
Expected value Standard deviation



T +Q
Task 3. Study the optimal strategy and its sensitivity to the model parameters. What can be said about
the limiting behaviour of the optimal strategy ν∗ as a→∞?
To accomplish this task, you can plot trajectories of (αt)0≤t≤T , (νt)0≤t≤T , (QIt )0≤t≤T , and (X
t )0≤t≤T
for a few outcomes of chance and comment on what the optimal strategy does and why one observes
the plotted behaviour. To analyse the sensitivity of the optimal strategy with respect to model pa-
rameters, one can consider different values of a and ϕ and comment on their influence on the optimal
Task 4. Consider now a benchmark strategy that trades following αt, i.e., ν
t = αt. Why would this
be a good benchmark? Produce histograms for P νT := X
T + Q
T − a (QνT )2 − ϕ
∫ T
(Qνs )
2 ds under
strategies νB and νI∗, and compare the means for the quantity P νT under both strategies. Explain your
3 Part 3: Original contribution
In this part, you should develop your own ideas either on an extension of the model proposed in Part
2, or on an independent control problem. Some possible ideas could be:
• More sophisticated price process models, e.g., jump-diffusion models, transient price impact,
permanent price impact, impacting the signal.
• Use neural networks to find the optimal feedback control following Section 2.1 in [GPW21].
Once you have managed to replicate the results obtained in Part 2 of the thesis, then use neural
networks to solve more general versions of the current problem, e.g., other utility functions,
different impact functions, more sophisticated models for the asset price, etc.
• Develop a reinforcement learning framework to solve a discrete-time version of the problem in
Part 2 and implement it. In particular, discuss advantages of doing so; see [HXY21, CJSB22].
• Study the case where there are two (or more) signals. Study the tradeoff between following a
short-term α-signal versus a longer term signal.
• Consider ambiguity aversion within the framework.
• Consider a model with two correlated assets, and an α-signal that enters both assets. Set up an
optimal trading problem of your design within this framework.
[CJP15] A´lvaro Cartea, Sebastian Jaimungal, and Jose´ Penalva. Algorithmic and high-frequency
trading. Cambridge University Press, 2015.
[CJSB22] A´lvaro Cartea, Sebastian Jaimungal, and Leandro Sa´nchez-Betancourt. Deep reinforcement
learning for algorithmic trading. In Machine Learning in Financial Markets: A guide to
contemporary practices (to appear). Edited by C.-A. Lehalle and A. Capponi. Cambridge
University Press, 2022.
[FS06] Wendell H Fleming and Halil Mete Soner. Controlled Markov processes and viscosity solu-
tions, volume 25. Springer Science & Business Media, 2006.
[GPW21] Maximilien Germain, Huyeˆn Pham, and Xavier Warin. Neural networks-based algorithms
for stochastic control and pdes in finance. arXiv preprint arXiv:2101.08068, 2021.
[HXY21] Ben Hambly, Renyuan Xu, and Huining Yang. Recent advances in reinforcement learning
in finance. arXiv preprint arXiv:2112.04553, 2021.
[Pha09] Huyeˆn Pham. Continuous-time stochastic control and optimization with financial applica-
tions, volume 61. Springer Science & Business Media, 2009.
[Tou12] Nizar Touzi. Optimal stochastic control, stochastic target problems, and backward SDE,
volume 29. Springer Science & Business Media, 2012.
