STAT0025 Optimisation Algorithms for
Operational Research: solutions
Department of Statistical Science, University College London
2021/22, main exam session
Intended learning outcomes for this course:
1. understand the theoretical concepts of linear programming, dynamic programming
and finite Markov programming
2. set up correct models of real life problems
3. interpret results correctly
4. check the validity of assumptions
Notes
A1. Seen similar all parts.
A2. Seen similar all parts.
B1. (a) Seen similar. (b) Seen similar. (c) Seen similar. (d) Unseen.
B2. (a) Unseen. (b) Unseen. (c) Seen similar. (d) Unseen.
B3. Seen similar.
1
Section A
A1 (a) The graphical solution is as follows:
The optimal solution is x∗ = (2, 3) with optimal value z∗ = 13.
[1 mark] for feasible region F , [1 mark] for contour (dotted/thick solid
line, arrow not necessary), [1 mark] for graphical identification of optimal
solution, [1 mark] for numerical identification of optimal solution, [1 mark]
for optimal value.
(b) First consider fixing c2 = 5 and varying c1. The objective function contour
is x2 = z/5− c1/5x1. We are still maximising the intercept regardless of the
value of c1. Considering the figure we see that the gradient of the contour
must lie between the gradients of constraints 2 and 4. That is, −1/2 ≤
−c1/5 ≤ 1/4, [1 mark] or −5/4 ≤ c1 ≤ 5/2, which is the tolerance interval.
[2 marks]
Now consider fixing c1 = −1 and varying c2. The objective function contour
is x2 = z/c2 + x1/c2. If c2 < 0, then we should minimise the intercept, and
then (2, 3) is never optimal. [1 mark] If c2 > 0, then we must meet the
same gradient condition as above, meaning that we need −1/2 ≤ 1/c2 and
1/c2 ≤ 1/4. This simplifies to c2 ≥ 4. [2 marks] Finally, if c2 = 0, then
we should maximise z = −x1, and then (2, 3) is not optimal. [1 mark] In
conclusion, the tolerance interval of c2 is [4,∞).
(c) The expected values of the rows are 25/4, 23/4, 10/4. [1 mark] The optimal
strategy is A1 with expected payoff 25/4. [1 mark]
2
(d) The regret matrix is as follows, with ‘EV’ and ‘max’ used to compute the
optimal solution.
Nature
N1 N2 N3 EV max
Player A A1 9 0 0 9/4 9
A2 0 9 1 11/4 9
A3 6 8 5 6 8
[2 marks] for regret matrix. According to the EV criterion, the optimal
strategy is A1 with expected regret 9/4. [1 mark] for EVs, [1 mark] for
conclusion. According to the minimax regret criterion, the optimal strategy
is A3 with maximal regret 8. [1 mark] for maxima, [1 mark] for conclusion.
A2 (a) Month 3. We know that I4 = I3 − V3 + Q3 = 0 since we require an empty
warehouse at the end. Since we need to have V3 ≤ I3 and Q3 ≥ 0, the only
(and hence optimal) choice is V ∗3 = I3 and Q
∗
3 = 0.
The optimal profit over month 3 is
pi∗3 = 55V
∗
2 − 50Q∗3 = 55I3.
[2 marks]
Month 2. We know that I3 = I2−V2+Q2. The profit over months 2 and 3 is
pi2 = 45V2 − 52Q2 + pi∗3
= 45V2 − 52Q2 + 55I3
= 45V2 − 52Q2 + 55I2 − 55V2 + 55Q2
= −10V2 + 3Q2 + 55I2.
We seek to maximise this over the constraints V2 ≤ I2, Q2−V2 ≤ K− I2 and
V2, Q2 ≥ 0.
The optimal solution is V ∗2 = 0, Q
∗
2 = K − I2 giving profit pi∗2 = 3K + 52I2.
[2 marks]
Month 1. We know that I2 = I1 − V1 +Q1. The profit over months 1, 2 and
3 is given by
pi1 = 54V1 − 46Q1 + pi∗2
= 54V1 − 46Q1 + 52I2 + 3K
= 54V1 − 46Q1 + 52I1 − 52V1 + 52Q1 + 3K
= 2V1 + 6Q1 + 52I1 + 3K.
We seek to maximise this over the constraints V1 ≤ I1, Q1 − V1 ≤ K − I1
and V1, Q1 ≥ 0. The contour of the objective function is Q1 = 16pi1 + 526 I1 +
3
2K − 1
3
V1. Hence we are maximising intercept and we see that the optimal
solution is V ∗1 = I1, Q
∗
1 = K, giving profit
pi∗1 = 2I1 + 6K + 52I1 + 3K = 54I1 + 9K.
[2 marks]
We are now done because we know that I1 = 400. We roll the solution
forward to complete the following summary table.
t It V
∗
t Q
∗
t pi
∗
t
1 400 I1 = 400 K = 500 54I1 + 9K = 26100
2 500 0 K − I2 = 0 52I2 + 3K = 27500
3 500 I3 = 500 0 55I3 = 27500
[3 marks] for table.
The total profit over the period is pi∗1 = 26100. [1 mark] for separately
reporting profit.
(b) For A3 to be the row minimum, we need min(a, b, 5) = a, which is equivalent
to conditions a ≤ 5, a ≤ b. [2 marks] For B2 to be the column maximum,
we need max(a, b− 1,−1) = a, which is equivalent to a ≥ −1, a ≥ b− 1. [2
marks] If these conditions are met then the point (A3, B1) is a pure Nash
equilibrium and hence a saddle point. In conclusion, the point is a saddle
point iff −1 ≤ a ≤ 5 and a ≤ b ≤ a + 1. [1 mark] for succinct conclusion
with correct argument.
Alternative formulation: the point is a saddle point iff −1 ≤ a ≤ 5 and
b− 1 ≤ a ≤ b.
(c) The pure maximin payoff is 5 [2 marks] and the pure minimax payoff is 7
[2 marks]. The value is the mixed maximin payoff, which is larger than the
pure maximin payoff because every pure strategy is also a mixed strategy.
Hence the value is greater than 5. Also, the value is the mixed minimax
payoff, which is smaller than the pure minimax payoff, and hence is smaller
than 7. [1 mark] for appropriate argument about value. This completes the
proof.
Section B
B1 (a) For an edge (i, j), denote by cij its capacity and dij its demand. The LPP
is as follows. Implicitly, quantifiers over edges include only those where an
4
edge exists in the network.
Maximise z =
∑
i,j
500(dij − xij) + 250(x13 + x14 + x23 + x24 + x58 − x35)
subject to
∑
j
xij =
∑
j
xji, i = 3, 4, 7, 8,
x35 ≤ x58,
x36 + x46 ≥ x68,
xij ≤ cij, i, j = 1, . . . , 10,
xij ≥ 0, i, j = 1, . . . , 10.
The first term in the objective function is the network’s unmet demand
penalty. The second term is the cost of running the buses.
The first three constraints are flow constraints for, respectively, normal bus
stops, the intermediate bus garage 5, and the intermediate bus dept 6. The
fourth constraint ensures we do not exceed capacity. The fifth is non-negativity.
[1 mark] for objective function, [1 mark] for explanation of objective func-
tion, [4 marks] for first four constraints, [4 marks] for explanation of first
four constraints.
(b) Example possibilities: look at the shadow prices of the capacity constraints
to decide whether to widen a road or install a bus lane; look at changing
the objective function to see if fluctuating demand will change the transport
plan (it won’t in this formulation); ditto to see if fluctuations in running costs
will affect the transport plan. [2 marks] for first suggestion, [1 mark] for
second.
(c) If x > d then the ‘penalty’ 500(d − x) is negative, which is favourable for
the LPP. This component of the objective function incentivises us to run as
many buses as possible on each route, regardless of the actual demand. [1
mark]
You could take an L2 sum of the penalties instead, to try to match them
closely. If you prefer to match without exceeding, you could replace the
penalty with 500max{d − x, 0}. With either of these, we can no longer use
linear programming. [1 mark] for any sensible suggestion about changes to
objective, [1 mark] for correct assessment about linearity.
(d) A naive answer is to simply remove the flow constraint through stop 6, since
buses may now stop or start at will. But in this formulation one cannot tell
how many buses are run through the network, since some might stop and
others start at 6. Even if you argue that stopping and starting buses at the
same point will never be optimal, it does not fix this issue, because then we
need to have a term in the objective 250max{x68 − x36 − x46, 0}, and this is
nonlinear.
5
A better answer is to install a phantom node 6′ which is the garage, and retain
6 as the depot. Link previous nodes to node 6, future nodes to 6′ and link 6
to 6′ by an edge which is exempt from the usual capacity and demand rules.
Then x6′8−x66′ is the number of buses started at 6, so 250(x6′8−x66′) should
be added to the objective function. [4 marks] for any suitable intervention.
Can be written out as LPP or explained unambiguously.
B2 (a) This is a resource allocation problem with a twist, which is that the accumu-
lation of values is multiplicative instead of additive.
Stages are: 0: no allocation; 1: allocation to team 1; 2: allocation to teams
1 and 2; 3: allocation to teams 1, 2 and 3. States are the cumulative budget
allocation to date. Denote by p(n)ij the probability of failure when team n− 1
is allocated j − i resource units.
Let fn(s) be the probability of failure from the start to stage n. Assign
fn(0) = 1. Then the forward recursion equation is
fn(s) = min{p(n−1)rs fn−1(r) : r ≤ s}.
Alternatively, let gn(s) be the probability of failure from stage n to the end,
and assign gn(s) = 1 for n = 3 and any s ∈ Q3. The backward recursion
equation is
gn(s) = min{p(n)st fn+1(t) : s ≤ t}.
[1 mark] for stages, [1 mark] for states, [2 marks] for suitable notation
for probabilities, [1 mark] for suitable notation for value function, [1 mark]
for boundaries of value function,. [4 marks] for either forward or backward
recursion equation.
An alternative solution (which I’m not expecting to see) is to take logs and
treat the log probabilities as transition/terminal costs.
(b) It may be easiest to do this graphically. In the diagram below, the edge
annotations are values of p(n)ij and the vertex annotations are the values of
fn(s).
6
00
1
2
0
1
2
0
1
2
0.3
0.15
0.1
0.5
0.5
0.5
0.3
0.3 0.2
Stage 0 Stage 1 Stage 2 Stage 3
0.7
0.7
0.7
0.4
0.4 0.25
1
0.3
0.15
0.1
0.15
0.075
0.045
0.105
0.0525
0.03
The thick lines represent (forwards) optimal routes, and the thick dashed line
represents the globally optimal route. In other words, we should allocate 1
unit to team 1, 0 units to team 2, and 1 unit to team 3. This gives a failure
probability of 0.03.
[3 marks] for value function, 1 for each stage, [1 mark] for solution ex-
plained in terms of allocation, [1 mark] for final failure probability.
Alternatively, using backward recursion, we have the following diagram. Here
the vertex annotations are the values of gn(s).
0
0
1
2
0
1
2
0
1
2
0.3
0.15
0.1
0.5
0.5
0.5
0.3
0.3 0.2
Stage 0 Stage 1 Stage 2 Stage 3
0.7
0.7
0.7
0.4
0.4 0.25
1
1
1
0.25
0.4
0.7
0.12
0.2
0.35
0.03
The thick lines represent the (backwards) optimal routes, and the thick
dashed line again represents the globally optimal route. We obtain the same
solution as above.
Alternatively, one can use a tabular solution.
7
(c) As proposed, the stages are 0: time 0, 1: after one year, 2: after two years,
3: after three years.
States. In stage 0, we just have state F0. In later stages, we have S, F0, F1
and F2, where S means success and Fi means failure with budget i consumed
to date. We set terminal costs to 1 for Fi and 0 for S and transition costs to
zero.
Let action a ∈ {0, 1, 2} represent allocating a resource units at the current
stage, i.e. to team n+ 1. We insist that, if we are currently in state Fi, then
we only allow actions a such that i + a ≤ 2. If we are currently in state S,
we only allow action 0.
Let p(n)st (a) now represent the failure probability with n stages to go(!), i.e.
from stage 3 − n to stage 3 − n + 1. Here, p(n)FiFi+a(a) is the relevant failure
probability and p(n)FiS(a) is its complement, with all other transitions having
zero probability.
We were asked to give transition matrices explicitly for the first two stages,
so here they are.
For stage 0, rows are indexed by F0 and columns by S, F0, F1, F2.
P (3)(0) =
(
0.7 0.3 0 0
)
P (3)(1) =
(
0.85 0 0.15 0
)
P (3)(2) =
(
0.9 0 0 0.1
)
For stage 1, rows are indexed by S, F0, F1, F2, and the same for columns.
P (2)(0) =
1 0 0 0
0.5 0.5 0 0
0.5 0 0.5 0
0.5 0 0 0.5
P (2)(1) =
1 0 0 0
0.7 0 0.3 0
0.7 0 0 0.3
0 0 0 0
P (2)(2) =
1 0 0 0
0.8 0 0 0.2
0 0 0 0
0 0 0 0
Note that one must handle exceeding the budget in some way but the exact
manner is not important. I have written substochastic matrices down with
the implication that the chain is killed in this case. One could equally in-
troduce a constraint violation state X, transition there and give it a high
terminal cost; or make some arbitrary transition but introduce a high transi-
tion cost; or simply allocate the highest acceptable budget and then interpret
appropriately later.
[2 marks] for suitable states, [1 mark] for suitable actions, [2 marks]
for stage 0 transition matrices, [2 marks] for stage 1 transition matrices
ignoring budget, [1 mark] for any sensible attempt at dealing with budget
(successful or not).
8
(d) The outcome of team 1 will affect the allocation to teams 2 and 3; and the
outcome of team 2 will affect the allocation of team 3. Since we need to take
sequential actions we cannot treat this the same as the non-Markov version.
[2 marks]
B3 We start by transforming the payoff matrix to remove negative entries. Add 3:
Player B
B1 B2 B3
Player A A1 2 6 3
A2 8 1 4
[2 marks] for offset.
The following linear programming problem determine’s B’s stategy.
Maximise Z = Y1 + Y2 + Y3
subject to 2Y1 + 6Y2 + 3Y3 ≤ 1
8Y1 + Y2 + 4Y3 ≤ 1
Y1, Y2, Y3 ≥ 0.
[2 marks] for conversion to LPP.
The initial tableau is
Y1 Y2 Y3 Y4 Y5 Sol
Y4 2 6 3 1 0 1
Y5 8 1 4 0 1 1
Z −1 −1 −1 0 0 0
[2 marks] for initial tableau. Enter Y3, exit Y5. (This hint speeds up the algo-
rithm.)
Y1 Y2 Y3 Y4 Y5 Sol
Y4 −4 214 0 1 −34 14
Y3 2
1
4
1 0 1
4
1
4
Z 1 −3
4
0 0 1
4
1
4
[2 marks] for updated tableau. Enter Y2, exit Y4.
Y1 Y2 Y3 Y4 Y5 Sol
Y2 −1621 1 0 421 −17 121
Y3
46
21
0 1 − 1
21
2
7
5
21
Z 3
7
0 0 1
7
1
7
2
7
[2 marks] for correct final tableau.
9
The solution of the LPP is Y1 = 0, Y2 = 1/21, Y3 = 5/21, Z = 2/7. Player B’s
strategy is y1 = Y1/Z = 0, y2 = Y2/Z = 1/6, y3 = Y3/Z = 5/6. The value of the
game is v = 1/Z − 3 = 1/2.
The solution of the dual LPP, using the shadow prices in the simplex tableau, is
X1 = 1/7, X2 = 2/7. Player A’s strategy is hence x1 = X1/Z = 1/2, x2 = X2/Z =
1/2.
[2 marks] for one player’s strategy, [2 marks] for the other one, [1 mark] for
value.
Alternatively, one can ignore the hint, at the cost of having four tableaux. This is
acceptable but the third tableau attracts no marks.