MATH 368
Dr. Andreas Alpers
Project 1
Submit by: 30.04.2021
Breaking permutation ciphers using
Markov Chain Monte Carlo
1. Introduction
Supposedly already used by the ancient Spartans, permutation ciphers have been around since
many centuries. Breaking of such ciphers is commonly considered to be rather difficult. In this
project we want to implement an MCMC method to break permutation ciphers. In particular, we
want to decipher a specific text that you should download from the course webpage.
We will assume that the plain and ciphertext contain only the 27 characters A,B,C, . . . ,Z, ,
with ‘ ’ denoting the space character. The set alph := {A,B,C, . . . ,Z, } is called alphabet. There
is no linebreak, punctuation, or anything else.
In permutation ciphers all the 27 characters in the plain text remain the same but their positions
are rearranged in a different order. The plain text is first split into blocks of a prescribed size ` (the
so-called key length). Characters in each block are permuted according to a same permutation σ
(the so-called key).
For example, for ` = 5 and key
σ = (σ(1), σ(2), σ(3), σ(4), σ(5)) = (2, 4, 5, 1, 3)
one obtains1
plaintext = VOLLE︸ ︷︷ ︸ YBALL︸ ︷︷ ︸
ciphertext =
︷ ︸︸ ︷
LVEOL
︷ ︸︸ ︷
LYLBA
Decryption therefore amounts to finding the inverse σ−1 of the (unknown) permutation σ.
2. Swap and slide moves
Using Metropolis-Hastings the aim is to generate a Markov Chain whose states are the permu-
tations σ of ` elements. Two different types of proposals/moves should be considered in the
proposal step: swap moves and slide moves.
A swap move swaps two randomly selected entries of the current key σ. For instance, if for the
above σ the selected entries are 2 and 5 then the new key will be
σ′ = (5, 4,2, 1, 3) .
For a slide move one first needs to draw (uniformly at random) three numbers b ∈ [1, ` − 2],
k1 ∈ [1, `− b+ 1], and k2 ∈ [0, `− b]. Then, from σ the entries at positions k1, k1 + 1, . . . , k1 + b− 1
are deleted and again inserted (in the same order) after position k2. For instance, for the above σ
with b = 2, k1 = 3, k2 = 0, one obtains
σ = (2, 4,5,1, 3) → (2, 4, 3) → σ′ = (5,1, 2, 4, 3) .
1In this notation for σ, the character in each block is moved from position i to σ(i).
Page 1 of 3
3. Steps of the project
We can break this project down into several steps.
(S0) Download the file ciphertext.txt from the canvas course page https://liverpool.
instructure.com/courses/19760 (see link in the top announcement). The file contains
a single line of 756 characters, all in alph. This is the ciphertext to be decrypted. Assume
that the plain text was written in English and encrypted with a permutation cipher with key
length ` = 9.
(S1) Download a large English text, such as Tolstoy’s War and Peace from http://www.
gutenberg.org/files/2600/2600-0.txt, and compute the following:
(a) freq(c), the relative frequencies of the characters c ∈ alph in English language (i.e., set
freq(c) to be the number of occurrences of c in your English text and normalize this
such that
∑
c∈alph freq(c) = 1).
(b) B(x, y), a matrix that records the number of character transitions from x ∈ alph to
y ∈ alph in your English text (in other words, B(x, y) holds the value how often x
is followed by y in the English text). By dividing each row B(x, ·) by ∑y∈alphB(x, y)
obtain the matrix TransB.
(S2) For numerical stability we want to work with the logarithm version of the Metropolis-
Hastings algorithm discussed in class. Therefore, compute logfreq(c) := log(freq(c)) and
logTransB(x, y) := log(TransB(x, y)) for all c, x, y ∈ alph (with log denoting the natural
logarithm).
(S3) To any string f of n characters from alph we can associate a plausibility
P (f) := logfreq(f(1)) +
n−1∑
i=1
logTransB(f(i), f(i+ 1)),
with f(i) denoting the ith character in f. Further, let f−j denote the string that results
from deleting the first j blocks (i.e., j` characters) from the ciphertext.
Give a matlab implementation of the Metropolis-Hastings algorithm that takes f−j as input
(j drawn uniformly at random from the interval [0, 30]) and, starting with the identity
permutation, performs a prescribed number MaxIt of iterations. Only slide moves should
be proposed in the first MaxIt/2 iterations, whereas only swap moves are to be used in the
final MaxIt/2 iterations.
(Note that the acceptance probability for accepting the move that changes the current string f
to the new string f ∗ is given by exp(min{0, P (f ∗)− P (f)}) as we work with logarithms.)
(S4) Run the Metropolis-Hastings algorithm 20 times (each run consisting of MaxIt = 6, 000
Metropolis-Hastings iterations, i.e., new proposals of f ∗). Return for the most plausible f
(the f with largest P (f) value obtained over all iterations and runs of the algorithm) the
corresponding permutation, and decrypt with it the whole ciphertext. If your code contains
no bug you will obtain the plain text (or, at least, a good approximation to it).
Page 2 of 3
4. Hints for debugging
(i) Before analyzing War and Peace you should convert all the letters to upper case. Delete
the first 842 and the last 364 lines of the text file (they just contain the table of contents
and technical comments inserted by Project Gutenberg; the remaining text should start with
the words ‘CHAPTER I’ and end with the last sentence of the book). Moreover delete all
characters that are not in alph.
(ii) To improve convergence, it may be helpful to replace −∞ values in logTransB by a small
value, say, −12.
(iii) You might want to test your implementation by decrypting a text that you encrypted yourself.
(iv) It might be helpful to write (and test separately) a function that takes a permutation and a
string as input and returns the permuted string.
(v) If your code (including the text analysis) takes more than a few minutes on a standard laptop,
then there is probably somewhere a bug.
5. What and how to submit
Each group is expected to submit a single PDF containing answers/code for Q1-Q4 stated
below (submit via Canvas as group submission). Each member of the group is expected to take
part in the peer moderation on Canvas using Buddycheck upon submission (failure to do so,
will result in a subtraction of 5 points from your individual grade after moderation).
Q1: [15 points] What is the best decryption of the ciphertext obtained by your MCMC
method? State also the corresponding P (f) value and the σ−1 that you use for the
decryption.
(Can you guess the original author of the lines of the ciphertext?)
Q2: [20 points] What are the five least frequent characters in War and Peace?
(State them in increasing order.)
Q3: [20 points] What are the five most frequent character transitions in War and Peace?
(State them in decreasing order.)
Q4: [45 points] Submit your complete, commented matlab code for decrypting the text (in-
cluding your MCMC implementation and code for analyzing English language texts).
Make sure that the part containing the implementation of the swap and slide moves is
well documented.
Academic Integrity
The work submitted must be academically sound, without elements of plagiarism or
other features of poor practice. More resources on Academic Integrity, and the conse-
quences of Academic Misconduct, can be found here, and the examiner expects that all
students complete the KnowHow Academic Integrity module (to be found on Canvas,
under KnowHow: Study for Success) before submitting any assignment.
Advice and responsibilities for students working in groups are given in Paragraph 5 of the
Code of Practice on Assessments Appendix G. The examiner expects that all students
familiarise themselves with this section of the document before commencing group work.
Page 3 of 3
学霸联盟