MXB361
Module 1: Complex Systems and Networks
Assessment Part A: Complex Systems and Computation
This assignment counts for 60% of the MXB361 final score and focusses on the
material covered in weeks 1-6 of Module 1. It is due at midnight on the Tuesday of the week
following the last week of teaching of Semester 1. This is to be completed individually. The
answers will be run through plagiarism checking software (please don’t even think about it!).
Your submissions should be in the form of a report answering the questions below. You must
provide a code sample for each question where code writing is involved; and images as
requested. Give all answers to at least 2 decimal points.
Part I (Complex Systems Quantitative Analysis – 40 marks in total)
You have just landed a dream job as the head of strategy at an evil but old-fashioned,
non-quantitative foreign exchange trading hedge fund, having out-muscled, out-witted and out-
computed your rivals. You have been given a remit by the boss of the firm, Mr Fatman Rich,
to revolutionise their inept investment strategies. This you are determined to accomplish, in
order to get a massive bonus so you can buy yourself a huge jacuzzi and fill it with chocolate
and gold, or whatever other material goods you desire.
The fund invests only in the top 22 most traded “tickers”, which includes 19 national
currencies, as well as bitcoin (BTC), gold (XAU) and silver (XAG). Because you took
MXB361, your first step is to download the firm’s stock market data with a view to devising a
new, computationally sound investment strategy. The data set (‘data’ matrix) has the past 7110
days (about 30 years) of ticker prices for the 22 tickers – these are the prices daily of each
currency against the US dollar. The ticker prices, corresponding to the codes for each currency
are in the ‘tickers’ matrix and the dates are in the ‘times’ matrix.
1. Load the data into MATLAB. Look only at the last 504 days of trading prices.
Divide each ticker’s price by its initial opening price 504 days ago so that they all
start at 1. Plot these corrected prices against time (for the past two years) for all
tickers. Plot, in a 22 x 22 plot array, all the corrected prices against one another.
Provide the code you used and the respective images in an image format of your
choice. Briefly describe 5 things you see. (5 marks).
2. Using your corrected prices over the past two years of data, compute the corrected
daily returns (i.e. the % daily change in the value) of each ticker. Using this,
compute the correlation matrix of daily returns. Check in MATLAB whether the
matrix is symmetric and provide evidence whether it is or not. Visualise the
correlations matrix appropriately. What is the determinant of this matrix? In 10
words or less, give a geometric interpretation of the determinant of a matrix (5
marks).
3. The Marcenko-Pastur distribution describes asymptotic behaviour of the
eigenvalues of random matrices. Plot this distribution as a family of curves for
different parameter values, N and T, respectively, where N is the number of series
and T is their length. Give expressions for the two points along the x-axis at which
the MP distribution curves departs from the horizontal axis (i.e. where the MP
distribution is > 0), as functions of N and T. What is the significance of any of the
eigenvalues of a given non-random matrix derived from time series data falling
between these two values? What about those falling outside the range? (5 marks).
4. Study the network derived from the correlation matrix you just computed. Is it a
directed or undirected matrix? Plot a histogram of its eigenvalues and comment on
it (100 words). Plot the distribution of the node degrees and comment on it (100
words). What is the mean correlation between any two randomly chosen ticker price
returns in the past year? What is the distribution of the correlations between any
two randomly chosen ticker price returns in that time? What is the standard
deviation? Comment on the above in 250 words or less (5 marks).
5. Your evil employer’s investment strategy focusses around choosing currencies that
its analysts believe are geographic regions (Asia, Europe, etc) that are “on the rise”
and avoiding those that are “falling” relative to the USD. Naturally, having taken
MXB361, you are sceptical of such an approach absent evidence in its favour. You
decide to use community detection methods to see if there are any distinct
communities in the ticker returns over the last year and if so, if they overlap with
the GICS sectors. Your first step is to write your own community detection code.
Choose an algorithm you’ve come across (such as Potts, Louvain, etc.) or devise
your own and write a piece of code in MATLAB that takes as input an adjacency
matrix (for example, a correlation matrix) and maximises the modularity, outputting
the communities within the network and a value for the estimated maximum
modularity. Provide evidence that the method works on a small network you have
devised (5 marks).
6. In order to feed a noise-cleaned network into your community detection code, you
need to eliminate some eigenvectors from the correlations matrix you computed.
Based on you answer to Q5 above, set the eigenvalues that you think are
indistinguishable from noise to 0 and reconstitute the matrix using only the non-
noise eigenvalues/vectors. Study the degree distribution of this new
matrix/network. How, if at all, is it different from your answer in Q5? What
proportion of the eigenvalues were eliminated in this way? What does this
proportion tell you about the level of noise in the initial correlation matrix? How
would this all change if the values of T and N changes? In particular, for what
combination(s) of T and N is the correlation matrix indistinguishable from a random
matrix, and justify the reason for this claim using the expressions you reported in
your answer from Q3 (5 marks).
7. Apply your community detection code to the new, noise-cleaned matrix in order to
find modules in your data. Are there any and if so, how many? What is the
modularity maximum your code found? Plot on a world map bubbles/dots
corresponding to each currency placed over the geographic coordinates of each
country’s capital city (for bitcoin, gold and silver, just place the dots somewhere
“empty” to the side, e.g. over the Pacific). Provide the image and code. Does the
membership of these modules overlap with the geographic regions? If not, how big
is the difference? Do their “locations” make any sense, with respect to their
economic activity (you may need to do a bit of research on that latter aspect)? Where
do bitcoin, gold and silver localise to (obviously, they don’t belong to any country),
module membership-wise? Any comments on these findings? Finally, what is the
implication of these findings for your firm’s current investment strategy in Q6? (5
marks).
8. Based on the above, devise a new, computationally driven investment strategy for
your firm. Explain the outlines of your strategy to your CEO in a one-page
document that includes one image, along with an estimate of the % return your
strategy would make yearly. You may wish to discuss the effect of over-fitting
and/or keep some data for model fitting and some for validation. (5 marks).
Part II (Computational Complexity and OtherAreas of Module I – 20 marks in total)
9. “We are living in a period in human history in which new data is generated at such
a rate and mis-analysed so badly that most of us can’t make sense of the world
around us. This is similar to the 1500s when, following the invention of the printing
press but before literacy was widespread, there followed perhaps the bloodiest
period in human history”. Do you agree or disagree? Why or why not? Discuss in
500 words or so (10 marks).
10. The permanent of a matrix is defined exactly like the determinant but instead of
alternating signs between the terms, it’s all +’s, like this:
Write code to compute the permanent for a given n x n real-valued matrix. What
is the computational complexity (in time) of your algorithm? Can you find a way
to do any better? It’s thought that the permanent is computationally intractable –
i.e. exponential time (in n) is required for its computation, a problem related to
“P?=NP” in ways that we don’t yet understand. Why would the determinant be
polynomial time computable (it is, otherwise we couldn’t solve systems of linear
equations or do pretty much anything useful with a computer), but the permanent,
so similar in its definition, be intractable to compute? Discuss in 500 words or so
(10 marks).
学霸联盟