CSYS5040-无代写-Assignment 3
时间:2024-10-08
CSYS5040: Criticality in Dynamical Systems Assignment 3
Part 1:
Due date: Sunday, October 20th, midnight: End of Week 12.
Submit via: Video submission in Turnitin
Weighting: 20% of final mark
Length: 5 minutes minimum, 10 minutes maximum (deductions for over/under time)
You will present a brief (5-10min) summary of the data set you have
chosen to analyse, why you think it may have “complex” characteristics,
and a proposal for how you intend to approach the analysis of the data.
If you have preliminary results of your analysis you can present them
during this session.
Part 2:
Due date: Sunday, November 3rd, midnight: the end of week 13.
Submit via: TurnItIn Assignment 3 section on our Canvas site
Weighting: 40% of final mark
Length: 2,000 words max. (hard limit – there are penalties for exceeding)
Format:
There is a Mathematica document in the style of a report posted on the
Canvas Assignment 3 page you will use as a guideline to the style of
the final report
The final report is due to be submitted into
TurnItIn on the 3rd of November. It will be the implementation of what
you proposed in Part 1 of this assignment. There is a Mathematica
document in the style of a report posted on the Canvas Assignment page
you should use as a guideline to the style of the final report The
emphasis of this assignment is on using the tools we’ve covered in class
to explore the properties of a complex data set: collection, analysis,
interpretation, and conclusions. You can include
mathematical/theoretical analysis if you like, but it is not necessary
to achieve high marks. To achieve top marks you need to have chosen an
appropriate data set, applied a variety of different tools to explore
the data set, drawn appropriate conclusions, and (in the Summary
section) explained the significance of your findings in a simple manner
understandable to a work colleague. You will get higher marks for more
readable and engaging content. Your data needs to be a time series in
order
for it to be a dynamical system: If it isn’t you risk a failing grade
for this assignment. Except for those doing Project 9 below, everyone
needs to choose a dataset or theoretical model you want to use for your
project. In the introduction you need to explain what the dataset is,
where you got it from, why you think the data is likely to have
“complex” properties, why other people might find this data interesting
etc, i.e you need to place your data in a context that explains its
relevance and interest. If you are doing Project 9 you need to explain
the purpose of the workflow analysis you are describing, what types of
data your workflow is relevant for, i.e. you need to explain why the
workflow is relevant, who it is relevant for, and what types of data it
is relevant for. All reports are to be submitted in Mathematica
notebook form with all code executable directly from the notebook.
New from 2021. You will nominate one of two types of reports to write: A qualitative report or a
quantitative
report. The difference between these two approaches is as follows: At
the top of your report you have to state which of these two approaches
you are taking, so there needs to be a declaration like:
This is a qualitative report of data XYZ or
This is a quantitative report of data ABC.
Qualitative
report: Your task is to find a large number of ways (6-8) in which you
would ask a data analyst to explore this time series, explaining in
detail what each method does, what algorithm should be used (pseudocode
or plain English is fine), what the interpretation of the method should
be in the context of the data you are using, and where possible
illustrating the methods with simple pieces of executable code (i.e.
only when a simple piece of code is possible or helpful) that make it
clear what would be expected of an analyst who needs to implement these
methods. By definition,
Project 9 is a qualitative project only.
You are still expected to be able to write correct equations where
necessary, even if you don’t use MMA to try and solve or simulate them.
Quantitative report: Your task is to find a small number of ways
(2-4) in which to analyse your data, but to do so in a computationally
thorough way, going into as much detail and extending each method as
much as you can to extract as much information as possible out of the
timeseries. You then need to draw coherent, insightful conclusions from
your analysis and to place them in the content of the application you
sourced the data from. Project 9 cannot be a quantitative project. Both
approaches require you to demonstrate some computational ability and
some ability to describe what you’re doing and why. The key difference
between the two is a narrow focus on
detailed computational methods versus a broader, more discussion orientated approach to
the
analysis: In the qualitative report I expect to see relatively simple
computations from our class work illustrating some key ideas but also a
much broader exploration of methods and how they fit together in a
coherent piece of analysis where the highest marks are awarded by how
much you extend beyond the material in class. In the quantitative report
I expect fewer methods to be used and therefore fewer explanations of
techniques, but the depth of the computational analysis needs to be
significant, with the highest grades being for computational methods
that extend beyond the relatively simple code from class. See the rubric
below for the details. NOTE: The number of methods given for you to
explore (2-4 for quantitative and 6-8 for qualitative) are only
indicative. Like any assessment the real measure is not the number but
how good your work is: you use more methods for either type of report
and possibly score higher for having extended the work, but you might
not do as well because the analysis is not as thorough than if you had
chosen a smaller number. My suggestion is that you find a sweet spot
where you think you will be comfortable and work to maximise the quality
of the analysis you then carry out. Potential sources of data: 1.
Using HCTSA’s comp-engine https://www.comp-engine.org data repository
https://www.comp-engine.org/#!browse to get data to explore. 2.
“Synthetic” data: Simulate a time series using Mathematica algorithms
I’ve shown you in class, there are many different ways to do this I
haven’t covered as well, this can include running chaotic dynamical
simulations, discrete maps, and bifurcations. 3. Mathematica’s library
of data is vast:
https://reference.wolfram.com/language/ref/FinancialData.html
https://reference.wolfram.com/language/ref/CountryData.html
https://reference.wolfram.com/language/ref/WikipediaData.html
https://reference.wolfram.com/language/ref/WeatherData.html
https://reference.wolfram.com/language/ref/EarthquakeData.html and see
the references in these articles to see other data and tools. 4. You
can source your own data, but you have to believe that it will have some
complex-like
behaviour, and you will need to justify this belief
in your presentation and final report. Potential projects that are
predefined, these are only suggestions with broad guidelines, it’s up to
you to decide how to carry out each piece of analysis in detail: 1.
Chaos prediction using artificial neural networks: Convert the simple
echo-state neural network discussed in class into a Mathematica notebook
and test it on chaotic time series and timeseries with a tipping point.
The code is available in Matlab from here:
https://mantas.info/code/simple_esn/ and there is a brief discussion of
the approach at the end of the Week 5 MMA notebook. 2. Climate tipping
points analysis: Using the data available in the references in this
article, analyse the timeseries for non-linearities and tipping points:
“Past abrupt changes, tipping points and cascading impacts in the Earth
system” https://www.nature.com/articles/s41561-021-00790-5 There are a
lot of suggestions for what to look for in the data available in the
article as well as lots of different time series available in the
references. 3. Climate tipping points analysis: As for the above
question but using the data available in this article: “The anatomy of
past abrupt warmings recorded in Greenland ice”
https://www.nature.com/articles/s41467-021-22241-w 4. Neural dynamics
analysis: In this theoretical project you will study the nonlinear
dynamics, phase portraits, and timeseries of bifurcations of a single
neuron. MMA code is available here:
https://demonstrations.wolfram.com/PhasePlaneDynamicsOfFitzHughNagumoModelOfNeuronalExcitation/
and here:
https://demonstrations.wolfram.com/TwoDimensionalSodiumPlusPotassiumNeuronModel/
You will need to do further research on the use and interpretation of
these models as well. 5. Structural breaks in timeseries: One way to
measure non-linearities is to test if there are
“structural
breaks”, these are related to “tipping points” and they are points of
discontinuity in a timeseries. This project requires you to apply these
methods to a timeseries of your choice, you can see a complete worked
example in MMA for financial timeseries here:
https://community.wolfram.com/groups/-/m/t/1749226 6. Classifying
stocks by their “fingerprints”: In this project you will replicate and
then
extend the analysis that has already been carried out in this
post on using MMA for this purpose: “In the stock market, the returns (%
change in price of the stock between days) of any particular stock
create a fingerprint unique to that stock. For example, the returns of
Tesla and Johnson & Johnson are very different, since they are
shaped by variable factors such as industry, size, and volatility. In
this project, I used machine learning to analyze the "return
fingerprint" of stocks in the S&P 500. Would the computer tell me
that Facebook and Twitter are similar, if I gave it no context? To
isolate the return fingerprint from the rest of the variable factors, I
trained the computer on pure Date List Plots. I could analyze the impact
of the variable factors, how people can use what they know about
particular stocks to get a better overview of the market, and the
accuracy and precision of computer results. I aimed to correctly group
stocks by fingerprint (within the time frame of a year), and analyze the
correlation within the subgroups.”
https://community.wolfram.com/groups/-/m/t/1732654
7. The
Visibility Graph for timeseries analysis: This project is based on the
ideas in the paper “From time series to complex networks: The visibility
graph” https://www.pnas.org/content/pnas/105/13/4972.full.pdf You can
apply these techniques to any timeseries data you like. 8. Network
analysis of the S&P500: Using either (or both) the visibility graph
or the RQA network analysis analyse the S&P500 before and after a
structural break in the timeseries. The structural break may be based on
known market events (like the Global Financial Crisis) or detected
using methods described in Project 5 above. Are there differences in the
networks before and after a significant event, and what does this tell
us about the underlying properties of the market? 9. Timeseries
anomalies, breaks, and outliers detection: This project this project is
very general and based on the diagram of relations you can see here:
https://tinyurl.com/u6trcjw2 This project is to “unpack” these measures
of timeseries anomalies and breaks into a workflow for the analysis of
timeseries that an analyst could follow in order to test a timeseries
for interesting properties related to nonlinearities, breaks etc.
Starting from what you consider to be the most important tests to apply
(from class, the list in the diagram at the url above, your own
research) you will need to briefly and accurately explain each method
and then use illustrative (but not complex) pieces of code where
possible to help explain the methods and their implementation (i.e. not
every method needs to have code, but every method has to have a good
description). You will also find the workflow analysis in Week 7 useful
as well as the diagrams showing the relationship between different
aspects of the work we’ve covered at the end of Week 5. 10. Using OECD
housing market data: Take the housing OECD housing data available here:
https://data.oecd.org/price/housing-prices.htm and look for interesting
behaviour I the time series data. I have previously analysed this data
for a Germany conference and you can recreate the results I got, and
then extend this work using some other methods that aren’t in the
summary report I wrote. 11. Theoretical data: Simulate a time “complex”
series and analyse it using the tools covered in class. Any and all of
the methods that are relevant from the 10projects listed above can also
be applied in this project, providing they are suitable to the task of
exploring/analysing your data. You will need to explain what fields of
research make use of these theoretical simulations. These are only
guidelines, you can use your own data and develop your own path for
analysing that data, you can also take data from one project listed
above and apply the techniques used in another. For example, you can use
the break point analysis methods applied to climate change data or
simulated data. The key aspects are: nonlinear data and the application
of methods to detect,
analyse, and explore those nonlinearities
through methods covered in class or that you have discovered through
your own research of the literature. To let you know what will be
covered later so you can plan your project, methods that we’ve already
looked at and those that we will cover later (starred entries we will
cover later): 1. *Autocorrelation/variance/kurtosis/skewness as measures
of predicting tipping points 2. *Hurst exponents 3. Lyapunov exponents
4. RQA analysis (see the measures described in the book referenced for
this course) 5. *RQA network analysis 6. *Visibility graph 7. *Bimodal
distributions
8. Fractal Dimensions 9. *Moving averages Methods
we haven’t covered but you might consider in your project (there are
many others for you to look into if you choose to): 10. Structural
breaks 11. Detrended Fluctuation Analysis (DFA) 12. Entropy (covered in
CSYS5030) 13. Mutual information at critical points (covered in
CSYS5030) 14. 0-1 test for chaos (as mentioned in the workflow diagram
of Week 7) I suggest you set up an initial plan, based on the workflow
analysis discussed in Week 7, in order to make clear what you intend to
do and how you intend to go about doing it. Of course, the questions
that are asked in that workflow analysis are the sorts of questions you
need to communicate answers to in your report: Who, Why, What, and How.
Rubrics In the rubric the blue text refers to the qualitative
report assessment criteria and the red text refers to the quantitative
report assessment criteria. Everything in black is common to both
reports.
Part 1: Oral Report
Pass: 1. Hasn’t quite kept to
time but within reasonable bounds, 2. Reasonable connection from one
idea to the next in the flow of the talk, 3. Describes a potential data
set, it’s a time series, but no discussion of content, 4. Some attempt
made to justify the data as being of interest, 5. Has a moderate
understanding of the methods that will be implemented, 6. The
justification for the data being complex is present but unclear,
Credit,
as for a Pass but including 1. Time is within bounds, 2. Good use of
slides/presentation (slides not cluttered, well structured, communicates
ideas to supplement the talk), 3. Explains the background to the data,
where it’s come from, why it’s interesting, who might be interested in
this sort of data/analysis, 4. Has some justification for the
complexity/criticality of the data chosen, 5. Has preliminary look at
the data with some plots (1st law of data analysis: plot your data),
both qualitative and quantitative should have preliminary plots that you
are able to discuss,
Distinction, as for Credit and Pass but
including: 1. Is able to interpret the initial plots of the data clearly
and in such a way that preliminary insights are demonstrated, 2. Has a
clear analytical approach to how the next stage of the workflow is to
proceed, and makes clear the reasons for the approach adopted, 3. Has
read the literature on this dataset or a similar dataset and uses this
literature to justify their expectations regarding the complexity of the
data,
High Distinction, as for Distinction, Credit and Pass
but including: 1. Initial analysis of the plots includes insights that
are then used to guide how the next phase of the work is to be carried
out, 2. A detailed, specific and rigorous plan for the next steps is
presented along with sound justifications based on material covered in
class or from the literature, 3. Has a sound grasp of the literature
around the dataset being used, cites the relevant literature, recent
results, why these results are significant within both
complexity/non-linearity theory and the applied field from the data came
from (or if it’s synthetic data the field(s) that would be interested
in the analysis), 4. Preliminary results are presented, discussed, and a
discussion of progress to date: what worked, what didn’t, lessons
learned.
Part 2: Final Written Report
Pass: 1. Has kept
approximately to the word limit, 2. Has an Introduction to the data, a
main body of the article possibly split into multiple sections, and a
conclusion, 3. The English is readable, 4. The background material
describes the data and where it came from, what field it’s relevant to,
5. Some methods are applied or discussed in the context of the time
series data but with little comprehension of what they mean or how to
interpret them, 6. The conclusion is a list of disconnected and
unrelated results from the analysis with no clear insights apparent.
Credit,
as for a Pass but including or improving in the following ways 1. Has
kept below the maximum word limit and is above 1000 words 2. Well
structured sections and subsections, appropriately titled, with a
suitable flow from one idea to the next, 3. The English is suitable for a
professional report, 4. Background to the data is presented early,
familiarizing the reader with the data, including some preliminary
plots. Both qualitative and quantitative reports should have preliminary
plots. 5. The analysis of the data looks for criticality/non-linearity,
uses some appropriate measures, reports the results in a sensible and
readable fashion. Alternatively, the discussion of the methods that are
covered are factually correct with some pseudocode or executable code.
6. The interpretation of the analysis is clear, functional, and shows
some understanding of the multifaceted aspects of the data.
Distinction,
as for Credit and Pass but including: 1. The introduction is a clear,
accurate summary of the data, the field(s) that are relevant to the
analysis, and why it’s relevant for complex/non-linear systems theory in
general. 2. Background to the data includes suitable plots of the data
that communicate interesting aspects that suggest the non-linear nature
of the data.
3. The data analysis sections demonstrate a clear
understanding of the methods used and how to interpret the results from
those methods. The tools used explore the multi-faceted aspects of the
data in such a way that the non-linearities are made explicit to the
reader. The presentation of the results in the form of tables, plots
etc. are clear and appropriate. Alternatively, the discussion of the
methods that are covered are factually correct, clear, and
understandable with appropriate pseudocode or executable code where
appropriate with some clear insights into the relationship between
different methods and how they can be combined to make the multifaceted
aspects of the data clearer. 4. The summary and conclusions draw
together the results or methods in a coherent fashion.
High
Distinction, as for Distinction, Credit and Pass but including: 1.
Introduction includes a referenced review of the field from which the
data comes (or if it’s synthetic data the field(s) that would be
interested in the analysis). This includes the relevance for
non-linear/complex systems theory. The references are not to be included
in the word count. 2. Background to the data includes suitable plots
of the data that communicate interesting aspects that suggest the
non-linear nature of the data. Comparative analysis with linear data or
some other alternative that makes clear precisely why the reader should
interpret these plots as non-linear or critical, clearly highlighting
the points of similarity and differences. The data analysis code extends
beyond what has been used in class. Alternatively, background to the
data includes suitable plots of the data that communicate interesting
aspects that suggest the non-linear nature of the data. Multiple methods
are described in a clear and insightful way, highlighting their
relevance for the specifics of the data being analysed, the simple
illustrative pieces of code are well chosen to emphasise key aspects of
algorithms in relation to the data. At least one method is introduced
that is not in the course syllabus and it is clearly and concisely
described. Comparisons with linear methods that might not work on your
data are provided. 3. The summary and conclusions show further insight
into the data set and suggest other areas for further analysis,
justified by a discussion of the results of the analysis.