CSYS5040-无代写-Assignment 3|学霸联盟

CSYS5040-无代写-Assignment 3

时间：2024-10-08

CSYS5040: Criticality in Dynamical Systems Assignment 3
Part 1:
Due date: Sunday, October 20th, midnight: End of Week 12.
Submit via: Video submission in Turnitin
Weighting: 20% of final mark
Length: 5 minutes minimum, 10 minutes maximum (deductions for over/under time)
You will present a brief (5-10min) summary of the data set you have chosen to analyse, why you think it may have “complex” characteristics, and a proposal for how you intend to approach the analysis of the data. If you have preliminary results of your analysis you can present them during this session.

Part 2:
Due date: Sunday, November 3rd, midnight: the end of week 13.
Submit via: TurnItIn Assignment 3 section on our Canvas site
Weighting: 40% of final mark
Length: 2,000 words max. (hard limit – there are penalties for exceeding)
Format: There is a Mathematica document in the style of a report posted on the Canvas Assignment 3 page you will use as a guideline to the style of the final report
The final report is due to be submitted into TurnItIn on the 3rd of November. It will be the implementation of what you proposed in Part 1 of this assignment. There is a Mathematica document in the style of a report posted on the Canvas Assignment page you should use as a guideline to the style of the final report The emphasis of this assignment is on using the tools we’ve covered in class to explore the properties of a complex data set: collection, analysis, interpretation, and conclusions. You can include mathematical/theoretical analysis if you like, but it is not necessary to achieve high marks. To achieve top marks you need to have chosen an appropriate data set, applied a variety of different tools to explore the data set, drawn appropriate conclusions, and (in the Summary section) explained the significance of your findings in a simple manner understandable to a work colleague. You will get higher marks for more readable and engaging content. Your data needs to be a time series in
order for it to be a dynamical system: If it isn’t you risk a failing grade for this assignment. Except for those doing Project 9 below, everyone needs to choose a dataset or theoretical model you want to use for your project. In the introduction you need to explain what the dataset is, where you got it from, why you think the data is likely to have “complex” properties, why other people might find this data interesting etc, i.e you need to place your data in a context that explains its relevance and interest. If you are doing Project 9 you need to explain the purpose of the workflow analysis you are describing, what types of data your workflow is relevant for, i.e. you need to explain why the workflow is relevant, who it is relevant for, and what types of data it is relevant for. All reports are to be submitted in Mathematica notebook form with all code executable directly from the notebook.
New from 2021. You will nominate one of two types of reports to write: A qualitative report or a
quantitative report. The difference between these two approaches is as follows: At the top of your report you have to state which of these two approaches you are taking, so there needs to be a declaration like:
This is a qualitative report of data XYZ or
This is a quantitative report of data ABC.
Qualitative report: Your task is to find a large number of ways (6-8) in which you would ask a data analyst to explore this time series, explaining in detail what each method does, what algorithm should be used (pseudocode or plain English is fine), what the interpretation of the method should be in the context of the data you are using, and where possible illustrating the methods with simple pieces of executable code (i.e. only when a simple piece of code is possible or helpful) that make it clear what would be expected of an analyst who needs to implement these methods. By definition,
Project 9 is a qualitative project only. You are still expected to be able to write correct equations where necessary, even if you don’t use MMA to try and solve or simulate them.
Quantitative report: Your task is to find a small number of ways (2-4) in which to analyse your data, but to do so in a computationally thorough way, going into as much detail and extending each method as much as you can to extract as much information as possible out of the timeseries. You then need to draw coherent, insightful conclusions from your analysis and to place them in the content of the application you sourced the data from. Project 9 cannot be a quantitative project. Both approaches require you to demonstrate some computational ability and some ability to describe what you’re doing and why. The key difference between the two is a narrow focus on
detailed computational methods versus a broader, more discussion orientated approach to
the analysis: In the qualitative report I expect to see relatively simple computations from our class work illustrating some key ideas but also a much broader exploration of methods and how they fit together in a coherent piece of analysis where the highest marks are awarded by how much you extend beyond the material in class. In the quantitative report I expect fewer methods to be used and therefore fewer explanations of techniques, but the depth of the computational analysis needs to be significant, with the highest grades being for computational methods that extend beyond the relatively simple code from class. See the rubric below for the details. NOTE: The number of methods given for you to explore (2-4 for quantitative and 6-8 for qualitative) are only indicative. Like any assessment the real measure is not the number but how good your work is: you use more methods for either type of report and possibly score higher for having extended the work, but you might not do as well because the analysis is not as thorough than if you had chosen a smaller number. My suggestion is that you find a sweet spot where you think you will be comfortable and work to maximise the quality of the analysis you then carry out. Potential sources of data: 1. Using HCTSA’s comp-engine https://www.comp-engine.org data repository https://www.comp-engine.org/#!browse to get data to explore. 2. “Synthetic” data: Simulate a time series using Mathematica algorithms I’ve shown you in class, there are many different ways to do this I haven’t covered as well, this can include running chaotic dynamical simulations, discrete maps, and bifurcations. 3. Mathematica’s library of data is vast:
https://reference.wolfram.com/language/ref/FinancialData.html https://reference.wolfram.com/language/ref/CountryData.html https://reference.wolfram.com/language/ref/WikipediaData.html https://reference.wolfram.com/language/ref/WeatherData.html https://reference.wolfram.com/language/ref/EarthquakeData.html and see the references in these articles to see other data and tools. 4. You can source your own data, but you have to believe that it will have some complex-like
behaviour, and you will need to justify this belief in your presentation and final report. Potential projects that are predefined, these are only suggestions with broad guidelines, it’s up to you to decide how to carry out each piece of analysis in detail: 1. Chaos prediction using artificial neural networks: Convert the simple echo-state neural network discussed in class into a Mathematica notebook and test it on chaotic time series and timeseries with a tipping point. The code is available in Matlab from here: https://mantas.info/code/simple_esn/ and there is a brief discussion of the approach at the end of the Week 5 MMA notebook. 2. Climate tipping points analysis: Using the data available in the references in this article, analyse the timeseries for non-linearities and tipping points: “Past abrupt changes, tipping points and cascading impacts in the Earth system” https://www.nature.com/articles/s41561-021-00790-5 There are a lot of suggestions for what to look for in the data available in the article as well as lots of different time series available in the references. 3. Climate tipping points analysis: As for the above question but using the data available in this article: “The anatomy of past abrupt warmings recorded in Greenland ice” https://www.nature.com/articles/s41467-021-22241-w 4. Neural dynamics analysis: In this theoretical project you will study the nonlinear dynamics, phase portraits, and timeseries of bifurcations of a single neuron. MMA code is available here: https://demonstrations.wolfram.com/PhasePlaneDynamicsOfFitzHughNagumoModelOfNeuronalExcitation/ and here: https://demonstrations.wolfram.com/TwoDimensionalSodiumPlusPotassiumNeuronModel/ You will need to do further research on the use and interpretation of these models as well. 5. Structural breaks in timeseries: One way to measure non-linearities is to test if there are
“structural breaks”, these are related to “tipping points” and they are points of discontinuity in a timeseries. This project requires you to apply these methods to a timeseries of your choice, you can see a complete worked example in MMA for financial timeseries here: https://community.wolfram.com/groups/-/m/t/1749226 6. Classifying stocks by their “fingerprints”: In this project you will replicate and then
extend the analysis that has already been carried out in this post on using MMA for this purpose: “In the stock market, the returns (% change in price of the stock between days) of any particular stock create a fingerprint unique to that stock. For example, the returns of Tesla and Johnson & Johnson are very different, since they are shaped by variable factors such as industry, size, and volatility. In this project, I used machine learning to analyze the "return fingerprint" of stocks in the S&P 500. Would the computer tell me that Facebook and Twitter are similar, if I gave it no context? To isolate the return fingerprint from the rest of the variable factors, I trained the computer on pure Date List Plots. I could analyze the impact of the variable factors, how people can use what they know about particular stocks to get a better overview of the market, and the accuracy and precision of computer results. I aimed to correctly group stocks by fingerprint (within the time frame of a year), and analyze the correlation within the subgroups.” https://community.wolfram.com/groups/-/m/t/1732654
7. The Visibility Graph for timeseries analysis: This project is based on the ideas in the paper “From time series to complex networks: The visibility graph” https://www.pnas.org/content/pnas/105/13/4972.full.pdf You can apply these techniques to any timeseries data you like. 8. Network analysis of the S&P500: Using either (or both) the visibility graph or the RQA network analysis analyse the S&P500 before and after a structural break in the timeseries. The structural break may be based on known market events (like the Global Financial Crisis) or detected using methods described in Project 5 above. Are there differences in the networks before and after a significant event, and what does this tell us about the underlying properties of the market? 9. Timeseries anomalies, breaks, and outliers detection: This project this project is very general and based on the diagram of relations you can see here: https://tinyurl.com/u6trcjw2 This project is to “unpack” these measures of timeseries anomalies and breaks into a workflow for the analysis of timeseries that an analyst could follow in order to test a timeseries for interesting properties related to nonlinearities, breaks etc. Starting from what you consider to be the most important tests to apply (from class, the list in the diagram at the url above, your own research) you will need to briefly and accurately explain each method and then use illustrative (but not complex) pieces of code where possible to help explain the methods and their implementation (i.e. not every method needs to have code, but every method has to have a good description). You will also find the workflow analysis in Week 7 useful as well as the diagrams showing the relationship between different aspects of the work we’ve covered at the end of Week 5. 10. Using OECD housing market data: Take the housing OECD housing data available here: https://data.oecd.org/price/housing-prices.htm and look for interesting behaviour I the time series data. I have previously analysed this data for a Germany conference and you can recreate the results I got, and then extend this work using some other methods that aren’t in the summary report I wrote. 11. Theoretical data: Simulate a time “complex” series and analyse it using the tools covered in class. Any and all of the methods that are relevant from the 10projects listed above can also be applied in this project, providing they are suitable to the task of exploring/analysing your data. You will need to explain what fields of research make use of these theoretical simulations. These are only guidelines, you can use your own data and develop your own path for analysing that data, you can also take data from one project listed above and apply the techniques used in another. For example, you can use the break point analysis methods applied to climate change data or simulated data. The key aspects are: nonlinear data and the application of methods to detect,
analyse, and explore those nonlinearities through methods covered in class or that you have discovered through your own research of the literature. To let you know what will be covered later so you can plan your project, methods that we’ve already looked at and those that we will cover later (starred entries we will cover later): 1. *Autocorrelation/variance/kurtosis/skewness as measures of predicting tipping points 2. *Hurst exponents 3. Lyapunov exponents 4. RQA analysis (see the measures described in the book referenced for this course) 5. *RQA network analysis 6. *Visibility graph 7. *Bimodal distributions
8. Fractal Dimensions 9. *Moving averages Methods we haven’t covered but you might consider in your project (there are many others for you to look into if you choose to): 10. Structural breaks 11. Detrended Fluctuation Analysis (DFA) 12. Entropy (covered in CSYS5030) 13. Mutual information at critical points (covered in CSYS5030) 14. 0-1 test for chaos (as mentioned in the workflow diagram of Week 7) I suggest you set up an initial plan, based on the workflow analysis discussed in Week 7, in order to make clear what you intend to do and how you intend to go about doing it. Of course, the questions that are asked in that workflow analysis are the sorts of questions you need to communicate answers to in your report: Who, Why, What, and How.

Rubrics In the rubric the blue text refers to the qualitative report assessment criteria and the red text refers to the quantitative report assessment criteria. Everything in black is common to both reports.
Part 1: Oral Report
Pass: 1. Hasn’t quite kept to time but within reasonable bounds, 2. Reasonable connection from one idea to the next in the flow of the talk, 3. Describes a potential data set, it’s a time series, but no discussion of content, 4. Some attempt made to justify the data as being of interest, 5. Has a moderate understanding of the methods that will be implemented, 6. The justification for the data being complex is present but unclear,
Credit, as for a Pass but including 1. Time is within bounds, 2. Good use of slides/presentation (slides not cluttered, well structured, communicates ideas to supplement the talk), 3. Explains the background to the data, where it’s come from, why it’s interesting, who might be interested in this sort of data/analysis, 4. Has some justification for the complexity/criticality of the data chosen, 5. Has preliminary look at the data with some plots (1st law of data analysis: plot your data), both qualitative and quantitative should have preliminary plots that you are able to discuss,
Distinction, as for Credit and Pass but including: 1. Is able to interpret the initial plots of the data clearly and in such a way that preliminary insights are demonstrated, 2. Has a clear analytical approach to how the next stage of the workflow is to proceed, and makes clear the reasons for the approach adopted, 3. Has read the literature on this dataset or a similar dataset and uses this literature to justify their expectations regarding the complexity of the data,

High Distinction, as for Distinction, Credit and Pass but including: 1. Initial analysis of the plots includes insights that are then used to guide how the next phase of the work is to be carried out, 2. A detailed, specific and rigorous plan for the next steps is presented along with sound justifications based on material covered in class or from the literature, 3. Has a sound grasp of the literature around the dataset being used, cites the relevant literature, recent results, why these results are significant within both complexity/non-linearity theory and the applied field from the data came from (or if it’s synthetic data the field(s) that would be interested in the analysis), 4. Preliminary results are presented, discussed, and a discussion of progress to date: what worked, what didn’t, lessons learned.
Part 2: Final Written Report
Pass: 1. Has kept approximately to the word limit, 2. Has an Introduction to the data, a main body of the article possibly split into multiple sections, and a conclusion, 3. The English is readable, 4. The background material describes the data and where it came from, what field it’s relevant to, 5. Some methods are applied or discussed in the context of the time series data but with little comprehension of what they mean or how to interpret them, 6. The conclusion is a list of disconnected and unrelated results from the analysis with no clear insights apparent.
Credit, as for a Pass but including or improving in the following ways 1. Has kept below the maximum word limit and is above 1000 words 2. Well structured sections and subsections, appropriately titled, with a suitable flow from one idea to the next, 3. The English is suitable for a professional report, 4. Background to the data is presented early, familiarizing the reader with the data, including some preliminary plots. Both qualitative and quantitative reports should have preliminary plots. 5. The analysis of the data looks for criticality/non-linearity, uses some appropriate measures, reports the results in a sensible and readable fashion. Alternatively, the discussion of the methods that are covered are factually correct with some pseudocode or executable code. 6. The interpretation of the analysis is clear, functional, and shows some understanding of the multifaceted aspects of the data.
Distinction, as for Credit and Pass but including: 1. The introduction is a clear, accurate summary of the data, the field(s) that are relevant to the analysis, and why it’s relevant for complex/non-linear systems theory in general. 2. Background to the data includes suitable plots of the data that communicate interesting aspects that suggest the non-linear nature of the data.
3. The data analysis sections demonstrate a clear understanding of the methods used and how to interpret the results from those methods. The tools used explore the multi-faceted aspects of the data in such a way that the non-linearities are made explicit to the reader. The presentation of the results in the form of tables, plots etc. are clear and appropriate. Alternatively, the discussion of the methods that are covered are factually correct, clear, and understandable with appropriate pseudocode or executable code where appropriate with some clear insights into the relationship between different methods and how they can be combined to make the multifaceted aspects of the data clearer. 4. The summary and conclusions draw together the results or methods in a coherent fashion.
High Distinction, as for Distinction, Credit and Pass but including: 1. Introduction includes a referenced review of the field from which the data comes (or if it’s synthetic data the field(s) that would be interested in the analysis). This includes the relevance for non-linear/complex systems theory. The references are not to be included in the word count. 2. Background to the data includes suitable plots of the data that communicate interesting aspects that suggest the non-linear nature of the data. Comparative analysis with linear data or some other alternative that makes clear precisely why the reader should interpret these plots as non-linear or critical, clearly highlighting the points of similarity and differences. The data analysis code extends beyond what has been used in class. Alternatively, background to the data includes suitable plots of the data that communicate interesting aspects that suggest the non-linear nature of the data. Multiple methods are described in a clear and insightful way, highlighting their relevance for the specifics of the data being analysed, the simple illustrative pieces of code are well chosen to emphasise key aspects of algorithms in relation to the data. At least one method is introduced that is not in the course syllabus and it is clearly and concisely described. Comparisons with linear methods that might not work on your data are provided. 3. The summary and conclusions show further insight into the data set and suggest other areas for further analysis, justified by a discussion of the results of the analysis.