Python代写-IIDS67682-Assignment 1
时间:2022-02-20
IIDS67682 Machine Learning and Advanced Data Methods
Assignment 1: Unsupervised Learning

Deadline for submission: Tuesday March 1st, 12 noon.

For this assignment you should write a short report (maximum 500 words, not including figures and
tables) presenting the analytical methods you have undertaken and your results. You should also
submit a jupyter notebook containing all the code required to produce the results and figures
presented in your report. A complete submission should comprise both the report and jupyter
notebook uploaded via the course blackboard site.

In the Assignment1.ipynb file you will find code to read in data from the paper ͞LJƚŽŬŝŶĞ
Responses to Rhinovirus and Development of Asthma, Allergic Sensitization, and Respiratory
/ŶĨĞĐƚŝŽŶƐĚƵƌŝŶŐŚŝůĚŚŽŽĚ͟. This dataset contains the logged fold change in cytokine
concentrations after stimulation of PBMCs by rhinovirus. The PBMCs were extracted from blood
samples taken from children in the MAAS birth cohort. The cluster assignments from the paper,
obtained using a Gaussian mixture model, are also included (GMM1-6).

Tasks

1. Use PCA to visualise the GMM clusters in 2D and to display the cytokines that contribute most
to the first two principal components. How much variance in the data is explained by the first
two principal components?

2. Use the PCA visualisation of the data to suggest cytokines that are likely to be useful indicators
of cluster membership. Are your results consistent with the patterns observed in Figure 1 of the
paper?

3. Use the k-means algorithm to cluster the children into 6 clusters based on their cytokine
profiles. Compare your new clustering to the original cluster assignments from the paper.
Which of your clusters (if any) agree well with the original clustering?

4. Use the distortion elbow method (given in the final cell of the clustering lab solution notebook)
to select the optimal number of clusters. How are the clustering results affected?

5. Use hierarchical clustering to cluster the cytokines based on their levels across children.
Compare the clustering you obtain with the cytokine categories (IFN, Inflam, TH2-chem and
Reg) given in Figure 1 of the paper. (hint: to cluster features it may be useful to transpose your
dataframe using data_transposed = data.T)

Marking Scheme
Report (80% of marks): Your report should briefly describe your methods and the results that you
obtain for the four tasks above. It should not exceed 500 words excluding figures and tables.
Figures should be numbered and have captions. Figures should be referred to in the main text.
Code (20% of marks): Your jupyter notebook should produce all the figures and results in your
report. Markdown should be used to explain what the code is doing in each step.




essay、essay代写