生物代写-BINF90002
时间:2021-06-15
  BINF90002 Semester 1 2020 Final exam    Academic Integrity Declaration  By commencing and/or submitting this assessment I agree that I have read and  understood the  University’s policy on academic integrity.  I also agree that:  1. Unless paragraph 2 applies, the work I submit will be original and solely my own  work (cheating);  2. I will not seek or receive any assistance from any other person (collusion) except  where the work is for a designated collaborative task, in which case the  individual contributions will be indicated; and,  3. I will not use any sources without proper acknowledgment or referencing  (plagiarism).  4. Where the work I submit is a computer program or code, I will ensure that:  a. any code I have copied is clearly noted by identifying the source of that code  at the start of the program or in a header file or, that comments inline identify  the start and end of the copied code; and  b. any modifications to code sourced from elsewhere will be commented upon  to show the nature of the modification.  This exam opens at 9.00 AM Australian Eastern Standard Time (AEST) on Thursday  02/07/2020 in Canvas (lms.unimelb.edu.au). The exam must be completed by 3.30  PM AEST on Thursday 02/07/2020. This exam has 30 minutes of reading time, and  120 minutes of writing time. You have a 6 hour window in which to complete and  submit the exam.    Number of pages: This paper has 7 pages, including this cover page  Authorised Materials: This is an open book exam. All material delivered during  the teaching period and student notes are permitted.    Instructions to Students:  The total number of marks for the examination is 100. It accounts for 50% of your  final result for the subject. You should attempt all questions.     Write your answers in a separate word document and upload your document as a  word or pdf document as your submission in the Assignment for the Semester 1  exam.      You may include scanned drawings to illustrate your answer(s) if you wish but they  must be appropriately embedded within the document that is your final submission.     Page 1 of 7   Section 1. Short-answer questions (50 marks total)  1. 5 possible variants have been summarised by a variant finding program. The locations  (marked in red vertical lines) of the variants and the affected codon are shown on the  diagram above. For each variant (1-5), use the VCF entry and codon table to comment  on the type of mutation that may have occurred, and the effect the variant may have on  the gene transcript or its protein product.    (5 marks) 2. In your work for the Pathology department in a major public cancer hospital, you have  been asked to design a new whole genome sequencing test that will be applied to  every cancer patient who visits the hospital.      a) How deep will you sequence the tumour and matched normal samples?   b) Would you check the data for sample swaps and, if so, how?   c) What types of mutations should the assay cover? For each mutation type, what signal in  the sequencing reads would you expect the caller to use?  d) What other downstream analyses might you include? (5 marks) Page 2 of 7   3. You have been tasked with developing two diagnostic tests to identify bacterial  pathogens.     Test_1 must be able to detect the bacterial pathogen species B. pathogenesis in patient  throat swab samples. The samples are expected to contain many bacterial species due  to host microbiome contamination. B. pathogenesis has a very limited accessory  genome compared to the other species which may be present in a sample. There is a  reasonable evolutionary distance in terms of SNPs between B. pathogenesis and other  bacteria.    Test_2 must be able to detect and differentiate between strains of a pathogenic species  B. nauseous. Different strains of B. nauseous have high levels of sequence homology,  but each strain has unique genetic regions compared to the other strains within a large  accessory genome.    For each of the two tests above state:    a) which of the following metagenomics sequencing approaches would be the most  appropriate; 16S amplicon, MLST, or WGS.    b) whether short-read or long-read sequencing would be most suited to that test, and  explain why.    (4 marks)   4. A study was performed to look for gene expression changes in cultured cells when  treated with a new drug, melbuximab. Bulk RNA-seq was performed on samples from  control (C) and drug treated (T) cell cultures of an immortalized cell line to detect  differentially expressed genes. The experiment was performed using cell cultures  prepared on three different days. The library sizes for each sample after NGS  sequencing are shown below.        A standard RNA-seq protocol was used, consisting of 1) sample preparation 2) mRNA  capture & size selection, 3) reverse transcription and PCR amplification, 4) short read  Illumina sequencing, and 5) differential expression (DE) analysis. For each of the 5 steps,  describe one possible source of bias or error that could impact the final results of the  DE analysis.    (5 marks) Page 3 of 7 5. The figure above shows a view of the UCSC genome browser showing some of the  Tabula Muris single cell RNAseq expression data. The data has been collapsed into cell  types and the coverage plots for a few selected cell types are shown. The region of the  mouse genome shown is the location of the Interleukin 4 gene (IL4) and the data ranges  for the coverage plots are autoscaled for each cell type.    a) From examination of the features in the gene models, what can you deduce about  the IL4 gene, its transcripts and possible protein products from this image?    b) From examination of the coverage plots, what can you deduce about expression of  IL4 transcripts in these cell types?    (6 marks) Page 4 of 7   6. Cell clustering is an important technique for interpreting scRNA-seq data. Discuss why  clustering is necessary, the visual representation which is produced, and how the  output can be informative. Mention the name of a commonly used scRNA-seq  clustering tool, and the name of a common normalisation method.  (5 marks)      7. 3D structure prediction is an important technique for characterising the functionality that  a novel protein may possess. Discuss when 3D structure prediction is the best option  for functional annotation over DNA or amino acid sequence homology. Additionally,  mention 2 considerations about the 3D structure which indicate whether the predicted  protein may perform the same function as the most similar annotated protein.   (5 marks)    8. What are post-translational modifications?     Briefly describe how the potential presence of post-translational modifications  contributes to the complexity in assigning peptide sequences from fragmentation data  in mass spectrometry experiments.    (5 marks)        9. In clinical genomics, deciding whether a variant is likely to be pathogenic involves  multiple stages. Discuss 5 considerations when deciding if a variant is likely to be  pathogenic. (5 marks)     10. Benchmarking is important to understand the strengths and weaknesses of  bioinformatics tools, and to compare their performance in an unbiased manner. You are  writing a review paper concerned with comparing software tools which perform cell  clustering from scRNA-seq data. Discuss how you would benchmark a group of tools,  commenting on:    a) Why benchmarking is necessary for scRNA-seq analysis.    b) The kinds of datasets which are appropriate for use in benchmarking.    c) Which metrics you would choose to assess tool performance.    Page 5 of 7 (5 marks) Section 2. Long-answer question (50 marks total) Part A: (20 marks)  Bowel cancer is one of the leading causes of cancer related mortality among young people.  Recent studies have shown that some genotoxic bacteria found in the intestinal microbiome  produce a toxin that is associated with risk of developing bowel cancer. Genotoxins are  toxins that can damage the DNA of a cell.     Of particular interest is a specific strain of Escherichia coli that produces the toxin  colibactin, encoded by the clb gene. The clb gene is contained within the 50kb Pks  genomic island, which is part of the bacteria’s accessory genome. The colibactin producing  strain of E.coli is therefore called pks+E. Coli. The pks+E.coli reference genome is available  from public databases.    For this study, 1000 tumor specimens are available from young people with bowel cancer.  The first aim of the study is to determine the frequency of pks+E.coli presence in the  tumour environment. DNA was extracted from each of the tumor samples and subjected to  whole genome sequencing for a metagenomics analysis using Illumina reads. You have  been provided with the FASTQ read sets for each tumor sample.    Describe a bioinformatics workflow to investigate the presence of pks+E.coli in the tumour  samples using these FASTQ reads. Your answer should describe, in detail, important steps  for processing the data, tools used, and expected outputs of the analysis.      Part B: (20 marks)  Your analysis determined that approximately 10% of the tumors were associated with the  presence of pks+E.coli. However, it is likely that not all these tumors are caused by  pks+E.coli. The genotoxic activity of pks+E.coli is known to result in a specific pattern of  somatic mutations in the tumor. To further investigate which of these 100 tumors are likely  to be a result of colibactin toxin you must design a further experiment. For this aim you are  provided with Illumina FASTQ sequencing reads from both the tumors and matched normal  (non-cancerous) tissue from the same patient.      1. Describe a bioinformatics workflow to identify somatic mutations in the tumors using  this data. Your answer should detail important steps for processing the data, tools  used, and expected outputs.   2. Suggest how you could compare the patterns of mutations between the samples.        Part C: (10 marks)  From the analysis, you confirmed that some of the tumor samples had somatic DNA  mutations characteristic of colibactin toxin (e.g. T>A base pair substitution), and that these  patterns were more prevalent in tumours associated with pks+E.coli.      Given this collection of data, explore how genomics tools (for e.g. metagenomics and  microbiome analysis) can be used to identify people with a greater risk of developing bowel  cancer.  Page 6 of 7     Section 2 is an opportunity for you to demonstrate your overall understanding of the  subject material. You should try to draw upon as many different parts of the course  material as you can. Creative (but scientifically motivated) uses of the course material  are welcome.      -- END OF EXAMINATION --    Page 7 of 7











































































































































































































































































学霸联盟


essay、essay代写