生物代写--BINF90002
时间:2021-07-14

Elements of Bioinformatics

BINF90002 Semester 1 2021 Final exam

Section 1. Short-answer questions 

Question 1.

 For each of the following example mutations, select the most appropriate statement from the options below:

Mutation 1: Single nucleotide mutation in the anticodon loop of a mitochondrial tRNA.

 Mutation 2: DNA insertion in the upstream region of a gene encoding a collagen protein.

 Mutation 3: Non conservative amino acid mutation in conserved domain of a transcription factor.

A. The mutation is likely to impact the gene transcript level and the cellular consequence is likely to affect the transcript level or function of only a single gene. 

B. The mutation is likely to impact the gene transcript level and the cellular consequence is likely to affect the transcript level or function of multiple transcripts or gene products. 

C. The mutation is likely to impact the function of the gene product and the cellular consequence is likely to affect the transcript level or functional product of only a single gene.

 D. The mutation is likely to impact the function of the gene product and the cellular consequence is likely to affect the transcript level or function of multiple transcripts or gene products.

 You may write your answer in this simple format e.g.: “Mutation 4 answer is E”

Question 2.

The image below is a view in IGV of a region of the human genome loaded with 3 sequencing data tracks and a gene annotation track.

• For each of the 3 data tracks (A, B and C), state whether it is WGS, WES or RNA- seq data.

 • For each datatype, describe a feature that is apparent in the image that distinguishes it from each of the other datatypes.


Question 3.

Read the methods section below and write out the bioinformatics analysis workflow as a bullet point list in the correct order. For each step, include the following details where available: workflow step, sequencing technology, tool name, file type for data inputs and outputs.

Total RNA was extracted using the Qiagen RNeasy Plant Mini kit. RNA integrity numbers of the extracted RNA, measured using a Agilent 2100 Bioanalyzer, were between 8.6 and 10. 400 ng of total RNA from each sample was used for RNAseq library preparation with the TruSeq Stranded Total RNA with Ribo-Zero Plant kit. 125-base paired-end reads were generated on an Illumina HiSeq 2500. Quality was assessed using fastqc and reads were trimmed and adaptors removed using cutadapt. The MSU v7 annotation of the Oryza sativa ssp. japonica cv. Nipponbare reference genome with STAR in 2-pass mode was used for mapping reads and counting reads per gene. Lowly expressed genes were filtered and DESeq2 used for normalisation and differential expression analysis of genes using a DE threshold of log2 >2.0 and FDR <0.05.

Question 4. 

Interpret the assembly graph below.


Repeats can cause genome assembly problems. The figure above shows an assembly graph (left) and some theoretical genome arrangements (right).

• Which genome arrangement (A-D) could have resulted in this assembly graph? 

• Explain how either paired-end short reads, or long reads could be used to untangle this assembly graph.

Question 5. 

Bulk RNA-seq and single-cell RNA-seq are often used to measure RNA expression levels as a proxy for the protein products of coding genes. 

State 3 reasons why a researcher would measure RNA levels when they are actually interested in protein levels?

Question 6.

 The mRNA expression levels of genes do not always correlate with the expression levels of their active protein products in cells and tissues. 

Name two cellular processes that could affect the correlation between cellular expression levels of mRNA and its active protein product, and describe how they could affect this correlation.

Question 7.


You have been given a draft assembly for a prokaryote genome along with some summary information for the assembly. Shown in the table above. 

You wish to improve the draft assembly by performing more sequencing. 

• State which sequencing approach – short or long read – will improve the assembly most.

 • Explain why you have chosen this approach.

Question 8. 

For each of the RNA sequencing cases below, describe a different method of assigning reads to genes for read counting. 

Assume a well-annotated reference genome is available. 

State which of the two approaches described is more computationally intensive. Assume the genome sizes of the two species are comparable.

A. RNA-seq readset from tissues from model organism tissue. 

B. RNA-seq readset from tissues from a newly discovered eukaryotic species.

Question 9. 

Your task is to predict the function of an open reading frame (ORF) from a recently discovered worm. Its closest relative is the model organism C. elegans, but the genomes of these two worms are very different.

• The results of a BLAST search against C. elegans find that the gene svh-2, has the best homology to your ORF: o 10% DNA sequence identity. o 30% AA sequence identity. 

• You use I-TASSER to predict the tertiary structure of the protein product of the ORF, and the generated model has 1.31 RMSD to svh-2.

 • The figure below shows the span of your ORF compared to svh-2, with protein domains indicated.


A. Is the ORF likely to have the same function as svh-2? 

B. Do you believe that your ORF would have the same cellular localisation as svh-2?

 Explain your answers by referencing the data provided.

Question 10. 

Single cell RNA-sequencing (scRNA-seq) data and its subsequent analysis shares some similarities with bulk RNA-seq analysis, but also differs in a number of ways.

 Name and discuss 3 of the similarities and 3 differences between bulk RNA-seq and scRNA- seq. (List a total of 6 similarities/differences.)

Section 2.Long-answer question 

NB. Some facts in the scenario described have been simplified or altered for the purposes of examination.

Gene duplication resulting in increased copy number (CN) can be associated with adaptation to environmental change. Approximately 10,000 years ago the human amylase gene (AMY) locus underwent a gene duplication event that resulted in several copies of the amylase 1 gene (AMY1).

AMY1 genes produce the protein Alpha-amylase, an enzyme that is important for digestion as it breaks down starch molecules into sugars. It is mainly produced by the salivary glands in the mouth. It is thought that high AMY1 CN results in increased amylase activity in saliva and makes starchy foods taste sweeter. This may have resulted in nutritious food choices at a time when early human populations were transitioning to a more agricultural lifestyle and adapting to a starch-rich diet, thus benefiting individuals with high AMY1 CN.

There are a number of different AMY haplotypes in humans today. The locus also encodes AMY1 paralogues, the pancreatic alpha-amylases AMY2A and AMY2B. See figure below.


Human AMY locus on chr1. Human amylase haplotypes have one copy each of AMY2A and AMY2B and an odd number AMY1 copies. Increased AMY1 CN arises from the presence of a genomic segment containing 2 copies of AMY1 (transcribed on opposite strands).

All AMY1 gene copies encode the same amino acid sequences, but there are some amino acid differences between the alpha amylase produced by AMY1 and that produced by AMY2A and AMY2B. This is reflected in the selected region from a multiple sequence alignment of the human amylase genes shown below. Genomic copies of AMY1 are labelled A, B, and C to differentiate them.


Published studies have shown that low AMY1 CN is associated with risk of both obesity and type 2 diabetes. However, there is little understanding of the influence of AMY1 CN on diet and metabolic health.

Part A: 

The human reference genome hg38 depicts the AH3 amylase haplotype. If a person is homozygous for AH3 how many copies of the AMY1 gene would they have in somatic cells?

You are part of a research team investigating diet and genetic risk of metabolic diseases. Your job is to determine the amylase genotype for a large study group selected from students on a University campus.

Describe an appropriate sequencing technology, and an associated bioinformatics analysis workflow to accurately measure the AMY1 CN and determine the AMY genotype for all participants in the study group. Your answer should include the important steps in sample collection, analysis, and any factors you might need consider when designing the study, analysing the data and interpreting the results.

Describe (or illustrate) an appropriate data structure for presenting your results and any other relevant QC and metadata, to the research team.

Explain how the AMY1 CN results you generated might be useful to the research team investigating genetic risk of metabolic diseases?

Part B: 

 Bulk RNA-seq studies examining gene expression in the 3 main human saliva glands show that expression of all AMY paralogues is detectable (see table below). In addition, protein studies have determined that there could be more than 20 different amylase proteoforms present in saliva. 


Table of normalised gene expression values for amylase genes in human saliva glands, determined by RNA-seq.

The results of your genotyping experiment above revealed that the student study population includes all possible AMY genotypes. Design an experiment to test the hypothesis that AMY1 CN correlates with levels of the AMY1 protein product in saliva, using the same study population.

Your answer should include the important steps in sample preparation, choice of analysis method, tools and databases used for analysing the results. Also describe any factors you might need to consider when designing the experiment and interpreting the resulting data.

Part C: 

 Saliva is essential for maintaining oral health. It is made up of salivary gland products as well as products originating from other tissues, including blood. Saliva also includes products arising from the oral microbiome. Blood and saliva proteomes overlap significantly, and saliva is currently under investigation as a potential source of diagnostic markers for monitoring human health, disease and pathogens. 

Discuss the potential benefits and challenges relating to developing a clinical test based on saliva, for disease detection or health monitoring. Your answer should include discussion of aspects relating to personalised medicine and population health. 

You are encouraged to include examples of existing diagnostic saliva tests you are aware of and/or any potential tests you can think of, to highlight points in your discussion.



学霸联盟


essay、essay代写