化学代写-CHEM3121
时间:2022-10-27
SCHOOL OF CHEMISTRY
SENIOR CHEMISTRY
CHEM3121 Chemical Biology
Generic skills plus
PROJECT L: Computing in Chemical Biology
EXPERIMENT L1: ANALYSING REACTION KINETICS – AN INTRODUCTION TO COMPUTER DATA
ANALYSIS
EXPERIMENT L2: STEADY-STATE ENZYME KINETICS
EXPERIMENT L3: BIOINFORMATICS
2022
1
Project L
Computing in Chemical Biology
CONTENTS
Page
L1 Analysing Reaction Kinetics – An Introduction to Computer Data Analysis 2
L2 Steady-State Enzyme Kinetics 7
L3 Further Investigations of Computing in Chemical Biology
- Bioinformatics 16
- Your Own Investigations 19
AdobeStock – USYD licence
2
3
EXPERIMENT L1: ANALYSING REACTION KINETICS - AN
INTRODUCTION TO COMPUTER DATA ANALYSIS
Aim
The aim of this experiment is use computer methods to investigate reaction kinetics.
Introduction and Background
This experiment is designed to familiarise you with the program Excel. It is also aimed to introduce
you to the subject of chemical kinetics, which allows one to study how chemical reactions occur. To
complete this computer-based experiment, you will need access to Excel and a copy of the files
containing the sample data. Submission of this experiment will be in the form of an electronic
notebook submissions which can be accessed via the Canvas page for this project.
Experimental chemical kinetics usually proceeds by analysing reactant or product concentrations as a
reaction proceeds by the integrated rate law method. The integrated rate law expresses the
concentration of one reactant or product as a function of time. It may be derived from the differential
rate law (rate as a function of concentration) by integration. Two examples are shown below:
First order reaction: Second order reaction:
A → B 2A → B
Differential Rate Law:
−
[]
= [] −1
2
[]
= []2
⇒ −∫
[][][][]0 = ∫ 0 ⇒ −∫ [][]2[][]0 = 2 ∫ 0
⇒ −|[]| [] []0 = ||0 ⇒ − �− 1[]�[]0[] = 2||0
Integrated Rate Law:
⇒ [] = []0 − ⇒ 1[] = 1[]0 + 2
Before you start the experiment answer the following question. How should the experimental kinetic
data be plotted to obtain a straight line and obtain the value of the rate constant for:
4
a) a first order reaction?
b) a second order reaction?
The reaction being studied is
2A + B → 2P where Rate = k[A]a[B]b (1)
However, in this case, the concentration of substance B is large enough that it does not change
significantly during the reaction. Hence, the rate law can be written as:
Rate = kobs[A]a (2)
where kobs is called the pseudo-rate constant and is equal to k[B]b.
The aim of the exercise is to use the supplied experimental data to determine the reaction order with
respect to reagent A (that is, “a”). To do this we use the integrated rate equations given in the
introduction and analyse the data to see which form of plotting the data yields a straight line (see the
question at the end of the introduction). The graphs generated will allow a determination of the order
of reaction with respect to this species.
Objectives
1. To use the supplied “experimental data” (computer-generated) to determine the rate law for the
reaction given below.
2A + B → 2P where Rate = k[A]a[B]b
Based on your analysis of the results, propose an acceptable mechanism for this reaction.
2. To become familiar with the use of a spreadsheet to manipulate, plot and perform statistical
analysis of experimental data.
Experimental Procedure
1. Accessing your data
Two data files, CKDAT1.XLS and CKDAT2.XLS, contain the experimental data to be used in this exercise.
These are available for download from the Experimental Data page for Project L. The sample data
were collected by adding a large excess of substance B to reagent A, and subsequently monitoring the
concentration of reagent A for the duration of the study. Each data set represents a different
concentration of substance B and contains two columns of numbers. The first column is the time (in
seconds) at which the measurement was taken, and the second column is the concentration (moles
per litre) of A at that time.
Before starting the data analysis make sure that you have the data analysis ToolPak installed in your
version of Excel. If you do, there will be a Data Analysis box to the far right of the horizontal toolbar
when you click on the Data tab. If it’s not there, follow the following steps to install it:
i. Click on the File tab
5
ii. Click More
iii. Click Options
iv. Click Add-Ins
v. Select Analysis ToolPak and click Go
vi. Click OK. Data Analysis should now appear when you are on the Data tab.
2. Generating the Concentration versus Time Plots to Calculate kobs
Procedure for More Experienced Users of Excel
1. Start Excel
2. Open the data files CKDAT1.XLS.
3. In a new column, calculate the values for the natural logarithm of the sample concentration
data.
4. In another new column, calculate the values for the inverse of the sample concentration data.
5. Make plots of [A] vs time, ln[A] vs time, and 1/[A] vs time. Ensure these plots are appropriately
labelled, including axis labels, they will need to be submitted as part of your report.
6. By examining which plot is linear determine the order of reaction with respect to A
7. Perform a linear regression analysis of the linear data (using Data; Data Analysis; Regression)
and plot the line of best fit through the data (using Trendline).
8. Record the slope of the line. (Note that in this case the error has no meaning because the data
have been artificially computer generated, but for your own experimental data you should
always quote errors on calculations performed using your experimental data.)
9. Determine the value of kobs for this data set, include units.
10. Repeat steps 2-9 for the other data file, CKDAT2.XLS.
11. Save your Excel files so that you can insert your charts and data analyses into your report.
Procedure for More Experienced Users of Excel
1. Start Excel
2. Open the data files CKDAT1.XLS.
− Double click on the data files CKDAT1.XL, found on the list of files given.
3. In a new column, calculate the values for the natural logarithm of the sample concentration
data.
− Select the first cell of the column where the new data are to go.
− Type ‘=ln(B2)’ where B2 is the cell containing the first row of sample data, and press
ENTER.
6
− To make the other cells of the column contain the same function as the one you just
created, you need to select the cell containing the function, grab the bottom right
corner where a small black square is visible (you will notice that the pointer changes
to a black plus sign when this is possible) and drag the pointer down the column. Each
cell “dragged over” will now contain that same function but of the sample data in the
same row as the cell dragged over.
4. In the next column, input the values for the reciprocal of the sample data.
− As before, to do this select the first cell of the column where the data are to go.
− Type ‘=1/(B2)’ where B2 is the cell containing the first row of sample data, and press
ENTER.
− Grab and drag this cell down the column as you did for the previous step to make each
cell of this column contain the same function.
5. Make plots of [A] vs time, ln[A] vs time, and 1/[A] vs time. Ensure these plots are appropriately
labelled, including axis labels, they will need to be submitted as part of your report.
− To do this, click and drag over the data values that you wish to plot. When you need
to plot x and y values which aren’t in adjacent columns, click and drag over the x values
(in this case time), press and hold the ctrl key on the keyboard down as you click and
drag over the y values. Select the Insert tab and in the Chart box, select the Scatter X-
Y option, followed by the Scatter option. The time should be in the first column from
cells 2 to 61, the concentration of A in the second column from cells 2 to 61, and so
on.
− Click on the chart (outside of the actual graph) and click the + sign at the top righthand
corner of the chart. Click axis and chart titles as desired. Then click on the words
“Chart title” and “axis title” to edit the names (e.g., x axis = time, y axis = [A]).
− Adjust the size and placement of the chart as required for easy viewing. This is
achieved by clicking on the chart (outside the graph area). Small open circles appear
at the four corners and in the centre of each side. Grab one of these and drag it until
the desired size is obtained. To move the whole chart, click on the chart, grab the
chart (not the open circles) and drag it to its new location. ‘
6. By examining which plot is linear determine the order of reaction with respect to A
7. Perform a linear regression analysis of the linear data to determine the slope of the line.
− One graph ([A], ln[A], or 1/[A]) will provide a straight line when plotted against time.
To draw a line of best fit through the data, click on a data point on the graph. (The
whole data set should light up.) Then right click on the data point and select Add
Trendline. From the Format Trendline Options window that opens select the type of
line you want (e.g., solid, dashed) and under Trendline Options select Linear, Display
Equation on Chart, and Display R-squared value on chart. Close the Format Trendline
window.
7
− The ADD TRENDLINE feature of the program does not perform a comprehensive linear
regression analysis of the data. To do this and obtain the value of the slope click on
the DATA tab, followed by DATA ANALYSIS and REGRESSION. Click OK. Insert the X and
Y ranges of the data you wish to fit by clicking and dragging down the respective
columns of data. Select OUTPUT RANGE and click and drag over the cells where you
wish the output to appear. Click OK. The intercept and slope (X variable) as well as
their calculated errors are given in the bottom lefthand corner of the analysis.
8. Record the slope of the line. (Note that in this case the error has no meaning because the data
have been artificially computer generated, but for your own experimental data you should
always quote errors on calculations performed using your experimental data.)
9. Determine the value of kobs for this data set, include units.
10. Repeat steps 2-9 for the other data file, CKDAT2.XLS.
11. Save your Excel files so that you can insert your charts and data analyses into your report.
3. Order of Reaction with Respect to Reagent B
The two pseudo-rate constants obtained part 2 of this exercise have different values because the
concentration of B was different in each case. Both pseudo-rate constants are related to the “true”
rate constant via the order of the reaction with respect to B. In the present exercise, the concentration
of B used to collect the data in CKDAT2 was 3.00 times that used in CKDAT1.
From eqn. 2, recall that
kobs = k[B]b (3)
From the ratio of the two kobs values determine the order of the reaction with respect to B.
Now determine the rate law for the reaction by substituting the appropriate numbers into eqn. 1.
4. The Reaction Mechanism
One of the most important uses of kinetics is to discriminate between possible reaction mechanisms.
Based upon the experimental rate law just determined, discuss the appropriateness of the following
mechanistic schemes to this reaction. Which is most consistent?
a) A + B → P + Q slow
Q + A → P fast
b) A + B ↔ Q fast equilibrium
Q + A → 2P slow
c) A + A → Q slow
Q + B → 2P fast
d) B + B → Q slow
8
Q + 2A → R + P fast
R → P + B fast
e) A + A ↔ Q fast equilibrium
Q → P + R slow
R + B → P fast
This discussion should be added to the ‘Question’ tab of the electronic notebook for submission.
9
EXPERIMENT L2: STEADY-STATE ENZYME KINETICS
Aim
The aim of this experiment is use investigate the concept of steady-state kinetics using computational
methods.
Introduction
In Experiment L1 you considered the kinetics of fairly simple reactions. When the reaction mechanism
becomes more complex, sometimes with several transient and difficult-to-measure intermediates, the
time dependence of the concentration of any species (including the products that you may be
interested in) can be complex. One objective of kinetic analysis is, of course, to allow you to predict
and optimise the concentration of the species of interest under a variety of experimental conditions
(including temperature, initial concentrations, etc.).
Unfortunately, as the mechanism becomes more complex, so does the mathematical treatment
required. In fact, for many complex mechanisms the kinetic equations cannot be solved analytically.
Fortunately, there are several situations where the complexity is simplified, leading to a better
understanding of the important features of the reaction, and allowing prediction of optimal
conditions. You should have come across at least one of these situations in First Year Chemistry –the
“Rate Determining Step”. Other situations include the “Steady State Approximation” and the “Pre-
equilibrium Condition” (which you came across at the end of Module 1).
In this module, you will carry out computer simulations to study a two-step reaction mechanism. Via
the simulations you will further explore kinetic concepts such as the steady state approximation and
the rate-determining step and investigate under what conditions these approximations are valid.
Reactions Going to Completion
In Module L1, you considered reactions where the reverse reaction step was insignificant. The
following discussion provides a brief summary:
A reaction that goes to completion can be represented as:
1 �⎯⎯⎯⎯�
where S represents the Starting compound and P the Product(s). In this case the reverse reaction is
considered to be extremely slow in comparison to the forward reaction, so that the reverse reaction
can be neglected. The rate of loss of S and gain of P can be expressed as:
[]
= −1[] [] = 1[] (1a/b)
If the initial concentration of S is [S]0 and the initial concentration of P is zero, then the time
dependence of [S] and [P] is obtained by integrating (1a) and considering that [P]t = [S]0 – [S]t to obtain:
10
[S] [S]t
k te= −0 1 ( )[P] [S]t
k te= − −0 1 1 (2a/b)
Reversible Reactions Approaching Equilibrium
In the previous section, only reactions where the reverse reaction step was insignificant were
considered. As you might have realised, reversible reactions are more common. Let us now consider
the reversible first-order equilibrium reaction
−1 1 �⎯⎯⎯⎯⎯� = [][]
where both the forward and reverse reactions are important, and Kc is the equilibrium constant for
the reaction.
The rate of change of S has two contributions, depletion through the forward reaction and
replenishment through the reverse reaction. The net rate of change in S is therefore:
[]
= −1[] + −1[] (3)
where k1 and k-1 are the rate constants for the forward and reverse reactions, respectively. If the initial
concentration of S is equal to [S]0 and there are no products present at the start of the reaction, then
at all times [P] = [S]0 - [S] (providing of course that the volume remains constant). Consequently, (3)
can be written as:
[]
= −1[] + −1{[]0 − []}
= −(1 + −1)[] + −1[]0 (4)
The solution of this first-order differential equation is
[] = []0 � −1 + 1−(1+-1)−1 + 1 � (5)
At equilibrium, the rate of change in the concentration of a species is zero. In other words, from (3),
[]
= 0 (6)
and therefore,
1[] = −1[] (7)
That is, at equilibrium, the rates of the forward and reverse reactions are equal. From (7) and the
definition of the equilibrium constant, it is easily shown that Kc is related to the rate constants by a
simple expression:
= 1−1 (8)
When an overall reaction is the sum of a sequence of reversible reactions, the overall equilibrium
constant is simply the product of the equilibrium constants for each component step.
11
= 123.....−1−2−3.... (9)
Consecutive Reactions
Many reactions proceed through the formation of intermediates. Consider the general consecutive
first-order reaction
1
�⎯⎯⎯�
2
�⎯⎯⎯�
The concentrations of substances S, C and P change at rates according to:
[]
= −1[] (10)
[]
= 1[] − 2[] (11)
[]
= 2[] (12)
If [S]0 is the initial concentration of S, then solution of the coupled series of differential equations (10-
12) yields:
[] = []0−1 (13)
[] = 1[]0 �−1−−22−1 � (14)
[] = []0 �1 + �1−2−2−12−1 �� (15)
(You can see how quickly the kinetic equations become complex for even a minor complication in the
reaction mechanism.)
The Rate-Determining Step
When either k1 >> k2 or k2 >> k1, (15) can be approximated by a much simpler form:
k1 >> k2 [] = []0�1 − −2� (16a)
k2 >> k1 [] = []0�1 − −1� (16b)
Note that these equations are identical to (2b). That is, the overall kinetics of P production resemble
a simple one-step mechanism with a rate constant equal to the smaller of k1 and k2 (i.e. the rate
determining step).
Normally one of the rate constants should be at least an order of magnitude (i.e., a factor of 10)
greater than the other to be considered as the sole rate-determining step and to justify the
approximations given by (16a) or (16b). Under these conditions the true amount of product formed,
as given by the more exact expression (15), is within approximately 10% of that estimated by
equations (16a) or (16b).
12
The Steady-State Approximation
The full kinetic equations of multi-step reactions can be very complex. Often, however,
approximations can be made to simplify the mathematics, allowing the important parameters in the
rate of product formation to be identified. One such simplification has already been discussed in the
recognition of the rate-determining step. Another common simplifying assumption is the steady-state
approximation (SSA).
The steady-state approximation concerns the concentration of an intermediate species, where it is
assumed that “for the major part of the duration of the reaction the concentrations of all reactive
intermediates are constant”. This assumption is used to simplify the equation of the kinetics of
product formation by excluding the intermediate concentrations from the final expression.
Mathematically, the SSA can be written as :
[]
≈ 0 (17)
Pre-Equilibria
One application of the SSA is to examine consecutive reactions where the intermediates are in
equilibrium with the starting reactants. Such a reaction can be written as follows:
−1 1 �⎯⎯⎯⎯⎯� 2 �⎯⎯⎯⎯�
Note that S could represent more than a single reactant and P more than a single product molecule.
The rate of formation of P is given by:
[]
= 2 (18)
and the rate of change of [S] is given by:
[S]
= −1[S] + −1[C] (19)
Often the concentration of the intermediate, [C], is difficult to measure, and so (18) and (19) are not
very useful. An alternate expression for d[P]/dt can be obtained using the SSA (17) as follows:
[C]
= 1[S] − −1[C] − 2[C] (20)
Applying the SSA equation means that (20) can be set to zero, and then rearranging to make [C] the
subject gives:
[] 1[]
−1+ 2 (21)
(21) can now be substituted into (18) to yield the expression for d[P]/dt:
[]
= 12[]
−1+ 2 (22)
= k′ [S] where ʹ = 12
−1+2
(23)
This is in exactly the same form as (1b), with the same solution, i.e.
13
[] = []0 �1 − −ʹ� (2b′)
[This is of course why we look for these approximations; so that we can simplify the maths and provide
a simpler physical explanation of the reaction. Remember that (21) will only be valid when the SSA
conditions are met (i.e. when d[C]/dt = 0)].
Experimental Procedure
1. Berkeley Madonna
To check the validity of the SSA approximation you need calculate the time dependence of the
concentrations of S, C and P to determine under what conditions the concentration of C is constant.
To do this you must calculate the concentrations of all of the species without making the steady state
approximation. To do that it is necessary to carry out a numerical integration of the set of
simultaneous differential equations (18), (19) and (20). This can be done via the program Berkeley
Madonna, which was developed by Robert Macey and George Oster at the University of California
Berkeley. (We don’t know the origin of the “Madonna”, but the first version of the program was
developed in the 1990’s or perhaps even earlier, so one or both of the inventors could have been fans
of the singer.)
To download the latest version of Berkeley Madonna (version 10) onto your computer go to the
website https://berkeley-madonna.myshopify.com/pages/download and download the MacOS or
Windows version, whichever is appropriate for your computer. Automatically you will then have the
demo or trial version of the software. To obtain the full version of the software, you would have to
register with Berkeley Madonna and pay for a license. However, for the purposes of this exercise, the
demo version is sufficient.
The Berkeley-Madonna program incorporates different numerical integration techniques which have
been worked out by mathematicians to integrate a series of couple differential equations. These
include the following algorithms:
1. Euler’s method (Euler)
2. Runge-Kutta 2 (RK2)
3. Runge-Kutta 4 (RK4)
4. Runge-Kutta 5 (Auto)
5. Rosenbrock (Stiff)
Methods 1-3 utilise a fixed step-size, i.e., they solve the set of coupled differential equations describing
the mechanism at fixed time intervals. The Auto and Stiff methods (4 and 5) use a variable step size
which is automatically adjusted, so that a large time interval is used when there is a slow change in
the concentrations of the various species with time and a small-time interval is used when the
concentrations are changing rapidly. Stiff sets of coupled differential equations are defined as those
in which some of the variables (i.e., concentrations in this case) are changing rapidly with time and
others are changing slowly. The Rosenbrock method was designed specifically to deal with such
systems. (Note: It is not the sole purpose of Berkeley Madonna to solve differential equations. You
14
can use it to simulate any equation, y = f(x). You just need to replace time in the program with x. You
may find this very useful in the future.)
Before writing a program in Berkeley-Madonna to simulate the reaction given above, for the purposes
of understanding the syntax of the program language let’s consider a simpler system:
�⎯⎯⎯⎯⎯�
For this simple mechanism a typical Berkeley-Madonna program is given below to determine the
concentrations of A and B as a function of time using the Runge-Kutta 4 method.
− To test that Berkeley Madonna is working on your computer, open the program by double-
clicking on the Berkeley Madonna shortcut icon on your computer desktop.
− Select File from the top ribbon bar and New Document from the dropdown menu.
− Type the program given below into the Berkeley Madonna’s equation window on the left-hand
side of the screen. Alternatively, you can copy the equations into a Notepad text document (.txt
file) and use Paste from the dropdown menu of the Edit option on Berkeley Madonna’s ribbon
bar to enter the program. The new version of Berkeley Madonna does not allow you to copy
and paste directly from a Word document if you are using the Windows operating system.
− Once you’ve entered the program click on Run on the run window on the right-hand side of the
screen.
− If you have entered everything correctly a graph of the concentrations of A and B should appear
in the centre of the screen.
− Clicking on table on the ribbon bar of the graph window will give you a table of the data points.
If you click and drag across the table and then select Copy from the dropdown menu of Edit on
the main ribbon you can then paste the data into another program, e.g. Excel, for formatting
and exporting to your lab report.
Program
METHOD RK4
STARTTIME = 0
STOPTIME = 10
DT = 0.02
d/dt (A) = -ka*A + kb*B
d/dt (B) = -kb*B + ka*A
init A = 100
init B = 0
LIMIT A >= 0
LIMIT A <= 100
15
LIMIT B >= 0
LIMIT B <= 100
ka = 1
kb = 1
Many of the program lines are almost self-explanatory. The first line (METHOD RK4) defines the
method to be used to solve the differential rate equations. The commands STARTTIME and STOPTIME
simply define when the calculation should start and stop, i.e., in this case starting at time = 0 and
stopping after 10 seconds. DT defines the time interval for integration of the differential rate
equations, i.e., in this case after every 0.02 seconds. The next two lines are the differential rate
equations for the species A and B. It is important to write an equation for every single species involved
in the reaction mechanism. The command init defines the initial concentrations of each species (in
whichever concentration units you desire). However, for second order reactions it is important that
the units of the concentrations, rate constants and time are consistent. For example, if the
concentration is entered in M and the time is seconds, a second order rate constant must be in units
of M-1 s-1, not M-1 min-1 or mM-1 s-1. The program lines containing the command LIMIT define the upper
and lower limits of the concentrations of each of the species. These are based on the mechanism and
the law of conservation of mass, i.e., for this particular mechanism the total concentration of A and B
cannot exceed 100, because the initial concentration of A is only 100. It should also be clear that a
negative concentration makes no sense. The LIMIT lines are not absolutely necessary, but for
complicated mechanisms they can be useful in preventing the numerical method from trying values
of the concentrations which may represent a legitimate mathematical solution to the equations but
make no physical sense. The final two lines merely specify the values of the rate constants for the
mechanism.
Based on this simple example, now try and write your own Berkeley-Madonna program for the two-
step mechanism at the beginning of this section and its associated differential rate equations (18) –
(20). Use a total simulation time of 1 second and a time interval of 0.002 seconds and values of k1 =
10 s-1, k2 = 0.1 s-1 and [S]0 = 1 M, which have been chosen arbitrarily just so everyone uses the same
values for the calculation. The initial value of k-1 that you use isn’t important, but later we want to
consider the three situations:
− k-1 >> k2
− k-1 ≈ k2
− k-1 << k2
Once you have typed in your program you can run it by clicking on the Run button. If your program
successfully runs, a graph should appear showing the time course of the concentration of each of
species S, C and P. If you wish to change the appearance of the lines, there is an option you can click
on the toolbar of the Graph Window.
You need to show one of the demonstrators a copy of your code and that you have successfully run
this software by sharing your screen within the zoom breakout room; this will be marked off as part
of the assessment.
16
Vary the value of k-1 within your program to consider the three conditions:
− k-1 >> k2
− k-1 ≈ k2
− k-1 << k2
From the time course of [C] determine under which condition the steady state approximation is
obeyed. Why does this condition yield the best agreement? For each condition, produce a plot of the
concentrations of S, C and P versus time to include in your presentation. To do this you need to click
on the option Table on the top ribbon bar of the graph window, so that you can see the concentration
values of S, C and P and the corresponding times. For the plotting you can use Excel or any other freely
available plotting program, such as SciDAVis.
Apart from looking at the time course of [C], another test of the steady-state approximation is to
choose a particular time point in your simulations and calculate [P] from (23) and (2b′), i.e., the
predicted value of [P] based on the steady-state approximation. Then compare this to the value of [P]
calculated in each of your simulations. Decide which simulation yields the best agreement and why
does this simulation yield the best agreement?
2. Michaelis-Menten Kinetics
A special case of the SSA is very frequently applied in the chemistry of living systems. Most enzyme-
catalysed reactions rely on a pre-equilibrium step; the first step is the reversible reaction between an
enzyme (E) and substrate (S) to form an activated complex (C), which then reacts irreversibly to give
the product (P).
+ −1 1 �⎯⎯⎯⎯⎯� 2 �⎯⎯⎯⎯� +
The change in concentration of the intermediate complex with time can be expressed as:
= 1[][] − −1[] − 2[] (24)
Using the same steady-state approximation as previously (i.e., d[C]/dt = 0), this scheme leads to an
equation practically identical to (21), hence:
[] = 1[][]
−1+ 2 (25)
and []
= 2[] = 12[][]−1 + 2 (26)
where [E] is the concentration of free enzyme.
Unfortunately, [E] is almost always experimentally inaccessible; concentrations of enzyme are low,
and it is impossible to determine them accurately in vivo. Biochemists customarily use a kinetic
equation expressing d[P]/dt in terms of the total enzyme concentration.
To do this, we make the substitution [E] = [Etotal] – [C] in equation 24:
= 1[][] − 1[][] − −1[] − 2[] (27)
17
Again applying the steady-state approximation, d[C]/dt = 0:
[] = 1[][]
−1+ 2+ 1[] (28)
and []
= 2[] = 12[][]−1+ 2+ 1[] (29)
Dividing the top and bottom of expression (29) by k1 gives the expression:
[]
= 2[][]
+[] (30)
where, = −1+ 21 is known as the Michaelis constant.
Clearly, the maximum value of d[P]/dt will be obtained where [C] = [Etotal] (i.e. the binding sites of the
enzyme are completely saturated by the substrate), and will be simply k2[Etotal]. Our expression then
simplifies to the form familiar to biochemists:
[]
= []
+[] (31)
where Vmax is the experimental maximum rate of the reaction.
KM can be determined experimentally by finding the concentration of substrate that will give a rate of
reaction equal to half the maximum:
[]
=
2
= []1/2
+[]1/2 (32)
Therefore, KM = [S]1/2
KM for some enzymes is near the physiological concentration of their substrate – can you suggest an
advantage of this?
18
EXPERIMENT L3: FURTHER INVESTIGATIONS OF
COMPUTING IN CHEMICAL BIOLOGY
This experiment is in two parts. The first is involves using bioinformatic software to analyse proteins.
The second part of the experiment involves “student-led enquiry” where you will come up with a
research question which can be answered by performing a literature search. A number of suggested
avenues are listed in Part 2, but you are not limited to these. Before beginning Part 2, you should
follow the investigative experiment checklist. This involves checking with a demonstrator that your
proposed research question is feasible. You should then prepare a HIRAC for approval for this
literature analysis. Only the front page of the HIRAC form needs to be completed for literature-based
investigations. An academic member of staff, not a demonstrator, must sign your HIRAC for Part 2.
19
L3 PART 1: BIOINFORMATICS
Aim
The aim of this experiment is gain experience with a bioinformatic software package, MEGA X and
construct a phylogenetic tree showing the evolution of a selected protein.
Introduction
Proteins can be considered as the workhorses of biology. Their building blocks are the 20 naturally
occurring amino acids, which can be arranged in an infinite number of different sequences of varying
lengths with a vast range of different three-dimensional structures. Thus, protein molecules are
specifically engineered in living systems to carry out a wide range of different functions, including
catalysis (i.e., enzymes), structural support (e.g., collagen and keratin), carriers of other molecules or
ions (e.g., haemoglobin and transferrin), energy conversion (e.g., ATP synthase and the Na+-pump)
and transport across membranes (e.g., Na+ and K+ channels).
Since the breaking of the genetic code in the 1960s by Nirenberg, Khorana and Holley (1968 Nobel
Prize in Physiology or Medicine) and the first development of a rapid DNA sequencing method by Fred
Sanger (Nobel Prize in Chemistry 1980), it became possible to rapidly determine protein sequences
from the complementary DNA of the encoding gene, rather than via a direct analysis of the amino acid
sequence of a protein. This is the method by which well over 90% of published protein sequences have
been determined. The speed with which sequencing can now be carried out has seen an explosion in
the number of available sequences. For example, in March 2014 the UniProt (Universal Protein)
database contained 543,000 protein sequences, comprising 198 million amino acids, from 227,000
publications. Within this massive haystack of available information valuable needles of wisdom can be
hidden. The goal of bioinformatic analysis is to discover new knowledge from the information held
within databases of protein and nucleic acid sequences. To make some sense out of this seeming glut
of data, one of the tricks is to ask the right question and another is to design the analysis in such a way
to answer it.
In this module you will gain experience in using a bioinformatic software package, MEGA X, which
stands for Molecular Evolutionary Genetics Analysis version 10. The package was developed at the
Pennsylvania State University, with the first version being released in 1993. Its software allows one to
conduct statistical analyses of molecular evolution and to construct phylogenetic trees, i.e., family
trees showing where modern day versions of a protein or the DNA encoding a gene evolved from.
Specifically, in this experiment you will align amino acid sequences of a protein from a variety of
different animal species, carry out a search for conserved amino acid residues (ones which must play
a crucial role in the function of that protein) and construct a phylogenetic tree showing the molecular
evolution of the protein.
In protein sequence databases, individual amino acid residues are designated by a single letter, i.e.,
glycine = G, alanine = A, leucine = L, lysine = K, etc. This is termed the FASTA format. Although this
designation of the amino acids essentially hides the chemical properties of the individual amino acids,
it provides a useful shorthand notation for storage of the sequence information. After the alignment
20
of multiple sequences of a protein from different species has been carried out and the conserved
amino acid residues identified, consideration of the chemical properties of the conserved residues can
be considered, e.g. are they acidic or basic, positively charged, negatively charged or neutral, are they
hydrophobic or hydrophilic, are they small or bulky.
Experimental Procedure
1. Downloading the MEGA X Software
To carry out this module you will first need to download the MEGA X software package onto your
computer. You can search for it in Google or alternatively click on the hyperlink
https://www.megasoftware.net/. The software is free. If you are using the Windows operating
system, all you need to do is to click the button DOWNLOAD in the website. If you are using a
Macintosh computer you will need to change the first button to macOS in the dropdown menu before
clicking DOWNLOAD.
2. Collection of Protein Sequences
The first step in the analysis of protein sequences is to collect the sequences and collate them in a
single input file in the format required for the MEGA X.
You need to show one of the demonstrators that you have successfully run this software by sharing
your screen within the zoom breakout room; this will be marked off as part of the assessment.
Please follow the following steps:
1. Select a protein which you know exists in vertebrates and that you would like to use as the
subject for your investigation. (Note: The same type of analysis can be performed on
invertebrates, plants or bacteria, but for this module we have chosen vertebrates because it’s
the group of organisms that you know best, being one yourself.)
2. Go to the protein section of the National Center for Biotechnology Information (NCBI)
website: https://www.ncbi.nlm.nih.gov/protein/
3. In the blank field at the top of the screen next to “Protein”, enter the name of your protein
and click Search. The website will now list entries of all published sequences of your protein.
4. On the right hand side of the screen, where it says Top Organisms [Tree], click on [Tree]. This
brings up a phylogenetic tree showing the number of sequences in different classes of
organisms. To restrict your analysis to just vertebrates, click on vertebrates in the tree.
5. To see the sequence of a particular entry click on FASTA. Copy and paste the entire sequence
from the header line, i.e., from >code no. protein [species], to the end of the sequence into a
new Notepad file, i.e., a .txt file. Be careful which entries you choose, because some are only
a partial sequence of the protein. These are usually listed as “partial”, but you can also see
from the number of amino acid residues (aa), because the partial sequences have much lower
aa values than full sequences.
21
6. Collect more sequences from the database, copying and pasting each subsequent FASTA
sequence after the previous one in the txt file. For the analysis that follows it doesn’t really
matter which order you place your sequences in, but for displaying the results of your
alignment in a publication or report it makes sense to use a logical order. For vertebrates the
most logical sequence would be placental mammals at the top, followed by marsupial
mammals, monotremes, birds, reptiles, amphibians, bony fish and finally cartilaginous fish.
You probably know what species Homo sapiens refers to, but if you don’t know what group a
particular species belongs to you can easily enter its taxonomic name into Google and find out
what the common name is (this is your chance to learn a little bit of zoology). There are no
strict limits on the minimum or maximum number of species you should include in your
analysis, but it is good if you obtain a spread across different classes of species if you can.
3. Alignment of Protein Sequences
The next step in the analysis is to input your file of sequences into MEGA and carry out a sequence
alignment. This is done within MEGA using the MUSCLE algorithm, using the following steps:
1. In order for MEGA to open your file of FASTA sequences, the file extension needs first to be
changed from .txt to .fas so that your computer recognises that it should open your file in
MEGA rather than Wordpad. If you are using Windows 10, these file extensions are hidden by
default, preventing you from quickly changing the file type. To make the file extensions visible
you need to first change the options of your Windows operating system. This can be done by
following two steps:
− Open Windows File Explorer. From the ribbon bar, select View>Options>Change folder and
search options.
− In the View tab of the Folder Options window, make sure that the “Hide extensions for known
file types” checkbox is disabled, then select the OK button to save.
− With this setting disabled, you should now be able to view extensions as a part of each file
name in Windows File Explorer.
− Right click on your .txt file with all of your sequences, select the Rename option, delete the txt
extension and replace it with fas.
2. Double click on your .fas file and select the option Align.
3. From the Alignment Explorer Window within MEGA select the Edit tab and click Select All.
Then select the Muscle icon (picture of an arm with biceps) and click Align Protein, followed
by OK.
4. On the Alignment Explorer Window select the Data tab and click on Phylogenetic Analysis. If
the main Molecular Evolutionary Genetics Analysis Window is hidden by the Alignment
Explorer Window, click on restore down so that the main window is visible. On this window
an icon TA should now have appeared. Click on the TA icon to open the Sequence Data
Explorer Window.
22
5. On the ribbon bar of the Sequence Data Explorer select Display and on the dropdown menu
unselect Use Identical Symbol. This will display the FASTA symbols of all amino acid residues.
6. Within the Sequence Data Explorer Window click on C from the top ribbon bar. This will
highlight all conserved amino acid residues yellow.
4. Constructing a Phylogenetic Tree
The final step is to construct a phylogenetic tree of the evolution of your protein.
1. In the main Molecular Evolutionary Genetics Analysis Window select Phylogeny from the top
ribbon bar and select Construct/Test Neighbor-Joining Tree. Answer Yes to the question of
whether you want to use the active data. In the window that appears, for the Test of
Phylogeny use the Bootstrap method with 1,000 as the No. of Bootstrap Replications. For the
Model/Method choose the Poisson model (the simplest model). Then click OK.
2. Now root the tree on the branch leading to the oldest organism (in the case of vertebrates
this is cartilaginous fish). This is done by clicking on the branch of the tree leading to
cartilaginous fish (i.e., sharks and stingrays) and then on the second icon from the top on the
left hand toolbar, “Root the tree on the selected branch”, in the Tree Explorer Window.
3. To export the tree as an image file or as a pdf, select Image from the ribbon bar of the Tree
Explorer Window and choose your preferred option. Your tree should be included in your
report.
Now your analysis is complete. If you wish you can play around with the software to see how the
results are affected by using an alignment algorithm other than MUSCLE, or how the phylogenetic tree
is influenced by methods other than the Bootstrap method for the Test of Phylogeny and by models
other than the Poisson model.
You should discuss these different methods/models as part of your report.
23
L3 PART 2: YOUR OWN INVESTIGATIONS
In your remaining lab sessions, design and implement further investigations computational methods
in chemical biology. Before beginning your investigative experiment, you should follow the
investigative experiment checklist on the next page. This involves checking with a demonstrator and
the service room that your proposed research question is feasible. You should then check with the
academic supervisor and get an academic to sign off on your HIRAC (only the front page of the HIRAC
form needs to completed for literature based projects).
Before commencing this experiment, you must complete a HIRAC form and submit it to an Academic,
along with your Name/SID, to be assessed. Show your signed HIRAC to a demonstrator to ensure the
marks are entered into the system. You may get your HIRAC assessed on any day prior to the
session/experiment that you are about to start.
A number of suggestions are given below but you are not limited to these:
− Find literature papers which have carried out a similar bioinformatic analysis and critically
analyse their results in light of your own findings.
− Devise a research question related to the phylogenetic ancestry of a particular protein and
use bioinformatic literature papers to answer your question.
− Examine the enzymatic kinetics of the protein you examined in experiment L3 part 1.
24
Investigative Experiment Checklist
Question Who to
consult
Completed?
Formulate a question that your experiment is designed to answer
(you can discuss this with a demonstrator). Suggestions are given
in your laboratory manual. You may choose to do something
suggested or design your own experiment.
Student
Check with a demonstrator that your idea is sensible, they may
want to see literature supporting your idea/question.
Demonstrator
Design an experiment to answer your question. You should have
literature references to support your question and experimental
design. Ensure the timeframe is appropriate and the outcomes
are achievable.
Student
Discuss your experimental design with a demonstrator or
member of academic staff. You may consult any academic in the
School of Chemistry, however it will be up to you to contact them
and make an appointment for a time when they are available. As
not all members of staff are involved with teaching Chem3 labs
you may need to explain to them what you are doing.
Demonstrator
or Academic
Check with the service room regarding any equipment and
chemicals that are required for your experiment. If equipment or
chemicals are required that are not in the teaching laboratory
service room staff will attempt to borrow them from research
groups. If chemicals are not available in the building you will
need to redesign your experiment – chemicals will not be
purchased from external suppliers for your experiment.
Service Room
Staff
Complete a HIRAC for your additional investigation. Student
Take your completed HIRAC to an academic from the requisite
discipline area (ask a demonstrator if unsure) and discuss your
experimental question and plan with them. A member of
academic staff must sign off your HIRAC before you commence
experimental work.
Academic
Show your signed HIRAC to a demonstrator to ensure you get
your Experiment 3 Part 2 HIRAC mark awarded online.
Demonstrator