BIEN/CENG 2310
MODELING FOR CHEMICAL AND BIOLOGICAL
ENGINEERING
HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY, FALL 2021
HOMEWORK #8 (DUE NOV. 23, 2021)
1. In this problem, we will use simulation in MATLAB to verify that the confidence interval
works as intended. When we try to measure a quantity experimentally, we like to repeat the
measurements several times and report the mean of the measurements. This is analogous to
drawing a sample from a population described by an unknown probability distribution, and
trying to estimate its parameter (in this case, the mean) from the sample. Because the sample
is limited, our reported sample mean will not be exactly the population mean, and we learned
that we should put a confidence interval around our reported sample mean to represent the
uncertainty in our estimate.
(a) Write a MATLAB function that takes in the population mean, , and standard deviation,
, the sample size , and the number of times the experiment is repeated, . To simulate
one experiment, draw a sample of values from this population (these are our
measurements), calculate the sample mean together with the 95% confidence interval of
this estimate. Then, check to see if the population mean is within this confidence interval.
For this part, assume that the population is normally distributed.
By repeating this sampling experiment times, verify that approximately 5% of the
experiments will lead to a confidence interval that does not contain the population mean.
Try varying , , , and , and see what happens.
(b) We also learned that even if the population is not normally distributed, the same method
of estimating a confidence interval will also work if the sample is large enough. This is
known as the Central Limit Theorem. Modify your program in Part (a), this time assuming
your population is exponentially distributed with mean . (Unlike the normal
distribution, the exponential distribution only has one parameter ; its mean is equal to
its standard deviation.) Verify that if is large, approximately 95% of the confidence
intervals will contain , but it is no longer the case if is small.
(c) Suppose we are interested in estimating the population variance rather than the
population mean. Modify your program in Part (a) to produce an estimate of 2 with a
95% confidence interval from samples of size . Check that the estimate is unbiased, that
is, the expected value of the estimate approaches 2 . By repeating this sampling
experiment times, verify that approximately 5% of the experiments will lead to a
confidence interval that does not contain 2.
DELIVERABLES:
Part (a) will be done in class, and the program will be provided.
Submit the modified programs for Parts (b) and (c), and rename them to
verifyConfInt_exp_
_.m and
verifyConfInt_var__.m, respectively. Your programs do not
have to produce any plot.
Experiment with your programs to check that the confidence intervals behave as expected under
different parameter settings, but there is no need to submit any writeup.
2. Recall the rocket problem we studied in Module 6. The altitude of the rocket as a function of
time, () can be modeled by the second-order ODE:
2
2
= − − (
)
2
sgn (
) ; ( = 0) = 0 ;
|
=0
= 0
where is the drag coefficient per mass of the rocket, 0 is the initial speed, and is the
acceleration due to gravity at the Earth’s surface, = 9.8 ms−2. For simplicity, since our
rocket is not flying very high, we will ignore the altitude dependence of the acceleration due
to gravity.
Our unknown parameters are and 0. (Although we can adjust the initial propulsion force
on the rocket, the actual velocity is hard to control exactly, so we decided to learn it from the
data.) We install an altimeter on the rocket to measure the altitude once every second during
the flight. The data is in the provided file ‘hw8_q2.mat’: tdata contains the time points (in
seconds), and ydata1 contains the measured altitudes (in meters) at those time points.
(a) Use nlinfit, lsqcurvefit, or fitnlm to obtain the least-square estimates for and
0, and also report their 95% confidence intervals.
(b) Out of curiosity, you decided to treat the acceleration due to gravity, , as a parameter to
be fitted as well. What least-square estimates do you get for , 0, and ? How does it
compare to what you got in Part (a)? What lesson do you learn from this exercise about
choosing parameters in nonlinear regression?
(c) To get more accurate estimates, you decide to repeat the experiment, but with greater
initial propulsion force you used before. Let the initial velocity in your second attempt to
be 0
′ . You expect that the drag coefficient should be the same both times. The altitude
data of the second attempt is in the vector ydata2 in the provided data file. Perform a
least-square fit on these two sets of data with lsqcurvefit to find , 0, and 0
′ . How
does your estimate of and its confidence interval compare to what you had in Part (a)?
DELIVERABLES:
This problem will be done in class in a demo. No need to submit anything.
3. The -test is perhaps the most often used statistical test in science. It helps us check whether
there is a “statistically significant” difference in the outcomes between two experimental
conditions. In this problem, we will try to verify that it works as intended. Let’s say we are
testing a drug that promotes hair growth, and we would like to show that the mean hair
growth of people taking the drug, 1, will be higher than that of people taking a placebo, 0.
We will also assume the population standard deviation of the hair growth, , will be the same
for both groups, and both populations are normally distributed.
We define our null hypothesis to be 0 = 1, i.e., the drug has no effect on hair growth, and
our alternative hypothesis to be 1 > 0.
(a) Write a MATLAB program that takes in 0, 1, , the sample size , and the number of
times the experiment will be repeated, . To simulate one experiment, draw a sample of
values of hair growth from the population taking the drug, and values from the
population taking the placebo, and perform a one-tailed -test at a 5% significance level
to determine if we should reject the null hypothesis. Repeat the experiments times, and
return the fraction of times that the null hypothesis is rejected.
(b) Run your program with the setting 0 = 1, i.e., the null hypothesis is true. Verify that in
approximately 5% of the experiments, the -test will produce a “false positive,” i.e.,
incorrectly rejecting the null hypothesis. Try varying 0 , , , and , and see what
happens.
(c) The “power” of a statistical test is the probability that we will correctly reject the null
hypothesis when it is in fact false. The MATLAB function sampsizepwr can be used to
estimate the power of the one-tailed -test. To check that this power estimate is correct,
run your program with the setting 0 < 1 , and verify that the power estimate is
approximately equal to the fraction of times that the null hypothesis is correctly rejected.
DELIVERABLES:
Submit one MATLAB program with the following function definition:
[fracPositives, power] = verifyTTest__(mu0, mu1,
sigma, n, m)
where fracPositives is the fraction of the m experiments for which the null hypothesis is
rejected, and power is the estimated power of the test as estimated by sampsizepwr.
Experiment with your program to check that the t-test behaves as expected, under different
parameter settings, but there is no need to submit any writeup.
4. In Module 5, we learned that the “rate law” of a chemical reaction is often empirically
determined, from which we can guess the reaction mechanism. Suppose we run the reaction
→ in a solvent (water) in a continuous flow reactor, with no or in the reactor
initially, and introduced at a fixed concentration [] in the feed stream. The inlet and
outlet flow rates are equal (i.e., = = ) such that the volume of the reacting
mixture, , does not change. We measure the concentration of in the outlet stream as a
function of time, and fit the data points to the prediction of two possible models:
Reaction mechanism 1. spontaneously converts to :
= 1[]
Reaction mechanism 2. Two molecules of activate by collision, then convert to :
= 2[]
2
where is the rate of generation of per unit volume, 1, 2, are parameters to be
determined by regression, and [] is the concentration of in the reactor.
(a) In our first experiment, we set the inlet concentration of , [] to be 1, and the
residence time = / to be 1. The data is provided in hw8_q4.mat, with tvec
containing the time points of measurements, and Bvec1 containing the concentration of
in the outlet stream at those time points. Using MATLAB, fit the two models to the data
points and provide the coefficient of determination, 2, for each of the two fits. Overlay
the best-fit curves and the data points in one plot. Which model appears to better explain
the data? Why?
(b) Not satisfied with your answer in Part (a), you decide to do one more experiment with
the same setup, but increasing [] to 1.2. The data for this new experiment is in the
vector Bvec2 in the same hw8_q4.mat file (the time points are the same). Using
lsqcurvefit, fit the data of both experiments simultaneously to find the best parameters
for each model. This time, to avoid making the plot too difficult to read, provide separate
plots (including the data points of both experiments and the fitted curves) for each of the
two models. Which model appears to better explain the data? Why? Compared to Part
(a), are you now more or less confident in your decision?
(c)
(d)
(e)
DELIVERABLES:
We will demo fitting one of the two models for Part (a).
Submit any MATLAB programs you used for this analysis, including functions for solving the ODEs,
and functions for performing the nonlinear regression. Also submit a typewritten or scanned
handwritten write-up to show how you set up the ODEs, and answer the questions. You may insert
the required plots into this write-up, or submit them as separate images._.m and
verifyConfInt_var__.m, respectively. Your programs do not
have to produce any plot.
Experiment with your programs to check that the confidence intervals behave as expected under
different parameter settings, but there is no need to submit any writeup.
2. Recall the rocket problem we studied in Module 6. The altitude of the rocket as a function of
time, () can be modeled by the second-order ODE:
2
2
= − − (
)
2
sgn (
) ; ( = 0) = 0 ;
|
=0
= 0
where is the drag coefficient per mass of the rocket, 0 is the initial speed, and is the
acceleration due to gravity at the Earth’s surface, = 9.8 ms−2. For simplicity, since our
rocket is not flying very high, we will ignore the altitude dependence of the acceleration due
to gravity.
Our unknown parameters are and 0. (Although we can adjust the initial propulsion force
on the rocket, the actual velocity is hard to control exactly, so we decided to learn it from the
data.) We install an altimeter on the rocket to measure the altitude once every second during
the flight. The data is in the provided file ‘hw8_q2.mat’: tdata contains the time points (in
seconds), and ydata1 contains the measured altitudes (in meters) at those time points.
(a) Use nlinfit, lsqcurvefit, or fitnlm to obtain the least-square estimates for and
0, and also report their 95% confidence intervals.
(b) Out of curiosity, you decided to treat the acceleration due to gravity, , as a parameter to
be fitted as well. What least-square estimates do you get for , 0, and ? How does it
compare to what you got in Part (a)? What lesson do you learn from this exercise about
choosing parameters in nonlinear regression?
(c) To get more accurate estimates, you decide to repeat the experiment, but with greater
initial propulsion force you used before. Let the initial velocity in your second attempt to
be 0
′ . You expect that the drag coefficient should be the same both times. The altitude
data of the second attempt is in the vector ydata2 in the provided data file. Perform a
least-square fit on these two sets of data with lsqcurvefit to find , 0, and 0
′ . How
does your estimate of and its confidence interval compare to what you had in Part (a)?
DELIVERABLES:
This problem will be done in class in a demo. No need to submit anything.
3. The -test is perhaps the most often used statistical test in science. It helps us check whether
there is a “statistically significant” difference in the outcomes between two experimental
conditions. In this problem, we will try to verify that it works as intended. Let’s say we are
testing a drug that promotes hair growth, and we would like to show that the mean hair
growth of people taking the drug, 1, will be higher than that of people taking a placebo, 0.
We will also assume the population standard deviation of the hair growth, , will be the same
for both groups, and both populations are normally distributed.
We define our null hypothesis to be 0 = 1, i.e., the drug has no effect on hair growth, and
our alternative hypothesis to be 1 > 0.
(a) Write a MATLAB program that takes in 0, 1, , the sample size , and the number of
times the experiment will be repeated, . To simulate one experiment, draw a sample of
values of hair growth from the population taking the drug, and values from the
population taking the placebo, and perform a one-tailed -test at a 5% significance level
to determine if we should reject the null hypothesis. Repeat the experiments times, and
return the fraction of times that the null hypothesis is rejected.
(b) Run your program with the setting 0 = 1, i.e., the null hypothesis is true. Verify that in
approximately 5% of the experiments, the -test will produce a “false positive,” i.e.,
incorrectly rejecting the null hypothesis. Try varying 0 , , , and , and see what
happens.
(c) The “power” of a statistical test is the probability that we will correctly reject the null
hypothesis when it is in fact false. The MATLAB function sampsizepwr can be used to
estimate the power of the one-tailed -test. To check that this power estimate is correct,
run your program with the setting 0 < 1 , and verify that the power estimate is
approximately equal to the fraction of times that the null hypothesis is correctly rejected.
DELIVERABLES:
Submit one MATLAB program with the following function definition:
[fracPositives, power] = verifyTTest__(mu0, mu1,
sigma, n, m)
where fracPositives is the fraction of the m experiments for which the null hypothesis is
rejected, and power is the estimated power of the test as estimated by sampsizepwr.
Experiment with your program to check that the t-test behaves as expected, under different
parameter settings, but there is no need to submit any writeup.
4. In Module 5, we learned that the “rate law” of a chemical reaction is often empirically
determined, from which we can guess the reaction mechanism. Suppose we run the reaction
→ in a solvent (water) in a continuous flow reactor, with no or in the reactor
initially, and introduced at a fixed concentration [] in the feed stream. The inlet and
outlet flow rates are equal (i.e., = = ) such that the volume of the reacting
mixture, , does not change. We measure the concentration of in the outlet stream as a
function of time, and fit the data points to the prediction of two possible models:
Reaction mechanism 1. spontaneously converts to :
= 1[]
Reaction mechanism 2. Two molecules of activate by collision, then convert to :
= 2[]
2
where is the rate of generation of per unit volume, 1, 2, are parameters to be
determined by regression, and [] is the concentration of in the reactor.
(a) In our first experiment, we set the inlet concentration of , [] to be 1, and the
residence time = / to be 1. The data is provided in hw8_q4.mat, with tvec
containing the time points of measurements, and Bvec1 containing the concentration of
in the outlet stream at those time points. Using MATLAB, fit the two models to the data
points and provide the coefficient of determination, 2, for each of the two fits. Overlay
the best-fit curves and the data points in one plot. Which model appears to better explain
the data? Why?
(b) Not satisfied with your answer in Part (a), you decide to do one more experiment with
the same setup, but increasing [] to 1.2. The data for this new experiment is in the
vector Bvec2 in the same hw8_q4.mat file (the time points are the same). Using
lsqcurvefit, fit the data of both experiments simultaneously to find the best parameters
for each model. This time, to avoid making the plot too difficult to read, provide separate
plots (including the data points of both experiments and the fitted curves) for each of the
two models. Which model appears to better explain the data? Why? Compared to Part
(a), are you now more or less confident in your decision?
(c)
(d)
(e)
DELIVERABLES:
We will demo fitting one of the two models for Part (a).
Submit any MATLAB programs you used for this analysis, including functions for solving the ODEs,
and functions for performing the nonlinear regression. Also submit a typewritten or scanned
handwritten write-up to show how you set up the ODEs, and answer the questions. You may insert
the required plots into this write-up, or submit them as separate images.
学霸联盟