GEA1000 Quantitative Reasoning with Data
2021/2022 Semester 2
Midterm Test
Solutions
NOTE
1. Check your scores for the test via the LumiNUS GradeBook.
2. Log in to Examplify to view a copy of the answer script you have submitted.
Please refer to the following NUS Wiki to find out how you may do so
https://wiki.nus.edu.sg/x/hQ4UDQ
Should Examplify display a score, please ignore it. We implemented a grading
scheme that Examplify was not able to do (See Point 4). Thus the score on the
LumiNUS GradeBook is the only score you should care about.
3. There are two versions each of Questions 8 to 10.
Students were allocated one version of each of these questions.
4. (READ THIS CAREFULLY)
Multiple Response Questions (MRQ) are graded according to the following scheme.
Suppose a question has 3 options (ABC), and AC are the correct/intended
answers. This means that AC should be selected while B should not.
We award 1/3 mark per option for student making the correct choice of
selecting it or not selecting it. The student gets 1/3 mark for each correct
option chosen, and 1/3 mark for each incorrect option not chosen.
Consider the following examples:
• Student 1 chooses ABC, he chose correctly for A and C, but chose
wrongly for B. He gets 2/3.
• Student 2 chooses A, she chose correctly for A and B, but chose
wrongly for C. She gets 2/3.
• Student 3 choose AB, she chose correctly for A, but chose wrongly for
B and C. She gets 1/3.
In general, for a MRQ with n options, each part will be given a weight of 1/n. A
student gets 1/n mark for each correct option chosen, and 1/n mark for each
incorrect option not chosen.
Note that ONLY Questions 1, 4, 5 & 6 are MRQs.
Q1 [MRQ]
A researcher in University X wanted to conduct a survey to find out the average amount of
time spent studying weekly, by students in the university. He obtained the list of email
addresses of all 2000 students in the university and sent out a survey form to everyone. As a
token of participation, students who filled up the form received a ‘10% off’ coupon from the
university’s bookshop. 300 students responded to the survey.
Which of the following statements is/are correct? Select all that apply.
A. The study is likely to contain non-response bias.
B. The study is likely to contain selection bias.
C. The study uses a census.
Explanation:
A is true, because only 300 out of 2000 students were willing to respond to the survey.
B is false, because selection bias arises from poor sampling method or frame – neither of
which is described in the question.
C is true, because the researcher’s population of interest is all students in University X, and
he sent a survey to all students in University X.
Q2
A researcher wishes to study procrastination and social anxiety levels amongst students
majoring in architecture in University X. He wanted to collect a sample of 100 students out of
the 1000 students majoring in architecture. He took a name list of all architecture students in
University X. We can assume all the architecture students in University X are randomly
ordered in this name list, from 1 to 1000. The researcher rolled a fair six-sided die, which
landed on 3.
He then decided to pick the 3rd student in the name list, and every 10th student afterwards
until he collected his desired sample size of 100 students. That is, he selects the 3rd student,
13th student, 23rd student... until he gets his desired sample size of 100 students.
What kind of sampling method did the researcher employ?
A. Systematic sampling
B. Simple random sampling
C. Non-probability sampling
D. Stratified sampling
Explanation:
Even though the number 3 was chosen randomly using a fair die, the 7th to the 10th person
on the name list has a zero chance of being selected, as a fair die is only six-sided. This
means the 17th to 20th, 27th to 30th, … , 997th to 1000th student, has no chance of being
chosen.
As some of the architecture students have a zero chance of being selected, this sampling
method is a non-probability sampling method.
Q3
Adam is a supervisor of the Call-Centre department of his company. He is interested to know
if there is a relationship between having mid-day naps and the average number of calls
completed per individual among all 500 workers in his department. Of the 500 workers, 400
are females and 100 are males. He uses a randomised mechanism in assigning 250 of the
workers in treatment and 250 of the workers in control. Workers in the treatment group are
given a mid-day nap between 2p.m. – 2.30p.m. every day, while those in the control group
were not given the mid-day nap. We assume that all in the treatment group took their daily
nap of 30 minutes. The number of calls cleared by the individual workers were recorded for a
month, and it is noted that there is a positive association between nap and daily average
number of completed calls among the 500 workers.
Which of the following must necessarily be true?
A. The study’s findings are generalisable to the target population of interest.
B. There will be 50 males and 200 females randomly assigned to the treatment
group, and 50 males and 200 females randomly assigned to the control group.
C. The above is an example of a randomised, double-blind controlled experiment.
D. None of the other options
Explanation:
Adam is interested to find out about the association between mid-day nap and the number of
completed calls. His target population therefore is the 500 workers in his company.
Therefore, a is correct as the above study is a census-wide study. While the randomized
assignment process may on average lead to the assignment of 50 males and 200 females
into the treatment and 50 males and 200 females into the control, there could be variations
in the actual assignment due to the small population of interest. Therefore b is incorrect. It
was not stated if the administrators are blinded, or the workers are blinded from whether
they belong to the treatment or control group. Therefore it is not necessarily true that the
above is an example of a double-blind experiment.
Q4 [MRQ]
A study was conducted to understand the relationship between a patient’s age and having
cardiovascular disease (CVD). The information on the variables ‘Age’ (Young/Old) and ‘CVD’
(Has CVD/No CVD) was collected in the table below.
Young Old Total
Has CVD 100 50 150
No CVD 100 200 300
Total 200 250 450
Furthermore, it is known that a third variable, ‘Smoking’, is associated with ‘CVD’. Using only
the information given, which of the following statements must be true? Select all that apply.
A. Young patients are positively associated with having CVD
B. ‘Smoking’ is a confounder when examining the association between ‘Age’ and ‘CVD’.
C. ‘CVD’ is a confounder when examining the association between ‘Age’ and ‘Smoking’.
Explanation:
A is true, because rate(Has CVD|Young) = 100/200 > 50/250 = rate(Has CVD|Old).
A variable is shown to be a confounder when it is associated with both the other two
variables of interest. We are given that ‘Smoking’ is associated with ‘CVD’, and we know that
‘Age’ is associated to ‘CVD’ from the table. Thus, C is true.
However, we do not know if ‘Age’ and ‘Smoking’ are associated. Thus, B is false.
Q5 [MRQ]
A study was conducted to understand the relationship between a patient’s duration of
infection of Covid-19 and age. The duration of infection has 2 categories: ‘Short’ and ‘Long’.
The age of the patient also has 2 categories: ‘Young’ and ‘Old’. The data was analysed, and
it was found that younger patients are positively associated with a longer duration of
infection.
An additional variable, ‘Vaccination status’, was also collected during the study. Vaccination
status has 2 categories: ‘Vaccinated’ and ‘Unvaccinated’. After slicing according to the
vaccination status of the patients, researchers noted that they have observed Simpson’s
Paradox. It was also noted that the unvaccinated are positively associated with a longer
duration of infection.
Which of the following statements must be true? Select all that apply.
A. There is an association between the vaccination status and the age of the patient.
B. Being vaccinated is negatively associated with older patients.
C. Being vaccinated is positively associated with shorter durations of infection.
Explanation:
Since Simpson’s Paradox is observed after slicing according to the vaccination status of the
patient, it means that the vaccination status of the patient is associated with both the age of
the patient and the duration of infection. Hence, there is an association between the
vaccination status of the patient and the age of the patient.
However, we do not know anything about the direction of the association between the
vaccination status of the patient and the age of the patient.
Lastly, since there is a positive association between unvaccinated patients and longer
duration of infection, this also means that being vaccinated is positively associated with
shorter durations of infection.
Q6 [MRQ]
In this question, we are focused only on the final year students in AMAS university in a
certain year. You are given that 75% of all these students graduated, and that 60%
graduated among the female students. Which of the following statements must be true?
Select all that are true.
A. The percentage of graduates among the female students is less than the
percentage of graduates among the male students.
B. The percentage of female students among the non-graduates is greater than the
percentage of female students among the graduates.
C. The percentage of male students among the graduates is less than the
percentage of male students among the non-graduates.
D. The percentage of male students is greater than the percentage of male students
among the non-graduates.
Explanation:
Only (A), (B) and (D) are true.
Let G = graduates, NG = non-graduates, M = males and F = females.
Then we are given that rate(G) = 75% and rate(G | F) = 60%.
By the basic rule on rates, since rate(G) > rate(G | F), we must have
rate(G | M) > rate(G) > rate(G | F).
Hence (A) is true. By symmetry rule, we have rate(F | G) < rate(F | NG), i.e., (B) is true.
By symmetry rule, we also have rate(M | G) > rate(M | NG), implying that (C) is false.
Finally, by the basic rule on rates, we must have rate(M | G) > rate(M) > rate(M | NG). Thus
(D) is true.
Q7
Consider a data set consisting of values for a numerical variable . Let the values be
1,2, … , arranged in ascending order. A value is said to be the balancing point of in
the data set if the following condition is satisfied. ( − 1) + ( − 2) + ⋯+ ( − ) = (+1 − ) + (+2 − ) + ⋯+ ( − )
where 1,2, … , are the values of in the data set that are smaller or equal to and
+1,+2, … , are the values of in the data set that are larger than . For example
consider a small data set {1, 3, 5, 5, 5, 7, 9}. In this case the value 5 is the balancing point of
the data set since (5− 5) + (5− 5) + (5− 5) + (5− 3) + (5− 1) = (7− 5) + (9− 5)
Which of the two statements below is/are true?
I) The median of is always the balancing point of in any data set.
II) The mode of is always the balancing point of in any data set.
A. I only
B. II only
C. Both I and II
D. Neither I nor II.
Explanation : The answer is D. Neither median nor mode is always the balancing point of a
data set. Consider a small data set {1, 1, 3, 4, 5}. The median of the data set is 3. However
we observe that (3 – 1) + (3 – 1) is not the same as (4 – 3) + (5 – 3). Similarly, the mode is 1
but once again we see that (1 – 1) + (1 – 1) is not the same as (3 – 1) + (4 – 1) + (5 – 1).
Q8A
An examination was given to Class A and Class B, which consisted of 20 students each. The
score of each student is between 0 and 100.
The range of scores in Class A is from 70 to 90. All the students in Class B scored less than
40 marks. Due to manpower shortages, Class A and Class B were combined to form Class
C. Hence Class C now contains 40 students, who were previously from Class A and Class
B.
Which of the following statements about the relationship between the median score in Class
C and the median score in Class A is always true?
A. The median score in Class C must be lower than median score in Class A.
B. The median score in Class C must be the same as the median score in Class A.
C. The median score in Class C must be higher than the median score in Class A.
D. There is insufficient information to deduce the relationship between the median score
of Class C and the median score of Class A.
Explanation:
Since Class A and Class B have the same number of students, and all students in Class A
scored strictly greater than the maximum score of Class B, the median for Class C will be
lower than the minimum score of Class A. Hence the median of Class C will be lower than
the median of Class A.
Q8B
An examination was given to Class A and Class B, which consisted of 20 students each. The
score of each student is between 0 and 100.
The range of scores in Class A is from 70 to 90. All the students in Class B scored less than
40 marks. Due to manpower shortages, Class A and Class B were combined to form Class
C. Hence Class C now contains 40 students, who were previously from Class A and Class
B.
Which of the following statements about the relationship between the median score in Class
C and the median score in Class B is always true?
A. The median score in Class C must be lower than median score in Class B.
B. The median score in Class C must be the same as the median score in Class B.
C. The median score in Class C must be higher than the median score in Class B.
D. There is insufficient information to deduce the relationship between the median score
of Class C and the median score of Class B.
Explanation:
Since Class A and Class B have the same number of students, and all students in Class B
scored strictly lesser than the minimum score of Class A, the median for Class C will be
higher than the maximum score of Class B. Hence the median of Class C will be higher than
the median of Class B.
Q9A
The contingency table below shows the classification of hair descriptions for an international
school in Singapore.
Hair colour
Hair type
Total
Straight Curly
Male Female Male Female
Red 7 9 8 5 29
Brown 35 20 12 16 83
Blonde 51 55 38 27 171
Black 22 25 19 24 90
Total 115 109 77 72 373
The marginal rate, rate(Curly), is ________%; while the joint rate, rate(non-Black and
Female), is ________%.
Give each answer as a percentage correct to 2 decimal places.
Explanation:
To calculate the marginal rate, rate (Curly), we take the column totals of all Curly-haired
persons (both Male and Female) divided by the grand total of everyone in the data set, i.e.
(77+72)/373 ≈ 39.95% (2 d.p.)
Then, to calculate the joint rate, rate(non-Black and Female), we take the count of “Females
with non-black hair” divided by once again the grand total of everyone in the data set, i.e.
(9+20+55+5+16+27)/373 ≈ 35.39% (2 d.p.)
Q9B
The contingency table below shows the classification of hair descriptions for an international
school in Singapore.
Hair colour
Hair type
Total
Straight Curly
Male Female Male Female
Red 7 8 9 5 29
Brown 53 36 24 36 149
Blonde 11 5 20 25 61
Black 50 55 11 18 134
Total 121 104 64 84 373
The marginal rate, rate(Curly), is ________%; while the joint rate, rate(non-Black and
Female), is ________%.
Give each answer as a percentage correct to 2 decimal places.
Explanation:
To calculate the marginal rate, rate (Curly), we take the column totals of all Curly-haired
persons (both Male and Female) divided by the grand total of everyone in the data set, i.e.
(64+84)/373 ≈ 39.68% (2 d.p.)
Then, to calculate the joint rate, rate(non-Black and Female), we take the count of “Females
with non-black hair” divided by once again the grand total of everyone in the data set, i.e.
(8+36+5+5+36+25)/373 ≈ 30.83% (2 d.p.)
Q10A
The bar graph below shows the number of gamers and non-gamers among males and
females. Which of the following statements is/are true?
A. There is a negative association between being female and being a gamer since
Rate(Female|Gamer) = 0.33 is less than Rate(Female|Non-Gamer) = 0.53
B. There is a negative association between being female and being a gamer since
Rate(Gamer|Female) = 0.4 is less than Rate(Gamer|Male) = 0.67
C. There is a negative association between being female and being a gamer since
Rate(Female|Gamer) = 0.33 is less than Rate(Male|Gamer) = 0.67
Explanation:
To establish association, we should be comparing Rate(A|B) and Rate(A|NB), thus option
(C) is wrong. Based on the graph, the contingency table can be constructed as shown:
Female Male Row total
Gamer 48 96 144
Non-Gamer 72 64 136
Column total 120 160 280
Rate(Gamer|Male) = 96/160 = 0.6 not 0.67. Therefore (B) is wrong.
Rate(Female|Gamer) = 48/144 = 0.33
Rate(Female|Non-Gamer) = 72/136 = 0.53
Since Rate(Female|Gamer) < Rate(Female|Non-Gamer), there is a negative association
between being female and being a gamer. (A) is true.
Q10B
The bar graph below shows the number of gamers and non-gamers among males and
females. Which of the following statements is/are true?
A. There is a positive association between being male and being a gamer since
Rate(Male|Gamer) = 0.68 is more than Rate(Male|Non-Gamer) = 0.53
B. There is a positive association between being male and being a gamer since
Rate(Gamer|Male) = 0.60 is more than Rate(Gamer|Female) = 0.33
C. There is a positive association between being male and being a gamer since
Rate(Male|Gamer) = 0.68 is more than Rate(Female|Gamer) = 0.33
Explanation:
To establish association, we should be comparing Rate(A|B) and Rate(A|NB), thus option
(C) is wrong. Based on the graph, the contingency table can be constructed as shown:
Female Male Row total
Gamers 52 108 160
Non-Gamers 64 72 136
Column total 116 180 296
Rate(Gamer|Female) = 52/116 = 0.45 not 0.33. Therefore (B) is wrong.
Rate(Male|Gamer) = 108/160 = 0.68
Rate(Male|Non-Gamer) = 72/136 = 0.53
Since Rate(Male|Gamer) > Rate(Male|Non-Gamer), there is a positive association between
being male and being a gamer. (A) is true
APPENDIX: How to read your Examplify report
This is how your report will look like:
Note: Questions 1, 4, 5 & 6 are MRQs. They will be scored according to Point 4 of Page 1.
For this report,
• For Q1, both the correct options A & C (highlighted in green) were chosen, and the wrong
option B was NOT chosen. This will be scored 1 mark.
• For Q2, a MCQ, an incorrect option A (highlighted in red) was chosen. This will be scored 0.
• For Q5, a correct option A (highlighted in green) and the incorrect option B (highlighted in red)
were chosen. The other correct option C was NOT chosen. This will be scored 1/3.
• For Q6, the correct options are A, B, D. Only one of the 3 correct options was chosen, and the
incorrect option C was not chosen. This will be scored 1/4 + 1/4 = 1/2.