ORBS7030 Business Statistics with Python
Individual Case Study
Note:
1. This individual case study will serve as the final assessment of the course
ORBS7030.
2. Each student will be randomly assigned a data set.
3. All the selected data sets are within the Python package “sklearn”, which was
originally loaded with some small standard data sets for practicing.
▪ Use the following code to call the datasets from the package:
from sklearn import datasets
▪ Use the following code to call the specific data set:
Your_dataname = datasets.load_dataname()
(Note: “dataname” after the “load_” is to put the name of data set)
▪ The name of the data set that assigned to you will be shown in the
submission link on Moodle. You can log in your Moodle account to find
out.
4. Each student needs to conduct a data analysis and write a short report (5 to 10
pages, single-space) based on the given data set.
5. The submission deadline will be 11:59pm, 30/11/2021.
Instructions:
1. Briefly introduce your data set (10 marks)
▪ E.g. How many observations and variables are included?
▪ E.g. What does this data set stand for?
▪ Or any other information about the data set.
▪ (The link of the data set is attached in the Appendix of this file. Simply
look for the data set that assigned to you.)
2. Compute some descriptive statistics about the variables in your data set. (10
marks)
▪ E.g., the sample mean, standard deviation, median, mode, quartiles, etc.
▪ You may also consider using “.describe()” function.
3. Generate some figures about your data. (25 marks)
▪ It can be boxplot, histogram, bar-chart, scatter plot, …, or whatever
graphical tools we have mentioned during this course.
▪ For each plot, need to state the main title, label of -axis and -axis.
▪ For each plot, briefly describe your findings about the data.
4. Conduct appropriate hypothesis tests based on two or more variables of your
data set. (40 marks)
▪ Each data set contains more than two variables, you can think of your
own hypothesis, conduct the test, and make your own conclusion.
▪ When conducting the test, need to state the null and alternative
hypothesis; state the test statistic that would be used.
▪ Either computing the p-value or comparing with the critical value is
acceptable. Simply choose the method that you are more familiar with.
▪ State the conclusion after you complete the hypothesis testing.
5. Each student must use Python to do the above data analysis. (15 marks)
▪ Need to state all the codes you write in the Python script (either put the
codes during each step of your analysis or put all the codes in Appendix is
acceptable).
6. For all the statistics or criteria that required calculation, state the formula and
then show your answer.
7. Please convert your document to PDF file before submitting to Moodle.
8. Please note that plagiarism is strictly forbidden. If you are found copying
others’ work, both the plagiarist and the helper would have the same penalty: no
grade for this final assessment.
Appendix
Following are the selected data sets and related link of brief introduction:
Data set Link of information
boston https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html
breast_cancer https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
diabetes https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
iris https://archive.ics.uci.edu/ml/datasets/Iris
wine https://archive.ics.uci.edu/ml/datasets/Wine