IB9JV0 Individual Project (100%)
You are provided with a collection of datasets that are published by UK government. Those
datasets include: School information, KS2 final performance, pupil information and school
workforce. Datasets are all in .csv files.
The datasets are download from:
Explanation of the datasets can be found from:
Set up an imaginary scenario, identify and propose a practical problem(s). e.g. To help
Department for Education to help school’s performance; To help tutoring school to improve
marketing effectiveness, etc.
Carry out a data science project based on the imaginary scenario and given datasets.
Write a report summarizing your scenario and work, and explaining how your work would help
to solve the proposed problem(s).
1. All codes must be implemented using Python.
2. You should use Jupyter Notebook to work on this project and submit the .ipynb file.
3. You are required to write an “executive summary” (word or pdf file) to present your work.
The summary should be no more than three pages (double spaced, excluding any figures,
tables, and references)
4. Codes must be well documented with comments.
5. You should also include narratives along your codes using Markdown to explain and justify
your steps, as well as describe any insights gained from each step.
6. You may search online or discuss with other students, but each student must work
1. Additional Python packages (not covered in class) are welcomed to use. But they should
be well documented through Markdown.
2. Comments are different from explanations using Markdown.
3. Here is the importance of each component of your work. The percentage is only indicative.
 Explanation/Description using Markdown (20% - 25%)
 Code, including comments (60% - 65%)
 Executive summary (15% - 20%)
4. You are not required to use all four datasets in your analysis. Depending on your imaginary
scenario and proposed problems, you may also collect additional data from other sources to
help your project. But your analysis should include at least one of the given datasets.
5. The accuracy (or other metrics) of your final prediction model is less important than the
process to achieve and improve that value.
6. Given the size of dataset, it may take time to train your model.