QBUS2820 Assignment 2 (25 marks) Semester 1, 2026 1 Background and Task Operational analytics teams often monitor the volume of failures in an automated system, such as failed scheduled jobs, pipeline interruptions, or report delivery errors. By tracking how these failure counts evolve over time, analysts can identify recurring patterns, assess operational pressure, and evaluate whether system reliability is improving or deteriorating. As one of the key indicators of automated process stability, this type of series provides useful insight into broader operational conditions and service quality. In this project, your goal is to develop a predictive model to forecast the monthly number of failures in an automated system given its historical values. The dataset system_failure_train.csv contains monthly failure-report data from January 2015 to December 2023 (108 data points). This dataset is based on a realistic operational monitoring series and has been anonymized for teaching purposes. Warning: the data has been anonymized and should not be interpreted as representing any real system, organisation, or production environment. The test dataset system_failure_test.csv (not provided) has the same structure as the training data and contains the monthly failure-report data for the next 12 months. To be specific, your task is to develop a time series model using system_failure_train.csv to forecast the monthly failure measures for the 12 months in the test period. Note that this is a multiple-step-ahead forecast problem. 1.1 Test Error To measure forecast accuracy, please use mean squared error (MSE) on the test data. Forecast horizon is 12 months, a multi-step forecast. 2 Submission Instructions 1. Please submit Two files via the Canvas site: • A Python file named SID_implementation.ipynb that implements your data analysis procedure and produces the test error, where SID is your student ID. • A CSV file system_failure_forecast.csv, that lists the 12 forecast values made by your final predictive model. This CSV file should have two columns: the first column named “Month” indicating the month, and the second column named “Failure_Reports” displaying the predicted value for each corresponding month. 2. The Python file must be written using Jupyter Notebook, assuming all necessary data files (system_failure_train.csv and system_failure_test.csv) are in the same folder as the Python file. • If the training of your model involves generating random numbers, the random seed in SID_implementation.ipynb must be fixed, e.g., np.random.seed(0), so that the marker expects to have the same results as you had. • The Python file SID_implementation.ipynb must include the following code in the last code cell: import pandas as pd system_failure_test = pd.read_csv('system_failure_test.csv') # YOUR CODE HERE: code that produces the test error print(test_error) • The idea is that, when the marker runs SID_implementation.ipynb, with the test data system_failure_test.csv in the same folder as the Python file, they expect to see the same test error as you would if you were provided with the test data. A dummy test file has been posted to canvas ‘system_failure_test_novals.csv’ as a fake test file for you to make sure the format of your submission is correct. • The file should contain sufficient explanations so that the marker knows how to run your code. • Restrict to methods seen in class and simple preprocessing steps. You may use auxiliary methods from publicly available python packages. • The Jupyter Notebook should comprehensively describe your data analysis procedure. It should provide detailed insights so that fellow data scientists, who have relevant background knowledge, can understand and replicate the task. Please ensure that the Notebook is well-structured, with organized sections and subsections, effectively uses markdown cells to explain the code, includes appropriate visualizations, and follows best practices for clarity and coherence. The notebook should serve as a report and should be well presented. 3 Marking Criteria This assignment is worth a total of 25 marks, with 11 marks allocated for prediction accuracy and 14 marks for the presentation of the Jupyter Notebook (including the description of your data analysis procedure). The breakdown is as follows: 3. Forecast Accuracy: Your test error will be compared against the smallest test error among all submissions, including the teaching team. The marker first runs SID_implementation.ipynb. • If the file runs smoothly and produces a test error, up to 11 marks will be awarded based on prediction accuracy relative to the smallest MSE and the appropriateness of your implementation. • If the marker cannot run SID_implementation.ipynb or if no test error is produced, partial marks (maximum 3) may be awarded based on the appropriateness of the file. 4. Notebook Presentation and Data Analysis Procedure in SID_implementation.ipynb: Up to 14 marks are allocated based on: • The readability and organization of the Notebook; • The appropriateness of the chosen forecasting method; • The detail, discussion, and explanation of your data analysis procedure. 5. CSV File Submission: Up to 2 marks will be deducted if you fail to upload the CSV file in the correct format. 6. Late submission: The late penalty for the assignment is 5% of the assigned mark per calendar day. The closing date is the last date on which the assessment will be accepted for marking. Assignments submitted after the closing date will be assigned a zero mark. 4 Special Consideration If you have been granted special consideration with an extended due date later than the original closing date, your new closing date will be the same as your extended due date. No submissions will be accepted after this date. Failure to submit by your extended due date will result in a mark of zero. 5 Final Notes If you believe there are errors in the assignment, please contact the teaching team as soon as possible. We encourage you to read the instructions carefully and seek clarification early if you are unsure about any requirements.
学霸联盟