QBUS3600 -无代写
时间:2025-08-31
QBUS3600 & QBUS6600
Data Explainer
UNICEF 2025 Semester 2
This is an example of what the data looks
like. For visualisation, only a subset of
columns and rows are shown
The table is organised so information
about the donors are first
Then there is information about the
transaction
These columns are helper flag columns. They can be used to organise the table
for the EDA. They are present in the training set, but not the test set. Do not use
these as features in your models.
These are the two target columns. In your EDA you will need to investigate the
relationships surrounding both these targets. In your modelling, you will choose
just one of these features to predict for.
This value is unique to each donor.
Each row represents one transaction. Some donors make multiple donations
and thus take up multiple rows (see red box). Their personal information is the
same across the rows, but the transaction data varies.
Other donors only donate once and thus have only one row.
You will use this column to tie in with the MOSAIC data, and possibly other
datasets (if you choose to use other datasets).
Remember! Postcodes are categorical not numerical!
This donor made 5 donations within the first 3 months of becoming a donor. This is the information for
which your prediction will be based on. After there initial 3 months of donation, in the next 24 months,
they make 8 more donations totalling $1042.30. This is the value that you are trying to predict.
You only make one prediction per donor (not per row). Feature engineering is essential.
The dashed blue line shows you the information you will have in the test set.
Case Study: Donor C-990440981 – LifeTime Value Project (LTV)
This donor made only one donation before becoming a RG.
No matter how many donations someone made before becoming a RG, you will make the
prediction based off of only their first donation. The prediction is the value of the
‘ConvertedTo_RG_Within_6M’ column.
The dashed blue line shows you the information you will have in the test set.
Case Study: Donor C-990440981 – Regular Giving Project (RG)
Training set 90% Holdout set 10%
Data Split
Your own test /
validation set
• Of the data UNICEF has given us, your training set makes up 90%. This is the
‘UNICEF2025S2_TrainingSet.csv’ that you have been given.
• When building your models, you will need to do your own train test split from this data
• 10% of the data has been used as a holdout set to verify your models. This is the
‘UNICEF2025S2_TestingSet_XX.csv’ that you have been given.
• You will feed your model the data from ‘UNICEF2025S2_TestingSet_XX.csv’. You will then put the
predictions in the `UNICEF2025S2_SubmissionTemplate.csv’ to be validated.
• The holdout set is further split in two. When validating your data, you will see your model's
performance based on one half (the “public” half) of this data – this will be on the leaderboard.
• Your model’s performance will also be measured on the other half of the data (the “private”
half), but you will not receive this metric. This is to prevent overfitting to the public half. If your
model is well calibrated, there should not be much of a difference between performance on
each set.
This is a sample of the
‘UNICEF2025S2_SubmissionTemplate.csv’.
For each donor, you will fill out one of the two right
columns.
This is an example of how your submission will be
marked. Your predicted values will go in one of the
two right columns.
The corresponding metric will be calculated against the values in the
‘Next_24_Month_Value_LTV’’ or ‘ConvertedTo_RG_Within_6M’ column.
The metric used for the regression task will me RMSE
The metric used for the classification task will be the Macro F1 score

学霸联盟
essay、essay代写