MSBA7012/MACC7022-商业分析代写-Assignment 2
时间:2024-04-10
MSBA7012/MACC7022 Individual Assignment 2: Fraudulent Job Post Detection
Deadline: Sunday, April 28, 2024 11:59pm
Objective:
• Leverage Alteryx to develop a workflow that can preprocess data, engineer features, and build
a machine learning model to predict whether a job posting is fraudulent.
Dataset:
• The Balanced_Fraudulent_Job_Posts.xlsx dataset includes attributes related to job postings,
with key columns like 'title', 'company_profile', 'description', 'requirements', 'benefits', and
'fraudulent'.
Tasks:
1. Data Preprocessing:
• Use Alteryx to load the dataset and create a new column that combines the textual
data in 'title', 'company_profile', 'description', 'requirements', and 'benefits' columns.
• Perform text pre-processing on the combined text column.
2. Feature Engineering:
• Implement TF-IDF vectorization in Alteryx using the Python Tool to convert the text
data into a numerical format suitable for machine learning.
3. Model Building and Evaluation:
• Split the data into a training set and a testing set with a ratio of 70:30.
• Utilize Alteryx's Forest Model tool to train a model using the training set.
• Consider the TF-IDF counts only as the model features.
• Evaluate the model's performance on the testing set through the Model Comparison
tool and record the metrics (accuracy, F1-score, AUC, and confusion matrix).
4. Reporting:
• Create a report in Word to summarize the model evaluation results and insights into
the key factors that help predict fraudulent job postings.
Deliverables:
• An Alteryx workflow (.yxmd) containing the complete analysis, with annotations explaining
each tool and step. Use relative path for workflow dependencies in Alteryx so that the grader
can run your program without making any change.
• A Word document (.docx) summarizing the findings and insights from the model.
• Compress the above two files into a zip file named with your student ID, e.g., 123456.zip.
• You should not make any modifications to the input file: Balanced_Fraudulent_Job_Posts.xlsx.
Also, DO NOT include this input file in your zip file.
Evaluation Criteria:
• Correctness and completeness of the preprocessing and feature engineering steps
implemented in Alteryx.
• Accuracy and thoroughness of the model evaluation and interpretation of results within
Alteryx.
• Quality and clarity of the final report, including insights and conclusions drawn from the
analysis.
essay、essay代写