ISE535-r代写
时间:2023-03-26
ISE 535 Data Mining Homework 4 Submit on March 29 by 12.30 pm
The file FlightDelays.csv consist of all flights from the Washington, DC area into the New York City
area during January 2004. A record is a particular flight. The data were obtained from the Bureau of
Transportation Statistics (available at www.transtats.bts.gov).
The goal is to accurately predict whether or not a new flight (not in this dataset), will be delayed
using as predictors the variables DAY_WEEK, CRS_DEP_TIME, ORIGIN, DEST, CARRIER. The target
(response) variable Flight.Status indicates whether the flight was delayed. It has two classes (delayed
and ontime).
Read the file into a dataframe df selecting the columns DAY_WEEK, CRS_DEP_TIME, ORIGIN, DEST,
CARRIER and Flight.Status only. The variable CRS_DEP_TIME is the scheduled departure time by the
airline.
Transform all variables to factors, then use df$CRS_DEP_TIME = factor(floor(df$CRS_DEP_TIME/100))
to create hourly intervals for the departure times.
a) (10 pts.) Find the number of delayed flights from each ORIGIN airport to each DEST airport. Show
the results in a 3-by-3 table.
b) (20 pts.) Use set.seed(1) to split the data into a train (80%) and a test set (20%). Report the
number of flights for each day of the week in the train set.
c) (10 pts.) Use the train set to construct a Naive Bayes model. Report the output provided by this
model (A-priori probabilities and all Conditional Probabilities).
d) (20 pts.) For the test set, show the confusion matrix, then find the overall test accuracy rate.
e) (20 pts.) It is of interest to know if a new Delta flight from DCA to LGA, scheduled to be de-
parting between 10 a.m. and 11 a.m. on a Sunday (DAY_WEEK = "7"), be ontime or delayed. Use
CRS_DEP_TIME = "10". What is your prediction?
f) (20 pts.) What is the posterior probability that this flight will be ontime?.
Submit your report (code and output) as a pdf file onto Blackboard (no screen captures). Report must
include the student name and USC ID.
essay、essay代写