程序代写案例-IE 7275
时间:2022-05-06
IE 7275 Spring 2022 Data Mining in Engineering - Final Exam


1. a) In the following panda’s dataset “dfCSGrade” should have 10 years of computer
science course details. You have to return the top 3 easy and hard courses (Hint: highest
average passed rate will consider as easy and lowest passed rate will consider as hard) from
the computer science department. (8 points)

Semester Course AveragePassedRate
Winter 2022 CS401 82.5
Winter 2022 CS300 85.3
Winter 2022 CS250 78
Summer 2022 CS401 77.1
Summer 2022 CS300 90.3
Summer 2022 CS250 89.1
Fall 2022 CS401 85
Fall 2022 CS300 70.2
Fall 2022 CS250 76.4

b) When do we use box-cox transformation? (2 points)
2. a) Shall we use Principal component analysis for feature selection? (2.5 points)
b) How scaling and centering is important before performing PCA? (2.5 points)
c) How do you interpret the coefficients from the PCA components? (2.5 points)
d) How do you identify the number of components in PCA? (2.5 points)

3. a) Would it be better if an ML algorithm exhibits a greater amount of Bias or a greater
amount of Variance? (5 points)
b) How do you identify a High Variance model? (5 points)

4. a) Which metric value increases in Multilinear regression, when independent variable
is significant and affects dependent variable? (2.5 points)
b) When you add more predictors model performance will improve your linear
regression models? (2.5 points)
c) How do you consider variables one by one and building the model by checking the
significance value & R square? (Hint: Model selection method). (2.5 points)
d) How do you reduce the number of features and computational complexity of the
model in Linear regression? (2.5 points)

5.
a) How do identify the value of "K" in KNN algorithm? (2.5 points)

b) Why the value of “K” is odd in KNN algorithm? (2.5 points)

c) Why do we call KNN is Lazy Learner? (2.5 points)

d) Do you recommend KNN algorithm for large datasets? Why? (2.5 points)

6.
a) How Naive Bayes handle imbalanced categorical data sets? (5 points)
b) Do you recommend Naïve bayes for classification problem? (5 points)

7.
a) How do you handle overfitting issues in decision tree (2.5 points)
b) How missing value or outlier are managed in decision tree models (2.5 points)
c) How Information gain helps to decide the parent node and further node split? (2.5
points)
d) How do you select important features through decision tree? (2.5 points)

8.
a) Here is the logit regression model output, how do you interpret the model
output and performance metrics (5 points)
Logit Regression Results
==============================================================================
Dep. Variable: OWNRENT No. Observations: 72
Model: Logit Df Residuals: 69
Method: MLE Df Model: 2
Date: Pseudo R-squ.: 0.2561
Time: Log-Likelihood: -35.434
converged: True LL-Null: -47.633
LLR p-value: 5.039e-06
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
intercept -6.0978 1.570 -3.885 0.000 -9.174 -3.021
AGE 0.1056 0.046 2.300 0.021 0.016 0.196
INCOME 0.6411 0.246 2.605 0.009 0.159 1.123
==============================================================================

b) Why do we called Logistic Regression called Regression and do not use for
regression? (2.5 points)
c) When the dataset has more outliers, shall we use Logistic for this dataset? (2.5
points)

9.
a) How do Neural Network model get the optimal Weights and Bias values? (5 points)
b) How do you avoid overfitting an ANN?? (2.5 points)
c) Why do we normalize the dataset for neural network models? (2.5 points)

10.
a) What do you mean by Regularization? How it works? List down a few Regularization
methods you know. (2.5 points)
b) What is the supervising regression, classification method you can use regularization?
(2.5 points)
c) Shall we use regularization on PCA? (2.5 points)


essay、essay代写