EECE5644: Assignment #3
Due on March 30, 2020
Machine Learning
Solution Manual
This solution manual is based on the submission of student Anja Deric. Thank you for agreeing on sharing your solution.
1
Solution Manual EECE5644 (Machine Learning ): Assignment #3 Problem 1
Problem 1
Generating Data
Before starting the problem, data sets were first generated to be used throughout the assignment. Four
different sets were generated: 3 training sets with 100, 500 and 1000 samples and labels, and 1 test set with
10000 samples and labels. Each data set consisted of 3-class multi-ring data. A sample distribution can be
seen in Figure 1 below.
Figure 1: Sample Data Distribution for 3-class Multi-Ring Data Set
Once generated, the same data sets were used throughout the entire assignment in order to stay consistent
and generate legitimate and accurate results.
Problem 1 continued on next page. . . 2
Solution Manual EECE5644 (Machine Learning ): Assignment #3 Problem 1 (continued)
Neural Network Model Training
For problem 1 of the exam, data sets were used to train a 2-layer neural network to approximate class label
posteriors. Each neural network trained for this problem consisted of 2 fully-connected layers of adaptive
weights, followed by a soft-max layer. For each trial, the number of perceptrons in the network was varied
between 1 and 6, with the activation function being varied between the sigmoid/logistic function and a soft-
ReLu function. By testing each combination of perceptrons and activation functions, the most successful
combination could be established in order to achive best possible results. Performance in this assignment
was simply defined as the proportion of correct classifications made by the model.
The overall process used to train the networks was as follows:
1. Split training set for 10-fold cross validation using the cvpartition() function in Matlab (this function
automatically makes 10 training sets, each with their own subset of testing points)
2. For each of the 10 smaller training sets, train the neural network model for 1-6 perceptrons, and both
activation functions (each smaller training set is thus used to train 12 different models)
3. Test and record the model performance using the test points of the smaller training sets
4. Compare performance across all 10 test sets to determine the best combination of perceptrons and
activation functions.
5. Once the best combination is found, train the full original training set (with 100, 500, and 1000 samples)
for that particular combination 10 times so as to minimize the chance of getting trapped in a local
minimum.
6. Record the performance of each new model, as well as the parameters for each trial.
7. Across the 10 models, find the best performing model and use those parameters to test the original
10,000 sample training set on the final model.
8. Record the final performance on the 10,000 sample set for later comparison.
The procedure described above was repeated for all 3 training sets (100, 500, and 1000 samples). Matlab
code for this portion of the assignment can be found in Appendix B.
Based on the results collected, the 100, 500, and 1000 sample training sets led to a performance level of
79.50%. 83.87%, and 88.67%, respectively. Figure 2 shows this performance summary in Matlab.
Figure 2: Neural Network Performance Summary
Additionally, for all 3 data sets, the sigmoid function outperformed the soft-ReLu activation function. In
terms of perceptrons, the models with the best performance had 6, 5, and 6 perceptrons for 100, 500, and
1000 samples, respectively.
As expected, the performance of each model improved with an increasing number of samples, since more
data was available for the model to use towards finding the optimal parameters. Although the overall per-
formance of most models was good (achieved >85%), it was surprising that no model reached the 90% mark.
In the next section, the EM (GMM) model will be discussed, which in my implementation achieved >95%
accuracy. Thus it is possible that Matlab’s fminsearch function did limit the performance of the models.
Problem 1 continued on next page. . . 3
Solution Manual EECE5644 (Machine Learning ): Assignment #3 Problem 1 (continued)
Fminsearch not only limits the number of iterations (which can be increased, but is still limited), but it is
also a local minimizer- meaning that it is not difficult to get stuck in a local minimum. My implementation
for this problem attempted to avoid that by re-initializing parameters every time and training even the
final model 10 times. However, this did not seem to make a significand difference in performance. Another
possibility is that stochastic gradient descent might’ve given better results for this particular assignment,
but my implementation did not use that particular algorithm.
4
Solution Manual EECE5644 (Machine Learning ): Assignment #3 Problem 2
Problem 2
For the second problem on this exam, the same data sets from problem 1 were used to train Gaussian Mixture
Models (GMMs) instead. The overall training and validation procedure for this problem was as follows:
1. Split each training set into 3 sets based on original class labels
2. Estimate class priors based on the number of samples in each set
3. Run the individual class data through the EM algorithm 100 times, recording which order model is
picked each time
4. For the EM algortihm itself, use Matlab’s built in fitgmdist() function with 5000 iterations, 10 repreti-
tions per data set, and 6 models (mixture of 1-6 Gaussians). For each trial, use 10-fold cross-validation
to split the full data set into 10 smaller sets- all of which are used to train the model. Use log-likelihood
to pick model order for each trial.
5. After the best model order is picked for all 3 classes and all 100 experiments, pick the model that was
most frequently selected for each class.
6. Based on the model order, fit a GMM distribution to each class and record it’s negative log-likelihood.
Repeat this step 10 times for each class in order to find best possible model.
7. After generating 10 models for each class, pick the model with the minimum negative log-likelihood as
the best Gaussian Mixture Model for that particular class.
8. To test the model, use the 10,000 sample test set and decide on class labels for each data point by
fitting the GMM model to the data and multiplying it with the class prior estimates from step 2.
9. Calculate performance by counting the number of correct classifications.
As with the previous problem, this procedure was repeated for all 3 training sets, and validated on the 10,000
sample test set. Figure 3 below summarizes results for problem 2.
Figure 3: Gaussian Mixture Models Performance Summary
Problem 2 continued on next page. . . 5
Solution Manual EECE5644 (Machine Learning ): Assignment #3 Problem 2 (continued)
As can be observed with these results, the more samples were used for training, the better the performance
of the model was. This is the same conclusion derived from the previous problem. The results here show
that the with the lower number of samples, the algorithm tended to choose 6-component GMMs for 2 of
the classes, most likely due to individual points being far apart. The more samples were added, the lower
this number got. For 1000 samples, the GMMs selected were 3 and 4-component ones. Comparing this
to the results fron the previous problem, GMMs in this case have far better performance, with 500- and
1000-sample GMMs achieving > 95% accuracy. As explained in the previous problem, this result can most
likely be attributed to the neural network algorithm not performing to its full capacity, which would make
the two models more competitive.
6