1st Assignment COMPX523-22A -
APPENDIX
Heitor Murilo Gomes, Albert Bifet
March 2022
1
1 kNN with per class windowing (kNN-cw)
1.1 MOA tips
In MOA, look for the kNN implementation available at kNN.java. One sugges-
tion is that you learn how the standard kNN works before making any changes.
Important things to look at are how the closest neighbours happen, how the
instances are stored in the window, how the window is initialized. For this
assignment, you do not need to worry about the Regression part of the code.
In general, you will need to modify the following methods and attributes.
If you decide to create more methods or modify another method, that is alright
as well.
protected Instances window;
public void setModelContext(InstancesHeader context)
public void trainOnInstanceImpl(Instance inst)
public double[] getVotesForInstance(Instance inst)
You can simply create an array of windows based on the amount of class
labels. The method setContext is invoked prior to starting the learning process,
so you can initialize the windows in it. Notice that the context object has a
method that tells you the number of class labels, as shown below.
public void setModelContext(InstancesHeader context) {
int numberOfClasses = context.numClasses();
}
To get started extending MOA code you can follow this tutorial Building
MOA from the source.
1.1.1 River tips
In river, look for the kNN implementation in knn classifier.py.
You must implement the data window maintenance strategy to keep it bal-
anced. The existing implementation uses a circular buffer to keep the most
recent data1. You can use the following template with the minimum elements2
for compatibility:
class KNeighborsBalancedBuffer:
def __init__(self,
window_size: int = 1000,
classes = None):
# TODO
1For reference, it is implemented in neighbors.base_neighbors.KNeighborsBuffer.
2This is only for reference, the task might need more (or less) methods than the ones listed here.
2
def reset(self):
# TODO
return self
def append(self, x: np.ndarray, y: base.typing.Target):
# TODO
return self
def features_buffer(self, class_idx) -> np.ndarray:
# TODO
# Corner case: The number of rows to return is
# smaller than window_size before the buffer is full.
# This property must return the features as an
# np.ndarray since it will be used to search for
# neighbors via a KDTree
In order to easily test your implementation without the need to reinstall the
framework, you can extend the existing implementation as follows:
class BalancedKNNClassifier(KNNClassifier):
def __init__(self,
n_neighbors=5,
max_window_size=1000,
leaf_size=30,
metric='euclidean',
classes=None)
super().__init__(n_neighbors=n_neighbors,
max_window_size=max_window_size,
leaf_size=leaf_size,
metric=metric)
self.data_window = KNeighborsBalancedBuffer(
window_size=window_size, classes=classes)
def predict_proba_one(self, x):
# TODO
This way BalancedKNNClassifier will inherit all functionality from KNNClas-
sifier and you only need to focus on the methods that need to be overridden
for the task at hand.
3
2 Evaluation and Analysis
2.0.1 MOA tips
Use EvaluateTestThenTrain, make sure you have the Evaluator set as: “Bas-
icClassificationPerformanceEvaluator -o -r”, otherwise you won’t obtain the
recall for each class on your output. A sample command that you can copy and
paste in MOA GUI can be found below.
EvaluatePrequential -l trees.HoeffdingTree
-s (ArffFileStream -f GMSC_nm.arff)
-e (BasicClassificationPerformanceEvaluator -o -r)
-i 150000 -f 100
Notice that “-f 100” does not influence the end result, it is just how often
the intermediary results are printed. You can also set “-f 150000” so that you
only get one row of results (the last one). If you are running it on the command
line, you might use something like this:
java -Xmx4g -Xss10M -cp moa.jar moa.DoTask
"EvaluatePrequential -l trees.HoeffdingTree
-s (ArffFileStream -f GMSC_nm.arff)
-e (BasicClassificationPerformanceEvaluator -o -r)
-i 150000 -f 100"
2.0.2 River tips
The following code snippet shows how to setup the evaluation with the exist-
ing KNNClassifier
from river.neighbors import KNNClassifier
from river.evaluate import progressive_val_score
from river.metrics import Accuracy
from river.stream import iter_pandas
import pandas as pd
df = pd.read_csv('GMSC_nm.csv')
stream = iter_pandas(X=df.iloc[:, :-1], y=df.iloc[:,-1])
knn = KNNClassifier()
metric = Accuracy()
progressive_val_score(dataset=stream,
model=knn,
4
metric=metric,
show_memory=True,
show_time=True,
print_every=10000)
Per-class recall is not directly available in river. However it is easy to calcu-
late it from the confusion matrix available in the metric object as follows:
cm = metric.cm
print("Recall per class")
for i in cm.classes:
recall = cm.data[i][i] / cm.sum_row[i] \
if cm.sum_row[i] != 0 else 'Ill-defined'
print("Class {}: {:.4f}".format(i, recall))
Note: The code above must be placed immediately after the evaluation.
5