Contents
The idea of kNearest neighbour classification very simple. First some training data has to be provided. Then, any new data point is assigned the same label as the majority of its k closest training data points. In the simplest case, this looks like this:
Store all training points
When a new data point is presented, compute the distance to all training points. As default, the euclidean distance is used, i.e.
Select the training samples with k smallest distances.
Return the label of the selected training samples that appeared most often.
If sample weights are given, the distance in the second step is modified:
This adjustment makes it more probable for a sample with a large weight to be in the set of nearest neighbours. Instead of a majority vote on the labels, it is also possible to pick a label at random (with respect to the distance to that point).
Only the base algorithm is implemented here. If a more advanced implementation is required, have a loot at the scikitlearn package.
Bases: ailib.fitting.model.Model
kNearest Neighbour matching.
A data point x is assigned to the same class as the most of its k nearest neighbours.
if weights are given, the distance will be computed as
>>> 1.0/(N*weight[i]) * dist(data[i],x)
There are three evaluation types:
The distance measurements
are made available. Change the distance function by assignment to obj.dist For the lower ones, the data should be passed as list or a scipy.matrix. The default is distSqNorm.
Parameters: 


Mixin for sampling with uniform probabilities.
Samples the label from all k nearest neighbours.
All neighbours are equally likely to be drawn.
Mixin for sampling with respect to distances.
Samples the label from all k nearest neighbours with respect to their distance.
The sampling weights are inverse proportional to the distance:
>>> w[i] = 1.0/(dist(data[i],x))
with weights enabled
>>> w[i] = N * weights[i] / dist(data[i], x)
Return the label of the most frequent class of the k nearest neighbours of x.
If there are two classes with the same number of occurrences in the kneighbourhood of x, one of them is chosen at random (i.e. ties are broken randomly).