k-Nearest Neighbour ******************* .. contents:: .. module:: ailib.fitting The idea of k-Nearest neighbour classification very simple. First some training data has to be provided. Then, any new data point is assigned the same label as the majority of its k closest training data points. In the simplest case, this looks like this: 1. Store all training points 2. When a new data point is presented, compute the distance to all training points. As default, the euclidean distance is used, i.e. .. math:: d_i(\nu) = \| \nu - x_i \|^2 3. Select the training samples with k smallest distances. 4. Return the label of the selected training samples that appeared most often. If sample weights are given, the distance in the second step is modified: .. math:: d_i(\nu) = \frac{1}{N w_i} \| \nu - x_i \|^2 This adjustment makes it more probable for a sample with a large weight to be in the set of nearest neighbours. Instead of a majority vote on the labels, it is also possible to pick a label at random (with respect to the distance to that point). Only the base algorithm is implemented here. If a more advanced implementation is required, have a loot at the `scikit-learn `_ package. Interfaces ========== .. autoclass:: kNN :members: :show-inheritance: Examples ========