Contents
In maximum likelihood methods, the Bayesian and Akaike information criteria give a measurement
Let be the likelihood of the most probable model parameters with respect to the training data . Furthermore, let the model have free parameters that are estimated and let there be training samples. The two information criteria are defined in the following way:
Bayesian information criterion
Akaike information criterion
From this definition, it can be seen that the criteria add a complexity penalty to the parameter likelihood. The reason behind this is the following: it’s assumed that the higher the number of parameters, the better any set of training data can be approximated by the model. For example, consider polynomial fitting. The number of free parameters corresponds to the order of the fitted polynomial. Given a degree equal to the number of training samples, the polynomial can be fitted perfectly. But as it will be a curve through all given data pairs, the generalization power will be poor. The goal is to find a model with reasonable number of parameters but at the same time a low training error (i.e. a high maximum likelihood).
If the error is normal i.i.d with variance , the criteria can be stated as:
Instead of the Akaike information criterion, one should rather empoly the corrected AIC:
As gets large, the two criteria are identical. If is relatively small, the original AIC may suggest models with a larger number of parameters, thus is more likely to be subject of overfitting.
The crossvalidation method tries to estimate the generalization error of a model, given some training data. For fold crossvalidation, the samples are split into subsets of equal size. Then, subsets are used as training set and the remaining subset as testing set:
Split the data into subsets.
For
 Fit the model using all subsets except the .
 Compute the testing error on the subset.
The estimated generalization error is the average of the errors of the testing sets.
If , each sample is once used as testing set (consisting of only this sample). This special case is called Leaveoneout crossvalidation.
Given a set of training data that is noisy and contains outliers, i.e. erroneous data. The aim of the RANSAC algorithm is to determine which data points are outliers and compute the model without those.
Given the minimal number of points required to fit the model and a threshold, the algorithm does the following:
Iterate until convergence
 Out of all data points, select points randomly (the root set).
 Fit the model to these.
 Compute the prediction error for all data points.
 Add all points for which the error is below a threshold to the consensus set of this iteration.
Fit the model using all points in the largest consensus set.
As convergence criterion, the probability that at least one outlier free root set was chosen can be tracked. Let be the number of iterations and the ratio of inliers and outliers.
Because is not known beforehand, it is estimated by . The algorithm iterates, until exceeds some value (usually close to one, e.g. ).
Parameters: 


Returns:  Crossvalidation error. 
Parameters: 


Parameters: 


Returns:  model instance, fitted to the optimal consensus set. 