The term model is used in a very general manner. A model usually is a mathematical (often a statistical) tool that can be adjusted to some given data points. Once set up (trained), it can be evaluated at arbitrary input values. There are roughly two types of models:

- Classification
- Regression

In classification problems an input point has to be assigned to a cluster. If the clusters are labeled, the collection of all clusters defines a codebook, so that the problem can also be viewed as finding the optimal code for each input value. Loosely speaking regression means curve fitting, i.e. adjust parameters of a mathematical function such that it optimally represents the training data. Of course, it has to be defined what optimal means. There is no general answer to this question, as it depends on the fitting target (the model). Often, the residual sum of squares is used, i.e. the sum of squared deviation between measurement and prediction from the model.

To be clear, the model only specifies how training data is explained. For learning problems, the model is incomplete (some information is missing, e.g. parameters) and there’s need for a learning algorithm that determines this information. The model learning algorithms strongly depend on the input their given. Basically, there are two situations:

- Supervised learning
- Unsupervised learning

For supervised learning, the input values are measured together with the target output values. Both are presented to the fitting algorithm, so that it can train the model to fit the targets optimally. However, it’s not always the case that targets are available. If there are only input but no output values, it’s an unsupervised learning problem. This implies that it’s not possible to use an error measurement for training or evaluation. So unsupervised learning algorithms try to find some (hidden) structure in the input data. A typical example is finding the location where the density of input points is maximal.

There are three kinds of information:

- Features
- Labels
- Weights

A feature is the same as one input value. It may be a single value, however the inputs are often multidimensional, so in general the features are vectors (lists). The labels are the target values to the respective input. In most cases, the labels are unidimensional. However, the concrete data format (of input and output values) strongly depends on the model and thus is specified by the learning algorithm. Some algorithms also allow weighted samples. The weights are usually real numbered values, passed as a list to the learning algorithm.

Usually, the training data is passed to the `fit` method seperately through
the respective parameters. For unsupervised learning, the *labels* argument will be ignored.
Internally, there are several ways to represent training data. For details, see *Data conversion*.

In this library, the model and learning algorithm are combined in a single class. The class
`Model` defines the interface for all learning algorithms (supervised or unsupervised).

classailib.fitting.Model¶Interface for fitting algorithm.

err((x,y))¶- Return the distance between the target
yand the model prediction at the pointx.

eval(x)¶- Evaluate the model at data point
x.

fit(features,labels=None,weights=None)¶Fit the model to the training data.

Parameters:

features– Training samples.labels– Target values.weights([float]) – Sample weights.Returns: self

Many learning algorithms use several models to make the prediction more robust. For this, a very
general interface is defined. The class `Committee` allows to collect models and it
provides different mixins for evaluation and error measurement.

classailib.fitting.Committee¶Bases:

ailib.fitting.model.ModelInterface for fitting algorithms that base on an ensemble of several models.

Typically, the committee consists of several — possibly different — submodels. All submodels are trained first, where the manner in which this happens is not known here (thus the

fitmethod is not overwritten). For prediction, each trained submodel is evaluated. The final result is then based on the individual predictions. How the predictions are assembled depends on the problem type.The mixin classes provide common evaluation and error functions. Use like so:

>>> class foo(Committee.Sampling, Committee): pass

classClassification¶Mixin for classification models.

err((x,y))¶0-1 loss functionfor classification.

eval(x)¶- Return the most frequently predicted class of
x.

classCommittee.Regression¶Mixin for regression models.

err((x,y))¶- Squared residual.

eval(x)¶- Return the average model prediction at
x.

classCommittee.Sampling¶Mixin for classification models. The class label is sampled from all predicted labels.

err((x,y))¶0-1 loss functionfor classification.

eval(x)¶- Return the class of
x, sampled from the predictions of the individual models.

Committee.addModel(m)¶- Add a model
mto the committee.

**Supervised learning**

- 4.3.1. Linear Regression
- 4.3.2. Non-Linear least squares
- 4.3.3. k-Nearest Neighbour
- 4.3.4. Bagging
- 4.3.5. Boosting
- 4.3.6. Trees and Random Forests

**Unsupervised learning**