Contents
The tree growing algorithm can be sketched as follows:
Grow the tree
Given the data, determine the optimal dimension j and splitting point s.
where and are the labels of the (hypothetical) leaves.
- Grow two child trees on the respective data subsets.
- Stop, if the number of data points is below some threshold.
Prune the tree
- Collapse the internal node that produces the smallest per-node increase in the tree cost.
- Stop, if there’s only one node left (the root).
- Out of the produced sequence of trees, select the one that minimizes the total tree cost.
Stumps are single-node trees with the node labels fixed to . Also for stumps, the learning consists of finding the optimal splitting dimension and threshold . Because stumps are classifiers, the 0-1 loss function is used as error measurement. Thus, each stump has to solve the optimization problem:
Additionally to the training input, each training sample is assigned a weight that represents how much a misclassification of the sample contributes to the classifier cost.
Bases: ailib.fitting.model.Model
Decision tree skeleton.
Warning
Don’t use this class directly. Use RegressionTree or ClassificationTree instead.
Parameter: | leafThres (int) – Minimum number of data points per node. |
---|
Remove the data from the tree.
Only the data member is cleared. The node label is recomputed first.
Collapses the childs of the node.
The childs’ data is collected in this node and the label recomputed. The childs will be deleted. The node will become a leaf.
Grow a tree on training data.
Finds the optimal splitting parameters of the data and grows two child trees (if there are enough data points left).
Parameter: | data (STD) – Training data. |
---|
Prune the tree to find the tree with minimal cost.
The tree is pruned according to weakest link pruning until there’s only one node left. Then, the tree with minimal total cost is restored.
Parameter: | alpha (float) – Pruning parameter. Controls, how much the tree size influences its cost. |
---|
Bases: ailib.fitting.model.Model
Parameter: | labelValue (float) – Label of the class. |
---|
Train the stump on the presented data.
Finds the optimal dimension and splitting point to split the data in two subsets. The optimal parameters are found w.r.t. to the data weights. If no weights are given, a uniform distribution is assumed.
Parameters: |
|
---|