How to grow a Decision Tree
source : [3](3.html)
LearnUnprunedTree(X,Y)
Input: X a matrix of R rows and M columns where X{}{}{~}ij{~} =
the value of the j‘th attribute in the i‘th input datapoint. Each
column consists of either all real values or all categorical values.
Input: Y a vector of R elements, where Y{}{}{~}i{~} = the output
class of the i‘th datapoint. The Y{}{}{~}i{~} values are categorical.
Output: An Unpruned decision tree
If all records in X have identical values in all their attributes (this
includes the case where R<2), return a Leaf Node predicting the majority
output, breaking ties randomly. This case also includes
If all values in Y are the same, return a Leaf Node predicting this value
as the output
Else
select m variables at random out of the M variables
For j = 1 .. m
If j‘th attribute is
categorical
IG{}{}{~}j{~} = IG(Y|X{}{}{~}j{~}) (see Information
Gain)
Else (j‘th attribute is
real-valued)
IG{}{}{~}j{~} = IG(Y|X{}{}{~}j{~}) (see Information Gain)
Let *j* = argmax{~}j~ IG{}{}{~}j{~} (this is the
splitting attribute we’ll use)
If j* is categorical then
For each value v of the j‘th
attribute
Let
X{}{}{^}v{^} = subset of rows of X in which X{}{}{~}ij{~} = v.
Let Y{}{}{^}v{^} = corresponding subset of Y
Let Child{}{}{^}v{^} =
LearnUnprunedTree(X{}{}{^}v{^},Y{}{}{^}v{^})
Return a decision tree node,
splitting on j‘th attribute. The number of children equals the number of
values of the j‘th attribute, and the v‘th child is
Child{}{}{^}v{^}
Else j* is real-valued and let t be the best split
threshold
Let X{}{}{^}LO{^} = subset
of rows of X in which X{}{}{~}ij{~} <= t. Let Y{}{}{^}LO{^} =
corresponding subset of Y
Let Child{}{}{^}LO{^} =
LearnUnprunedTree(X{}{}{^}LO{^},Y{}{}{^}LO{^})
Let X{}{}{^}HI{^} = subset of rows of X
in which X{}{}{~}ij{~} > t. Let Y{}{}{^}HI{^} = corresponding
subset of Y
Let Child{}{}{^}HI{^} =
LearnUnprunedTree(X{}{}{^}HI{^},Y{}{}{^}HI{^})
Return a decision tree node, splitting on
j‘th attribute. It has two children corresponding to whether the j‘th
attribute is above or below the given threshold.
Note: There are alternatives to Information Gain for splitting nodes
source : [3](3.html)
- h4. nominal attributes
suppose X can have one of m values V{~}1~,V{~}2~,…,V{~}m~
P(X=V{~}1~)=p{~}1~, P(X=V{~}2~)=p{~}2~,…,P(X=V{~}m~)=p{~}m~
H(X)= -sum{~}j=1{~}{^}m^ p{~}j~ log{~}2~ p{~}j~ (The entropy of X)
H(Y|X=v) = the entropy of Y among only those records in which X has value
v
H(Y|X) = sum{~}j~ p{~}j~ H(Y|X=v{~}j~)
IG(Y|X) = H(Y) - H(Y|X)
- h4. real-valued attributes
suppose X is real valued
define IG(Y|X:t) as H(Y) - H(Y|X:t)
define H(Y|X:t) = H(Y|X<t) P(X<t) + H(Y|X>=t) P(X>=t)
define IG*(Y|X) = max{~}t~ IG(Y|X:t)
How to grow a Random Forest
source : [1](1.html)
Each tree is grown as follows:
- if the number of cases in the training set is N, sample N cases at
random -but with replacement, from the original data. This sample will be
the training set for the growing tree.
- if there are M input variables, a number m « M is specified such
that at each node, m variables are selected at random out of the M and
the best split on these m is used to split the node. The value of m is
held constant during the forest growing.
- each tree is grown to its large extent possible. There is no pruning.
Random Forest parameters
source : [2](2.html)
Random Forests are easy to use, the only 2 parameters a user of the
technique has to determine are the number of trees to be used and the
number of variables (m) to be randomly selected from the available set of
variables.
Breinman’s recommendations are to pick a large number of trees, as well as
the square root of the number of variables for m.
How to predict the label of a case
Classify(node,V)
Input: node from the decision tree, if node.attribute
= j then the split is done on the j‘th attribute
Input: V a vector of M columns where
V{}{}{~}j{~} = the value of the j‘th attribute.
Output: label of V
If node is a Leaf then
Return the value predicted
by node
Else
Let j =
node.attribute
If j is
categorical then
Let v = V{}{}{~}j{~}
Let child{}{}{^}v{^} = child node corresponding to the attribute’s
value v
Return Classify(child{}{}{^}v{^},V)
Else j is
real-valued
Let t = node.threshold (split threshold)
If Vj < t then
Let child{}{}{^}LO{^} = child
node corresponding to (<t)
Return
Classify(child{}{}{^}LO{^},V)
Else
Let child{}{}{^}HI{^} =
child node corresponding to (>=t)
Return
Classify(child{}{}{^}HI{^},V)
The out of bag (oob) error estimation
source : [1](1.html)
in random forests, there is no need for cross-validation or a separate test
set to get an unbiased estimate of the test set error. It is estimated
internally, during the run, as follows:
- each tree is constructed using a different bootstrap sample from the
original data. About one-third of the cases left of the bootstrap sample
and not used in the construction of the kth tree.
- put each case left out in the construction of the kth tree down the
kth{}tree to get a classification. In this way, a test set classification
is obtained for each case in about one-thrid of the trees. At the end of
the run, take j to be the class that got most of the the votes every time
case n was oob. The proportion of times that j is not equal to the
true class of n averaged over all cases is the oob error estimate. This
has proven to be unbiased in many tests.
Other RF uses
source : [1](1.html)
- variable importance
- gini importance
- proximities
- scaling
- prototypes
- missing values replacement for the training set
- missing values replacement for the test set
- detecting mislabeled cases
- detecting outliers
- detecting novelties
- unsupervised learning
- balancing prediction error
Please refer to [1](1.html)
for a detailed description
References
[1](1.html)
Random Forests - Classification Description
http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
[2](2.html)
B. Larivi�re & D. Van Den Poel, 2004. “Predicting Customer Retention
and Profitability by Using Random Forests and Regression Forests
Techniques,”
Working Papers of Faculty of
Economics and Business Administration, Ghent University, Belgium 04/282,
Ghent University,
Faculty of Economics and
Business Administration.
Available online : http://ideas.repec.org/p/rug/rugwps/04-282.html
[3](3.html)
Decision Trees - Andrew W. Moore[4]
http://www.cs.cmu.edu/~awm/tutorials[1](1.html)
[4](4.html)
Information Gain - Andrew W. Moore
http://www.cs.cmu.edu/~awm/tutorials