x

Random forest is ensemble of decision trees characterized by randomness introduced inlearning process. As it can be seen from previous examples, randomness can be introduced intraining data, feature space, target variables and splitting process within nodes. Thoseapproaches are not mutually exclusive and very often are combined.Definition: A random forest is a classifier consisting of a collection of tree structuredclassifiers of tree-structured classifiers {h(x, ? k ), k = 1, … , } where {? k } are independentidentically distributed random vectors and each tree casts a unit vote for the most popularclass at input x.The main motivation for development of ensemble tree learning model is over fitting problemthat appears when only one decision tree is used. It can be shown that for large enoughnumber of trees in the forest it does not over fit. If we consider set of classifiers {h k } trainedwith dataset(X, Y)randomly sampled, generalization error is defined as :Err g = P (X,Y) (m(X, Y) < 0) (2.15)m(X, Y) = Avg k (I(h k (X) = Y)) ? max j?Y Avg k (I(h k (X) = j)) (2.16)13where m(X, Y) is margin, that is equal to difference between average number of votes forcorrect class and average of incorrect votes for most likely incorrectly chosen class. Thelarger the margin is, the classification is more accurate. It can be shown that due to treestructure and Strong Low of Large Numbers, as number of trees increases in ensemble,generalization error converges to:Err g = P (X,Y) (P ? (h(X, ?) = Y) ? max j?Y P ? (h(X, ?) = j) < 0)(2.17)Generalization error of the tree depends on the strength of each individual tree and correlationbetween them. Strength of the forest composed of set of classifiers {h(X, ?)} is defined as:s = E X,Y (m(X, Y)) (2.18)m(X, Y) = P ? (h(X, ?) = Y) ? max j?Y P ? (h(X, ?) = j) (2.19)Where m(X, Y) is margin function:Raw margin function is given as:rm(?, X, Y) = I(h(X, ?) = Y) ? max j?Y P ? (h(X, ?) = j)(2.20)And margin function represents expectation of raw margin function over?. If we have twoindependent random variables ?and ? ? , with same distribution and we denote with ?? meanvalue of correlation between them, upper bound of generalization error is given as: