Regression Trees

Regression trees are supervised learning methods that address multiple regression problems. They provide a tree-based approximation \(\hat\) , of an unknown regression function Y = f(x) +ɛ with \(Y \in \mathfrak\) and ɛ ≈ N(0, σ 2 ), based on a given sample of data \(D = \<\langle x_^,\cdots \,,x_^

,y_\rangle \>_^\) . The obtained models consist of a hierarchy of logical tests on the values of any of the p predictor variables. The terminal nodes of these trees, known as the leaves, contain the numerical predictions of the model for the target variable Y.

Motivation and Background

Work on regression trees goes back to the AID system by Morgan and Sonquist (1963). Nonetheless, the seminal work is the book Classification and Regression Trees by Breiman and colleagues (1984). This book has established several standards in many theoretical aspects of tree-based.