I have implemented several predictive models using Random Forests. Here are my thoughts on why they work well.
Why are Random Forests so effective?
Ensemble Method A random forest is an ensemble of many decision trees whose individual predictions are combined (via majority vote) to improve overall classification accuracy.
Foundation on Decision Trees Decision trees, the building blocks of a random forest, classify observations by sequentially splitting data based on feature values.
Bagging (Bootstrap Aggregation) Each tree is trained on a bootstrap sample—random samples taken with replacement from the original dataset—which introduces variability among trees.
Feature Randomness Trees consider only a random subset of features at each split rather than all available features. This technique fosters tree diversity and minimizes correlation among their predictions.
Wisdom of Crowds The collective voting of many uncorrelated trees leads to more accurate and robust predictions, as errors from individual trees tend to cancel out.
Risk Reduction Analogy The article uses a gambling analogy to show that breaking a decision into multiple independent “plays” (or trees) leads to more stable and reliable outcomes, even if each play has the same expected value.
Prerequisites for Success The features must have genuine predictive power for random forests to be effective, and the individual trees’ errors must be uncorrelated. These factors are crucial for the ensemble to outperform any single decision tree.
Feature Importances
Interpretability is Crucial Beyond high predictive accuracy, knowing which features drive model decisions is essential for trust, actionable insights, and regulatory compliance in many business applications.
Multiple Methods Provide Complementary Insights:
Default Feature Importances: Quick and straightforward but can be biased toward features with more unique values.
Permutation Importance: This measure offers a model-agnostic approach by assessing the impact on performance when a feature’s values are shuffled. However, it may overestimate the importance of correlated predictors.
Drop Column Importance: Retraining the model without a feature accurately measures its impact, even highlighting features that might be detrimental (negative importance).
Feature importances can be taken from Scikit-learn and Spark MLLib implementations after training.
Visual explanation for each prediction
Need for Interpretability Analysts and data scientists often require clear explanations for model decisions, whether to justify a flagged fraudulent transaction or to understand shifts in model behavior over time.
Random Forests as Black Boxes While random forests are powerful, their ensemble of many deep decision trees makes them difficult to interpret using traditional methods. Conventional feature importance metrics (like permutation importance or impurity reduction) offer only a static, overall view.
Decision Path Decomposition Tracing the path from the root to the leaf in each decision tree allows one to express each prediction as the sum of a bias (the initial value at the root) and feature contributions accumulated along the path. This method provides a dynamic explanation for individual predictions.
Mathematical Insight The prediction for a decision tree is conceptualized as follows: 𝑓(𝑥) = 𝑐₍full₎ + ∑ₖ₌₁ᴷ 𝑐₍contrib₎(𝑥,𝑘) The bias and contributions across all trees are averaged for a random forest.
Practical Application Tools such as the treeinterpreter library make it feasible to apply these interpretability techniques to real-world models built with scikit-learn, thereby transforming a black-box model into one that provides actionable insights.
This interpretability framework aids in the following:
Explain to stakeholders why a specific prediction was made.
Debug unexpected model behavior by pinpointing influential features.
Compare model behavior across different datasets by analyzing changes in feature contributions.
References
Understanding Random Forest (Tony Yiu - Towards Data Science)
How the Algorithm Works and Why it Is So Effective
treeinterpreter (ando - Python Package Index)
Package for interpreting scikit-learn's decision tree and random forest predictions.
Explaining Feature Importance by example of a Random Forest (Eryk Lewinson - Towards Data Science)
Learn the most popular methods of determining feature importance in Python.