Machine Learning - Made Easy To Understand
If you are looking for a book to help you understand how the machine learning algorithms "Random Forest" and "Decision Trees" work behind the scenes, then this is a good book for you. Those two algorithms are commonly used in a variety of applications including big data analysis for industry and data analysis competitions like you would find on Kaggle.
This book explains how Decision Trees work and how they can be combined into a Random Forest to reduce many of the common problems with decision trees, such as overfitting the training data.
Several Dozen Visual Examples
Equations are great for really understanding every last detail of an algorithm. But to get a basic idea of how something works, in a way that will stick with you 6 months later, nothing beats pictures. This book contains several dozen images which detail things such as how a decision tree picks what splits it will make, how a decision tree can over fit its data, and how multiple decision trees can be combined to form a random forest.
This Is Not A Textbook
Most books, and other information on machine learning, that I have seen fall into one of two categories, they are either textbooks that explain an algorithm in a way similar to "And then the algorithm optimizes this loss function" or they focus entirely on how to set up code to use the algorithm and how to tune the parameters.
This book takes a different approach that is based on providing simple examples of how Decision Trees and Random Forests work, and building on those examples step by step to encompass the more complicated parts of the algorithms. The actual equations behind decision trees and random forests get explained by breaking them down and showing what each part of the equation does, and how it affects the examples in question.
Python Files & Excel File For Many Of The Examples Shown In The Book
Some topics in machine learning don't lend themselves to equations in an Excel table. Things like error checking or complicated conditionals are hard to replicate outside of code. However some topics work quite well in a spreadsheet. Topics such as entropy and information gain, which is how a decision tree picks its splits, can be easily calculated in a spreadsheet. The spreadsheet used to generate many of the examples in this book is available for free download, as are all of the Python scripts that ran the Random Forests & Decision Trees in this book and generated many of the plots and images.
If you are someone who learns by playing with the code, and editing the data or equations to see what changes, then use those resources along with the book for a deeper understanding.
Topics Covered
The topics covered in this book are
- An overview of decision trees and random forests
- A manual example of how a human would classify a dataset, compared to how a decision tree would work
- How a decision tree works, and why it is prone to overfitting
- How decision trees get combined to form a random forest
- How to use that random forest to classify data and make predictions
- How to determine how many trees to use in a random forest
- Just where does the "randomness" come from
- Out of Bag Errors & Cross Validation - how good of a fit did the machine learning algorithm make?
- Gini Criteria & Entropy Criteria - how to tell which split on a decision tree is best among many possible choices
- And More
If you want to know more about how these machine learning algorithms work, but don't need to reinvent them, this is a good book for you