Make better models with ensemble techniques and show them to the world with Shiny.
In our August meetup, we were happy to welcome Valeria Fonseca Diaz to Amsterdam to tell us all about ensemble machine learning methods. Thanks goes as well to our sponsor Deloitte for hosting us and providing great food and drinks.
Ensemble methods are powerful machine learning methods that enhance the predictive power by combining the (different) predictions of multiple models. “Why choose, if you can combine them all!”
During her talk, Valeria explained the four main methods of ensemble techniques in machine learning: voting, bagging, boosting and stacking.
Depending on the type of model, regression or classification, voting is generally:
Bagging stands for Bootstrap aggregating. It works well with large datasets, as it basically resamples smaller subsets of a large dataset, which each might have different characteristics. You estimate a model for each subset and than combine outcome of all models. The most well known bagging technique is a random forest.
Boosting can be viewed as a type of gradient descent. Each iteration of the model learning process, a new model is trained based on the errors of the previous model. The most popular algorithm is ADABoost.
Stacking can combine different types of models together, only depending on your creativity and insight. Often a new model is trained with the output of (multiple) previous model(s).
Examples:
Have a look at the meetup material in our github repository to learn more on how to use ensemble methods in R with the caret package and then show them in your own Shiny app.
If you’re new to Shiny, check out our previous workshop on an intro to R Shiny or get inspired in the R Shiny gallery.
Further reading: