Avoid Poller-Coaster Whiplash by Separating the Signal From the Noise

Evince's Andy Hoagland created an ensemble model to predict the outcome of the U.S. Presidential election. His algorithm eliminates a lot of the volatility seen in other prominent election prognosticators, such as FiveThirtyEight and The Upshot. Here, he explains how, and why, he has been able to accomplish this feat.

Who is the best pollster? Poll aggregators figured out how to tackle this question years ago. Nate Silver put presidential forecasting on the map in 2008 and solidified making his name synonymous with election forecasting after his 51 out of 51 electoral college call in 2012. 


This presidential election season we have more aggregators using different approaches. Consider this, along with the advances since 2012 in cloud computing, data science, and machine learning. Naturally, we wanted to get a model out there, but there was debate over our approach.


One thing I learned from participating in several Kaggle data science competitions is that there is no silver bullet algorithm, although XGBoost is pretty close. Data prep, feature engineering, and feature selection are key. A single model put me in 18th place out of 2,257. The winner of that competition took their model one step (actually, several steps) forward.


When you look at data science competitions on Kaggle, the winners all seem to have one thing in common; ensemble models. Take your best model, then mix it with another that uses a slightly different methodology. For the past several years, each winning solution has used some variation of ensemble modeling. We decided to do the same. That is how we came up with the Aggregators Ensemble.

Screenshot from Andy Hoagland's Aggregators Ensemble model. Visuals by Mike Cisneros.

Screenshot from Andy Hoagland's Aggregators Ensemble model. Visuals by Mike Cisneros.


Not only did we create our own model to predict the electoral college outcome of each state (and D.C.), but we also included poll aggregators, prediction markets, and qualitative experts. Once we had our state probabilities, we ran over 20,000 simulations to see which candidate crossed the 270 win threshold to become the next President of the United States.


Check out the interactive for yourself: http://tabsoft.co/2dByh94