Quality data makes quality models - The Presidential Election Use Case

The era of big data gave us high quantity data, but that is meaningless if you have low quality data. Like other forecast models we used polling data but we also used qualitative reports like The Cook Political Report and Sabato’s Crystal Ball. We used this to smooth out polling response bias, but there were larger issues at hand.

So what went wrong?

On November 1st, we did a simple “what if” alternative analysis. The premise was simple: “What if all the polls were wrong?” We looked at 471 state polls that were published after August 1st. If a state had even a single poll with a Trump lead, we gave that state to Trump. This election map may very well be the most accurate forecast in the United States. (Note - at the time of posting, the election was called for Trump but we were awaiting the finally tally from several states).

 

Simply put, data quality determines the quality of your insights. It is up to the pollsters and pundits to decide whether they were wrong about the polls or the voter turnout. It is up to the pundits to determine if reporting on the polling had any impact on voter turnout.

What we can say from experience is that objectively collected data on actions and behavior is far more accurate than first person self-reporting.

In the end, good data science on low quality data makes for a low quality output.