False Confessions: When Data Doesn't Tell a Story

In data visualization circles, “storytelling” is making a strong push to be the buzzword of 2014. The premise is that data analysis isn't just about gathering data, performing investigations, and reporting findings—it’s about having a story to tell.

Like a playwright or bard of old, your job now includes making sure there’s a narrative path for your intended audience to follow, and guiding them smoothly from point to point.

As a data visualizer, your final product is expected to accomplish this gracefully, with deft use of text and visuals in just the right proportions.  Then, and only then, have you completed your analysis. The message seems to be: if your final product isn’t telling a story, then there is more work to be done.

But what happens when there's no story to tell?

Clients often come to us with a finite data set and a specific question or set of questions. These questions are loosely of the format: based on the data we provide, or the data that exists in the world, do you see evidence of anything that should concern us?

So, in service of answering the customer's question, we:

  1. gather and clean an enormous—but finite—amount of data,
  2. make sure it is internally consistent, and
  3. extract whatever insight possible from the patterns, anomalies, and relationships we find.

Many times, after all this effort, we discover...well, nothing. In some cases there truly is not anything of concern to the client. But the allure and mystique of Big Data makes it more difficult for us to tell a customer "we didn't find anything." We're afraid that the customer will respond, "What kind of story is that? I thought Big Data was supposed to find all kinds of patterns and insights for us! You must not be looking hard enough."

So with that pressure to find something--anything--of note, there's a danger of getting caught in an appeasement trap: running multiple analyses, in permutation after permutation, including and excluding various parts of your dataset, over and over again.

Eventually, if you look hard enough, you'll come up with something to report that isn't "we didn't find anything." It may not be earth-shattering, or novel, or even especially relevant, but it's something. It may even cross the magical p-value threshold into Statistical Significance.

But that doesn't mean it's a genuine finding.

Just as common interrogation techniques, which use coercion and extreme circumstances to weaken the subject of the question, sometimes elicit false confessions from innocent people, so too can the extreme interrogation of your data elicit false confessions in your analysis.

Repeated slicing and dicing and rejiggering of parameters, in the quest for a significant result, can, paradoxically, bring the analysis farther away from the truth. Yes, you may find something statistically significant; but if it took 100 trials to get to that result, how confident are you that you aren't looking at something that arose from random chance?

The quest to find a story to tell can blind us to our real job, which is to protect and to serve...that is, to protect the integrity of the data, and serve the true needs of the customer.

Speak for the facts, as represented by the data set and by forthright analysis; avoid coercing them into saying something you think the customer wants to hear. If you’re confident in your data and confident in your methods, then you should be confident in delivering a product to your customer that says “we didn’t find anything.”

Now, the "story" you tell is the story of both the data and of your methodology. Show the customer how you proceeded through your analysis. Along the way, include relevant data, and present it in the context that led you to draw your conclusion that there was "no story to tell."

With this approach, you will demonstrate to your customer both the thoroughness of your methods and the true qualities of the data. By showing both of these things, you should be able to bring your customer to the same level of comfort and confidence with your findings as you have.

 

Mike Cisneros is on Twitter at @mikevizneros.