Making Data Cake

Why is it so important to ensure you have quality data before you distill insight from that data? Believe it or not, we are a culture obsessed with the consumption of data and yet, we have no idea what it means for our data to be fit for purpose. The necessity for quality data is more important than ever before and it’s becoming more critical every day. As we create new ways to derive insight from our data through advances such as machine learning and deep learning, we also create new opportunities for small mistakes to have major impacts on results. It is critical then, that we understand what quality really means when we talk about data and how we can deliver on the idea that data is fit for purpose.

If you were to bake a cake for the love of your life, what type of ingredients would you select for that cake? Would you use a well-regarded recipe to make it or would you wing it and just throw the ingredients in the bowl? How long would you bake the cake and at what temperature? I hope that the answers to these questions are obvious, but let’s take a look at data in the same way to illustrate the importance of a data quality program. Here are the steps for baking our data cake:

  1. Accurate and Whole – When we begin the process of baking our cake we need to have the right ingredients, in which we have included all of them in measured quantities. We do so by selecting only the best ingredients that have not exceeded their shelf-life and we measure them perfectly so as not to have a cake that is flat, or too salty or bland. Similarly, with data, to derive sound insight for those in the organization who depend on us, we want to select only data that is accurate. We don’t want to use sets with missing fields or fields that have letters where only numbers should be. Additionally it is critical that if we are expecting one hundred dollars in a field, that it does not have ten dollars instead. That mistake could result in some unintentional data insights.

  2. Consistency – Have you ever had a cake or other food in which there is something inconsistent in it? You eat a chicken sandwich and a big piece of gristle is in the third bite and you are immediately turned off. Or perhaps, with our cake, you take a bite and get a bubble of dry flour? Consistency is key to enjoyment here and the same can be said for data. Consistent data means reliable data and when you are running a business, this is not up for debate.

  3. Duplication – Did I add salt to this? I don’t think so, let me add the salt. Then, a few hours later, you see the face of your beloved turn sour when they take their first bite. Too salty? Perhaps you duplicated the amount the recipe called for. Our data acts in the same way in that duplicated data generates that same sour face. No executive wants to tell the board or their shareholders that they reported an erroneous number of sales in region two because some data analyst did not take the time to verify that there were no duplicates in the data.

  4. Integrity – Let’s go back to our ingredients again. How is that flour you are using? Has it been sitting in the cupboard for too long? How about those eggs? What was the date on that carton again? The integrity of our ingredients will determine the quality of our cake. In data, our ingredients are the individual data elements that make up our aggregate insights. If the individual elements are a mess, then our aggregates are only going to reflect that.

  5. Timeliness – Here is a big one for our cake. Who likes a burned cake? You can put as much icing on that bad boy as you want but it won’t save the burnt cake my friend. Time is our friend when we respect it, but disregard time’s importance and you have a disaster on your hands. The timeliness of data is no different. If we decided that we wanted to see the sales of the prior week to determine what changes we need to make this week, it does not help us if the data arrives this week. Too late! That’s a burned data cake and no one will eat it. We need to ensure that our data arrives on time and is therefore useful to our insight machine.

Okay, so I need to finish this article soon because I am hungry. The problem is though, that the last few times I went to my favorite cake place I wound up eating bad cake! So what do you think I am going to do next time I want some cake, like right now? I won’t be hitting that place anymore because I don’t want to risk getting something bad. We data practitioners need to think of ourselves as restaurant or bakery owners and realize that perception is everything. If we deliver a bad experience just once, it may haunt us forever. It’s like they say, you can only make a first impression once. When executives are looking for answers and they go to their data teams to get them, they expect the integrity of those answers to be accurate, consistent and timely with no duplication. So next time you hear someone complain about the bureaucracy of data quality you need to shout loudly, “Let them eat cake!”