Data is the fuel of all analytics, whether you want to provide accurate timely information to decision-makers on the front lines, provide a comprehensive customer dashboard to your executives, do a predictive demand forecast or simply complete your annual budget.
If you don't have the right data for your goal, you won’t get it done. By the RIGHT data I mean … well-governed (and therefore trustworthy) data, data at the appropriate level of detail, data from different systems that are linked in a coherent way, and data that is readily available … by which I mean that no one should have to do any extra work for that data to be fit for purpose.
It sounds obvious, doesn’t it? In our experience companies rarely have the right data ready to go, and about 75% of the time - they are surprised to find that out!
There are three primary reasons for this confusion:
- The finance or business function who wants the new capabilities have unrealistic expectations about the availability of the RIGHT data
- Red flag statement: “Jane our financial analyst does this already in Excel, so how hard can it possibly be?”
- The IT function that manages the systems from which the data is to be procured have insufficient understanding of the business intricacies of how the data will be used, levels of detail required, and data transformations needed
- Red flag statement: “Just tell me what you need, and I can get it for you”
- Neither finance or the business, nor IT, realize that the data needed to achieve the goals of the project are not being collected and/or stored anywhere
- Red flag statement: “I am sure it is in the data warehouse”
The situation is exacerbated by over-eager (and sometimes sincere) software vendors who either claim that their solution will address this magically or take the most optimistic view possible of data requirements.
Here is our advice: if it sounds too good to be true, it probably is.
It is difficult though. There is SO MUCH hype about the wonders of technology and you cannot possibly research everything yourself, so that it becomes natural to doubt your own judgement and be ready to take a leap of faith. After all, why would so many Venture Capitalists invest in this technology if it weren’t any “good”?
This is where it is important to listen to your inner voice of reason. The technology may be excellent in a narrow sense but to be successful in your context, it will need to operate within your unique environment. This includes integrating with data residing on other systems, as well as human business processes.
Ultimately, it’s all about data preparation. Here is an analogy:
I love to EAT delicious food, which is why I cook. Cooking is usually easy, once all the ingredients have been prepared. Ingredient preparation, like cleaning and chopping, are the parts of the process I like the least. If you are making an Asian-style stir-fry you can easily spend 40 minutes prepping, 5 minutes cooking … and about 1-minute gobbling it all down (if you are my kid).
Data preparation is like food preparation. If you don’t prep well, there will be a price to pay. According to the internet, Abraham Lincoln once said: “Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”
Data scientists (the people who build the fancy models) sometimes joke that 95% of an analytics project lies in data preparation. Data preparation is not just a data engineering exercise of building data pipelines that transport and transform data, but it includes iterative cycles of data exploration and hypothesis testing which inform the scope and goals of the project.
In one example (described in this case study) the client was looking for a way to create a financial forecast while the COVID-19 pandemic was raging. Traditional predictive demand forecasting techniques were not applicable because historical patterns had been rendered irrelevant. Instead, they were looking to understand the causal factors driving their business. When the project began, they had “gut-feelings” about what those causal factors would be. These were ultimately proved wrong through a series of iterative explorations and statistical analyses of the data. Once they understood what the true drivers were, they were able to explore scenarios based on tweaking those drivers.
In another example, a global financial consolidation client assumed that the different accounting (ERP) systems (including Oracle Financials and JD Edwards) used by subsidiaries across the globe had consistent definitions of their charts of accounts. It turned out after some iterative data exploration that there were significant inconsistencies, and that there was a need to:
- Define and agree on a standardized global chart of accounts
- Define mappings between each of the subsidiary ERP systems to the global chart
Note that neither (1) nor (2) in this example are problems that can be solved by technology, but you can solve the problem more quickly if you have the right technology tools at your disposal. Conversely, denying the existence of the problem can be costly. This example also shows the interplay between data prep and human business processes.
- The primary cause of failure for analytics initiatives is a lack of data readiness
- A lack of data readiness is often caused by over-optimism, technical/business naivete or misplaced expectations
- Consequently, project plans end up being light on data readiness assessments and mitigation steps for data preparation
- This leads to later identification of serious data issues, and a higher likelihood of costly rework resulting in time and budget overruns
- We recommend that to mitigate this very real risk, you should plan to identify data related issues as early as possible and be ready to invest in whatever data preparation work that is needed before you go too far. As they say, bad news doesn’t get better with age!
The sad truth about data problems is that most of the horror stories about cost overruns and delayed or failed projects could have been avoided by taking a realistic and common-sense approach to data preparation from the start.