Planning Analytics Data Modelling with Context
In the past, data to be modeled came from a single source and was provided in the same format, typically transactions from a general ledger system. In today’s data driven world, project data can come from a variety of places which, potentially, can influence the data’s possible meaning or value, effect how you model and use it and ultimately, whether it will provide insights the business can in fact leverage.
To properly model data and create a clean, solid design, you must first develop a deep understanding of the data so that you can establish context and then “build in” appropriate features so that the data consumer – the user of your model or solution - can better comprehend and use it.
Adding context should go beyond inserting an “as of” date to a report or data view or, for example, “segmenting the data into time periods” (years or months, typically) or adding supplementary dimensions.
Consider an example of “similarly looking” data that can actually mean very different things such as an average heart rate. This can carry a significantly different connotation if the median age of patients within the current view of the data is 18-25, versus the same average for patients +65.
Obviously, in this scenario, you could expect the user to “know the data” well enough to “see” this critical detail, and then take the time to set or make it visible in a report or visualization or other downstream algorithm, but perhaps a better option would be to “model context in” from the start, making it difficult for a user to draw incorrect conclusions and avoid having to perform additional steps during use.
Context and Clues
When writing a book, authors leave context clues for their readers. A context clue is a “source of information” about written content that may be difficult or unique that helps readers understand. This information offers insight into the content being read or consumed (an example might be: “It was an idyllic day; sunny, warm and perfect…”).
When modeling data as part of a planning analytics solution design, context clues should be developed, through a process referred to as profiling (I’ll write about this in a future post), so that the data consumer can better understand (the data) when visualized or consumed, right-away. Additionally, having context and perspective on data “built in” can fast-track the process of identifying key insights that may exist within the model or data, reduce the time it takes for new feature development and help overall model performance and usability.
You can build in these context points in the form of attributes, control objects, staging or intermediate cubes and dimensions or even in the form of a relational datastore where calculations, comparisons and validations are performed on the incoming data automatically.
Keep in mind though that adding context to data before and during creating a model design can make the data more relevant, and useful, but still can’t serve as a substitute for value unless first and foremost, your data model benefits those who are going to ultimately consume it so establishing appropriate context requirements will be critical.
Establishing Context
Remember, to add context the rule is: Before Context, Think -> Value. How? Start by thinking about the contextual categories, which can be used to augment or increase the value, understanding and use of the data you are modeling.
These include:
- Definitions & Explanations,
- Comparisons,
- Contrasts
- Tendencies and
- Dispersion
Definitions & Explanations – provide additional information or “attributes” about a data point. In Planning Analytics, we might consider adding attributes to the data. For example, if the data contains a field named “patient ID” and we come to know that records describe individual patients, we may choose to add height and weight or even calculate and add each individual patients BMI or current body mass index.
Comparisons – This is the idea of adding a comparable value to a particular data point, such as computing and adding a national ranking to each “total by state” or other relevant bit of information.
Contrasts – This is almost like adding an “opposite” to a data point in an effort to see if it perhaps determines a different perspective. An example might be reviewing average body weights for patients within a group to those of another group, i.e. those who consume alcoholic beverages verses those who do not consume alcoholic beverages.
Tendencies – These are the “typical” mathematical calculations (or aggregations) on the data as a whole or by other category within the data, such as Mean, Median, and Mode. For example, you might add a median heart rate for the age group each patient in the data is a member of.
Dispersion – Again, these will be mathematical calculations (or summaries), such as Range, Variance, and Standard Deviation, but they describe the "average" of a data set (or group within the data). For example, you may want to add the “range” for the selected value, such as the minimum and maximum number of hospitals stays found in the data for each patient age group.
An Undervalued Art
The “art” of profiling data and adding context to support identifying new and interesting perspectives for use and use it to drive the data modelling process is ever evolving; no doubt there are additional contextual categories existing today that can be investigated as you continue your work with data modelling projects.
Using this concept to model data when designing a Planning Analytics model may be a bit of a mind shift for some and even cause concern for scope creep, however knowing your data and enhancing it with context closes the gap between analytics and advanced analytics and saves development time in other functional areas as well as ensure that you build a model that will be embraced and used by your customers.