At QueBIT we like to describe Advanced Analytics as the powerful combination of Predictive Analytics and Prescriptive Analytics. The term Data Mining is still in wide use, however. It can be described as one of the most important methods for doing Predictive Analytics. Tom Khabaza has done more than most in clarifying what is meant by Data Mining, and how to do it well. Much of his great insight is available on the web.
About half of Data Miners use the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. The other half uses a variety of different methods making it the de facto standard. It was the collaboration of dozens of data miners working together on the document through their participation in a SIG created for that purpose. Tom was one of the lead authors of the document. CRISP-DM is easy to find on the web, and it is free. It is about 80 pages and its combination of structured outline and prose discussion walks the reader through the entire data mining process.
Just last year, Tom founded the Society of Data Miners. Its website hosts a copy of CRISP-DM as well as Tom’s well received Nine Laws of Data Mining. The Nine Laws have been discussed in a number of related articles over the years. They always inspire discussion and debate. Tom likes to make the point that CRISP-DM is the ‘how’, but it does not explore the ‘why’? Why is it that Data Mining projects (and especially the successful ones!) take on a particular form? The Nine Laws include statements that might, at first, seem straightforward: “Business objectives are the origin of every data mining solution”. But they also include statements that may seem surprising at first: “The value of data mining results is not determined by the accuracy or stability of predictive models”. That second statement has always been favorite, and in the hands of an expert like Tom its meaning becomes clear.
Tom collaborated with two members of the QueBIT team in writing the IBM SPSS Modeler Cookbook. QueBIT has invited Tom to be a guest speaker in the upcoming May Modeler Seminar. This seminar series is a half day series of subscription seminars on a variety of topics around IBM SPSS Modeler. Tom will be discussing the history of Modeler and how the wisdom behind both CRISP-DM and the Nine Laws inspired certain design decisions in the creation of Modeler. Tom’s experience with Modeler goes back all the way to version 1, so I look very much forward to what he has to say this month.