Wednesday, August 5, 2009

The Mythology of Modeling

Time to take on one of the sacred cows of database marketing...the statistical model. Or - in the words of Mark Twain (or Benjamin Disreali, depending on who you believe), let's talk about "lies, damn lies, and statistics."

I've been involved with marketing models for more years than I care to remember. What I've learned is that modelers - even if they're not very good- seem to go unscathed when looking at the success or failure of a marketing program. Chalk it up to what I call "The Mythology of Modeling," where modelers are the equivalent of modern-day Merlins. Nobody seems to understand their "magic", so it's impossible to criticize them or hold them accountable. OK, maybe that's a bad's another one. Modelers are like Rasputin, marketers like the Tsarina. All they have to do is stare at you and talk about KS Stats and you're done.

While I could not develop a model if my life depended on it (aren't you glad those situations never occur), I have come to recognize when you're about to be bamboozled by science. Here's a few things to keep an eye out for in evaluating models.

(1) Ignoring macroeconomic factors - most current day models use the "past predicts the future" logic in determining a modeling strategy. You take a sample from a (recent) period of time, then try and glean knowledge and predict the future. The inherent assumption is that the macroeconomic conditions during your observation period will hold true for the future. Problem is, that's not often true.

A while back I had a modeling company come in an build a model for me, on spec. Their logic was "this model will be so awesome, you will BEG us to sell you more names." I mailed the file (as a test, of course). The model fell on its head. To which the modelers said "you must have a bad sales process and screwed up the implementation." While blaming the client is usually the first choice of modelers everywhere, the fact is that the underlying economy was changing - rapidly. The model was build upon a series of economic assumptions that were simply no longer true. The model was DOA, due to the failure of the modelers to recognize that times they were a-changin'.

(2) Statistical correlation isn't everything - Modelers are in love with correlations. There are a great number of variables that are highly predictive, but so tiny in size as to be almost irrelevant. If you allow these variables into your model, you'll end up with a highly predictive model that maximizes efficiency at the expense of size. While that's nice, size still matters to most marketers. Plus, variables are sometimes too big to have "high" predictive power, so they're eliminated from the model - again at the expense of actual business.

I had a team try to build a statistically predictive music model...when we ran the model, it came up with roughly 75,000 names...problem is, I needed about 200,000 to make the program work. They were quite satisfied with their work and quite surprised when I thought it was sub-standard. Being the client, we know who was to blame...

(3) Interactions are ignored - Data points are great, but the interaction between those data points can contain real value. For example, if you drive an SUV, that's good to know. If you drive an SUV and have 4 kids, that can have a different statistical impact than someone who drive an SUV with only one kid. The 4 kids are about room enough to get everyone everywhere at once. The 1 kid is (more likely) about safety.

The easiest way to figure this out is to run a CHAID (or your favorite interaction detection technique) analysis, then dump those variables into your favorite statistical program. If at least one of the interaction variables does not appear in your final model, your statistician should have some serious explaining to do.

(4) The wrong goals are in mind - More often than not, statisticians will deliver models that deliver the maximum statistical advantage. Problem is, you know -as a marketer - that in the "imperfect" universe there are still a whole bunch of sales. Think of it this way - in most models there is a pretty steep curve. But if you only did, say, the top 20% (because the model told me this was the maximum point of efficiency), chances are you would be cutting out about 50% of sales. That's a bad thing. Your goal as a marketer is not to deliver maximum statistical efficiency, but to deliver maximum business efficiency. So a model should tell you - in business terms - where you really cost yourself in terms of target measurement. Where does your cost per acquired account (or whatever your key variable happens to be) go so far underwater to make the whole effort not worthwhile. THAT's a starting point for where you want to cut off the model...

(5) Using modelers who are inexperienced at business - Statistics are hard. Hard to create, hard to use, hard to understand. As a result, models have fallen into the hands of not the most socially ept people in a company. Not everyone, of course, but in all probability the one who is working on your model. The aggravating thing is that - the less experienced in business a modeler is - the more arrogant they are about clinging to statistical correctness. Sometimes close enough is good enough...but it takes business experience to figure that out. You may get lucky and find a youngster who is good at math, good at business and good at accurately communicating results to management. Then again, you might have a pony in your yard when you get home today.

Which leads to...

(6) Marketing is not deeply involved with the process - As a marketer who is about to use a statistical model, if you're not invested in the development of the model it's your own fault. You're the one responsible for the results - you need to roll up your sleeves and dig in. You don't need to know jack about stats - you need to be able to lend business guidance to the people developing the models. Many of the people I know who develop models do not think like marketers - they're not trained to. That's YOUR job. If you abdicate it and the model fails, bad on you. It's not all that hard to understand the basic concepts. Get in and dig around.

Statistical models are not bad. In fact, they can be quite helpful. But you have to view them with the same vigor you would look at other aspects of your marketing programs. That way you can see your modelers not as someone whose stare can paralyze you, but someone with an interesting way of looking at you...


  1. Hi Bob, thank you for taking the plunge into blogging.

    I am one of the statisticians that you talk about in your article. I've spent more than twenty years working with business leaders, and I have been on the receiving end of the criticisms you outline in this well written piece.

    Hopefully, you'll run across more modeling folks who conduct business in the opposite way that you characterize my industry in this article. There's a large number of us who try really hard to provide good models and great customer service, and try to be patient with folks who aren't able to build models but have other really good business skills.

  2. Funny post, Bob, and very true, most times. I agree that the marketer needs to be hands-on, and really learn what went into the model. The marketer needs to bring his/her knowledge of the customer to this process.

    I would also suggest that any modeler that provides a black-box solution--who won't share the scorecard, for example--should be avoided like the plague.

    Finally, it's really important to match the data you're using to the task at hand. There's no point in analyzing a bunch of datapoints, if those data points have no relevance to your specific project. I see the big compilers doing this all the time.