Originally Posted by
chriswufgator
You are overlooking the real issue.
Many statistical correlations are in reality totally meaningless, being the product of nothing more than coincidence, or being brought about by some third factor of which the statistician is unaware. This is a "false positive".
Any system that lacks the capacity to determine meaningful from meaningless correlations is completely non-functional for any purpose.
Again, if Amex added Office Depot to its nexus based on a spike in trailing cardholder defaults by OD customers, when it was caused by OD holding a closeout sale on a "DIY Foreclosure Defense Kit" for let's say 10 days, then the system is a complete failure, because it takes action that leads to real costs based on incomplete data. The only real legitimacy of that data-point was the 10 days during which that merchant's sale was attracting less creditworthy borrowers. By the time the system responds, it's already too late. This is the problem with trailing data.
The bottom line is, Amex has nowhere near enough data points to create an accurate predictive behavior model. They will *think* it's accurate based on the false positives the system finds, only because they lack sufficient data points to differentiate between meaningless and meaningful correlations.
If you mitigate risk based on meaningful correlations, then you'll achieve a real result. But if you start axing customers based on meaningless or coincidental correlations, then you're costing yourself (a lot of ) money.
Once again, you are making assumptions. You assume that Amex isn't smart enough to make sure they make the best use of the data points they have. Your Office Depot example is stated as though this is what they are doing. Like I said before, if they found that office depot shoppers had a high level of defaults *and* people who didn't default *never* shopped at Office Depot, that neutralizes your hypothetical about the special sale.