| |
![]() November 2001 Down in the Data Mine
But what if this methodology yields unintended results? What if the resulting portfolio actually ends up tracking a different benchmark
Data mining occurs when a quantitative analyst bases a theory on a pattern he or she notices in a series of observations. On the face of it this doesn't sound bad, but when you stop think about it, there's no real underlying theory. Instead, the theory comes as a result of the observations rather than being tested by them. A Picture is Worth a Thousand ObservationsNevertheless, when you see a definitive pattern in the data, it's hard not to speculate there's some underlying reason for it. Consider, for example, the apparent correlation between P4 and the Nasdaq:Chart 1 shows each series from P4's inception (June 30, 2000) through September 2001. Early in the period, P4 seemed to follow the
This pattern is even more obvious over the last three months. Chart 2 shows P4 and the Nasdaq moving almost identically over the period. As we've noted before, P4's regression was based on ten year's worth of data from the decade of the 90's. This was a period when the growth style dominated so P4's growth bias is quite understandable. The Nasdaq, studded with techs and financials, is predominately a growth index. Yet over the bear market of the past 18 months, the S&P 500 has been dominated by its value names. Given that, it's no wonder P4 has looked more like the Nasdaq than the 500.
Looks Can Be DeceivingBut sometimes things aren't what they appear. It's often said that figures don't lie. In this case, the figures tell a much different story.Correlation is a statistical measure of how two series move relative to one another. Perfect correlation exists when both move identically -- not unlike P4 and the Nasdaq as shown in Chart 2. Perfect negative correlation occurs when two series move in opposite directions with equal magnitude. Correlations range from +1 (perfect positive correlation) to -1 (perfect negative correlation). A value around 0 -- whether above or below -- indicates the two series have little or no correlation.
From inception, the correlation of P4 and the Nasdaq is +0.95411 while P4 and the S&P 500 is +0.97500. Correlation with the Nasdaq is almost perfect in the July 2001 - September 2001 period (+0.99218), but it's still not much better than with the 500 (+0.99181). So what's up here? First of all, the model is working as it's supposed to: it's highly correlated with the S&P 500. Secondly, correlations aren't stable; they can and do change over time. The initial portfolio reflected the dominance of the S&P 500's growth stocks over the decade of the 90s. As planned, Portfolio 4 remained unchanged for 12 months, yet during that period value began to outperform growth in the index. The Nasdaq, heavily populated with tech stocks, retained its growth bias. As a result, P4's correlation with the Nasdaq grew while falling in relation to the S&P 500. That's the problem with data mining, appearances alone can't establish a theory. Statistical analysis is a means of testing theories, not creating them. The old adage remains true: Don't believe everything you see. Search this site! Just enter you key word or words:
Get current quotes or follow your own custom portfolio,
courtesy of E-Line Financials:
|
|||||||||||||||||||||||||||