Quant View -- Investing by the Numbers -- Archives: November '01 Work in Progress Click on Topic to Go
 


November 2001
Down in the Data Mine
"Your theory is crazy, but it's not crazy enough to be true. "
-- Niels Bohr

 

OTH OF OUR QUANTITATIVE MODELS ARE supposed to track the S&P 500. P3 is based on a regression analysis of factors that affect the entire index. P4 uses a regression on each sector. P3 is reconstituted every other month, while P4 is only rebalanced annually.

But what if this methodology yields unintended results? What if the resulting portfolio actually ends up tracking a different benchmark Archive Index more closely than the S&P 500? Is that a bad thing? Probably not, but it is data mining at its worst.

Data mining occurs when a quantitative analyst bases a theory on a pattern he or she notices in a series of observations. On the face of it this doesn't sound bad, but when you stop think about it, there's no real underlying theory. Instead, the theory comes as a result of the observations rather than being tested by them.

A Picture is Worth a Thousand Observations

Nevertheless, when you see a definitive pattern in the data, it's hard not to speculate there's some underlying reason for it. Consider, for example, the apparent correlation between P4 and the Nasdaq:

Chart 1 shows each series from P4's inception (June 30, 2000) through September 2001. Early in the period, P4 seemed to follow the
Chart 1:
Returns from Inception
Graph -- Portfolio 4 vs. S&P 500 and Nasdaq, July 2000 - September 2001
While the correlation certainly isn't perfect, P4 looks a lot more like the Nasdaq than the S&P 500, the index it was designed to emulate.
500, yet as the bear market took hold, it looked more and more like the Nasdaq.

This pattern is even more obvious over the last three months. Chart 2 shows P4 and the Nasdaq moving almost identically over the period.

As we've noted before, P4's regression was based on ten year's worth of data from the decade of the 90's. This was a period when the growth style dominated so P4's growth bias is quite understandable. The Nasdaq, studded with techs and financials, is predominately a growth index.

Yet over the bear market of the past 18 months, the S&P 500 has been dominated by its value names. Given that, it's no wonder P4 has looked more like the Nasdaq than the 500.
Chart 2:
Couldn't Be Closer
Graph -- Portfolio 4 vs. S&P 500 and Nasdaq, July 2001 - September 2001
Data Source: Baseline
P4 and the Nasdaq traded almost identically in the 3rd quarter of 2001. Their correlation appears much stronger than that of P4 and the S&P 500.

Looks Can Be Deceiving

But sometimes things aren't what they appear. It's often said that figures don't lie. In this case, the figures tell a much different story.

Correlation is a statistical measure of how two series move relative to one another. Perfect correlation exists when both move identically -- not unlike P4 and the Nasdaq as shown in Chart 2. Perfect negative correlation occurs when two series move in opposite directions with equal magnitude.

Correlations range from +1 (perfect positive correlation) to -1 (perfect negative correlation). A value around 0 -- whether above or below -- indicates the two series have little or no correlation.
Our Quant Portfolios
Portfolio 3
  • Top 30 Stocks Based on Stepwise Regression Across All Stocks of the S&P 500
  • No Attempt is Made to Sector-Weight this Portfolio
  • Rebalanced Every 60 Days
  • Stocks Remain in the Portfolio Until Falling Below the Top 40
  • The Highest Rated Stocks Not Already in the Portfolio are Added When Existing Constituents are Removed

Portfolio 4
  • Top Stocks of Each Sector Based on Stepwise Regression of Each Individual Sector of the S&P 500
  • Number of Stocks Selected in Each Sector Determined by Current Sector-Weightings of the S&P 500
  • Rebalanced Every June
  • Stocks Remain in the Portfolio for 12 Months Unless Deleted for Special Circumstance e.g. Acquisition
  • Stocks Removed for Mergers and Acquisitions are Replaced by the Next Highest Rated Stocks in Their Specific Sector

From inception, the correlation of P4 and the Nasdaq is +0.95411 while P4 and the S&P 500 is +0.97500. Correlation with the Nasdaq is almost perfect in the July 2001 - September 2001 period (+0.99218), but it's still not much better than with the 500 (+0.99181).

So what's up here? First of all, the model is working as it's supposed to: it's highly correlated with the S&P 500.

Secondly, correlations aren't stable; they can and do change over time. The initial portfolio reflected the dominance of the S&P 500's growth stocks over the decade of the 90s.

As planned, Portfolio 4 remained unchanged for 12 months, yet during that period value began to outperform growth in the index. The Nasdaq, heavily populated with tech stocks, retained its growth bias. As a result, P4's correlation with the Nasdaq grew while falling in relation to the S&P 500.

That's the problem with data mining, appearances alone can't establish a theory. Statistical analysis is a means of testing theories, not creating them.

The old adage remains true: Don't believe everything you see.


 

E-mail your comments.

Search this site! Just enter you key word or words:

 

PicoSearch

Get current quotes or follow your own custom portfolio, courtesy of E-Line Financials:
 

Search:TickerName
 

 
Homepage Return to Top