Quant View -- Investing by the Numbers -- Archives: March '01 Work in Progress Click on Topic to Go
 


March 2001
What We Didn't Do
"If you torture the data enough, it will confess to anything."
-- Anonymous

 

NY PORTFOLIO IS THE RESULT OF A NUMBER of choices. You have to choose a style, a valuation methodology, a stock selection process, etc. Each of these decisions comes at the expense of some other alternative.

Our quantitative portfolios are no exception. In creating them, we focused on eight fundamental factors and their relation to annual return (see The Starting Point). By the same token, there were a number of things we didn't do.

It's often instructive to look back at the choices you didn't make in order to understand the results you are getting. With this in mind, here are some of the things we didn't do: Archive Index

We Didn't Use a Longer Term Investment Horizon

Our models were based on regressions of the fundamental factors to one-year returns. In math terms, one-year return was our dependent variable and the other eight factors were independent variables. In essence our models attempt to select the stocks whose fundamental characteristics are most highly related to greater one-year return.

The results we got may have differed if a different time horizon was used for return. If, for example, we had considered five-year return as our dependent variable we probably would have found different relations with the fundamental factors. We'd also end up with different stocks in our portfolios.

We knew results would vary by the time horizon we chose. We selected one year for several reason. First, it's short enough that we should see results almost immediately. If a longer horizon were used, we'd have to wait longer to begin assessing the effectiveness of the model.

Secondly, with the advent of online trading and the popularity of investing, many investors use a one-year horizon -- if that long. The whole point of this exercise was to create a quantitative methodology that investors might actually use.
Our Quant Portfolios
Portfolio 3
  • Top 30 Stocks Based on Stepwise Regression Across All Stocks of the S&P 500
  • No Attempt is Made to Sector-Weight this Portfolio
  • Rebalanced Every 60 Days
  • Stocks Remain in the Portfolio Until Falling Below the Top 40
  • The Highest Rated Stocks Not Already in the Portfolio are Added When Existing Constituents are Removed

Portfolio 4
  • Top Stocks of Each Sector Based on Stepwise Regression of Each Individual Sector of the S&P 500
  • Number of Stocks Selected in Each Sector Determined by Current Sector-Weightings of the S&P 500
  • Rebalanced Every June
  • Stocks Remain in the Portfolio for 12 Months Unless Deleted for Special Circumstance e.g. Acquisition
  • Stocks Removed for Mergers and Acquisitions are Replaced by the Next Highest Rated Stocks in Their Specific Sector

It would be interesting to see how portfolios would differ if a different time horizon were used for the dependent variable. Perhaps the model would work better, perhaps not.

A Lot of Fundamental Factors Weren't Used

There are literally hundreds of fundamental factors measuring prices, earnings, growth rates, margins, capitalization, etc. The most comprehensive models would start with as many as possible and weed out the less informative ones. At least that's the way it's supposed to work.

But we didn't want to do that. Instead, we set out to create a model that a normal investor could use. Because of that, it was necessary to restrict factors to those that are easily attainable. Most folks with access to Value Line or financial sites on the Web can come up with basic factors such as P/E and Return on Assets.

In selecting the factors we used, we looked for a cross section of growth, value, and even momentum metrics. We hoped to have a viable model regardless of which specific style was in vogue.

We Didn't Use More Factors

There were certainly a lot of other factors we could have included. However, the more independent variables (fundamental factors in this case) you use, the more likely the model will begin to suffer from "autocorrelation". This is a fancy term that simply means the independent factors start to affect one another and show similar changes.

In a reliable regression, the independent variables show little correlation with each other. The more you add, the more likely some are to be correlated. The trick is to balance the model's predictive power with the fewest number of independent variables possible.

We thought eight variables was enough for a first pass. After all, this is a work in progress.

Our Data Could Have Covered a Different Period

Our models were based on ten years of S&P 500 data covering the decade of the '90s. S&P data extends back much longer and a broader data set could certainly have altered our results.
Measurement Periods Matter
Graph -- Growth vs. Value, 12/89 - 12/00
Although growth has defeated value over the past decade…

Graph -- Growth vs. Value, 12/74 - 12/00
…that hasn't been the case over the past 25 years.
Source: Ibbotson Associates

Looking back over the '90s, large cap growth stocks soundly outperformed their value counterparts, so it's no surprise that growth factors dominate the model for Portfolio 3. If the same regression had been run using data from the '80s when value outperformed growth, the opposite results may have been obtained.

We knew this could affect the model, but still limited the data to ten years in an effort to keep the data to a manageable level. We also thought it was reasonable to put more weight on the most recent decade in light of the changes that had occurred in the markets over time. With increased trading volumes, ECNs, and growing internationalization, the market bears little resemblance to that of the '60s or '70s, much less the '20s or '40s.

Perhaps we'll want to expand the database once we monitor the model's performance. We'll give it a year first before we revisit this decision.

We Could Have Altered the Regression Procedure

Both of our models are the results of "stepwise multiple regressions". What that mouthful means is this: All of our fundamental factors were combined and their daily changes compared to the daily changes in the S&P 500. If a factor didn't materially contribute to the relation, it was removed and the process rerun. This procedure was repeated until only "statistically significant" factors remained. The resulting relations are captured in the equations appearing on the Portfolio 3 and Portfolio 4 pages.

The regression process could have been run in reverse order. In other words, rather than starting with all factors, we could have started by comparing the S&P's return against just one and then added (or rejected) others to build our equations.
While it's true that you can't simply sift through data and hope to come up with a theory, statistical tests should be designed to test your theory, not necessarily prove it.

Intuitively you'd think the results would be the same, but they aren't. So much of this process is determined by how the independent factors interact with one another rather than with the dependent variable.

We used the subtractive rather than the additive process because we wanted to test the validity of the assumption that all eight of our fundamental factors were relevant. The additive approach would never have tested all eight together.

Did this choice affect the results? Probably. Would the alternative approach have yielded a better result? Maybe. Does it matter? No. While we certainly want to construct reliable models, we also want to test our theories in the process. Which brings up the final issue:

Did We Mine the Data?

"Data mining" is one of the most common problems of quantitative analysis. This occurs when an analyst devises a theory and then runs tests designed to prove it. If the theory initially fails, it's only human nature to tweak it a little and alter the tests.

While it's true that you can't simply sift through data and hope to come up with a theory, statistical tests should be designed to test your theory, not necessarily prove it. If your theory doesn't stand up to the data, change the theory don't just seek more agreeable data.

The underlying theory in our models was that the eight fundamental factors help explain stocks' annual performance. Our regression process initially tested them all together but then went on to remove any that weren't statistically significant. The resulting equations -- especially those for Portfolio 3 -- didn't rely on all eight factors.

The data didn't completely justify the theory but the results are still worth testing. Should the models fail to keep pace with the index, we'll make changes and start the process over again. We haven't mined the data -- at least yet.


 

E-mail your comments.

Search this site! Just enter you key word or words:

 

PicoSearch

Get current quotes or follow your own custom portfolio, courtesy of E-Line Financials:
 

Search:TickerName
 

 
Homepage Return to Top