User:Devon McCormick/Research/HoldingWinnersSellingLosers4STReturnPersistenceAvoidingLookAhead

From J Wiki
Jump to navigation Jump to search

Originally presented 6/24/2007 to the NYC Financial Engineering Meetup. It illustrates the use of J to manipulate data quickly and easily in order to explore topics in quantitative financial research. The previous work is here and the continuation of this is here.

Short-Term Return Persistence Without Look-Ahead Bias

So far, the method by which we have examined short-term return persistence has been subject to “Look-Ahead Bias”. This is because we used the entire period of returns to establish the decile ranges but the entire period would not have been available if we had been using this strategy during that historical period. That is, we have biased our results by looking ahead into the future.

To eliminate this bias, we have to re-build our deciles at each historical period using only the data available at that period. Let’s approach this by first working with arbitrary namedvalues for each of our parameters. This will allow us to generalize the code more easily. For instance, let's start with these values:

npw=. 5                    NB. Number of periods for return window.
ndiv=. 10                  NB. Number of quantile divisions, e.g. 10=deciles
mdts=. 2*ndiv              NB. Minimum # divisions to start quantile stats
cnobs=. npw*mdts           NB. Current # observations from which to build return quantiles
rets=. (1{seltkrs){clcrets NB. Look at only GLW returns for now

We’ve arbitrarily decided that we need an average of at least two observations per decile in establishing the minimum number of observations with which to start. So, looking at the earliest set of correspondences for the GLW returns only:  

   >0{mi=. ntilesOnDDayRets ndiv;npw;cnobs{.rets
0 0 0 1 0 0 0 1 0 0
1 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 1 0 0 1
0 0 0 1 1 0 0 0 0 0
0 0 0 0 0 0 1 0 0 1
0 1 0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0 1 0
0 0 1 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0
1 0 0 0 0 1 0 0 0 0

To clarify the meaning of this table, study the labeled version below:

Observations Subsequent Period Weighted
Decile # 1 2 3 4 5 6 7 8 9 10 Average
1 0 0 0 1 0 0 0 1 0 0 0.63
2 1 0 0 0 0 0 0 0 1 0 0.53
3 0 0 0 0 0 0 1 0 0 1 0.89
Initial 4 0 0 0 1 1 0 0 0 0 0 0.47
Period 5 0 0 0 0 0 0 1 0 0 1 0.89
6 0 1 0 0 0 1 0 0 0 0 0.42
7 0 0 1 0 0 0 0 0 1 0 0.63
8 0 0 1 0 1 0 0 0 0 0 0.42
9 0 0 0 0 0 0 0 1 0 0 0.42
10 1 0 0 0 0 1 0 0 0 0 0.37

The Weighted Average column provides a convenient way to summarize how good each starting decile turns out to be: a higher number means the subsequent returns were higher.

So, the two observations in the first (lowest return) decile - row one - are followed by observations in the fourth and eighth deciles, indicated by ones in columns four and eight of that row. The weighted average of "0.63" is "(4+8)/19" because there are a total of 19 observations.

At this point, we might pause to ask why there are only 19 decile-pair observations when we selected our initial values above to provide an average of two observations per decile, or 20 in this case. We have one less decile-pair observation because the last decile observed has no following decile, hence has no entry in this table. Upon reflection, this might lead us to modify our original initial value to be  

   cnobs=. npw*1+2*ndiv  NB. Current # observations from which to build return quantiles

or 105 observation periods instead of 100 as originally defined.

This gives us the following similar, but slightly neater, table:

Observations Subsequent Period Weighted
Decile # 1 2 3 4 5 6 7 8 9 10 Average
1 0 0 0 1 0 0 0 1 0 0 0.40
2 1 0 0 0 0 0 0 0 1 0 0.70
3 0 0 0 0 0 0 1 0 0 1 0.60
Initial 4 0 0 0 1 1 0 0 0 0 0 0.65
Period 5 0 0 0 0 0 0 1 0 0 1 0.70
6 0 1 0 0 0 1 0 0 0 0 0.75
7 0 0 1 0 0 0 0 0 1 0 0.60
8 0 0 1 0 1 0 0 0 0 0 0.45
9 0 0 0 0 0 0 0 1 0 0 0.45
10 1 0 0 0 0 1 0 0 0 0 0.35

Note how the values in the Weighted Average column end in either “0” or “5” – this is a clue to the observant reader that we don’t really have a full two digits of precision in these numbers even though they are shown as if we do. This clue is further motivation for this little bit of neatening we’ve done in ensuring that we have 20 decile pairs instead of 19: the earlier table was misleading in presenting two digits of precision for the weighted averages.

So, to proceed from this point, we’d like to rebuild this table and calculate this weighted average for each day after the initial number of observations. For each additional day, moving from the past toward the present, we want to 1. rebuild the decile boundaries at each period, 1. place each past return period in its appropriate decile, 1. calculate the subsequent period (i.e. future at this point) return, 1. place this forward-looking return in its appropriate decile, 1. calculate an observation table like the above, and 1. calculate the weighted average for each initial decile.

The result of this will be a set of weighted averages per period as this looks more useful than the ten by ten gray-scale graphs we have been using so far. However, this result raises questions about how we treat the (large set of) numbers we will then have: what’s a good way to look at a lot of numbers such as we’ll be generating? We'll consider this after we generate our sets of weighted averages.

The previous work is here and the continuation of this is here.