Survivorship Bias-Free Data
What is Survivorship Bias?
As defined on Wikipedia, “survivorship bias or survival bias is the logical error of concentrating on the people or things that made it past some selection process—and overlooking those that did not—typically because of their lack of visibility. This can lead to false conclusions in several different ways.”
When we run a back test on equities, we often say that we want to focus our test on the members of a popular index like the S&P500, FTSE100 or ASX200. We collect together the securities that make up that index and do our test on those over the last ten years. The trouble is that members of the S&P500 today are not the same as the members ten years ago. Lehman Bros ring a bell? How can we run a test without including ‘LEH’ in the list?
Similarly, there are a number of current names in the S&P that were not included ten years ago. E.g. Netflix was added to the S&P500 in 2010, so we should not be considering any signals in Netflix before 2010 when it was not part of the index.
Both these conditions - ignoring companies that are no longer in the index and including those that had not yet ‘made it’ - leads to survivorship bias which skews tests positively.
Optuma Symbol Lists with Historical Data
We have a number of Optuma Symbol Lists with survivorship-bias free data available. Based on the data that we have available, the dates that we have reliable historical data for will vary. Below you can find a full list of all Optuma Symbol Lists with historical data available, and the dates that they can be reliably used from:
Please note that we are researching to get our historical membership data as accurate as possible. The dates below are when our data drops below 10% of the expected exchange components, for example, when there are less than 450 members in the S&P 500. If you have any information for historical membership, or want to request any indexes to have historical data for, please send an email to firstname.lastname@example.org
- Australian Equities
- ASX 20 - August 8, 2016
- ASX 50 - August 8, 2016
- ASX 100 - August 8, 2016
- ASX 200 - June 16, 2016
- ASX 300 - September 15, 2017
- ASX All Ords - September 13, 2018
- Canadian TSX Equities
- TSX Composite - September 1, 2010
- Euronext Equities
- Amsterdam Exchange Index - December 22, 2014
- Amsterdam Midkap Index - February 5, 2015
- Amsterdam Small Cap Index - December 19, 2014
- Indonesian Equities
- IDX30 - August 1, 2017
- IDX80 - February 1, 2019
- LQ45 - August 1, 2017
- Japan Equities
- Nikkei Average 225 - January 28, 2008
- London Equities
- FTSE 100 - June 23, 2008
- FTSE 250 - March 21, 2014
- Major Euro Equities
- DAX Index - December 18, 2008
- STOXX Europe 600 - January 6, 2014
- STOXX Europe 50 - September 21, 2020
- NSE India Equities
- Nifty 50 - January 1, 2010
- Nifty Next 50 - January 1, 2009
- Nifty 100 - January 1, 2009
- Nifty Midcap 100 - June 9, 2008
- Nifty Smallcap 100 - March 31, 2011
- Nifty Midcap 50 - October 16, 2007
- Nifty 500 - February 14, 2008
- Nifty Microcap 250 - May 10, 2021
- US Equities
- Dow Jones Industrial Average - October 29, 1999
- NASDAQ 100 - December 19, 2003
- Russell 2000 - December 17, 2013
- S&P 400 - April 7, 2015
- S&P 500 - December 16, 2005
- S&P 600 - June 15, 2011
How do I use this data in my tests?
In the window of the test setup (whether it be a Back Test, Signal Test, or Trade Test) if the S&P500 is selected from our Symbol List as the universe under Codes to Scan then a Membership option appears, allowing you to select the Current or Historical membership.
If Historical membership is selected, then all the previous members will be included in the test - since 2000 there have been over 950 members of the S&P500 index - but they will only be included in the results when they were in the index.
For example, TSLA was added to the index in December 2020 and AIV was removed. For any S&P500 tests over that period AIV signals will be included up to that date, and TSLA will be used from that date.
NOTE: you can see when a stock in included in the index by using the IsMember() function in a Show View, as described here.
Current vs Historical Results
The following is a simple signal test to enter a stock when the 50-period moving average crosses above a 200-period moving average over the last ten years on the current members. Here is the script:
MA(BARS=50) CrossesAbove MA(BARS=200)
Remember in this chart the blue shaded plot is our equity from the test. Obviously not a lot of alpha, but it shows a moderate return over the index (red line). The issue is that we have only used the current 505 stocks in our 10 year test. We need to set this up to include all the companies that were ever in the index by using the Historical membership dataset with the same formula.
Suddenly this does not look so good anymore. Our idea did not ‘beat’ the market at all. Anyone who has ever tried trading a MA crossover like this knows that it’s a great strategy in theory, but the results are really hard to replicate.
The main point of this is to highlight to you how important survivorship bias is, and to ensure that you don’t ignore it in testing. If you have ever been frustrated by your inability to repeat test results in real-life, then this will help you see why that has happened.
A simple rule of thumb, for when you don’t have access to correct survivorship bias-free data, is to subtract around 3% per annum from your results. That will give you a better idea of what you can expect. Just don’t plan your trading strategy by only looking at the survivors. Make sure you properly consider the securities that didn’t make it.