The failproof method of avoid overfitting

… is by not doing any analysis using past data at all. Once you are analyzing past data, you are already overfitting your strategy.

Let me explain this using a simple analogy. Let’s say I have a friend named Frog. They must make the choice right after they wake up everyday whether they want to drink apple or orange juice. They can only drink the chosen juice for that day. They have been doing so for the past 10 years. I have no data on what they were thinking, what did they feel, no data on their activities after they woke up until the time before they went to sleep everyday. The only data I have are the dates, the choices for each dates, full description of him and his room when he just awake for each dates, and full description of him and his room when he just goes to deep sleep the previous night for each dates.

Now it’s 7am, and my friend is still sleeping. I bring all the data to you and 100 other people. All of you have supercomputer to help you analyze anything you wants. The task is simple. Give me your best prediction on which juice he gonna choose today when he wakes up. You can either answer my question, or you can choose not to answer. If your prediction is correct, I’ll give you $100. If your prediction is wrong, you have to give me $100. If you choose not to answer, then both of us keep our money.

First thing first, how much data do you think you need to be able to give a good prediction? Only previous day’s data enough for you? Past week? Past month? Past year? Past five years? All of the data (ten years)? Would five years’ data give you better prediction than one week’s data?

Another question, do you think recency bias affect Frog’s choices? Whatever factors that made Frog chose Apple juice 3 years ago, would the same factors still affect their decision for recent days? Or now the factors are completely different?

The next question is which factors affect their decisions? The color of their bed sheets? The weather at the time he wake up? What food he had for dinner the previous night? Which side of bed did he face when he woke up? How do you determine which factors to be used in your analysis? Are you sure you are choosing the right factors? Do you believe all of Frog’s choices were done based of logical decision and not just based of a coin toss?

And the last question, do you think the same analysis can be used on my other friends named Hippo, Musky, and Scooby?

Going back to what I wrote here, my personal opinion is that while the market can’t be scientifically analyzed, that’s all what we (retail traders) can do. So I’m not saying to not doing any backtest at all. Backtest is still important for us to try to predict which indicators can be useful to alert us of bad trades. It’s important to minimize risk, and one of the way to minimize it is by choosing “not to trade at all” more often than to jump into trade. With every opened trades, you are taking more risks, no matter what you think about the coins or what you think about your analysis/strategy. That’s where your backtest helps you. Since we can’t predict the future, the best we can do is to make sure our analysis is working well in the past. That’s why using a larger data is better than using smaller data.

It’s also worth noting that since all we are doing is guessing which factors/indicators affecting the past data, having more indicators won’t give you more accurate prediction than having lesser indicators. It’s all about the quality of the indicators. Diversifying the indicators used would be a better option. Having 5 RSI indicators with different lengths won’t increase the quality of your prediction.

To summarize this post, you can’t avoid overfitting when you are doing backtest. All you can do is doing a proper backtest to have a good overfitting, which will make your future decision-making (hopefully) more accurate, and (hopefully) have lower risk factor in the end.

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *