The Limits Of Out-Of-Sample Testing

Image Source: Pexels
 

In trading system design, out-of-sample (OOS) testing is a critical step to assess robustness. It is a necessary step, but not sufficient. In this post, I’ll explore some issues with OOS testing.
 

How Well Overfitted Trading Systems Perform Out-of-Sample?

In-sample overfitting is a serious problem when designing trading strategies. This is because a strategy that worked well in the past may not work in the future. In other words, the strategy may be too specific to the conditions that existed in the past and may not be able to adapt to changing market conditions.

One way to avoid in-sample overfitting is to use out-of-sample testing. This is where you test your strategy on data that was not used to develop the strategy. Reference [1] examined how well the in-sample optimized trading strategies perform out of sample.
 

Findings

-In-sample overfitting occurs when trading strategies are tailored too closely to historical data, making them unreliable in adapting to future, changing market conditions and behaviors.

-The study applied support vector machines with 10 technical indicators to forecast stock price directions and explored how different hyperparameter settings impacted performance and profitability.

-Results showed that while models often performed well on training data, their out-of-sample accuracy significantly dropped—hovering around 50%—highlighting the risk of misleading in-sample success.

-Despite low out-of-sample accuracy, about 14% of tested hyperparameter combinations outperformed the traditional buy-and-hold strategy in profitability, revealing some potential value.

-The highest-performing strategies exhibited chaotic behavior; their profitability fluctuated sharply with minor changes in hyperparameters, suggesting a lack of consistency and stability.

-There was no identifiable pattern in hyperparameter configurations that led to consistently superior results, further complicating strategy selection and tuning.

-These findings align with classic financial theories like the Efficient Market Hypothesis and reflect common challenges in machine learning, such as overfitting with complex, high-dimensional data.

-The paper stresses caution in deploying overfitted strategies, as their sensitivity to settings can lead to unpredictable results and unreliable long-term performance in real markets.

The results indicated that most models had a high in-sample accuracy but only around 50% when applied to out-of-sample data. Nonetheless, a significant proportion of the models managed to outperform the buy-and-hold strategy in terms of profitability.

However, it’s noteworthy that the most profitable strategies are sensitive to system parameters. This is a cause for concern.

Reference

[1] Yaohao Penga, Joao Gabriel de Moraes Souza, Chaos, overfitting, and equilibrium: To what extent can machine learning beat the financial market?  International Review of Financial Analysis Volume 95, Part B, October 2024, 103474
 

How Reliable Is Out-of-Sample Testing?

Out-of-sample testing is a crucial step in designing and evaluating trading systems, allowing traders to make more informed and effective decisions in dynamic and ever-changing financial markets. But is it free of well-known biases such as overfitting, data-snooping, and look-ahead? Reference [2] investigated these issues.
 

Findings

-Out-of-sample testing plays a vital role in evaluating trading systems by assessing their ability to generalize beyond historical data and perform well under future market conditions.

-Although useful, out-of-sample testing is not immune to biases such as overfitting, data-snooping, and especially look-ahead bias, which can distort the validity of results.

-A common issue arises when models are developed or tuned using insights gained from prior research, creating an indirect dependency between development and test data.

-Researchers found that excessively high Sharpe ratios in popular multifactor models can be largely explained by a subtle form of look-ahead bias in factor selection.

-Many out-of-sample research designs still overlap with datasets used in earlier studies, leading to results that reflect known patterns rather than genuine model performance.

-The ongoing and iterative nature of financial research makes it difficult to construct fully unbiased validation frameworks that truly represent out-of-sample conditions.

-When alternative evaluation methods were applied, Sharpe ratio estimates dropped significantly, indicating the extent to which traditional approaches may inflate performance expectations.

-This reduction in Sharpe ratios is actually encouraging, as it better reflects the realistic outcomes investors can expect when implementing these models in real time.

-Despite these findings, the paper emphasizes that multifactor models still improve on CAPM, though the improvements are smaller than widely claimed.

In short, out-of-sample testing also suffers, albeit subtly, from biases such as overfitting, data-snooping, and look-ahead.

We agree with the authors. We also believe that out-of-sample tests, such as walk-forward analysis, also suffer from selection bias.

Then how do we minimize these biases?

Reference

[2] Easterwood, Sara, and Paye, Bradley S., High on High Sharpe Ratios: Optimistically Biased Factor Model Assessments (2023). SSRN 4360788
 

Closing Thoughts

The results indicated that most models achieved high in-sample accuracy, but only around 50% when applied to out-of-sample data. While out-of-sample testing is an essential tool for evaluating trading strategies, it is not entirely free from biases such as overfitting and look-ahead. Research shows that these biases can inflate performance metrics like Sharpe ratios, leading to overly optimistic expectations.


More By This Author:

Sentiment As Signal: Forecasting With Alternative Data And Generative AI
The Rise Of 0DTE Options: Cause For Concern Or Business As Usual?
How Machine Learning Enhances Market Volatility Forecasting Accuracy

How did you like this article? Let us know so we can better customize your reading experience.

Comments

Leave a comment to automatically be entered into our contest to win a free Echo Show.