Research Proves Chat GPT4 Outperforms Financial Analysts

Yes, you read the title correctly. Why am I not surprised that I am writing this article? Well, I should not be surprised, yet I partly still am.

 

depositphotos

 

Now that I have overcome my latent disbelief in AI capabilities, let’s examine the topic in detail. In May 2024, three researchers from the University of Chicago Booth School of Business, Alex G. Kim, Maximilian Muhn, and Valeri V. Nikolaev, published a paper demonstrating Chat GPT’s abilities in predicting companies’ earnings.

The researchers also created a test portfolio to examine whether GPT’s earnings forecasts more accurately predicted stock price movements. They found that the portfolio outperformed the market and generated significant alphas and Sharpe ratios.

Let’s take a closer look.

What Did the Research Measure? 

I read the research paper behind the test, which anyone can read here. I liked it immediately. The study had a rigorous methodology and a great sample size, as expected from graduate-level research from a major institution.

The researchers gave Chat GPT4 financial statements to analyze. The research team fed balance sheets and income statements in a standardized form to the large language model (LLM), GPT 4.0 Turbo, and asked the model to analyze them.

Based on the analysis of the two financial statements, the model had to decide whether a firm’s earnings would grow or decline in the following period.

According to the team, “We studied whether an LLM can successfully perform financial statement analysis in a way similar to that of professional human analysts. The answer to this question has far-reaching implications for the future of financial analysis and whether financial analysts will continue to be the backbone of informed decision-making in financial markets… We focus on earnings because they are the primary variable forecasted by financial analysts and are fundamental to valuation…”

Sample Size, Methodology, and Timespan 

The full sample spans from 1968 to 2021, which is critical because this covers many different market environments: bull and bear markets, financial crises, geopolitical events, high-inflationary periods, periods when value companies did better, and times when growth companies were in charge, the emergence of the tech sector, etc.

The Chat GPT4 test covered over 15,000 companies and over 150,000 data points, which implies that the research probably included many different sizes of companies and sectors. A human analyst sample spanning from 1983 to 2021, using 3,000 companies and 40,000 data points, was used for comparison.

Critically, the researchers anonymized the data to prevent the potential “memory of the company” by the language model. They omitted company names from the financial statements and replaced years with labels. This approach ensured the model did not know which company or in which year it was analyzing. (The researchers even asked Chat GPT4 to guess the companies and year to check.)

The Chain-of-Thought (CoT) Prompt 

The researchers developed a Chain-of-Thought (CoT) prompt that effectively “teaches” the model to mimic a financial analyst. Financial analysts identify notable trends in financial statement line items, compute key financial ratios (e.g., operating efficiency, liquidity, and (or) leverage ratio), synthesize this information, and form expectations about future earnings. The CoT prompt implements this thought process via instructions, ultimately deciding whether next year’s earnings will increase or decrease compared to the current year.

How Did Chat GPT4 Perform? 

Here are some of the key conclusions that came out of the research paper:

  1. When using the chain of thought prompt to emulate human reasoning, GPT achieves an accuracy of 60% (up to 57% without CoT), which is markedly higher than the analysts’ accuracy.
  2. When humans struggle to create future forecasts (e.g., there was less consensus in analysts' forecasts), Chat GPT’s insights are more valuable. Similarly, when human forecasts are prone to biases or inefficiency (i.e., not incorporating information rationally), GPT’s forecasts are more useful in predicting the direction of future earnings.
  3. The earlier version, GPT3.5, showed considerably less impressive performance, demonstrating that the version of the Large Language Module matters.
  4. Google's recently released Gemini Pro achieved a similar level of accuracy to GPT 4.
  5. Finally, we explore the economic usefulness of GPT’s forecasts by assessing their value in predicting stock price shifts. The long-short strategy based on GPT forecasts outperforms the market and generates significant alphas and Sharpe ratios.
  6. Chat GPT4 outperformed other more specialized Machine Learning models such as ANN.
  7. The researchers created a test portfolio to examine whether GPT’s earnings forecasts more accurately predicted stock price movements. The long-short strategy held positions for a year after financial earnings data would have been in the market. The team found that the portfolio outperformed the market and generated significant alphas and Sharpe ratios.

What Does This Mean for Trading and Forex Traders? 

Summarizing a study is one thing, and deciding what it means is quite another. Here are some of my key thoughts.

  1. The study proves that AI-driven equity fundamental analysis works, and more investors will do it that way. The equity market has the advantage of having one of the clearest drivers for asset price: “earnings” or “earnings per share.” This is easy to replicate with data, and not only is its performance on par with human analysis, but it is also highly efficient.
  2. The study did not tell me definitively if it beat an index, such as the S&P 500.

It’s a 55-page study, and I may have missed some details, but I could not see a mention of the model portfolio outperforming an index. Most equity trading is index-driven, so the real question is not whether it beats other humans but whether it beats index performance.

  1. AI could complement human decision-making, not replace it.

The study touched upon areas where Chat GPT4 outperformed human analysis, suggesting that AI analysis could be a complementary input rather than a direct replacement.

  1. Not all AI-decision making is the same. This is a fundamental analysis study, not a technical analysis one. People use the term “AI” a lot, and brokers now claim to have ready-made  AI trading strategies. Often, these are just indicator combinations in which AI determines the indicator settings. This study should not prompt anyone to adopt any AI-driven investment strategy without a closer examination.
     
  2. Forex does not have a single clear fundamental driver, such as earnings, compared to equities. When it comes to fundamental analysis in forex, some important fundamental drivers exist, such as interest rates, GDP, and employment numbers. It will be interesting to see whether AI can predict macroeconomic data and how easily those predictions can inform trading decisions.

AI Apps in Forex 

There has been a steady flood of newly released AI trading systems, some Chat GPT-based. Here’s a brief overview:

  1. Some traders are using Chat GPT as well as leading AI trading platforms and apps to help build and test trading strategies. This is perfectly valid. It’s your responsibility to test the strategy on a demo or small account that does not expose you to large risks. Experimentation is key here.
  2. AI-driven rulesets have the potential to be more dynamic than static indicator settings and can help traders speed up their decision-making.
  3. At the time of writing, AI apps, especially in Forex, use standard indicators, and the AI function tweaks the indicator settings and recommends optimal timeframes.

AI for retail trading is in its infancy. I think it is an exciting space to watch, but I have not seen anything that I would rely on for my trading yet. The technology is moving so incredibly fast, though, and it may be a case of an emerging tool to help cut through the noise, but the final decision-making will still be human.

Bottom Line 

The research undertaken by the University of Chicago team is a breakthrough in applying Artificial Intelligence to market analysis and trading. It is based on a specific market and scenarios, with a team required to feed Chat GPT4 the correctly formatted data. However, the concept was proved through rigorous methodology. It bodes for exciting times ahead. There is a big step from AI suggestions to real-world trading that must contain stop-losses take-profits and position sizing—all of which ultimately affect profitability. However, no doubt, the field will continue to evolve.


More By This Author:

US Inflation Dips To 2.9%, Lowest Level In Over Three Years
BTC/USD Forex Signal: Bitcoin Forms A Death Cross Pattern
GBP/USD Technical Analysis

Disclosure: DailyForex will not be held liable for any loss or damage resulting from reliance on the information contained within this website including market news, analysis, trading signals ...

more
How did you like this article? Let us know so we can better customize your reading experience.

Comments