trouble abounds (Credit: Sad Dog by Ink Hog, https://www.flickr.com/photos/34191996@N07/3859862007)

Does A Worse than Average S&P 500 In August Mean September Will Also Suffer?

The simple answer is yes! The data show that a worse than average August for the S&P 500 (SPY) increases the likelihood of a worse than average September. I use average to mean the average maximum drawdown as calculated and analyzed in “How to Trade During the Stock Market’s Most Dangerous Months“. In this case, “worse” means a negative number less than the average and “better” means a negative number higher than the average. I was disappointed by this result because I was hoping that an extreme month relieves the selling pressure in the stock market for the subsequent month. Instead, it seems August’s extra selling pressure tells us to brace for greater than average selling pressure in September!

This analysis is of particular interest because August, 2024 delivered an extreme maximum drawdown of -6.1%, the 13th largest drawdown for August for the S&P 500 since 1950 (74 years of data). If you are a bull, you want to believe that the stock market let off steam, so to speak, and thus is less prone to another large drawdown in September. If you are a bear, you want to believe that August’s selling was a warning of more to come. Both the numbers and the statistics appear to be in favor of the bears.

Let’s take a look at the detailed analysis.



The Analysis of Worse Than Average

The basis of this work comes from my analysis of the monthly maximum drawdowns for the S&P 500 (SPY) referenced above. August, September, and October have the worst maximum drawdowns (note I did not examine the statistical significance of the differences across months). I went back to the core dataset and segmented the August and September data as follows:

MonthMaximum Drawdown Relative to the Average for the MonthNumber of Occurrences 1950 to 2023
AugustBelow30
AugustAbove44
SeptemberBelow27
SeptemberAbove47

I further segmented the September data conditioned on the August maximum drawdown.

Number of times September’s maximum drawdown is below the month’s averageNumber of times September’s maximum drawdown is above the month’s average
When the August maximum drawdown is below the month’s average drawdown (30 times)1614

September’s maximum drawdown is above average 64% of the time (= 47/ (27 + 47)). Yet, when only looking at years when August has a below average maximum drawdown, September’s maximum drawdown is above average only 47% of the time. Thus, September is more likely to deliver a worse than average performance.

The Statistical Significance

The analysis does not end there. Only a statistically significant difference is truly actionable. Statistical significance means that these differences are not likely the result of random chance. In other words, the differences could be predictive and not just descriptive.

At this juncture, let’s turn to the data analyst function in ChatGPT (4.0) for help. I gave ChatGPT the following prompt (emphasis mine):

“I need a test of statistical significance for the following scenarios measuring maximum monthly drawdowns in the S&P 500 since 1950. August has had a below average maximum drawdown (negative percentage) 30 times. August has had an above average maximum drawdown 44 times. September has had a below average maximum drawdown (negative percentage) 27 times. September has had an above average maximum drawdown 47 times. Conditioned on an August with a below average maximum drawdown, September has had a below average maximum drawdown 16 times and an above average maximum drawdown 14 times. Thus, a below average maximum drawdown August, decreases the likelihood of an above average maximum drawdown in September. What is the statistical significance of this finding? Show all your work in detail.

ChatGPT executed the following steps:

  • Build a contingency table.
  • Create the hypothesis.
  • Calculate the expected frequencies (used as a baseline for the statistical test).
  • Calculate the Chi-Square statistic – the test of choice for statistical significance for this particular 2 x2 dataset.
  • Calculate the degrees of freedom and critical value for the Chi-Square test.
  • Determine whether the null hypothesis can be rejected and thus establish statistical significance.

If you have access to ChatGPT’s data analyst, you can see the numerical details by using the prompt from above. I asked ChatGPT to show its work so I could verify the accuracy of its results and the soundness of its analytical choices. Here are the core results:

The contingency table fills out the remaining scenarios:

September Below AvgSeptember Above AvgTotal
August Below Avg161430
August Above Avg27 – 16 = 1147 – 14 = 3344
Total274774

The hypotheses are as follows:

  • Null Hypothesis (H0H_0H0​): There is no association between a below-average August drawdown and an above-average September drawdown. The probability of a September drawdown (whether below or above average) is independent of the August drawdown.
  • Alternative Hypothesis (H1H_1H1​): A below-average August drawdown is associated with a different likelihood of a September drawdown being above or below average.

A rejection of the null hypothesis, allows acceptance of the alternative hypothesis.

The Chi-Square test uses the expected frequencies under the null hypothesis that look as follows:

ScenarioExpected frequency (normalized by total observations = 74)
August Below Avg & September Below Avg10.95
August Below Avg & September Above Avg19.05
August Above Avg & September Below Avg16.05
August Above Avg & September Above Avg27.95

The resulting Chi-Square statistic equals 6.17, higher than the 3.84 threshold for statistical significance based on the number of degrees of freedom (1) and a .05 threshold for the tolerable odds that these results could be the result of random chance. (Note the test for statistical significance just barely fails with a .01 threshold). Here is ChatGPT’s interpretation and conclusion:

“This suggests that the observed relationship between a below-average August drawdown and a below-average September drawdown is statistically significant. The data provides evidence that a below-average August drawdown decreases the likelihood of an above-average September drawdown.”

Implications

So, apparently, the odds that September’s maximum drawdown is worse than the average -2.9% is 53%. These odds are little different from a coin flip but it is notably worse than the odds in any given year. Moreover, these results cannot predict the specific magnitude of the drawdown. I have not proposed a model for such crystal ball powers. However, these results tell me that buying protection is more likely to pay off and that I should avoid overly aggressive buys BEFORE a notable drawdown event. (I would prefer to buy during oversold conditions per the market breadth trading rules).

Currently, protection is relatively cheap again with the volatility index (VIX) at 15.0 less than one month after soaring to its third highest close ever. The VIX even looks ready to go lower before jumping higher again.

The volatility index (VIX) had a volatile month in August.
The volatility index (VIX) had a volatile month in August.

August delivered a lot of trading drama which also equals opportunity. If September manages to deliver more sharp reversals in investor sentiment, the shifts could pivot around the Federal Reserve’s next decision on monetary policy on September 18th. The market expects at least a 25 basis points (.25%) rate cut. Of course, an unexpectedly strong jobs report for August could roil the market into a state of confusion over the odds for a rate cut. At least inflation looks unlikely to surprise for now…right?

Be careful out there!

Full disclosure: long SPY put spread and puts

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.