Introduction

This post aims to chronicle my journey into and progress towards (hopefully) becoming a superforecaster.

Coined by Philip Tetlock and described by him and Dan Gardner in Superforecasting: The Art and Science of Prediction (2015), “superforecaster” refers to a person whose forecasts are consistently better on average compared to the general public, but often also purported experts. The book is fascinating read based on real-life experiments and details patterns and traits associated with superforecasters and their predictions.

One finding is that, while superforecasters did exhibit certain traits like open-mindedness and intelligence, forecasting is a skill that anyone can get better at by taking a systematic approach. Specialized knowledge is not necessary, and can sometimes even be a hindrance!

While their book has been published, they continue to provide an opportunity for people to train their forecasting skills and compete against each other on their platform, GoodJudgment Open (GJO). Furthermore, they select the top participants every year with at least 100 total forecasted questions to potentially become professional Superforecasters on their service platform.

I signed up in January 2021, and have dedicated about 1 hr/week to forecasting and updating active forecasts. It’s not too much of a time commitment, though I certainly don’t research things in as much depth as I should.

As of September 23, 2021: 295 forecasts (with 35 upvotes) across 76 questions. 36 of the 76 questions have resolved, giving me a Brier score of 0.289 versus a median score of 0.271.

Brier score

While perhaps not the criteria used to identify professional Superforecasters, the Brier score is the basis for all leaderboards on GJO.

The Brier score of a prediction is essentially the root mean squared error (RMSE) of the prediction across all possible outcomes. That is, the RMSE between the predicted probabilities (e.g. 70% it rains, 30% it doesn’t) and the actual outcome (e.g. 100% it did rain, 0% it didn’t).

Of course, it can get more complicated with more than two possible outcomes or decompositions of the Brier score into component parts, but it’s a fairly straightforward concept for those familiar with RMSE. For those not familiar, the RMSE is a number that represents the difference between a prediction and the actual outcome, with a disproportionately heavier “penalty” when the difference is larger.

Below we visualize the range of possible Brier scores for the rain/no rain question, depending on your prediction and the actual outcome that occurs. Note that smaller Brier scores are better, which can be counterintuitive. The best (i.e. lowest) scores, naturally, occur when either you predict it will rain with high probability and it does, or you predict it will rain with low probability and it doesn’t.

Brier skill score

We can derive metrics based on the Brier score to answer questions like “How much better did I do than some baseline prediction?”

The Brier skill score is defined as

Let’s suppose we have 3 people competing against each other to predict which of 3 different outcomes will take place. Person A has no clue what will happen, and assigns each outcome a flat 1/3 probability. Person B has done some research, and thinks that Outcome 2 is most likely but not a sure bet. Person C has done some research and thinks that Outcome 1 will very likely occur, and Outcome 3 is essentially impossible.

The results role in and… it’s Outcome 2! How did each person perform?

  • Brier score of Person A = 1.333

  • Brier score of Person B = 0.760

  • Brier score of Person C = 1.769

These numbers make sense, but it can be difficult to interpret them without any context. How low a Brier score needs to be to be considered a “good Brier score” depends on a lot of things, including how many possible outcomes there are to choose from. That’s where a derived metric like the Brier skill score (BSS) can come in:

  • BSS of Person B relative to Person A is 0.430 (B outperforms A)

  • BSS of Person C relative to Person A is -0.327 (C underperforms A)

My progress

In addition to Brier scores for individual questions, GJO also reports your cumulative average score (weighted by participation rate, which I will ignore here). You can see that I’ve struggled to get my average score below the average median score, though I managed to close the gap quite a bit towards the fall.

One big lesson was that keep up-to-date with news related to the forecasting question is paramount in this competition. You can see my score blow up around the end of May, and more than half of that is because of a single question which fell off my radar.

“Will restaurants in New York City (NYC) be permitted to offer indoor dining at 85% capacity or more before 1 July 2021?”

I forecasted 20% for this question and the crowd forecasted 33%, right up until an announcement on May 3rd that full capacity would begin on May 19th.

I didn’t catch that announcement. Most other people did, and ratched up their predictions up to an average of 96%.

I placed 160th out of 182 participants for that question, which was a very sad day.

Best predictions

The 3 questions I outperformed by the most on:

Resolution Date Question My Score Median Score BSS
2021-06-21 Which will happen next regarding the price of Dogecoin? 0.185 0.636 0.709
2021-09-01 What will be the combined U.S. domestic theater box office gross for June, July, and August 2021? 0.032 0.439 0.927
2021-08-24 When will the US Congress agree to a new budget resolution? 0.067 0.429 0.844

Worst predictions

The 3 questions I underperformed by the most on:

Resolution Date Question My Score Median Score BSS
2021-05-31 Will the EU amend its Own Resources Decision to help finance the EU’s proposed COVID-19 recovery package before 1 September 2021? 1.250 0.129 -8.690
2021-05-19 Will restaurants in New York City (NYC) be permitted to offer indoor dining at 85% capacity or more before 1 July 2021? 1.607 0.662 -1.427
2021-06-21 In NCAA v. Alston, will the Supreme Court rule that NCAA rules restricting education-related benefits for student-athletes violate federal antitrust law? 1.316 0.637 -1.066

The pattern seems to be that many of the questions I do poorly on tend to be those with a great deal of historical context and various dynamics at play, which I have no prior knowledge of and no real interest in researching. But I end up making forecasts anyway, because it’s fun and probably also because of the Dunning-Kruger effect.

Top upvoted explanations

The GJO platform encourages participants to learn from and engage with each other by posting comments. While doing your own research and forming your own understanding is valuable, in practice there you might not have the enough time/interest/background knowledge. I’ve benefitted a great deal from reading other people’s explanations, and I likewise try to post informative explanations (at least for my first prediction on a question).

What will be the total value of assets under management by global sustainable funds at the end of 2021, according to Morningstar?

September 9, 2021:

CHANCE ANSWER
5% Less than $2,250 billion
25% Between $2,250 and $2,500 billion, inclusive
55% More than $2,500 but less than $2,750 billion
10% Between $2,750 and $3,000 billion, inclusive
4% More than $3,000 billion but less than $3,250 billion
1% $3,250 billion or more

The tremendous flows over the past 1.5 years have been partly due to QE/overall stock market returns, and partly due to the Biden administration. I would expect the effect of the latter to decrease in 2021 Q3/4 compared to 2020 Q4 and 2021 Q1/2, while the effect of QE will remain steady (for how long? no idea).

So I’m expecting quarterly inflows of about 100-200 billion, putting us around 2450-2750 billion. Uncertainty around global financial markets, new manifestations of climate change in the media, etc., will affect how these funds do. I’m leaning towards more downside than upside surprise.

Will the US civilian labor force participation rate reach or exceed 63.0% for any month in 2021?

May 30, 2021:

CHANCE ANSWER
10% Yes

https://fred.stlouisfed.org/series/CIVPART While much higher labor force participation rates have been seen historically, it has struggled to stay above 63% since about 2014. Also, it does not appear to be the kind of statistic that moves quickly (outside of major upheavals). Zooming out on the chart, there is not a single instance of a 1.5% jump in participation rate within a ~7 month period of time. There was a single period of time that was very close, however: 58.1% in Dec 1954 to 59.7% in Aug 1955. Obviously COVID is a special case – this is also the most rapid decrease in participation rate ever seen. The loss of businesses and the additional unemployment benefits (though these may end before the end of the year) will be a damper though..

How many COVID-19 vaccines will be authorized for emergency use or approved by the US FDA as of 31 December 2021?

May 16, 2021:

CHANCE ANSWER
0% 2 or fewer
70% 3
20% 4
10% 5
0% 6 or more

April 9: https://www.ctvnews.ca/health/coronavirus/government-vaccine-advisers-say-they-don-t-foresee-astrazeneca-vaccine-being-used-in-the-u-s-1.5381038 ‘“We already have contracted for enough vaccines, from Moderna and from Pfizer and from [Johnson &Johnson],” he said. “There is no plan to immediately start utilizing the AstraZeneca [vaccine] even if it gets approved through the EUA, which it very well might.” Fauci said it wasn’t because of the AstraZeneca vaccine itself, but rather that it’s not necessary in the US right now. “It’s not any indictment against the product. We just have a lot of vaccines,” he said.’

Given the lack of need, even if there is good evidence for the safety and efficacy of AZ, Novavax, and potentially others, there’s just not a lot of pressure to rapidly approve additional vaccines. I think there would need to be some compelling reason, e.g. more effective against rising variants, etc.

Pfizer, Moderna, and J&J are already approved, so that’s 3. https://www.cdc.gov/coronavirus/2019-ncov/vaccines/different-vaccines.html

Reflections

After 9 months, I have mostly positive thoughts on this whole experience.

+ Done correctly, making predictions about world events which you know nothing about forces you to learn about what’s happening in the world, the historical context, various influential factors, and cause-and-effect dynamics. It’s highly unlikely that you’ll have any application in your life for this knowledge, but if you enjoy learning about how things work, can be rewarding in and of itself.

+ Being forced to make quantitative predictions on clearly defined, measurable outcomes is much more difficult than going “hm, yeah, something like this will probably happen eventually”. The former has does have its limitations and issues, but necessitates more rigorous thinking, and that is a useful skill.

The nature of certain questions (e.g. “how many COVID-19 cases will have been reported in X country by MM/DD/YYYY?”) and how your Brier score is calculated means that you’re giving up leaderboard points if you don’t regularly update your predictions in response to news updates and trend changes. I don’t feel that I should be scored on how much time I have and am willing to spend staying on top of news across dozens of active predictions. That said, updating your beliefs about something as new facts come out is definitely important to good forecasting.

I’ve also learned about some of my own biases and mental shortcuts that cause me to perform more poorly than I could. Obviously, it’s not possible to make perfect predictions. But we can all get better!

What about you?

Interested in improving your analytical skills and learning more about the world?

Have some time on your hand to do research and occasionally update your predictions?

Join the GoodJudgment Open!