Are You Smarter Than a Coin-Flipping Monkey?
30 years ago, a man named Philip Tetlock decided to figure out whether the people we pay to make predictions about politics were actually good at predicting things.
He picked two hundred and eighty-four people who made their living “commenting or offering advice on political and economic trends,” and he started asking them to assess the probability that various things would or would not come to pass, both in the areas of the world in which they specialized and in areas about which they were not expert. Would there be a nonviolent end to apartheid in South Africa? Would Gorbachev be ousted in a coup? Would the United States go to war in the Persian Gulf?
–Louis Menand, Everybody’s An Expert
Tetlock’s discovery: On average, the commentators were slightly less accurate than a monkey flipping a coin with “yes” printed on one face and “no” on the other. They’d have been better off if they’d made completely random predictions!
What’s more, being an expert on a topic didn’t help much. At some point, more expertise even led to more faulty predictions.
Can We Do Any Better?
There are lots of reasons we make bad guesses about the future. But Philip Tetlock’s particular interest was in figuring out how to do better.
Prediction, after all, is one of the most important things a person can ever do: Will I divorce this person if I marry them? Will I be happy in a year if I accept this job offer? It’s also an important skill for governments: How much will the Iraq War cost? Will this gun-control bill really lower the crime rate?
But if political experts aren’t good at prediction, who is?
How about gamblers? And stock-market traders? After all, to be a successful gambler, you need to be right more often than wrong — or at least bet against people who are especially wrong. The same is also true of traders: To beat the market in the long run, you need to be better at guessing what will happen to stock prices than the average person who enters the market.
(Traders can also lie to other traders, defraud investors, or participate in insider trading — but plenty of them do make money by making a lot of good predictions. Similarly, a gambler could bribe referees to fix soccer matches, but if we look at a lot of good gamblers, we’ll find some who are simply better than most at thinking through the variables that lead one team to beat another.)
Prediction Markets
A prediction market is like a factory for churning out good guesses.
–Aaron Gertler, “Quoting Yourself for Fun and Profit”
Perhaps inspired by thoughts of traders, Philip Tetlock decided to run experiments using something called a “prediction market”. In such a market, people buy and sell “shares” of predictions, which then “pay out” if the predicted event actually happens.
For example, I could create a share called “Hillary Clinton wins the 2016 presidential election”, which pays the holder $1 after the election finishes if she wins and $0 otherwise. If you think she has a 60% chance of winning, you’ll gladly buy the share for 50 cents, but sell it if someone offers you 70 cents. Eventually, if enough people are active in the market, the price of the share will settle at the average belief of the market: 50 cents, say, if the average belief is that her odds of winning are 50%.
In theory, a prediction market should make better predictions than most of the individuals within the market. This might seem weird: Why would we expect people to be smarter in groups than alone if they aren’t even cooperating to make predictions?
The secret sauce of a prediction market is the incredible power of the bet. Or, as some people call it, the Tax on Bullshit.
When you bet money on a certain prediction, you use your past winnings to influence the market’s belief: The more you bet, the more the average bet will shift in favor of your prediction. But if you are wrong, you lose money, and thus lose some of your previous influence. The more often you make bad predictions, the fewer bets you’ll be able to make, and the less the market will care about your (on average) flawed perspective.
This isn’t what happens in politics. When Tom Friedman or Nancy Pelosi makes a lousy prediction, they lose nothing except respect — and they only lose respect if someone is paying enough attention to notice the mistake. If wrong predictions aren’t “punished”, the people who make them will keep adding misinformation to the ongoing dialogue. Markets fix that problem by transferring money from bad predictors to good ones — though any other numerical point system works just as well as money, if we care about the numbers.
The Good Judgment Project
Last summer, I signed up for the fourth and final “season” of Philip Tetlock’s masterpiece: The prediction market experiment known as the Good Judgment Project.
For ten months, I made bets in a market alongside a few hundred other people, many but not all of whom had advanced degrees or worked in politics or the media. As in other seasons, the very best predictors were to be designated “superforecasters” and grouped together into a special market later on, in hopes of getting even better results. (The CIA is one sponsor of the Project, which sounds a bit sinister, but may be a good thing if we want fewer Iraq Wars in the future.)
By the way: When I say that a person or market is “better” at predicting, I mean “better-calibrated”. If you are “well-calibrated”, the more sure you are about something, the more likely it is to happen. A prediction market is very unlikely to arrive at a firm 100% conclusion: Instead, the average guess might give Hillary Clinton a 60% chance of becoming President. Being well-calibrated means that things the market thinks are 60% likely to happen will actually happen 60% of the time.
Becoming better-calibrated was one of my goals in joining the Project. I also wanted to see if studying cognitive science — and learning all about my own biases — would help me beat the market. Here’s what happened:
Stage One: Application
I don’t have any idea how selective the Good Judgment Project is. I’d put my email in a few months before, after speaking to a former participant I met at a rationality workshop. I was notified when sign-ups went live, and then I spent about 90 minutes filling out an application, including:
- Demographic questions
- Questions about my educational background and political beliefs
- Questions about my experience with prediction markets
- A short “cognitive reflection test” intended to measure clear thinking
- A personality test measuring traits like “openness” and “avoidance of ambiguity”
- A quiz about politics, including calibration questions: “How sure are you that so-and-so is the opposition leader in Great Britain?”
Some time later, I was notified of my acceptance. I was given the chance to read a series of short essays about making smart predictions, and to practice with the betting system. Finally, Tetlock’s team put $50,000 in fake money into my trading account and opened up the first questions.
Stage Two: Testing My Theories
I was taking part in the project during my senior year of college, so I didn’t want to spend lots of time reading about obscure political issues. Instead, I decided to read very little, unless I knew absolutely nothing about the topic (for example, the Japanese stock market.)
Instead of studying the news, I chose a couple of heuristics that would help me make bets. They performed well: I’ll use all three again the next time I join a market.
These were the assumptions I made whenever I thought about placing a bet:
The world is boring: Many of the questions asked things like “will the government be overthrown in Country X?”
I’ve noticed that newspapers and pundits tend to assume that the most newsworthy thing will happen in any given situation. This makes sense, because exciting predictions are also exciting to listen to, and will draw viewers and readers. I assumed that the market would typically make the same mistake, giving unfair weight to the exciting answer of each question. So I resolved to give a bit more weight to the boring answers: “No, the government of Country X will be fine.”
(Postmortem: Judging from the comment threads under each prediction, people were indeed taking news stories very seriously. When you think about the sort of person who enters a political prediction tournament, this is not surprising.)
Things move slowly: Some of the questions set time limits for when a thing might happen: “Will the U.S. and Iran make such-and-such a deal by March 1st?”
When I saw those, I thought of Hofstadter’s Law: “It always takes longer than you think.” Negotiation is hard, and projects often get delayed. So I gave a bit more weight to answers like: “No deal will have been made by March 1st.”
Numbers tell the story: There were several questions about the future prices of stock indices or commodities. I found this very funny, since Philip Tetlock could have made quite a lot of money off of the market had we been correct.
Anyway, I decided to more or less ignore the headlines, since mass-media financial predictions are generally useless (for reasons I don’t have the space to explain). Instead, I’d look at the past few years of the market or commodity, try to spot patterns that matched the current fluctuations, and make predictions based on that. I also applied the “boring world” heuristic: “No, this market is not about to explode or implode based on whatever random headline you saw this morning.”
In the early days of the market, I made a few sizable bets when I felt unsure — and I was quickly punished, as the market swung wildly back and forth.
Luckily, I went about 50/50 on the big bets, and made a few “easy” small bets that went my way. For example, when the market assigned 10% odds to China and Japan exchanging gunfire in the Pacific within a few months (seriously?), I bet on the same boring peace we’ve had for the past 70 years.
Sadly, I failed to check my shares for a week or so during an especially turbulent time for the TOPIX, and lost nearly $10,000 of my $50,000 in fake money on that single bet.
Most of these losses came because I was scrambling to somehow win my money back; in the process, I kept trailing the market and getting crushed by each new fluctuation. I’d read in books before that traders often make the mistake of selling off stocks that are falling, when they ought to be holding on for the recovery. Despite this reading, I failed to avoid the actual mistake. Hopefully, losing all that fake money taught me a lesson I won’t have to learn with real money.
The chart above shows the questions I bet on most often. In general, I made money by repeating bets, because I’d “correct the market” by reversing the predictions of people who seemed to be pushing the average too far. In the case of the TOPIX, however, I did a lot of betting and lost lots of money; I suppose that other people were correcting me.
Stage Three: Slow and Steady
After the first few months, many participants dropped out of the project: Our only payment was a few Amazon gift cards, which wasn’t much of an incentive. (You’d think the CIA could afford better.)
Still, there were enough people making bets that prices did fluctuate, and fake money could be earned. I continued to profit off of questions like “Japan and China going to war” whenever the probabilities crept back up, and I checked back each week to search through new questions for anything that matched one of my heuristics:
- “No, this prime minister is not going to resign from office.”
- “No, this army is not going to invade that country.”
- “No, France is not going to deliver this controversial assault boat to Russia in the middle of the Ukraine crisis.”
The chart above shows my gains and losses over time. You can see some huge swings at the beginning, followed by lots of small wins as I learned more about the market.
Eventually, the last new questions were released. I made my final bets and left the project for the last few weeks. When I came back, I discovered that I’d actually made quite a bit of money despite a total failure to update on new evidence; people with more information than me were betting in the wrong direction.
The lesson I took away (which may not be correct in the long run): Most news about tricky political situations (at least the stuff the average person reads) isn’t very helpful. Given that this post started with a study showing that forecasting experts aren’t good at making predictions, we seem to have come full circle. Time for the conclusion!
Final Results
Over the course of ten months, I spent about 60 minutes per week on the Good Judgment Project — mostly looking at other bets, sometimes reading newspapers or checking markets. I earned about $21,000 on the $50,000 I started with; if the prediction market were a hedge fund, this would have been a good year!
I wasn’t quite a superpredictor: They only took the top few people from this market. And I was more active than most participants, which probably helped me make money when other people left bad bets unattended. In a market with actual money at stake, I wouldn’t be able to rely on the laziness of others, so I’m not making too many assumptions about my l33t prediction skillz.
It was very interesting to watch my predictions come true (or go terribly wrong). I think I got a bit better at knowing when to trust my gut feeling that a particular forecast was or wasn’t a good idea — this was probably because I’d learned to associate certain patterns of price changes with “actual important news has happened” and others with “this price is moving around for no good reason, so I should bet it back to where it just was”.
I also had a few awkward moments when I’d check email updates at dinner:
Oh no!
“What’s wrong?”
The Russian military just killed a man in the Ukraine!
“Oh no! But… why are you following that story?”
I was betting money that this wouldn’t happen! Dammit, Russia!
“…what?”
Alas, this was the last season of the GJP. I suppose the CIA now has access to enough superpredictors that we’ll never make a stupid decision to invade another country again.
Still, you can sign up for something very similar at this link — Tetlock’s team is organizing another tournament. If you decide to enter, you’ll be betting against me. Good luck!
Resources
When I’m not betting against the Russian military, you can find me on PredictionBook — like the Good Judgment Project, but for life, Place predictions on anything and see how well-calibrated you are!
Gwern, an internet essayist with an amazing blog, took part in all four seasons of the Good Judgment Project. He writes about his experience (and about prediction markets in general) right here.
In case you missed the link above, NPR published a great short article on the Project. In other media, the New Yorker wrote a profile of Tetlock in 2005, and Edge interviewed him in 2012 (right after the Nate Silver prediction extravaganza).
Want to play in a prediction market with real money? PredictIt lets you do that right now! And Augur, a fancier version, will be starting up soon.
Want to learn more about prediction markets? The Wikipedia article is good and easy to find, but I also like Scott Alexander’s essay on using prediction markets to run the government (I think this is actually my favorite political system). Also, Google ran an internal prediction market to gather bets on whether certain products would become successful: Read all about it!
Hamilton College researchers ran a Tetlock-like study analyzing the predictions of modern-day “experts” (Paul Krugman, David Brooks, etc.). The results are quite biased, because many of the predictions involved a single event (the 2008 Presidential election), but the paper is still interesting.
If you’d like to learn more about the current state of Tetlock’s work, and the common traits shared by superpredictors, this article is a good starting point. You could also buy Tetlock’s new book.
Did you enjoy this post? You might want to subscribe to see more posts in the future! I think there’s a 60% chance you won’t regret it.
Fascinating article! Several points:
“In theory, a prediction market should make better predictions than most of the individuals within the market. This might seem weird: Why would we expect people to be smarter in groups than alone if they aren’t even cooperating to make predictions?
The secret sauce of a prediction market is the incredible power of the bet. Or, as some people call it, the Tax on Bullshit.”
However, you don’t need a prediction market or a betting scheme for a group to be collectively more rational than individuals. Basically any collective decision system that minimizes groupthink will likely have very good averages, much better than most individuals. The phenomenon is known as the wisdom of the crowd. (https://en.wikipedia.org/wiki/Wisdom_of_the_crowd). For a relatively extreme interpretation of said phenemenon, see “Philosophical Majoritianism” http://www.overcomingbias.com/2007/03/on_majoritarian.html
I guess how much *better* a prediction market will do than expert consensus will be more instructive to its value, rather than its value relative to a single expert (kinda how we’re taught to trust meta-analyses over single experiments and wikipedia over individual news articles).
“There were several questions about the future prices of stock indices or commodities. I found this very funny, since Philip Tetlock could have made quite a lot of money off of the market had we been correct.”
Your intuition is correct, but I don’t think you fully understand the implications. In particular, the real market is SO much more efficient than the prediction market that basically everybody in said prediction market is batting way out of their league. It’s definitely conceivable that some people may be better at political or social predictions than others (where classically being right =/=being rich) on the relatively inefficient predictions market, but absolutely bizarre to imagine that they could predict commodity prices better than a monkey with a dartboard. If the EMH is exploitable without secret information, somebody would have already exploited it…
Now if you noticed that the prediction market *doesn’t* track reality, that might give you a slim opportunity (e.g. if the predictions market’s suggest prices do not track futures prices)>
“And I was more active than most participants, which probably helped me make money when other people left bad bets unattended.”
This is actually really interesting, because a wealth of economic literature will suggest that actively managed mutual funds<<passively managed ones in most situations, and the more you trade the more money you lose. However, I think this is mostly due to transaction costs. I'm guessing that this pseudo-prediction market had very low or non-existent "fees"?
That’s true! But to the extent that prediction markets beat “averaging everyone’s guesses”, they do so by finding people who are the best guessers and slowly increasing the amount that the best guessers influence the entire group. I’d think this could be especially important for an issue where a few people with special knowledge are betting against a large group that is wrong for some systematic reason — but it wouldn’t be as important for something like guessing the number of jelly beans in a jar.
This is certainly true. That’s why I said “had we been correct”… I very much doubt that we actually were. I wonder if it’s convenient for the Project team to be able to compare financial to non-financial predictions as a way of judging how far other areas of prediction are from the near-perfect information of the market?
No transaction fees whatsoever. I’d have done quite a bit worse otherwise, I think.
I should mention that the bets where I traded most often were generally “decay function”-type questions, like “will China do X by March 1st?” As time wears on and no new information arrives, the probability of the thing happening by the chosen date goes down — but the market was small enough that people would often miss this, so betting against the event as the crucial date drew near was an easy way to make money.
Interesting! Thanks for writing up your experience. I was in a different experimental condition (no fake money or bets, just scores, working alone, given access to anonymous “tips” that were supposed to rate) and apparently the report format at the end was different in some respects too. Like you, I relied on the world generally being boring and moving slowly – particularly when any large organization or coordination between large groups is involved. That served me well overall.
I find that I am quite capable of over-focusing on questions that have large amounts of numerical data though – I’m happy with “numbers tell the story” in the case of infrequently updated summary data (eg. Chinese monthly power consumption) but have to be careful not to mistake noise for signal in other cases.