Evaluating Accuracy Effect By Percentile

By Ben Golden

In my last post analyzing my own forecasting history on Inkling Markets, I showed that I was consistently identifying long-shot bets that were more likely to pay off than their existing probability would suggest.  In this post, I'll look at how my forecasts improve the accuracy of these markets, calculating how many the change in component Brier score within different percentiles.

Whereas last time I looked at forecasts that adjusted probabilities across a percentile, in this analysis I look at the value of forecasting within each percentile range--0% to 1%, 1% to 2%, etc.  When an individual forecast included multiple percentile ranges (for instance, a forecast that moves the market probability from 35.2% to 37.4% includes the ranges 35%-36%, 36%-37%, and 37%-38%), I split the forecast into multiple components and evaluated the accuracy improvement of component forecasts individually.  This allows me to see which percentile ranges I'm most effective forecasting within.

The following chart shows the average Brier score improvement per 100 within each percentile when I bought, or forecasted upwards:

Note that negative Brier scores imply better accuracy, and I've reversed the axis on the left of the chart, so where the blue line is higher, I'm forecasting more accurately.  My per-forecast accuracy effect is generally high in the 5%-40% range, where I made a lot of forecasts, and then peaks again around 70%, where I made considerably fewer forecasts.  My forecasting is least effective at around 50%, and is actually reducing accuracy in the 82%-83% range.

Here's the same chart, but for cases where I sold, or forecasted downwards:

My downward forecasts on average are doing more to improve market accuracy, in particular when I'm forecasting downwards in the 40%-95% range, which is where I'm mostly likely to forecast downwards.

These analyses have largely confirmed my impression that I'm mostly playing long shots, and am consistently improving accuracy when doing so.  But it also reveals my other forecasts to be pretty effective--there are only three percentile ranges where my forecasts haven't improved market accuracy.  

By applying these same analyses to other forecasters, we can uncover their strengths and weaknesses, help them improve as forecasters, and ultimately improve the accuracy of our markets.

If you liked this post, follow us on Twitter at @cultivatelabs, and sign up for our monthly newsletter.

Ben Golden/@BenGoldn is an Engineer and Data Scientist at Cultivate Labs.

prediction markets crowdsourced forecasting