If you’re not already familiar with Brier scoring, you should read the first two parts of our forecast scoring series:
In our previous articles, we discussed a basic forecasting question: Will
the Cubs win the World Series in 2017? Once the season is over, there’s an
unequivocal answer to the question (either they won or they did not), and we
score each allotment of probability in a forecast as correct or incorrect. But
how should we score questions when some answers are "closer" to being
correct than others?
One example would be the question "How many games will the Cubs win in
the 2017 regular season?" with the answer options:
Less than 50 
5075 
76100 
More than 100 
Say I make the following forecast:
Answer  Forecast 

Less than 50  15% 
5075  45% 
76100  25% 
More than 100  15% 
Now say they win 80 games, making the "76100" bucket correct. The 45% I allocated to the 5075 bucket was not technically correct, but it was closer to being correct than the "Less than 50" bucket. Ideally, our scoring system should penalize forecasters less for allocating probabilities closer to the correct outcome. This is exactly what the ordinal scoring system does.
Similar to a normal Brier score, we calculate a daily score for each day
that the forecast was active and then average the daily scores to calculate an
overall score for the question. The difference from a normal Brier score is in
the method used for calculating the daily score.
Using a standard Brier score for this question (ie. not an ordinal score), the daily score for my forecast would be:
Answer  Forecast  Score 

Less than 50  0.15  (0.15  0)² = 0.0225 
5075  0.45  (0.45  0)² = 0.2025 
76100  0.25  (0.25  1)² = 0.5625 
More than 100  0.15  (0.15  0)² = 0.0225 
Daily score (sum of answer scores)  0.81 
In ordinal questions, we use a different method for calculating daily error. To start, we create successive groupings of the answer options as follows:
Grouping 1  Grouping 2 







For each of these groupings, sum the probabilities in that group and calculate the squared error using that sum:
(forecast_probability_sum  final_outcome)²
The final_outcome
should be 0 or 1, depending on which bucket
the correct answer (76100) falls into.
Grouping 1  Grouping 1 Score  Grouping 2  Grouping 2 Score  Total Score  


(0.15  0)² = 0.0225


(0.45 + 0.25 + 0.15  1)² = 0.0225

0.0225 + 0.0225 = 0.045  

(0.15 + 0.45  0)² = 0.36


(0.25 + 0.15  1)² = 0.36

0.36 + 0.36 = 0.72  

(0.15 + 0.45 + 0.25  1)² = 0.0225 

(0.15  0)² = 0.0225  0.0225 + 0.0225 = 0.045  
Daily Score (average of the 3 total score values):  0.27 
As you can see, the ordinal score penalizes my forecast much less (remember,
lower score = less error = better score) than a standard Brier score. This
better reflects the fact that I allocated most of the probability to the
correct bucket or a bucket that was "close" but not quite correct.