HEPS challenges the wisdom of the crowds

Contributed by Massimiliano Zappa and Kaethi Liechti

The wisdom of the crowd is “the process of taking into account the collective opinion of a group of individuals rather than a single expert to answer a question.

A large group’s aggregated answers to questions involving quantity estimation, general world knowledge, and spatial reasoning has generally been found to be as good as, and often better than, the answer given by any of the individuals within the group.” (Source: wikipedia).

Concept

The concept of the wisdom of the crowds was first established by Francis Galton, who published his findings about an ox and the crowd in Nature (Galton F. 1907. Vox populi. Nature 75: 450-451):

Once upon a time, in fall 1906 to be precise, 85-year old Francis Galton went to see the livestock show in Plymouth (southern England). For entertainment there was a weight-judging competition where people had to estimate the weight of an ox and write down their estimate on a card. Of course there was a price for the one that got closest to the actual weight of the ox. Francis, however, was more interested in statistics than in oxes. So he went home to analyse the 800 collected cards. To his surprise, the crowd was pretty good in its estimate. The median of all the estimates was 1207 pounds, Whereas the true weight of the ox was 1198 pounds. So the error of the crowd was below 0.8 %.Wisdom-crowds-3

His paper was recently revisited by K. Wallis in a study where the author states that Galton’s forecasting competition was a precursors of two developments in statistical forecasting: ‘forecast combination’ and ‘two-piece distribution for symmetry description’.

Another nice example of the wisdom of the crowds is the audience assistance in the “who wants to be a millionaire” show.

The questions are from all domains and ranges of difficulties, and the poll is drawn from a random group of people who happen to spend a weekday afternoon in a TV studio. Even if only a few persons know the right answer, this will stick out, because the answers of the persons who have no idea will distribute about normally over all the answers. In about 91 % the audience gets the right answer. Although not always:

wisdom-crowds

In hydrological ensemble prediction

The same concept can be translated into the world of hydrological ensemble prediction: an ensemble forecast can be seen as a crowd of members. Does a crowd of ensemble members have more wisdom than a single valued forecast?wisdom-crowds_2The Peak Box Game

The Peak-Box Game was played at the session “Ensemble hydro-meteorological forecasting” during the EGU Assembly in Vienna, on May 1st 2014 (see abstract here) and at the 10-year HEPEX workshop in Maryland last June.

It was conceived to offer an opportunity to play with the “Peak Box” approach for supporting interpretation and verification of operational ensemble peak-flow forecasts, proposed by Zappa and colleagues, and to discuss the use of ensemble predictions in operational hydrology.

The Peak-Box defines the “best estimate” of a flood event’s timing and magnitude by framing the discharge peaks of all members of an ensemble forecast and taking their median in timing and magnitude.

Peak-Box-Game4The game is simple: when looking at the evolution in time of an ensemble prediction of streamflows with N members, one has to ‘guess’ how big the observed peak discharge will be and at what time it will occur. With the help of a worksheet, every person writes down the coordinates of her/his estimated peak, following a re-established coordinate system:

wisdom-crowds_4

The Peak-Box Game

At the end, the observed flows are given as well as the estimates made by the Peak-Box approach (Zappa et al., 2013). Everybody then compares their guesses with the location of the observed peak and the Peak-Box estimate.

Peak Box Game – The results

The four following figures show the results of the four forecast days that were played at the HEPEX workshop, together with the results from other applications of the game (at EGU, AWEL (end user), and ETH). The observed highest discharge is also indicated (red circle). The intersections of the blue rectangles are used to capture the Peak Box center.

Peak-Box-Results-1Below, we show the coordinates of:

  • the observed maximum peak (Observation),
  • the median of the experts’ guess (Median out of 162 Experts), and
  • the Peak Box Center (Median of HEPS with 16 members).

Peak-Box-TableThe yellow columns give the total error (Manhatten distance) of the experts’ guess and the Peak Box guess (against the observed coordinates). We can see that the medium errors of the Peak Box guess are lower than or equal to the medium errors of the experts in all forecast days. The most important lesson, however, is, that the collective guess of the crowds of experts and HEPS members is very good compared to most “deterministic” guesses of single experts and HEPS members. So, to put it short: trust in your ensembles (or hire 200 experts…)!

Conclusion

This is only one example, but it shows that the Peak Box estimate can be useful as additional information extracted from the ensemble prediction to help in the analysis of forecast events. It is also an interesting tool to train forecasters and better assess the value of ensemble forecasts in flood prediction.

If you are interested in using the Peak Box game, it is available for download in the Resources Page of the Hepex website.

You can check the paper where it is explained (here) and contact us (here) for any information and feedback.

3 comments

  1. I am not quite sure I understand this latest text, who are the experts X and Y? But more important:

    1. What can we learn from 162 “experts” who trust EPS forecasts presented to them? It would have been more illuminating to see an investigation where these experts were given additional information such as later deterministic forecasts and/or observations which were either made after the EPS was run or were not possible to include in the data assimilation, to see if this knowledge improved their forecast service.

    2. I was introduced to this idea in the late 1960’s then called the “Delphi method”. http://en.wikipedia.org/wiki/Delphi_method. It did not impress on me. At the time there was a commercial for the Renault Dauphine: “50 million Frenchman cannot be wrong”, inspired by an American 1920’s musical. See http://en.wikipedia.org/wiki/Fifty_Million_Frenchmen and
    http://exmormon.org/phorum/read.php?2,1231601,1231601 Statistically it would have been better to say “Opinions of 50 million randomly selected humans from around the world, with different and uncorrelated backgrounds, prefer Renault Dauphine”. And the same applies, I think, to forecasting.

    3. It has been know, or should have been known, in the meteorological community since the 1950’s (a paper by Phil Thompson, 1957) that forecast combination yields better forecast service both deterministically and probabilistically. Today’s weather forecasters nevertheless try to pick “the model of the day”. I had actually intended to devote my next column on this topic, that the ultimate test of our service is the quality of the decisions made from it, not if we can improve a particular ZXW-score by 4%.

    1. Dear Anders,
      We’re sorry that the table was not clear enough. We hope we can clarify this here:
      The X-Coordinate represents the timing and the Y-Coordinate represents the magnitude of the peak-flow forecast. So ..
      Column “Expert X” contains the median in X (timing) of 162 expert guesses.
      Column “Expert Y” contains the median in Y (peak runoff) of 162 expert guesses.

      To your other comments:
      The aim of this game was to present the Peak-Box and its value as an instrument to support the interpretation of ensemble forecasts.
      What the Peak-Box does, is to mark the most probable estimate of the expected peak-flow based on the information of a 16-member HEPS. The very same task was given to the participants of the game (experts).
      We could now show, that the Peak-Box estimate for the peak-flow was most of the time better than the individual estimates form the experts. However, if we take all the estimates from the experts together we can come up with a median expert guess which is almost as good as the Peak-Box guess.
      While creating the game we were actually considering to give ensemble forecasts WITH the Peak-Box to one half of the participants and ensemble forecasts WITHOUT the Peak-Box to the other half. But for the sake of simplicity we decided to hand out the ensemble forecasts only without the Peak-Box.
      Your suggested setting would certainly be interesting and also more profound. The restriction we had with this game was that it should be possible to introduce and play it in an EGU session in about 20-30 minutes, so we decided to keep it simple.

      Concerning the “Renault Dauphine” example we partly agree with your opinion. Our “experts” had different backgrounds, but their background was somewhat correlated. At the HEPEX Meeting in the US we had THE EXPERTS on the topic being voluntary there. We argue they did not need any additional information to complete the game. The EGU crowd was also quite correlated, among dozens of sessions they decided to spend a couple of hours on advances in HEPS. Also the results of our end-users (AWEL) originates from a training unit we had with them in April. All of them are used to the kind of forecasts we presented in the game. The ETH student were the only real lay persons involved in the game, so far. They never heard about HEPS before, and I doubt they will ever have to do with HEPS afterwards, since their curricula is forestry.

  2. On first read I was really worried that the crowd was guessing “water” as the name of earth’s atmosphere. But following the link with the image suggests those two graphics came from different places and aren’t related to each other. My faith in humanity is now restored.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.