As an operational forecaster, should I concern myself with typologies of uncertainty?

Contributed by Florian Pappenberger, Jan Verkade and Fredrik Wetterhall

You don’t need to know about uncertainty typology, but you do need to know what is in your uncertainty estimate and, more importantly, what is not.

Lots of talk!
The term “uncertainty” has become part of the standard vocabulary of many hydrologists. The analysis of uncertainty in hydrology is a thriving sub-discipline. The more a discipline advances the greater is the need to use/invent new jargon to allow for more precise definitions – new jargon is of course also a way to signify that progress has been made. Part of the new jargon is often a classification system. There are now a large number of such classifications for uncertainty. The most widely used definitions (and currently swimming on a tide of popularity in hydrology) are aleatoric uncertainty and epistemic uncertainty. In this post, we will give the basics of these definitions, argue that you don’t need to remember any of this blog post and postulate what you need to know.

Aleatoric? Epistemic?
Let us give you some background. Aleatoric uncertainty refers to the unknowns that differ each time we run the same experiment. As such, it is akin to the random elements affecting the outcomes of the experiment. Think of the throw of a dice, of the chaotic nature of the atmosphere and of the measurement error in water level gauges. Epistemic uncertainty relates to things we could but do not know. Think of the sparseness of rain gauges, of the constantly changing river geometry which we only record periodically and of the purposely omitted processes in our forecasting models that need to run quickly.

By the way, the words aleatory and epistemic stem from the Greek language and if you cannot remember which uncertainty is meant by which term, then you are not alone (a free chocolate bar will be offered for the person which comes up with the best memory-hook in the comments below!).

Both aleatoric and epistemic uncertainties are present in all components of an operational hydrologic forecasting chain. Money and effort could contribute to the reduction of the part of the error that depends on the epistemic uncertainty, but most of this effort would have to be done ‘offline’, i.e., in system and model design. For all intents and purposes, in the real-time operation of a forecasting system, we treat all uncertainties, regardless of their nature, as aleatoric, i.e., as random.
To put it differently, we expect that future hydrologic conditions will be within the range of our uncertainty estimates. In contrast, epistemic uncertainty could potentially lead to big surprises because of events that are not included in our uncertainty estimate. Think of the river that is suddenly blocked because of a landslide.

Should I care?
So far the theory. The question is, how much should one care about these different classifications of uncertainty in an operational forecasting chain?

The answer depends on whether you’re designing or operating a forecasting system.

If you’re designing a probabilistic forecasting system, it’s probably helpful to know which uncertainties can be reduced and which cannot. You’d probably also want to have some idea of how much uncertainty can be reduced by investing in additional knowledge, additional measurements and so on: should I invest in a new satellite, in additional rain gauges, in data assimilation, or in additional research into stem leaf hydrology?

If, on the contrary, you’re operating a forecasting system, you’d need to know what is in the uncertainty estimates and what is not. How are these estimates produced? Atmospheric ensembles only? Hydrologic uncertainties too? Similar to the world of hydrological modelling, you’d need to know what is in your predictive distribution (model) and what is not. This guides the interpretation of the estimate of the total range of uncertainty and the communication thereof to forecast users.

Does the typology of uncertainties matter in decision making?
It does and it doesn’t. Often, the decision maker will simply be interested in the estimate of predictive uncertainty that is based on our current state of knowledge. For example, the blue light emergency services will want to know the range of future water levels. They are less interested in what would happen if a bridge collapsed and blocked the river altogether. Having said that, it doesn’t harm to communicate the fact that some processes that could affect future conditions are not actually included in the estimate they’re given.

Does the knowledge about the typology of uncertainty really change current practice? Is it not just the same meat with different gravy?

#HEPEX at EGU2014

Coming up next:

Is a lack of competition affecting innovation in operational river forecasting?

16 comments

Jan Danhelka
May 6, 2014 at 11:48

Nice post. Being a forecaster I like more the typology proposed by Krzystofowicz:
– Input uncertainty of measurements and meteorological forecast – it includes mostly Aleatoric uncertainty (e.g. chaotic atmosphere, water lever measurement error) and Epistemic (network sparness, rating curve change due to change in channel geometry etc.),
– Hydrologic uncertainty of models and methods – it includes mostly Epistemic uncertainty (network sparness, rating curve change due to change in channel geometry, model structure and parameter etc.), But some features (like estimation of initial conditions and thus chaotic nature of runoff response) might be considered as aleotoric from my point of view.
– Operational uncertainty of external drivers that cannot be quantified – it may cover missing data, landslide blocking, ice barries, levees breaks but also antropogennic factors like reservoir operation based on human decision and the effect of hydrologist-forecaster.
The last item is very interesting one. The trend is to use more and more automatic forecasting systems but I preffer to have a forecaster who actively operates the model and intervenes to the computation run. Why? There is an example: in 2003 there was a winter flood in Southern Bohemia with two comparable peaks within three days. The first peak has been heavily underpredicted (really a bad forecast). You can deal with it by data assimilation and update of the forecast each hour or you can think about the reason of undeprediction from the hydrological perspective. Before the second peak a hydrologist identified the problem in frozen ground (a very specific course of winter that year) and changed the model parameter defining impervious ground. The result was the perfect forecast of the second peak.
There are many other situation when the human intervention is crutial – missing data, dam breaks but also the interpretaion of the forecast output (as mentioned in the article).
If I could select between “better data”, “better model” and “better forecaster” I would select a forecaster (BTW what is your choose?). But I am aware that my selection means a very specific source of Aleatoric (operational) uncertainty:-)

Reply
1. Jan Verkade
  May 7, 2014 at 21:36
  
  Better forecaster, assuming that she can add skill to the automatically produced forecast. That’s a big assumption though, and one that has received some attention on these pages as well as elsewhere.
  
  Reply
2. Florian Pappenberger
  May 7, 2014 at 23:14
  
  Like your classification 🙂
  
  I would select better data, but it all depends a bit on the purpose of the forecasting system. I believe that there is a post from one of our HEPEX columnists coming up on the topic.
  More importantly, did you fix the issue so next time the soil freezes correctly?
  
  Reply
Florian Pappenberger
May 6, 2014 at 13:32

If you are interested in a longer discussion about the topic have a look at Beven and Young (2013) which has a subsection on this topic (although more from a modelling point rather than forecasters) – but there is a nice table of examples fro epistemic and aleatory uncertainty.

Beven, K., and P. Young (2013), A guide to good practice in modeling semantics for authors and referees, Water Resour. Res., 49, 5092–5098, doi:10.1002/wrcr.20393. http://onlinelibrary.wiley.com/doi/10.1002/wrcr.20393/abstract

Reply
Liz Stephens
May 7, 2014 at 17:58

Epistemic uncertainty relates to things we could but do not know. Think of the sparseness of rain gauges, of the constantly changing river geometry which we only record periodically and of the purposely omitted processes in our forecasting models that need to run quickly.

Of course the difference here is quite important. If I make decisions based on your forecasts and you tell me that there is uncertainty because of the constantly changing river geometry and this will always be the case, then that is something I have to accept.

If you tell me that there is uncertainty because you’ve omitted processes to make the model run quickly, I might be on your side when you are campaigning for a faster supercomputer!

Reply
1. Jan Verkade
  May 7, 2014 at 21:23
  
  In both cases, one might be interested in the Return on Investment: what’s the value of the improved forecasts after an investment in reduction thereof.
  
  Reply
2. Florian Pappenberger
  May 7, 2014 at 23:08
  
  is that more a system design issue (so falls under you should know),
  
  Reply
Anders Persson
May 7, 2014 at 20:30

The word ”aleatoric” comes from the Latin word ”alea” which actually means ”die” (dice)”. “Alea iacta est” (The die is cast) were Julius Caesar’s immortal words when he, with his army, crossed the river Rubicon in northern Italy in 49 BC.

“Epistemic” comes from Greek epistēmē, meaning “knowledge” or “understanding”. In English (and in my language) an “epistle” is an elegant and formal letter sent to a person or group of people. This helps me not to confuse ”aleatoric” and “epistemic”.

Having come so far I wonder if a figure I showed at my seminars and lectures at the Met Office some years ago encapsulates the ”aleatoric” and “epistemic” aspects.

1) A numerical weather forecast is presented in a grid point which actually represents an area around the grid point.
2) It differs from the Truth which is an average of an unlimited number of observations within that grid box.
3) In reality we are lucky if there is at least one observation within the gridbox.

The “aleatoric” uncertainties are

a) the errors in the forecasts of the weather systems and
b) the errors due to sub-grid small scale turbulence,

The “epistemic” uncertainties are

a) the systematic model deficiencies we are gradually coming to terms with and
b) the uncertainty because of just one observation in the box (at best) to represent the true average.

Pardon my Greek (or Latin) if I am wrong.

Reply
1. Jan Verkade
  May 7, 2014 at 21:39
  
  Definitely a contender for the chocolate, but I’m not sure I follow your Met Office lecture example.
  
  Reply
2. Florian Pappenberger
  May 7, 2014 at 23:22
  
  Image for Anders comment
  
  Reply
3. Anders Persson
  May 8, 2014 at 14:11
  
  In order to bring my illustration image closer to my original comment and to avoid that it becomes more and more remote for every new comment from somebody else, I make use of Florian’s new option and put it here as a “comment to myself”.
  
  Reply
Anders Persson
May 8, 2014 at 08:17

The left-upper sentence in my figure (“What we should do”) refers to a previous figure where the central part is omitted, reflecting what we normally are forced to do: compare NWP grid box averages with point observations (if there is any!).

Reply
Bettina Schaefli
May 8, 2014 at 12:11

I would argue that the problem is not really how to remember which term stands for which type of uncertainty but rather how to grasp the concept hidden behind the term.

The mental image of roling a dice (“alea iacta est”) clearly sets the scene for the concept of aleatoric, but when does this ever apply in reality? To give an every day example: yesterday, I tried to measure the length of a curtain with a too short rule. The outcome was obviously every time different depending on many factors of which I am perfectly aware. Is this now aleatoric or not?

On the other hand, a permanently changing cross-section might well be assumed to have a perfectly random effect on discharge estimates (at certain time scales), why is this then not aleatoric? The point I am trying to make here is that the distinction might be of very little use in reality since the distinction between aleatoric and epistemic is a question of scale (how deep do I dig into what creates the uncertainty).

Reply
1. Anders Persson
  May 8, 2014 at 15:19
  
  Mmmm, maybe you have a point.
  
  We meteorologists discuss in terms of “systematic” and “non-systematic” errors. The former, what you hydrologists tend to call “epistemic”, can be remedied by e.g. improving the physics scheme in the models and/or the observational network (e.g. putting rain gauges at representative heights) thus minimizing the systematic “epistemic” uncertainties of the amount of rain actually falling in the region.
  
  Non-systematic errors, due to uncertainties in the speed and intensities of arriving weather systems and sub-grid variability could also, if the world is deterministic, be regarded as “epistemic” since they will in due course, after huge investments in research and computers, be reduced to zero. But if the world is not deterministic there will always remain a random “aleatoric” uncertainty.
  
  Reply
Roberto Buizza
May 22, 2014 at 11:30

Before addressing the question that you posed (‘As an operational forecaster should I concern myself with typologies of uncertainties’) I would ask: ‘As an operational forecaster should concern myself to have an estimate of the forecast uncertainty’? In other words, do we all agree that without an uncertainty estimation we can only produce suboptimal, unreliable forecasts?

Only God would not need it! He would perfectly know the laws that govern the state of earth-system, and would perfectly know its initial state. Not only “God does not play dice with the universe”, as Einstein said, but “God does not need uncertainty estimations”!

We need uncertainty estimates because we cannot make perfect forecasts, since do not know the true equations (or we cannot avoid model uncertainties), and we do not the initial state of the system. Once we all agree that we must have an uncertainty estimation, identifying the sources of uncertainties can help improve our forecasts. This process is not trivial, since it could be practically impossible to disentangle them given that the ‘truth’ is unknown, but we can only estimate it by blending model first guesses and observations. Thus, does it matter if uncertainties are epistemological or aleatory? More importantly, how can progress in our field? As a modeller, I would aim to improve my knowledge of the processes, and design reliable PROBABILISTIC forecasting systems that include all sources of uncertainties, thus to provide forecasters with forecasts that include an uncertainty estimation. As a forecaster, I would require modellers to provide me with a probabilistic forecast. Together, we should learn how to take decisions with uncertain information, using probabilistic rather than categorical forecasts.

Reply
1. Jan
  May 24, 2014 at 21:40
  
  Are you saying the distinction is irrelevant? I would argue that for end users, that would indeed be the case. For modelers, however, wouldn’t it be worthwhile to identify what can and what cannot be known – and make sure that these processes are modeled accordingly?
  
  Reply

As an operational forecaster, should I concern myself with typologies of uncertainty?

Like this:

16 comments

Leave a ReplyCancel reply

As an operational forecaster, should I concern myself with typologies of uncertainty?

Share this:

Like this:

16 comments

Leave a ReplyCancel reply