A Conversation with Martyn Clark – On Modeling, Forecasting, Life and Everything
Contributed by Andy Wood and Maria-Helena Ramos
On October 27th, 2015, we had the chance to sit at a local Boulder Colorado brewery and have a casual conversation with Martyn Clark.
Martyn is a scientist in the Hydrometeorological Applications Program (HAP) at the National Center for Atmospheric Research (NCAR) in Boulder, Colorado (see more about his research interests and career here). He agreed to answer our questions and let us publish the conversation in the HEPEX blog. The interview was edited after the fact to ‘improve readability’.
We hope you will enjoy it as much as we did.
Andy Wood: What’s the connection between hydrology models being developed and used in research versus those developed and used in applications, if any? One way to think about this might be: Are research models generally suitable for applications, or/and can application models be used in research? Are some models destined only for one sphere, while others can navigate across the gap?
Martyn Clark: Is the question about what is the actual connection or the possible connection?
AW: Let’s consider both. And where does your own model [SUMMA] fit?
MC: Right. In practice, a lot of the research models aren’t being used in applications. I don’t think that needs to be that way. A lot of the models are often being developed to enhance the understanding of hydrological processes. Those type of more process-oriented models often have data requirements and computational requirements that make them difficult to use in an applied setting.
AW: And the applications can be anything from forecasting to climate change impact assessment, so, thinking broadly …
MC: I guess that it’s a bit of a mix. The VIC model, for example, is a research model. You can view that as being a research model because it was primarily developed at universities … but it was developed for applied research.
AW: Initially climate change impact assessment…
MC: Yes. And it has been used for climate change and impact assessment and for streamflow forecasting, and also research to understand hydroclimatic variability and change. So, that’s one [research model] that ‘s been used fairly widely in applications.
MHR: Why were you making a distinction between actual and potential?
MC: Okay, so, let’s go back and think about what we mean by applications. Initially, when we were talking about applications, I was thinking of operational forecasting. I think few of the research models have been used for operational forecasting [for instance, the HYPE-type models that have been built out of HBV].
MC: If you view a research model as a process-oriented model that is primarily used to enhance your understanding of hydrological processes, like the PARFLOW-type models or HydroGeoSphere, things like that — have they been used for applications? To some extent, they can be used in some climate sensitivity studies, but they are not widely used for operational forecasting, for example.
AW: Is there an end of the spectrum where models will really mostly be suitable for very applied questions like flow forecasting — the simpler models — and probably have lesser utility for research? Would you say that conceptual models have much utility for research or they are really just engineering tools?
MC: But it depends on what the research question is. They can be used as benchmarks, for example to understand the information content in the forcing data, or to understand spatial variability, etc. I just wrote a paper that discusses ways to narrow the gap between hydrological theory and hydrological models. We really wanted to take a model-agnostic stance, and to argue that it’s possible to increase the relationship between theory and models for all types of models. The difference among models really relates to what processes you want to emphasize and what algorithmic simplifications you’re willing to make, and not the applicability of the theory. You tailor a model by putting computational weight into certain elements, and you’re not saying one theory is applicable, and one theory is not. The issue of model choice really relates to whether you adequately represent the dominant processes – from a complexity perspective this entails asking if the algorithmic simplifications are defensible.
AW: I wonder to what extent model developers even really think about prioritizing and weighting theory rather than just putting in things they think are important and not thinking about the [associated] theory?
MC: Right. That was kind of our point. A lot of the modelling discussions start with the algorithms, and the link between the algorithms and the theory is lost. If you think about a lot of the land-surface models, the starting point is Richards’ equation, and most land-surface model developers aren’t necessarily thinking whether Richards’ equation is the correct way to simulate the storage and transmission of water through the soil matrix, [Rather], it’s just “The Way That Things Are Done”.
AW: And your approach in SUMMA is exactly the opposite — that is, starting with theory and saying, okay, what parameterizations are available?
MC: Well, we haven’t really gone that far, and that’s part of the motivation for writing this particular paper. The start of SUMMA was to identify the commonalities among models, and to try to identify where the community agrees in terms of identifying what will be the dynamical core for the model, what are some general conservation equations, and then see how different flux parameterizations could fit into that master modeling template. Even in SUMMA, in the first versions of SUMMA, the linkage between models, algorithms and theory is pretty weak.
AW: Where does SUMMA fit in the research to applications spectrum?
MC: Well, it’s really addressed to do both. I view ‘research and applications’ as somewhat of a false dichotomy. The real question is: what question (and not necessarily a research question) is the model suited to answer? Different models can answer that question is different ways. What we have been trying to do in SUMMA is to have a unifying framework, so that models are more broadly applicable. SUMMA could be used to answer research questions, such as what are appropriate ways to parameterize different physical processes, or how can we simulate the spatial scaling characteristics that are important in different river basins in the world. Such questions focus on the tie between the physics and the algorithms. SUMMA can also be used for some applied questions, such as characterizing the uncertainty in hydrological model projections of climate change. Instead of having a small ensemble of models that are selected on ad hoc considerations, you can use SUMMA to deliberately have a broader coverage of the model hypothesis space that explicitly characterizes the hydrological modelling uncertainties.
Also, you can use SUMMA for streamflow forecasting applications. I’ve argued before that the way that streamflow forecasting is done now is fairly sensitive to stationary assumptions. A lot of people criticize statistical forecasting as being not applicable in times of a changing climate because coefficients in the statistical models are trained on historical data. But the same is true for conceptual models because the parameters are also trained in historical data and they might not necessarily be well suited to extrapolate beyond the historical conditions. So the argument there is that processes-oriented approaches to simulating hydrological processes might be really important. But in order to make those effective, you really got to increase the agility of your process-based models, so that you can represent what is important in reality.
The way that we have described that in previous papers is to define modeling as a continuum, in which we are gradually increasing the process complexity in terms of the number of processes that are explicitly represented, and we are gradually increasing the spatial complexity in terms of how we represent the spatial variability of hydrological processes and the spatial connections across the landscape. So, I view complexity as a continuum. But if we talk about model complexity in terms of the two camps where the community has become to organize itself, we have the conceptual models in one camp and the physics-based models, or the processes-based models, in the other camp, they’re not that discrete. If you continue with this extreme perspective, you can view the conceptual modelers as assuming that they know absolutely nothing about environmental physics and they infer all of their knowledge through the calibration process. You can view the processes-based modelers or the physics-based modelers as assuming that they know too much, to the extent that they have limited flexibility in the model parameters and process parameterizations.
MHR: In SUMMA, can you make physically-based models and conceptual models work together?
MC: That’s what we are working on at the moment. In 2008, I published a paper on Water Resources Research on the framework for understanding structural errors, the FUSE framework, and that was based on bucket-style hydrologic models, what we call the conceptual hydrologic models. It was to say, okay, let’s define a common set of state equations and have flexibility in the way that we arrange the buckets flexibility in how we parameterize the individual fluxes. We’re currently incorporating the FUSE concepts in SUMMA.
Waiter: Another round of beers for the table? How is the food?
MC: Well I haven’t had time to touch it.
[Sorry Martyn! What a trouper!]
AW: You have asserted publicly that forecasters cannot expect to improve their predictions unless they improve their hydrological models. Do you think the operational forecasting community has neglected model development?
MC: Is another way of asking that question: has the operational forecasting community been asking the right questions? Or, there are multiple ways to improve streamflow forecasting, and has the operational forecasting community been moving in the right direction? Have they been picking the low-hanging fruit or picking the correct fruit? So, I don’t think model development is neglected. But I think the broader question is: are research investments adequately targeted towards the important questions that will result in tangible increases in forecast skill?
AW: And how would you answer that?
MC: I think we do not know whether we are asking the right questions or not, many times. Many times the question that we ask is based on the gut feeling of the person holding the purse strings, or the research interests of the people working with them in the organization. It’s rare for me to see research investments that are based on a comprehensive predictability study.
AW: And, in this day and age, do you think it is necessary for a modeler and/or forecaster to explore a watershed in person to claim to have expertise regarding its hydrological behavior?
MC: Yes. But there’s kind of a nuance there. To understand its hydrological behavior, that’s kind of a qualified yes; to provide meaningful simulations, maybe no.
AW: Given that, what fraction of watersheds that you have modeled, have you visited?
MC: (laughs) That’s why I said a qualified yes! The ones that I have intensively modeled in order to really understand their behavior, I visited. But when I am doing simulations for hundreds of watersheds across the country, no, I haven’t visited them all.
If you really want to understand the hydrological behavior, I think it’s important to visit them. If you want to infer some aspects of the hydrological behavior from the observations that are available in order to improve the fidelity of the model simulations, then no, it is not really necessary to visit them all, and it’s impractical.
(pause for some food and drinks)
AW: As you know, HEPEX worships ensembles. Given your enthusiasm about multiple working hypotheses, would you agree that ensemble techniques are the best strategy toward understanding and representing hydrologic uncertainty?
MC: Yes. (laughs)
MC: Well, we can talk more about that if you want. For me the question is: are ensembles useful? And I think the obvious answer for that is “absolutely yes”. Now, do people misuse ensembles? And the answer for that is also “absolutely yes”. Just because you’ve got ensembles doesn’t mean that you have characterized the uncertainty properly. But then there is that [ongoing] debate about characterization of uncertainty and I’m not going to get into that here. The assessment of forecast uncertainty is too complicated to do without ensembles. You might be able to get analytical solutions for specific test cases, but those are too simple to be useful. There are so many interactions among the different components of the system that you really need to be use ensembles or some type of Monte Carlo sampling of the uncertainty space in order to get anything realistic.
AW: On the other hand, is there also a danger that people could have a false confidence that their ensembles comprehensively define uncertainty when really they are too restrictive?
MC: Are you talking about cases where the uncertainty is unknown or a streamflow forecasting case where it is possible to hindcast and evaluate the statistical reliability of your ensembles?
AW: I am talking more about these ensemble techniques in general where researchers feel that with, let’s say, multiple models, they are somehow defining the full probability distribution of the modelling uncertainties.
MC: That’s like someone going to the bar, ordering a Budweiser, Bud Light and Miller and pretending they have drunk [a representative sample of] beer… (laughs)
MC: Just because you’re generating ensembles it doesn’t mean that your ensembles are statistically reliable — that you have characterized all the sources of uncertainty in a meaningful way. Just because people are really, really misguided in their application of ensembles, it doesn’t mean that ensembles in themselves are useless.
You want to characterize model uncertainty, so what do you do? You pick up three models of the shelf, but it is just like throwing random shots at the dart board, and they could all… miss the dart board, actually! You need to explicitly characterize all sources of uncertainty. Think of the standard approach to ensemble prediction: people run a deterministic model up to the start of the forecasting period to estimate deterministic initial conditions and then run an ensemble of future forcings and pretend they’ve characterized the uncertainty correctly. You know, that’s ridiculous (laughs). If you are missing key sources of uncertainty, like the impact of the initial conditions on forecasts, you’ve got no business on being in the ensemble business.
AW: So — ensembles are inherently a powerful technique, but there’s no guarantee that anybody will apply them properly or thoughtfully. Do you think we should be moving more towards uncertainty-aware approaches, uncertainty-aware science?
MC: Yes, it should be what we should always be doing. We’re just beginning to recognize the folly of the deterministic paradigm now. We’ve being revolving around the question of what is important, and we’ve been talking about what is important from a science perspective. What we haven’t been talking about is what is important from a computational perspective — where do we want to put our computational resources? Do we put our computational resources into the model physics, or do we put them into characterizing the model uncertainty? Different communities are doing that in different ways. At the extreme, some communities are putting all their effort into physics, at the expense of being only able to run a single deterministic simulation.
MHR: Yes — because they believe that the physics can give them ‘the right answer’.
MC: But that’s just their belief: a lot of [these choices] are based on personal beliefs rather than comprehensive sensitivity analysis. We need to shift the community to accept that the path forward isn’t based on banging your fist on the table the loudest [loudly bangs fist on table, spills everyone’s beer; apologizes], [and saying] “I believe in ensembles!”, or “I believe in physics!”, or “I believe in parsimony!”.
AW: Do you think there is evidence that any one belief has more credibility than another? Maybe this idea that you need to qualify and understand uncertainties from ensembles is perhaps more compelling given the evidence than the idea that physics and resolution can give you the right answers?
MC: Well, arguably ‘hydrologic prediction science’ is in its infancy – hydrologic predictions have been made for a long time but it’s really only now that it’s beginning to emerge as a science. As we’re sharing stories, trying to understand different perspectives and how they’re all married together, there’s greater awareness of different aspects of the forecasting process. We’re beginning now to see people take a more holistic view, but historically it has been one person looking at one aspect of the problem, one at another, and it’s difficult to pull all of those together and understand their relative importance.
AW: To comment on your earlier point about why have we been so deterministic for so long, I think that initially, when we didn’t have a lot of computing power, and the models and data were rudimentary, every simulation was full of so many errors that people had to fix them in some way, and the only way to do that was to work with deterministic simulations. Because it’s not practical to fix an ensemble by hand; you can just fix a single, low-dimensional simulation so as to move forward operationally. But that day is gone — now we do have resources and methods. So, there was a very rational reason for us having deterministic forecasts and simulations thirty years ago, but the rationale for that has evolved.
AW: Coming back to hydrological modelling, as a modeler, if you have to choose between great parameters, great model structure, great parameterizations, or great forcings, which would you select?
MC: In what context? What is my objective as a modeler?
AW: I meant to include that … but Helena wouldn’t let me add more words…
MC: Draconian, she is…
(laughter… MHR glares)
AW: For implementing a model for some applications — could be forecasting, could be just simulating the watershed. Well, let’s say generically for representing the watershed with the model.
MC: Well, they are all important. It depends on the context, I’d say. The question is: in a specific environment, with a specific model configuration…[and] you left something out: the quality and availability of the evaluation data. I think the point is that all those sources of uncertainty can interact in very complex ways. You can’t say one is more important than the other. You’ve got to base it on a specific model application, figure out what is lacking in it, and then begin to put attention in that area.
Let’s say that you’ve got a perfect model structure, perfect parameters, and perfect parameterizations — so you’ve got a perfect model! But with terrible forcings, how would you know that your model is perfect? Or say you’ve got a really good parameterization, but with terrible parameters, again, how would you know that it is perfect? Or you’ve got a really good architecture, but your parameterization stinks…
MHR: Suppose…. if I were a flood forecaster, I would prefer great forcings.
AW: And just for simulating hydrology, you can’t even get started without decent forcings.
MC: That’s true.
AW: I think in a way this question may be a little broader than just a question about what is important for modeling, but it is also about a person’s perspective of what is important… We’ve been talking a little bit this week about the differences between the world view of forecasters and of modelers. Hydrologic forecasters tend to look at everything except models as being the most important component — and mainly it’s the forcings, and mainly it’s the precipitation. Modelers tend to look at the model and all its aspects as being central, critical. So, beyond this facile question about how these modeling aspects interact, the answer may come down to what a person does: are they forecasters or are they modelers?
MC: This relates back to the discussion we were having before, [and an] interpretation of the question is: is the forecasting community asking the right questions? And the response to that question was: we don’t know. For the most part, the questions that we choose to answer are based on someone’s personal preferences. They’re preaching “physics versus parsimony”, or its based on their personal interests, their training, their experience — all that comes in to this random walk instead of charting the course forward.
MHR: Back to the different views from modelers and forecasters — when modelers look at improving their models, they always work with their meteorological observations. But the forecasters have to work with meteorological forecasts. I think that’s why for them if the model is not perfect, it is not a huge problem because meteorological forecasts dominate the uncertainty. But then could a modeler analyze the simulating capacity of a model (its parameterizations, structure, representation of physics processes) without using meteorological observations, but using forecasts as forcings? That is, to take an approach closer to the ‘real world’ of a forecaster? This might open their eyes to the relative importance of other forecast challenges besides the model.
MC: This gets back to the earlier question: what is important? I think to do that properly you’ve got to decompose the forecasting problem into its individual components, and to the extent possible, look at each component in isolation. I don’t think it is appropriate to say that hydrological modelling isn’t important because the uncertainties in hydrological modeling are being dwarfed by meteorological forecasts. In many respects, that’s a cop-out. You have abdicated your responsibilities as a scientist to just say it doesn’t really matter because the uncertainties are so large, so let’s just throw up our hands and so why bother?
AW: There’s an analog in the climate change impact assessment application, where one view has been that hydrological uncertainties are so relatively small compared to downscaling and so on that we don’t even need to calibrate the models.
MC: But those decisions are not based on comprehensive sensitivity analysis, but based on ‘gut feeling’
AW: Right, and there’s a sort of distaste for the idea that when getting into the hydrological modeling, we’ll need to calibrate. That’s a murky area for most scientists and hydrologists even. It’s too much a mixture of empiricism.
MHR: Maybe what is important to a hydrological model is to be sensitive to variations in the forcings. Maybe the models that are tested with uncertainties in the forcings, even if it is observational uncertainty, can be more robust to face forecast uncertainties.
MC: That’s a really important point, and that’s what we have been doing in some of the work in our group: it’s to recognize the uncertainty in the forcings, and recognize the uncertainty in the evaluation data. Because the problem that we see is that people can be so over-confident in the forcings or over-confident in one part of the model, that their selection of appropriate parameter values, or their selection of appropriate model structures is biased by errors in the forcings. So, if you take an ensemble framework for parameter estimations and model selection, an ensemble framework for model evaluation, then what you’ll find is that some of the decisions you’ve made are not really appropriate because they are biased by over-confidence. We are beginning to evaluate this… but it’s a different evaluation paradigm than we’re used to.
AW: Final question: as an ‘established’ hydrologist, what do you regard as having been your best decisions in your professional life?
MC: That’s a tough question!
AW: That is, if somebody’s just getting a PhD and wants to be a researcher, what are the things that you feel you’ve done right that they should be aware of?
MC: One thing that I am absolutely sure of which would be good advice is to build strong and effective collaborations. Some of the collaborations that I’ve built in the course of my career have led to a great generation of ideas and really good papers and a lot of fun and some really good science. So, they [recent PhD graduates] definitely need to look outside their group, look for some key people and the community internationally who you’ve got an affinity with — or sometimes if you think very, very differently, it can be good as well — and think about some collaborative projects.
MC: It’s also important to take risks. Some of my papers that have got the most attention have been fairly large risks that I have taken. I am not sure if that is advice that you could impart to someone — that is, to ‘go large’, to take a big risk, but generally it is worth it in the end. I’m most happy when I’m working on something that pushes the boundaries of my existing knowledge and capabilities. Perhaps it sounds a bit strange, but if I can map out a project from the beginning to the end then I kind of view it as being rather boring. It is like I know how to do it, whereas the exciting thing to me is discovery — to try and figure out how to do something I don’t know how to do. Some of the papers that I am most proud of, I was working away for ages, and it was really unclear whether all the investment would result in something tangible, but in the end they turned out to be really important papers. But, it has been a big risk. At one point, I went for four years without getting a major first author publication. That doesn’t review well in many circumstances, but it turned out okay in the end. It is a risk that you take.
Thank you Martyn for your time and insights!
And now … who will be the next victim for the HEPEX Interview Series?