GIS/EM4 - Predictive Modeling of Ammonia Deposition from Large Numbers of Agricultural Sources

Predictive Modeling of Ammonia Deposition from Large Numbers of Agricultural Sources

GIS/EM4

William A. Huber
W.A.S. Nijenhuis

Abstract

The deposition of ammonia by airborne emissions from agricultural products is a significant problem in the Netherlands. Predictive deposition models are crucial tools for directing governmental policy. These models have two potential drawbacks: lack of transparency in model control and long running times.

To address the first drawback we make greater use of desktop GIS facilities to streamline the user-model interface. This straightforward architectural change has enabled better user control over program execution and better monitoring of input and output data quality.

Interestingly, this change also provides opportunities to improve the efficiency of the models. One set of improvements is achieved by offering alternative algorithms. Those alternatives typically make trade-offs between execution speed and accuracy. For example, a GIS front end makes it easy to re-present spatial input data in a hybridized vector-raster format. The extreme regularity of the raster representation lends itself to tremendous speedups while the vector representation of input in key locations retains sufficient accuracy. This approach has achieved speedups of 50- to 200-fold.

We also note substantial improvement in decision making when the model can produce visible results within seconds or just a few minutes at the most with transparent model control. Alternative decision options can then be evaluated in little time, making possible an iterative way of finding optimum policy solutions with a solid scientific basis.

Because this approach is very general, the same benefits potentially are available for almost any spatial model.

Keywords

Air emissions, agricultural ammonia sources, predictive modeling, model architecture, numerical model, optimization, ammonia, nitrification, government policy.

Introduction

The deposition of ammonia by airborne emissions from agricultural products is a significant problem in the Netherlands. Due to the presence of natural areas and intensive, widespread agricultural operations, ammonia is the main contributor to acidification and nitrification. Fast, accurate predictive modeling has become a crucial tool for directing governmental policy concerning agriculture in an environmentally beneficial fashion.

Predictive models exist and are in use (TNO 1998). The challenges before us include making these models more usable and responsive. This paper describes an approach to meeting this challenge based on adapting model accuracy and speed to decision making needs.

The approach has two parts: implementation and policy. "Implementation" consists of a suite of alternative methods for executing a model. "Policy" consists in determining how to deploy the methods in order to meet an auxiliary objective, such as improving model accuracy, achieving rapid model turn-around time, or minimizing the use of computing resources. Policy selection is a problem in optimization.

Meaningful policy choices become possible when more than one method to compute the model results exists. Therefore our approach requires, for the implementation, acquiring or developing alternative model methods (such as approximation methods); and for the policy, solving the attendant optimization problem.

Using a GIS to carry out policy decisions creates new possibilities for model implementation. The GIS is well suited to partition spatially distributed inputs into components (either by spatial dissection or in some other way, as illustrated by the "cutoff" method described later) that are sufficiently homogeneous or sufficiently well adapted to the requirements of alternative model algorithms. This is especially true for models whose results depend linearly on at least some of their inputs, as we will illustrate.

Problem statement

Governments aim at defining policy strategies that optimize both environmental and economic benefits. Quantitative instruments based on scientific models can support the establishment of optimized policy measures. These models have two potential drawbacks, however. Most current scientific ammonia dispersion/deposition models may require several hours to several days to run on a desktop workstation, because their inputs are extensive (sometimes exceeding 10,000 discrete sources) and because high-resolution information is needed over large regions. Furthermore, the standard architecture of these models is execution of a sequence of stand-alone programs controlled by parameter files. This does not allow transparent user control over program execution, it does not provide effective monitoring of input and output data quality, and it makes policy decisions difficult or impossible ("policy" as defined in the Introduction, not in the sense of "government policy"). These limitations inhibit any real or timely interaction between decision makers and the model, thereby reducing the model's utility in decision-making.

We note substantial improvement in decision-making when the model can produce visible results within seconds (or just a few minutes at the most) with transparent model control. Decision makers can then evaluate and understand alternative decision options in little time, making possible a practicable, iterative way of finding optimum policy solutions with a solid scientific basis.

The objective of this paper is to describe a general approach to enhancing the usability of environmental models and to illustrate this with details of its successful application to managing ammonia emissions in The Netherlands.

Background

A number of research institutes in the Netherlands use the OPS model (Operationeel Prioritaire Stoffen-model, Van Jaarsveld 1995) to estimate and predict regional ammonia depositions (TNO 1998). This model couples meteorological data with contaminant transport trajectories. Deposition is computed using a Lagrangian trajectory model with 12 fixed wind directions. Within each direction a short-term Gaussian model is applied to estimate the vertical concentration distribution. The modeling of depositional processes incorporates chemical reactions. The short-term effects are accumulated to estimate long-term average deposition over several hundred kilometers.

This level of detail in the representation of space and time causes a single OPS computation to span days of computation time on powerful workstations. Therefore the TNO has developed a faster approximation to OPS, termed "OPS simulation." This method assumes deposition is a linear function of emissions sources and emissions strength. Thus, in principle, the OPS solution for a collection of varied sources at discrete locations can be represented as a linear combination of OPS solutions developed for single unit-strength sources. A library of pre-computed OPS solutions, one per source location, would then suffice via OPS simulation to approximate the OPS solution for any configuration of source strengths.

OPS simulation can further approximate the OPS solution by assuming that the spatial deposition pattern of one source, relative to its location, is essentially the same as the pattern of another source relative to its location. This assumption is likely valid when both sources are subject to similar meteorological conditions and when topography and land use patterns (specifically, roughness heights) around the sources are similar. This is generally true within regions of the Netherlands where meteorology, elevation, and land use tend to be similar along bands parallel to the coast and to vary primarily as a function of distance inland from the coast.

Although this similarity assumption appears to be gross, in practice it is unlikely to introduce major errors because the vast majority of deposition contributed by any point source occurs within just a few hundred meters of its location. The deposition pattern for a unit source, relative to the source location, we term the "emissions kernel." Emissions kernels for the Netherlands computed with OPS tend to exhibit super-exponential decrease along the radial direction. For example, regression of y = ln(deposition per unit area) against r = ln(distance from source) in one case yielded a good fit with y = 1.21 + 0.34x - 0.18x2. A color three-dimensional representation of the original kernel was published in (Huber 1999).

Figure 1. Quadratic approximation of ln(deposition) versus ln(distance) along a fixed direction from a single (unit) source, illustrating the rapid decrease of deposition with distance from source. The kernel is defined to a distance of 100,000 meters but attains values less than 0.001 beyond 3,000 meters.

(In some cases, especially where one wind direction prevails, the kernel maximum may be displaced slightly downwind of the emissions source. Thus the super-exponential behavior is not necessarily a good description of the kernel near the source. Note, too, that most emissions kernels are anisotropic: although they eventually exhibit rapid decrease at larger distances, their values may vary with direction from the source.)

OPS simulation is performed with a straightforward algorithm. Input consists of a list of sources. Each source is determined by its emissions kernel, its strength, and its location. Output values are computed on a grid of rectangular cells. The values (representing total deposition within each cell) are initially set to zero. The algorithm loops over the sources. The contribution of the emissions kernel at each cell is computed, multiplied by the source strength, and added to the cell's value. After all sources are processed the grid values are written to disk for further processing or visualization.

OPS simulation thereby reduces the computing time needed for the OPS model by pre-computing a library of emissions kernels and assembling them as just described into an approximate solution to any input. However, the sheer number of potential interpolation points causes even OPS simulation to consume extensive computing resources. Elementary estimates based on estimating the number of machine instructions and adjusting for floating-point processing speed agree closely with actual OPS simulation run times (Huber 1999). On a 40 by 50 kilometer grid containing 3.2 million cells with 4,000 sources approximately three hours were required on a dedicated PC workstation (550 MHz Pentium III processor).

The purpose of OPS simulation includes providing timely information to managers regarding the potential consequences of agricultural land use alternatives. A three-hour turn-around time is not conducive to effective, informed decision making. The principal challenge we faced was to reduce this computation to seconds while maintaining "reasonable" accuracy for comparing alternatives. A secondary challenge was to streamline the process of preparing data for input and reviewing the output.

Approach

Our work builds on existing modeling practices in two ways. First, we make greater use of desktop GIS facilities to streamline the user-model interface, thereby making user control of the model more transparent. Traditionally, dispersion/deposition models have been built from large suites of programs written in scientific languages (Fortran, C, C++). (More recently many of these have been endowed with a Windows interface.) The functions of these programs can be divided into front end, execution, and back end portions. The front end gathers information from the user such as the names of data sources and values of parameters that control program execution. The execution portion not only runs the model, but in response to parameter settings established by the front end, may also determine which algorithm to use, the extent of optimization, and how to balance the computing resources (RAM and CPU speed) immediately available to it. The back end assembles and displays the results to the user for further analysis.

GISes are especially well suited to implementing front- and back-ends, which are concerned with user interface and establishing execution "policy." GIS applications, which are often scripted in an interpreted language, can be slow, but interface support and policy determination are functions that do not need high speed or efficiency. The compiled scientific languages are best suited to running and optimizing the algorithms, where speed and efficiency are paramount.

We have, therefore, begun systematically to remove interface and policy functions from the scientific programs and to re-code these in GIS. This architectural change has enabled better user control over program execution and better monitoring of input and output data quality.

Our second contribution has been to greatly improve the algorithmic efficiency of the models, in part by capitalizing on the change in architecture. This is an interesting and unexpected side effect of using a GIS. One set of improvements is achieved by offering alternative algorithms. The GIS front end maintains a unified interface that hides the complexity of implementing many alternative algorithms. Those alternatives typically make trade-offs between execution speed and accuracy. When speed is desired, an accurate but slow algorithm (such as OPS simulation or even OPS itself) will be selected only to compute a few critical values within a deposition plume. Then, alternate algorithms are invoked to interpolate among and extrapolate beyond those critical values. This approach, implemented as described below, has achieved speedups of 50- to 200-fold compared to OPS simulation, bringing model execution times down to seconds or minutes.

Another set of improvements was obtained by changing the method of representing input data. Specifically, the GIS can rasterize the input data (emissions source locations and strengths) instead of representing them as discrete points. Depending on the resolution of the resulting grid, there may be an increase or decrease in the number of locations represented, but the extreme regularity of the gridded representation lends itself to tremendous speedups.

For example, deposition under homogeneous emission and dispersion conditions is essentially a convolution process (Press et al. 1986, Huber 1999). Let R be the number of emissions sources and let the output grid have M rows and N columns. Using fast Fourier transform techniques (FFT) changes the straightforward OPS algorithm, which requires O(M*N*R) time, into a O(M*N*lg(M*N)) algorithm. Moreover, the implicit constants in these asymptotic expressions are very different: each step in the straightforward algorithm requires a conversion to polar coordinates and a logarithmic interpolation followed by an exponentiation. This is a relatively time-consuming series of operations, whereas each atomic step in the FFT is essentially a multiply-and-add. Therefore for problems where lg(M*N) is not very much greater than R there should be a noticeable improvement using the FFT. In our applications (R between 1 and 4,000 and lg(M*N) between 18 and 23) this has achieved speedups of up to 10,000-fold, so that computations that took hours or days now require only a few seconds (and are actually limited by the speed of data input and output operations). The modularization of the code afforded by the new architecture now lets us break any particular problem into relatively homogeneous parts (in terms of emission, dispersion and ground cover data), rasterize the input data, apply the fast raster algorithm to each part, and reassemble the results within the GIS back end.

Methods

GIS Front End: User control and policy analysis

The front end does what you would expect: it allows the user to define the study region, select sources, define source strengths, and specify parameters for the execution of the model. These parameters are copied into a parameter file template. The template is a standard parameter file for the model, but its values have been replaced by macros. A macro is simply an easily recognized string, such as "$scale". A GIS script identifies these strings and replaces them by corresponding parameter values collected via an input dialog from the user. We also wrote a GIS script to create input dialogs from lists of {macro, description} pairs. This is a general approach that works for any model executed by a stand-alone program and controlled by a parameter file. Therefore creating front-end functionality for a new model now requires only two simple steps. First, the values in a parameter file are replaced by macros. Then, descriptions are added to a list of the macro names.

The front end also collects or computes "policy" information. This concerns the best choice of model algorithm to use, what level of accuracy to attain, and how to deploy computing resources (including allocation of processors, processing time, RAM, and disk usage).

For example, when the model involves a convolution calculation, the straightforward algorithm will be fastest (and most accurate) whenever there are few sources, a small number of output grid cells, or both. In other cases the FFT algorithm will be superior. Preliminary timing tests established algebraic formulas for estimating the computational resources (computing time and RAM) required for each algorithm choice. (This is not as simple as comparing R to lg(M*N) as outlined above. Edge effects must be assessed when the ranges of the emissions kernels are smaller than the extent of the output grid; allowance must be made for scaling dimensions up to a power of two in the FFT algorithm.) The front end was then programmed to select the version of the model to run in order to achieve one of three user-specified goals: highest computational accuracy, best use of RAM, or shortest computation time.

These front-end computations could just as easily be coded into the model itself. That, however, would greatly complicate the construction and testing of the model. It is simpler and more reliable to code and test a single algorithm rather than a monolithic collection of algorithms. Furthermore, performing the "policy analysis" in the front-end provides access to the GIS interface and visualization tools. The interaction with the user occurs in this "friendly" environment, where a lot of information to support policy choices can be displayed.

The front end can readily serve another purpose. As suggested by the shape of the kernel shown above, most of the contribution to deposition from a source occurs within a few hundred meters. This is a tiny fraction (about 1/1000) of the theoretical area covered by the deposition. Beyond this short range, the deposition rate tends to be two orders of magnitude or less than the peak contribution. It is tempting simply to ignore such long-range deposition effects. (That can be considered an alternative, approximate model and indeed is one we have implemented.) However, the long-range contributions cannot be entirely neglected. Within one 40 by 50 kilometer region we studied there were over 4,000 discrete emissions sources. The cumulative effect of so many sources, even at long range, can be as large or larger than the contribution of one source at short range.

However, the linearity of the model creates optimization possibilities. Consider a situation with a relatively uniform dispersion of sources of roughly equal strength. (This is a situation a GIS is readily suited to recognize and respond to.) If, instead of ignoring the long-range deposition, we were to adopt a rough but unbiased approximation to the deposition, then the law of large numbers comes to our aid. The uniform dispersion of sources guarantees that sometimes the approximate contribution from a far source is too low, sometimes too high, but the cumulative approximate estimate from many sources will be more accurate. The relative accuracy will scale as S^(-0.5) where S is the number of far sources.

Therefore, the front end can dissect each kernel into two (or more) pieces. The first piece, the "near field" kernel, replaces the kernel values by zero beyond a short cutoff distance. The second piece, the "far field" kernel, is the residual (the difference between the original kernel and the near field kernel). The far field kernel could also be replaced by an approximate kernel, defined in a way allowing for much more rapid calculation.

For example, we have performed the original (straightforward) OPS simulation algorithm using the near field kernels. Instead of using a fixed cutoff distance, we use a fixed cutoff value. The near field kernel is defined at all locations with emissions exceeding this cutoff. For sources of low strength the near field kernel can be zero; for sources of high strength the near field kernel may extend for kilometers. Nevertheless, for computations over larger regions, the limited extent of the near field kernels means that many of the calculations previously required by OPS simulation at distant points need not be performed. In our 40 x 50 km example this near-field calculation takes only about 0.001 to 0.01 of the original time: about ten to 100 seconds on the 3.2 million cell grid.

There are several ways to estimate the far-field contribution to the solution. One is to use a single approximate kernel (rather than a unique approximation for each emissions source). This single approximation can be, for example, a suitable average of the different kernel types. This makes the far-field calculation a single convolution problem, which also (because of its asymptotic superiority to the straightforward method) requires only a few seconds (two to 40 in our test cases). The two solutions are then summed. (This is properly the role of the GIS back end processing.) At the cost of some inaccuracy, a computation of some 10,000 seconds has thereby been reduced to about 100 seconds.

We also implemented a more accurate but slightly slower approach. This one rasterizes each far-field kernel on a coarse grid. The coarseness is under user control during policy selection. Values are interpolated onto the output grid using a rapid nearest-neighbor algorithm. Evidently, the coarseness provides a trade-off between speed and accuracy. The speed improvement is readily computed in terms of the coarseness. The loss in accuracy depends on the kernel and the cutoff value, but can be precomputed for each kernel function in terms of the first derivatives of the kernel (which are asymptotically very small).

Measuring the inaccuracy is a concern. There are several ways to do this. One is to incorporate accuracy estimates into the model itself, as just described, so that changes in policy lead to predictable changes in accuracy. Another technique, which is more general, is to perform computations with "typical" or "representative" inputs using both the original model (such as OPS or OPS simulation) and the approximate model, and then to compare the results. For example, with a little experimentation (which goes quickly: the lengthy computation, which serves as a reference result, needs to be done only once) we quickly established an empirical relationship between the cutoff value (where near field switches to far field approximation) and maximum (or mean square) error. The speedups reported just above were achieved throughout large study regions with errors of about 0.00 001 times the maximum deposition within the region.

GIS Back End: Assembling results for visualization

The back end reads the model output and presents it to the user. More interestingly, if as a result of the front-end policy analysis phase the input or the model has been broken into parts, then the back end is also responsible for assembling the partial solutions into a whole.

The preceding paragraphs described how the OPS simulation model can be broken into parts by representing the emissions kernels as sums of near field and far field parts. Dissection is also possible (although we have not yet implemented this) on a regional basis. For example, the GIS front end is well suited to analyzing a land cover map and breaking the modeling process into regions of sufficiently homogeneous land cover. Such an approach could be used to run simpler and faster models (that do not need to incorporate differences in land cover) within each region.

The overriding principle here is that (a) many models can be highly optimized when certain conditions (such as linearity) hold, especially when key variables are sufficiently homogeneous in space and (b) a GIS is particularly well suited for recognizing when such conditions may hold, for performing the dissection of inputs required to achieve homogeneous conditions, for performing the appropriate model runs, and for assembling the partial results into a coherent solution.

Findings

The natural GIS-model-GIS architecture provides tremendous opportunities to optimize spatial models and thereby make them more useful for decision making. We have described in general how that can occur and we have illustrated its practicability with a complex, important model. The highlights are

A GIS is naturally suited to segregate model inputs into pieces that meet requirements for alternative algorithms.
The change from a series of programs controlled by parameter files to a dialog-driven, fill-in-the-blank front end is easily accomplished.
The opportunity to use different algorithms to conduct a model or to simulate its results can afford speedups of two orders of magnitude or more.
We note that the improvement from three-hour run times to four-minute run times enhances our ability as scientists to inform and advise clients and managers who want to use models for decision making, rather than merely decision affirming.

Conclusion

Our change to a hybrid GIS-plus-scientific code architecture has engendered further improvements and optimizations that, collectively, have transformed what was previously a good but cumbersome scientific tool into an effective, visually oriented environmental management decision tool. The approach is a general one that should be applicable to many models. The change to a GIS-supported front- and back-end is natural and widely applied. We encourage modelers therefore to take the next step of exploring how the GIS capabilities can enhance model utility by speeding it up and making it more flexible.

Recommendations for future research

Our suggestion that GIS be used to segment model inputs into domains of relatively high homogeneity appears promising but has not yet been implemented or tested.
The greatest improvement we noted in the case study involved replacing extensive grid calculations by convolutions. The FFT algorithm produced an enormous speedup and little loss of accuracy. We wonder to what extent current models-air emissions models, groundwater transport models, and other models based on propagation, transport, and diffusion-could be replaced by or approximated by a small number of convolutions.
The line that divides computation best suited for GIS (in the front- and back-end processing) from computation best suited for development in a scientific language is unclear. We suspect that much, if not all, of many existing models, can be directly implemented in GIS, especially in systems that support true compilation of their scripting languages. This suggests that modelers move away from coding monolithic programs or sequences of programs and consider instead implementing object modules or dynamic link libraries communicating with standard (raster and vector) GIS data files. This would effectively modularize model-specific code (partial differential equations solvers, for example) and simplify the separation of policy and implementation code.

Acknowledgements

The authors wish to express their gratitude to Marc Hoogerwerf, formerly of TNO-MEP and now with the Department of Landuse Dynamics at Alterra in Wageningen, The Netherlands, for suggesting and stimulating this research as well as providing the GIS support and development.

Dr. Roel Plant of TNO-MEP reviewed a draft of this manuscript and provided many helpful suggestions for its improvement.

Thanks also to Jan Duyzer of TNO-MEP for securing the funding that made this collaboration possible.

References

Huber WA. 1999 Oct. Convolution: Part 3 of 3. Directions Magazine.<http://www.directionsmag.com/features.asp?featureid=26>. Accessed 2000 June 23.

Press WH, Flannery BP, Teukolsky SA, Vetterling WT. 1986. Numerical Recipes: The Art of Scientific Computing. New York: Cambridge University Press.

TNO 1998. Nieuw Nationaal Model: Verslag van het onderzoek van de projectgroep Revisie Nationaal Model. Apeldoorn, The Netherlands.

Van Jaarsveld JA. 1995. Modeling the long-term atmospheric behaviour of pollutants on various spatial scales. Proefschrift Utrecht.

Authors

William A. Huber, Principal, Quantitative Decisions
539 Valley View Road, Merion Station, Pennsylvania, 19066 USA.
Email: [email protected], Tel: +1-610-771-0606, Fax: +1-610-771-0607.

W.A.S. Nijenhuis,, TNO Institute of Environmental Sciences, Energy Research and Process Innovation
Business Park E. T. V., Laan van Westenenk 501, P. O. Box 342, Apeldoorn 7300 AH, The Netherlands.
Email: [email protected], Tel: +31-55- 549-38-84, Fax: +31-55- 549-32-52.