Modeling Spatio-Temporal Precipitation using Hidden Markov Models

Project Participants

Introduction

Prediction and modeling of rainfall is an important problem in atmospheric sciences and agriculture. It is often addressed using statistical learning methods since global circulation and climate change models are too coarse and inaccurate to capture properties of precipitation for a specific location. We consider a problem of modeling precipitation occurrence for a network of rain stations. Ideally, the model should capture a number of data properties, e.g. spatial dependencies between pairs of rain stations, the temporal (e.g. run-length) distribution of the wet and dry spell lengths, interannual variability in the number of rainy days per season. What makes the problem difficult is the variety of aspects of data to be modeled.

Example

Predicting seasonal rainfall in Northeast region of Brazil is of great interest to the atmospheric scientists, in particular at IRI. As one of the goals, they are interested in modeling rainfall occurrences for February-March-April (FMA) season for the state of Ceará (Figure 1). The data for the region consists of rainfall records for 10 rain-gauge stations for the period beginning at 1975. Once the years with significant number of missing observations are discarded, we end up with data for 10 rain stations over 24 years with 90 binary (rain/no rain) observations each.

Figure 1: Rainfall station locations with topographic contours (meters). Circle size denotes the February-April climatological daily rainfall probability (%) 1975-2002. The stations are: (1) Acopiara (317 m), (2) Aracoiaba (107 m), (3) Barbalha (405 m), (4) Boa Viagem (276 m), (5) Camocim (5 m), (6) Campos Sales (551 m), (7) Caninde (15 m), (8) Crateus (275 m), (9) Guaraciaba Do Norte (902 m), and (10) Ibiapina (878 m). One degree of longitude/latitude corresponds to about 110 km at the equator.

Methodology

Our approach is to model daily precipitation for the network conditioned on a small number of "weather" states. The states are not explicitely known and treated as a random variable. A sequence of precipitation occurrences is modeled as a hidden Markov model (HMM) with weather states hidden and having first-order Markov dependence, and observations for different days independent given the values of corresponding weather states (Figure 2). Precipitation occurrences for each station on a given day are further assumed to be independent conditioned on the value of the weather state.

Graphical model of HMM with conditionally independent output components

Figure 2: Graphical model of a hidden Markov model. States S₁,...,S_T correspond to latent weather states while output vectors R₁,...,R_T are daily precipitation occurrences for the network.

While this model can capture some global properties of the data, it cannot capture interannual variability due to outside atmospheric factors. For example, using HMMs we cannot predict whether a season from a test data is going be rainier than average or not since there is no mechanism in the model to distinuish unseen sequences. Without a mechanism to use information other than historical precipitation, the model cannot be used for prediction.

Atmoshperic scientists often use general circulation models (GCM) to extrapolate the future physical state of the atmosphere. GCMs can produce with reasonable accuracy values for sea-surface temperatures, sea-surface pressure, wind vectors, precipitation, and other atmospheric variables on a grid of typically 2.5º×2.5º on the daily (or sometimes even finer) time intervals. While these predictions are not accurate enough to predict precipitation for a particular location directly, they can be used as additional input vectors to improve the descriptive power of HMMs as well as to distinguish unseen data. To incorporate atmospheric variables into HMM, we make the transition matrix representing the probability distribution P(S_t|S_t-1) dependent on the corresponding value of the atmospheric variable (Figure 3).

Graphical model of a non-homeogeneous HMM

Figure 3: Graphical model of a non-homogeneous hidden Markov model. States S₁,...,S_T correspond to latent weather states while output vectors R₁,...,R_T are daily precipitation occurrences for the network; X₁,...,X_T are vectors of atmospheric variables.

Software

MVNHMM Toolbox

Results to date

We have used this framework to train models and analyze their predictive power on the hold-out set for the Northeast Brazil region. The results are described in detail in the related paper.

Papers

S. Kirshner. Modeling of multivariate time series using hidden Markov models, Ph.D. thesis. [PDF]
S. Kirshner, P. Smyth, A.W. Robertson. Conditional Chow-Liu tree structures for modeling discrete-valued vector time series, Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI-2004), July 2004. [PDF] New method of modeling discrete-valued multivariate time series with application to modeling of multi-site precipitation occurrences.
A.W. Robertson, S. Kirshner, P. Smyth. Downscaling of daily rainfall occurrence over Northeast Brazil using a hidden Markov model, Journal of Climate, 17(22):4407-4424, November 2004. [PDF (from Allen Press)] (or a TR version) Results of modeling precipitation occurrences of Northeast region of Brazil.
J. P. Hughes, P. Guttorp, S. P.Charles. A non-homogeneous hidden Markov model for precipitation occurrence, Journal of the Royal Statistical Society Series C Applied Statistics, 48(1):15-30, 1999. [PDF (from Blackwell Publishing)] Earlier paper describing NHMMs and how they can be applied to modeling precipitation occurrence for a network of stations.
J.P. Hughes, P. Guttorp. Incorporating spatial dependence and atmospheric data in a model of precipitation, Journal of Applied Meteorology, 33(12):1503-1515, December 1994. [PDF (from Allen Press)] First paper describing NHMMs and their application to precipitation occurrence.

Collaborator and Funding

This is joint work with Andrew Robertson at the International Research Institute (IRI) for Climate Prediction at Columbia University, and it is supported by the Department of Energy.

Related Web Pages of Interest

Information and Computer Science
University of California, Irvine CA 92697-3425

Last modified: December 21, 2003