Block Maxima Approach

Block Maxima Approach

One approach to working with extreme value data is to group the data into blocks of equal length and fit the data to the maximums of each block, for example, annual maxima of daily precipitation amounts. The choice of block size can be critical as blocks that are too small can lead to bias and blocks that are too large generate too few block maxima, which leads to large estimation variance (see Coles (2001) (b) Ch. 3). The block maxima approach is closely associated with the use of the GEV family. Note that all parameters are always estimated (with extRemes) by maximum likelihood estimation (MLE), which requires iterative numerical optimization techniques. See Coles (2001) (b) section 2.6 on parametric modeling for more information on this optimization method.

Fitting data to a GEV distribution

The general procedure for fitting data to a GEV distribution with extRemes is
Example 1: Port Jervis data
This example uses the PORT dataset (see Example 2: Loading an R source Dataset) to illustrate fitting data to a GEV using extRemes. If you have not already loaded these data, please do so before trying this example. Figure 2.1 shows a time series of the annual (winter) maximum temperatures (degrees centigrade).


Figure 2.1: Time series of Port Jervis annual (winter) maximum temperature (degrees centigrade).


An R graphics window appears displaying the probability and quantile plots, a return-level plot, and a density estimate plot as shown in Figure 2.2. In the case of perfect fit, the data would line up on the diagonal of the probability and quantile plots. Briefly, the quantile plot compares the model quantiles against the data (empirical) quantiles. A quantile plot that deviates greatly from a straight line suggests that the model assumptions may be invalid for the data plotted. The return level plot shows the return period against the return level, and shows an estimated 95\% confidence interval. The return level is the level (in this case temperature) that is expected to be exceeded, on average, once every m time points (in this case years). The return period is the amount of time expected to wait for the exceedance of a particular return level. For example, in Figure 2.2, one would expect the maximum winter temperature for Port Jervis to exceed about 24 degrees centigrade on average every 100 years. Refer to Coles (2001) (b) Ch. 3 for more details about these plots.


Figure 2.2: GEV fit diagnostics for Port Jervis winter maximum temperature dataset. Quantile and return level plots are in degrees centigrade.


In the status section of the main window, several details of the fit are displayed. The maximum likelihood estimates of each of the parameters are given, along with their respective standard errors. In this case, 15.14 degrees centigrade (0.39745 degrees), 2.97 degrees (0.27523 degrees) and -0.22 (0.0744). The negative log-likelihood for the model (172.7426) is also displayed.



Note that Figure 2.2 can be re-made in the following manner. It may be of interest to incorporate a covariate into one or more of the parameters of the GEV. For example, the dominant mode of large-scale variability in mid-latitude Northern Hemisphere temperature variability is the North Atlantic Oscillation-Arctic Oscillation (NAO-AO). Such a relationship should be investigated by including these indices as a covariate in the GEV. See Fitting data to a GEV distribution with a covariate for inclusion of one of these variables as a covariate.

Back to Top

Return level and shape parameter () (1-)% confidence limits

Confidence intervals may be estimated using the toolkit for either the m-year return level or shape parameter () of either the GEV distribution or the GPD. The estimates are based on the profile likelihood method; finding the intersection between the respective profile likelihood values and , where is the distance between the maximum of the profile log-likelihood and the quantile of a distribution (see Coles (2001) (b) section 2.6.5 for more information). The general procedure for estimating confidence limits for return levels and shape parameters of the GEV distribution using extRemes is as follows.
Example: Port Jervis Data Continued
MLE estimate for 100-year return levels in the above GEV fit for the Port Jervis data are found to be somewhere between 20 and 25 degrees (using the return level plot), and -0.2 ( 0.07). These values can be used in finding a reasonable search range for estimating the confidence limits. In the case of the return level one range that finds correct5 confidence limits is from 22 to 28, and similarly, for the shape parameter, from -0.4 to 0.1. To find confidence limits, do the following. 5 Estimated confidence limits should now appear in the main toolkit dialog. In this case, the estimates are given to be about 22.42 to 27.18 degrees for the 100-year return level and about -0.35 to -0.05 for indicating that this parameter is significantly below zero (i.e., Weibull type). Of course, it is also possible to find limits for other return levels (besides 100-year) by changing this value in the m-year return level field. Also, the profile likelihoods (Figure 2.3) can be produced by clicking on the check checkbutton for this feature. In this case, our estimates are good because the dashed vertical lines intersect the likelihood at the same point as the lower horizontal line in both cases.

Figure 2.3: Profile likelihood plots for the 100-year return level (degrees centigrade) and shape parameter () of the GEV distribution fit to the Port Jervis dataset.


Back to Top

Fitting data to a GEV distribution with a covariate

The general procedure for fitting data to a GEV distribution with a covariate is similar to that of fitting data to a GEV without a covariate, but with two additional steps. The procedure is:
Example 2: Port Jervis data with a covariate
To demonstrate the ability of the Toolkit to use covariates, we shall continue with the Port Jervis data and fit a GEV on TMX1, but with the Atlantic Oscillation index, AOindex, as a covariate with a linear link to the location parameter. See Wettstein and Mearns for more information on this index.

Analyze Generalized Extreme Value (GEV) Distribution.
The status window now displays information similar to the previous example, with one important exception. Underneath the estimate for MU (now the intercept) is the estimate for the covariate trend in mu as modeled by AOindex. In this case,

15.25 + 1.15(AOindex)

Figure 2.4 shows the diagnostic plots for this fit. Note that only the probability and quantile plots are displayed and that the quantile plot is in the Gumbel scale. See the appendix for more details.


Figure 2.4: GEV fit diagnostics for Port Jervis winter maximum temperature dataset with AOindex as a covariate. Both plots are generated using transformed variables and therefore the units are not readily interpretable. See
appendix for more details.


A test can be performed to determine if this model with AOindex as a covariate is an improvement over the previous fit without a covariate. Specifically, the test compares the likelihood-ratio, , where and are the likelihoods for each of the two models ( must be nested in ), to a quantile, where is the difference in the number of estimated parameters. In this case, we have three parameters estimated for the example without a covariate and four parameters for the case with a covariate because

= + (AOindex)

giving us the new parameters: , , and . So, for this example, =4-3=1. See Coles (2001) (b) section 6.2 for details on this test. Note that the model without a covariate was stored as gev.fit1 and the model with a covariate was stored as gev.fit2; each time a GEV is fit using this data object, it will be stored as gev.fitN, where N is the N-th fit performed. The general procedure is: For this example, the likelihood-ratio is about 11.89, which is greater than the 95\% quantile of the distribution of 3.8415, suggesting that the covariate AOindex model is a significant improvement over the model without a covariate. The small p-value of 0.000565 further supports this claim. In addition to specifying the covariate for a given parameter, the user has the ability to indicate what type of link function should relate that covariate to the parameter. The two available link functions ( identity and log) are indicated by the radiobuttons to the right of the covariate list boxes. This example used the identity link function (note that the log link is labeled exponential in Stuart Coles' software ( ismev)). For example, to model the scale parameter () with the log-link and one covariate, say x, gives

= exp( + x), or

log = + x.




5If the Lower limit (or Upper limit) field(s) is/are left blank, extRemes will make a reasonable guess for these values. Always check the Plot profile likelihoods checkbutton, and inspect the plots when finding limits automatically in order to ensure that the confidence intervals are correct or not. If they do not appear to be correct (i.e., if the dashed vertical line(s) does/do not intersect the profile likelihood at about where the lower horizontal line intersects the profile likelihood), the resulting plot might suggest appropriate limits to input manually.

6 If fit from M0 has more components than that of M1, extRemes will assume M1 is nested in M0, and computes the likelihood-ratio accordingly.


Back to Top

Back to Table of Contents