Generalized Pareto Distribution (GPD)
Generalized Pareto Distribution (GPD)
Sometimes using only block maximum can be wasteful if it ignores much of
the data. It is often more useful to look at exceedances over a given threshold instead of simply the maximum (or minimum)
of the data. extRemes provides for fitting data to GPD models as well as some tools for threshold selection. For
more information on the GPD see appendix.
Fitting Data to a GPD
The general procedure for fitting data to a GPD using extRemes is:
- Analyze
Generalized Pareto Distribution (GPD)
New window appears
- Select a data object from Data Object listbox. Covariates appear in various listboxes.
- Select a response variable from Response listbox. Selected response is removed from other listboxes.
- Enter a threshold (only values above this threshold will be fitted to the GPD)
other options
OK
- A GPD will be fitted and results will appear in the main toolkit window.
Example 1: Hurricane damage
For this example, load the extRemes dataset, damage.R and save it (in R) as damage. That is,
- File
Read Data
- Browse for damage.R in extRemes library data folder
OK
- Check the R source radiobutton.
- Type damage in the Save As (in R) field
OK
Figure 5.1 shows the scatter plot of these data from 1925 to 1995. The data are economic damage of individual
hurricanes in billions of U.S. dollars. These data correspond to the count data discussed in the
section: Fitting data to a Poisson distribution. To learn more
about these data, please see Pielke and Landsea (1998) or
Katz (2002). The time series shows that there was a particularly large
assessment of economic damage early on (in 1926) of over 70 billion dollars. After this time, assessments are much smaller
than this value.

Figure 5.1: Scatter plot of U.S. hurricane damage (in billions $ U.S.).
- Analyze
Generalized Pareto Distribution (GPD)
New window appears
- Select damage from the Data Object listbox. Covariates appear in various listboxes.
- Select Dam from Response listbox. Selected response is removed from other listboxes.
- Enter 6 in the Threshold field.
- optionally check Plot diagnostics
OK
- A GPD will be fitted and results will appear in main toolkit window.
- Note that the Number of obs per year is not relevant for this type of dataset.
Diagnostic plots for the GPD fit for these data with economic damage, Dam,
as the response variable and a threshold of 6 billion dollars are shown in
Figure 5.2. The fit looks pretty good considering the one
rather large outlier from 1926 and only 18 values over the threshold.
The histogram in Figure 5.2 appears to include all of the data,
and not just data above the threshold. However, this is simply a result of the
binning algorithm used; in this case the default Sturges algorithm. The same
histogram can be plotted, with this or a choice of two other algorithms: Scott
or Friedman-Diaconis in the following manner.
- Plot
Fit with Histogram
- Select damage from the Response listbox.
- Select gpd.fit1 from the Select a fit listbox.
- Select a breaks algorithm (here Friedman-Diaconis is selected) and click OK.
The histogram shown in Figure 5.3 used the Friedman-Diaconis
algorithm. Each choice of breaks algorithm is simply a different algorithm for
binning the data for the histogram. The histogram of
Figure 5.3 is still a little misleading in that it looks like
the lower end point is at 5 billion dollars instead of 6 billion dollars and
that it still does not appear to be a good fit to the GPD. In such a case, it
is a good idea to play with the histogram in order to make sure that this
appearance is not simply an artifact of the R function, hist, before
concluding that it is a bad fit. In fact, the histogram shown in
Figure 5.4 looks better. It is currently not possible to
produce this histogram directly from extRemes. This histogram was produced
in the following manner. From the R prompt:
max( damage$models$gpd.fit1$dat)
[1] 72.303
brks
seq(6, 72.303, ,15)
hist( damage$models$gpd.fit1, breaks=brks)
See the help file for the R function hist for more details about plotting
histograms in R. That is, from the R prompt type:
help( hist)

Figure 5.2: GPD fit for hurricane damage data using a threshold of 6 billion dollars.
For these data,
4.6 billion dollars (1.82 billion dollars) and
0.5 (0.340). The model has an associated negative log-likelihood of about 54.65.

Figure 5.3: Histogram for GPD fit for hurricane damage data using a threshold of 6 billion dollars and the
Friedman-Diaconis algorithm for bin breaks.

Figure 5.4: Histogram for GPD fit for hurricane damage data using a threshold of 6 billion dollars and a specialized
vector for the breaks. See text for more details.
Example 2: Fort Collins Precipitation Data
An example of a dataset where more information can be gathered using a threshold exceedance approach is the Fort Collins
precipitation dataset. Read in the file FtCoPrec.R from the data directory in the extRemes library and assign
it to an object called Fort--it may take a few seconds to load this relatively large dataset.
- File
Read Data
New window appears
- Browse to extRemes data directory and select FtCoPrec.R
New window appears
- Select common from the Data Type field
- Check the header checkbutton
- Enter Fort in Save As (in R) field
OK
- Data will be read in as an ``ev.data" object with the name Fort.
This dataset has precipitation data for a single location in Fort Collins, C.O., USA for the
time period 1900-1999. These data are of special interest because of a flood that occurred there
on July 28, 1997. See Katz et al. (2002) for more information on these data.
Figure 5.5 shows a scatter plot of the daily precipitation (by month) at this location.
Using extRemes:
- Plot
Scatter Plot
New window appears
- Select Fort from Data Object listbox. Covariates appear in other listboxes.
- Select month from x-axis listbox and Prec
from y-axis listbox
OK
- Plot in Figure 5.5 should appear.

Figure 5.5: Scatter plot of observed daily precipitation (inches) values by month for a Fort Collins, C.O. rain
gauge.
To fit a GPD model using the toolkit do the following.
- Analyze
Generalized Pareto Distribution (GPD)
New window appears
- Select Fort from Data Object listbox. Covariates appear in other listboxes.
- Select Prec from the Response listbox. Prec is removed from other listboxes.
- Check Plot diagnostics checkbutton.
- Enter 0.395 in the Threshold field
OK
- Note that unlike the hurricane damage dataset, the Number of obs per year field is appropriate in this case
because data are collected on a daily basis throughout the year.
The threshold of 0.395 inches is used as in Katz et al. (2002).
A plot similar to that of Figure 5.6 should appear along with summary statistics for the
GPD fit in the main toolkit window. This fit yields MLE's of
0.32 inches
(0.016 inches),
0.21 (0.038), and a negative log-likelihood of about 85. Note
that we are ignoring, for now, the annual cycle that is evident in Figure 5.5.

Figure 5.6: Diagnostic plots for the GPD fit of the Fort Collins, C.O. Precipitation data using a threshold of 0.395 in.
Figure 5.6 can be reproduced at any time in the following way.
- Plot
Fit Diagnostics
- Select Fort from the Data Object listbox.
- Select gpd.fit1 from the Select a fit listbox
OK.
Figure 5.7 shows a histogram of the data along with the model fit using the Friedman-Diaconis algorithm
for binning (see the help file for hist in R for more details).

Figure 5.7: Histogram of GPD fit to Fort Collins precipitation (inches) data using the Friedman-Diaconis algorithm
for determining the number of breakpoints.
The general procedure for plotting a histogram of a fitted GPD function using extRemes is (identical to that
of the GEV):
- Plot
Fit with Histogram
New window appears
- Select an object from the Data Object listbox
- Select the desired fit object from the Select a fit listbox.
- Select an algorithm from the Breaks Algorithm listbox and click OK
- Histogram is plotted.
Back to Top
Return level and shape parameter (
) (1-
)% confidence bounds
Confidence intervals may be estimated using the toolkit for both the return level and shape parameter (
)
of both the GEV and GP distributions. See Return level and shape parameter (
) (1-
)% confidence limits (for the GEV) for more information on how the
confidence intervals are obtained.
Example: Fort Collins precipitation data
To estimate the confidence limits for the GPD shape parameter using extRemes:
- Analyze
Parameter Confidence Intervals
GPD fit
- Select Fort from Data Object listbox.
- Select gpd.fit1 from Select a fit listbox.
- Leave the default value of 100 in the m-year return level field.
- enter 4 in the Lower limit field of the
Return Level Search Range8
and 7 in the Upper limit field.
- enter 0.1 in Lower limit field of the Shape Parameter (xi) Search Range
8
and enter 0.3 in the Upper limit field
OK
Confidence intervals (in this case 95%) are shown in the main toolkit dialog. For the 100-year return level they
are approximately (4.24, 6.82) inches and for the shape parameter about 0.12 to 0.27, consistent with the
shape parameter being greater than zero. Visual inspection of the dashed vertical lines in
Figure 5.8 act as a guide to the accuracy of the displayed confidence limits; here the estimates
shown appear to be accurate because the dashed vertical lines (for both parameters) appear to intersect the profile
likelihood in the same location as the (lower) horizontal line. Note that the confidence interval for the 100-year
return level includes 4.63 inches, the amount recorded for the high precipitation event of July 1997.

Figure 5.8: Profile log-likelihood plots for GPD 100-year return level (inches) and shape parameter
(
) for Fort Collins, C.O. precipitation data.
Back to Top
Threshold Selection
Threshold selection is an important topic, and still an area of active research. It is desired to find a threshold that
is high enough that the underlying theoretical development is valid, but low enough that there is sufficient data with
which to make an accurate fit. That is, selection of a threshold that is too low will give biased parameter estimates, but a
threshold that is too high will result in large variance of the parameter estimates. Some useful descriptive tools for
threshold selection are included with extRemes. Specifically, the mean excess, or mean residual life, plot and another
method involving the fitting of data to a GPD several times using a range of different thresholds.
Back to Top
Threshold Selection: Mean Residual Life Plot
Mean residual life plots, also referred to as mean excess plots in statistical literature,
can be plotted using extRemes. For more information on the mean residual
life plot (and threshold selection) see appendix. The general
procedure for plotting a mean residual life plot using extRemes is:
- Plot
Mean Residual Life Plot
New window appears
- Select an object from Data Object listbox. Variables appear in Select Variable
listbox. Select one.
- Choose other options
OK.
- Mean residual life plot appears.
Example: Fort Collins precipitation
Figure 5.9 shows the mean residual life plot for the Fort Collins, C.O. precipitation dataset. Interpretation of
a mean residual life plot is not always simple in practice. The idea is to find the lowest threshold where the plot is
nearly linear; taking into account the 95% confidence bounds. For the Fort Collins data, it is especially difficult to
interpret, which may be because of the annual cycle (seasonality) that is being ignored here. Nevertheless, the plot appears
roughly linear from about 0.3 to 2.5 inches and is erratic above 2.5 inches, so 0.395 inches is a plausible choice of
threshold.

Figure 5.9: Mean Residual Life Plot of Fort Collins precipitation data. Thresholds (u) vs Mean Excess precipitation
(in inches).
To plot Figure 5.9 using extRemes:
- Plot
Mean Residual Life Plot
- Select Fort from the Data Object listbox.
- Select Prec (the dependent variable) from the Select Variable listbox.
Notice that you may also change the confidence level and the number of thresholds to plot. Here, just leave them as
their defaults (95% and 100) and click on OK.
Back to Top
Threshold Selection: Fitting data to a GPD Over a Range of Thresholds
The second method for trying to find a threshold requires fitting data to the GPD distribution several times, each time
using a different threshold. The stability in the parameter estimates
can then be checked. The general procedure for fitting threshold ranges to a GPD is:
- Plot
Fit Threshold Ranges (GPD)
New window appears
- Select a data object from Data Object listbox. Variables appear in Select Variable
listbox. Select one
- Enter lower and upper limits and number of thresholds in remaining fields
OK.
- If successfull, plot will appear. Otherwise, try different ranges.
Example: Fort Collins precipitation
Figure 5.10 shows plots from having fit the GPD model for a range of 50 thresholds from 0.01 inches to 1
inch for the Fort Collins precipitation data (see Fitting Data to a GPD section for more
information on these data).
Figure 5.10 suggests that, for the GPD model, a threshold of 0.395 inches is appropriate.

Figure 5.10: GPD fits for a range of 50 thresholds from 0.01 inches to 1 inch for the Fort Collins precipitation dataset.
To create the plot from Figure 5.10 using extRemes, do the following.
- Plot
Fit Threshold Ranges (GPD)
- Select Fort from the Data Object listbox.
- Select Prec from the Select Variable listbox.
- Enter 0.01 in the Minimum Threshold field.
- Enter 1 in the Maximum Threshold field.
- Enter 30 in the Number of thresholds field
OK.
Note that different values may be tried here as well, but the program will fail for certain choices. Keep trying different
threshold ranges until it works.
8For the Fort Collins, C.O. precipitation data the MLE for the 100-year return level is near 5
inches and
0.19, so a good
search range for the confidence limits would include 5 and be wide enough to capture the actual limits. If any of the
search range fields are left blank, extRemes will try to find a reasonable search limit (for each field left blank)
automatically. It is a good idea to check the plot profile likelihoods checkbutton when searching for ranges
automatically. This way, the profile likelihoods with vertical dashed lines at estimated limits will be displayed; if
dashed lines intersect profile at lower horizontal line, then the estimate is reasonably accurate. For this example, 4 to 7
inches are used for the 100-year return level and 0.1 to 0.3 for the shape parameter.
Back to Top
Back to Table of Contents