, of the GEV(
,
,
)
distribution to vary cyclically with time by replacing
by
(t) =
+
sin(2
t/365.25) +
2cos(2
t/365.25). When allowing the
scale parameter to vary, it is important to ensure that
(t)
0, for all t. Often a link function that only yields
positive output is employed. The {\em log} link function is available for this purpose as an option with extRemes.
For example, the model
(x) =
exp(
+
x) can be employed
using the default linear representation
log
+
x
by checking the appropriate Link button. While it is also possible to allow the shape parameter to vary, it is
generally difficult to estimate this parameter with precision; so it is unrealistic to allow this parameter to vary as a
smooth function. One alternative is to allow it to vary on a larger scale (e.g., fit a different distribution for each
season) if enough data are available (see, for example, Coles (2001) (b)
section 6.1).
Transform Data --
Indicator Transformation
OK.
t/365.25) and cos(2
t/365.25), where t is
simply the obs column found in Fort$data (i.e., t = 1,...,36524). Using extRemes:
Transform Data
Trigonometric Transformation
OK.
) as a function of time. Specifically, we want to find
log (t) = 0 +
1sin(2 t/365.25) +
2cos(2 t/365.25) =
0 +
1. obs.sin365 +
2. obs.cos365.
| (7.1) |
Poisson Distribution
OK.
0
-3.72 (0.037),
1
0.22 (0.046) and
2
-0.85 (0.049). Note
also that the likelihood-ratio against the null model (Example 1 above) is about 355 with associated p-value
0, which indicates that the addition of an annual cycle is significant.
Next, we fit the GPD with the same annual cycle as a covariate in the scale parameter. That is, the
scale parameter is modeled by
log (t)= +
sin(2 t/365.25) +
2cos(2 t/365.25).
| (7.2) |
Generalized Pareto Distribution (GPD)
OK
0
-1.24 (0.053),
1
0.09 (0.048) and
2
-0.30 (0.069), and for the shape
parameter
0.18 (0.037). The negative log-likelihood value is about 73, and the
likelihood-ratio test between this fit and that of section~\ref{subsec:FitGPD} Example 2 is about 24 (associated p-value
nearly zero) indicating that inclusion of the annual cycle is significant.
mths
Fort$data[,"month"]
u.fortcollins
0.475+5*(-0.03*cos(2*pi*mths/12))
Figure 7.1 shows a plot of the Fort Collins precipitation data with both the previously used constant threshold
of 0.4 inches and the above cyclical threshold. The following R commands created the plot in
Figure 7.1.
prec
Fort$data[,"Prec"]
plot( mths, Fort$data[,"Prec"], xlab="Month", ylab="precipitation (inches)", xaxt="n")
axis(1, labels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), at=1:12)
abline( h=0.4)
lines( mths[order(mths)], u.fortcollins[order(mths)], col="blue")
Fitting data to a point process model using u.fortcollins to fit a nonconstant (seasonal) threshold gives parameter
estimates:
1.40 inches (0.043 inches),
0.53 inches (0.034 inches) and
0.16 (0.040); and
associated negative log-likelihood of about -619.64. The ideal model would be based on a nonconstant threshold, but it is
also possible to include annual cycles in the parameters; compare estimates to those found when including a seasonal cycle
in the scale parameter from section PP. Inspection of the diagnostic
plots (Figure 7.2) suggests that the model assumptions seem reasonable. For different
cycles in the threshold with higher peaks in the summer months resulted in rather poor fits suggesting that too much data
is lost, so the lower thresholds are necessary.


Decluster
OK.
Decluster
OK.
= nc/N, where
nc is the estimated number of clusters and N is the total number of
exceedances over the threshold, u. The second estimate is more complicated, but is made prior to declustering
the data, and is called the intervals estimator (Ferro and Segers (2003)). Please
see appendix for the definition of this estimate.
Other information given in the main toolkit dialog is a suggested run length based on the procedure of
Ferro and Segers (2003) of r=11, but this number should be disregarded here because we
are declustering by year. The procedure for determining the "best" run length employed with this software does not account
for covariates when declustering. It is important to decluster by year here because we do not want values from August of
one year to be clustered with values from July of the following year. If it were determined unnecessary to decluster by
year, then r=11 would still apply for declustering without taking into account the year.
Note that because this process reduces the number of data points, values below the threshold have been "filled in" so that
the declustered data will have the correct dimensions in order to be added to the original data matrix. Specifically, every
point not found to be a cluster maxima is converted to be the minimum of the data and the threshold--i.e.,
min(x,u). These filled-in values will not affect any POT analyses (using the same or higher
threshold) because they are less than the threshold, and subsequently discarded. The original positions of the cluster
maxima are preserved so that any covariates will not require further transformations. The optional use of the
Decluster by feature ensures that, in this case, values from one year will not be clustered with values from another
year.
The next step is to fit the declustered data to a GPD.
Generalized Pareto Distribution (GPD)
OK.
| |||||||||||||||||||||||||||||||||||
Table 7.1: Results of fitting data to the GPD to (negative) minimum temperature (degrees Fahrenheit) using a threshold
of 73 degrees at Phoenix Sky Harbor airport with: no declustering, runs declustering with run length r=1 (150
clusters), runs declustering with r=2 (138 clusters) and runs declustering with r=5 (115 clusters). Here, is the extremal index estimated by (nc)/n, and
is the extremal index estimated as in
Ferro and Segers (2003).
|
t =
0 +
1.(t-1948), which has a significant positive slope.
It is, therefore, of interest to also investigate incorporation of a trend into the GPD model. We will also make use here
of the Poisson-GP model (or see
appendix for more details). The fitted values for
the Poisson rate parameter model
log
(t) =
0 +
1t,
with t=1,...,43 (Year - 47) are:
0
-2.38 (0.160)
1
-0.04 (0.008).
) =
+
t, t as above, are given
by:
0
1.66 (0.161)
1
-0.02 (0.008)
-0.24 (0.069)
1, 0.95 critical value of 3.84, and the associated p-value
of about 0.040; indicating that the inclusion of a temporal trend in the scale parameter is statistically significant, if
only slightly. Of course, the apparent trend in Figure 6.3 is
relatively small, but nevertheless apparent so that it is reasonable to include such a trend in the Poisson-GP model.