Preliminaries
    • Starting the Extremes Toolkit
    • Data
      • Loading a dataset
      • Simulating data from a GEV distribution

      • Simulating data from a GPD
      • Loading an R Dataset from the Working Directory

    Preliminaries

    Once extRemes has been installed (see http://www.isse.ucar.edu/extremevalues/evtk.html for installation instructions), the toolkit must be loaded into R (each time a new R session is invoked). Instructions for loading extRemes into your R session are given in Starting the Extremes Toolkit. Once the toolkit is loaded, then data to be analyzed must be read into R, or simulated, as an "ev.data" object (a dataset readable by extRemes). Instructions for reading various types of data into R are given in Loading a dataset, and for Simulating data from the GEV distribution or Simulating data from a GPD. Finally, Loading an R Dataset from the Working Directory discusses creating an "ev.data" object from within the R session. For a quick start to test the toolkit, follow the instructions from Simulating data from the GEV distribution.

    Starting the Extremes Toolkit

    It is assumed here that extRemes is already installed, and it merely needs to be loaded. If extRemes has not yet been installed, please refer to the extRemes web page a http://www.esig.ucar.edu/extremevalues/evtk.html for installation instructions.

    To start the Extremes Toolkit, open an R session and from the R prompt, type

    library( extRemes)

    The main extRemes dialog should now appear. If it does not appear, please see Troubleshooting to troubleshoot the problem. If at any time while extRemes is loaded this main dialog is closed, it can be re-opened by the following command.

    extremes.gui()

    Back to Top

    Data

    The Extremes Toolkit allows for both reading in existing datasets (i.e., opening a file), and for the simulation of values from the generalized extreme value (GEV) and generalized Pareto (GP) distributions.

    Loading a dataset

    The general outline for reading in a dataset to the extreme value toolkit is
    • File Read Data New window appears
    • Browse for file and Select Another new window appears
    • Enter options assign a Save As (in R) name
    • OK Status message displays.
    • The data should now be loaded in R as an ev.data list object.
    There are two general types of datasets that can be read in using the toolkit. One type is referred to here as common and the other is R source. Common data can take many forms as long as any headers do not exceed one line and the rows are the observations and the columns are the variables. For example, Table 1 represents a typical common dataset; in this case data representing U.S. flood damage. See Pielke and Downton (2000) or Katz et al. (2002) for more information on these data.

    Table 1: U.S. total economic damage (in billion $) due to floods (USDMG) by hydrologic year from 1932-1997. Also gives damage per capita (DMGPC) and damage per unit wealth (LOSSPW). See Pielke and Downton (2000) for more information.
    OBS HYEAR USDMG DMGPC LOSSPW
    1 1932 0.1212 0.9708 36.73
    2 1933 0.4387 3.4934 143.26
    3 1934 0.1168 0.9242 39.04
    4 1935 1.4177 11.1411 461.27
    ... ... ... ... ...
    64 1995 5.1108 19.4504 235.34
    65 1996 5.9774 22.5410 269.62
    66 1997 8.3576 31.2275 367.34


    An R source dataset is a dataset that has been dumped from R. These typically have a .R or .r extension. That is, it is written in R source code from within R itself. Normally, these are not the types of files that a user would need to load. However, extRemes and many other R packages include these types of datasets for examples. It is easy to decipher if a dataset is an R source file or not. For example, the same dataset in
    Table 1 would look like the following.

    "Flood"
    structure(list(OBS = c(1, 2, 3, 4,..., 64, 65, 66),
            HYEAR = c(1932, 1933, 1934, 1935, ..., 1995, 1996, 1997),
            USDMG = c(0.1212, 0.4387, 0.1168, 1.4177, ..., 5.1108, 5.9774, 8.3576),
            DMGPC = c(0.9708, 3.4934, 0.9242, 11.1411, ..., 19.4504, 22.541, 31.2275),
            LOSSPW = c(36.73, 143.26, 39.04, 461.27, ..., 235.34, 269.62, 367.34)),
            .Names = c("OBS", "HYEAR", "USDMG", "DMGPC", "LOSSPW"),
            class = "data.frame", row.names = c("1", "2", "3", "4", ..., "64", "65", "66"))

    Apart from the Flood data, all other datasets included with the toolkit are R source datasets. Data loaded by extRemes are assigned to a list object with class attribute "ev.data". A list object is a convenient way to collect and store related information in R. A list object can store different types of objects in separate components. For example, a character vector, a matrix, a function and maybe another matrix can all be stored as components in the same list object. When data are first loaded into the toolkit, it has three components: data, name and file.path. data is the actual data read in (or simulated), name is a character string giving the original file name, for example "Flood.dat", and file.path is a character string giving the full path where the data was read from. When data are fit to a particular model, say a GEV distribution, then there will be a new component called models in the original list object. This new component is also a list whose components will include each fit. Specifically, each GEV fit will be assigned the name "gev.fit1", "gev.fit2" and so on, where the first fit is "gev.fit1", the second gev.fit2", etc. Component names of a list object can be found by using the R function names as shown in the example below. To look at components of a list, type the list name followed by a dollar sign followed by the component name. For example, if you have a list object called George with a component called finance, you can look at this component by typing George$finance (or George[["finance"]]) at the R prompt.

    Example 1: Loading a Common Dataset

    Here we will load the common dataset, Flood.dat, which will be located in the extRemes data directory. From the main toolkit dialog, select File Read Data. A new window appears for file browsing. Go to the extRemes data directory and select the file Flood.dat; another new window will appear that allows you to glance at the dataset (by row) and has some additional options. That is,
    • File Read Data New window appears.
    • Browse for file Flood.dat Open Another new window appears.
    Leave the Common radiobutton checked and because the columns are separated by white space, leave the delimiter field blank; sometimes datasets are delimited by other symbols like commas "," and if that were the case it would be necessary to put a comma in this field. Check the Header checkbutton because this file has a one line header. Files with headers that are longer than one line cannot be read in by the toolkit. Enter a Save As (in R) name, say Flood, and click OK. A message in the R console should display that the file was read in correctly. The steps for this example, once again, are:
    • 1. File Read Data New window appears.
    • 2. Browse for file Flood.dat Open Another new window appears.
    • 3. Check Header
    • 4-5. Enter Flood in Save As (in R) field > OK.
    • Message appears saying that file was successfully opened.
    Each of the above commands will look something like the following on your computer screen. Note that the appearance of the toolkit will vary depending on the operating system used. 1. File Read Data New window appears.


    2. Browse for file Flood.dat1 > Open Another new window appears.
    Note that the window appearances are system dependent. The following two screenshots show an example from a Windows operating system (OS), and the following shows a typical example from a Linux OS. If you cannot find these datasets in your extRemes data directory (likely with the newer versions of R), you can obtain them from here.




    3. Check Header
    4-5. Enter Flood in Save As (in R) field > OK.


    Message appears saying that file was successfully opened along with summary statistics for each column of the dataset. The current R workspace is then automatically saved with the newly loaded data.



    Figure 1.1 shows a time series plot of one of the variables from these data, USDMG. Although extRemes does not currently allow for time series data in the true sense (e.g., does not facilitate objects of class "ts"), such a plot can be easily created using the toolkit.



    Figure 1.1: Time series plot of total economic damage from U.S. floods (in billion $).




    The general procedure is:
    Plot Scatter Plot New dialog window appears.
    • Select Flood from Data Object listbox.
    • Select line from the Point Character (pch) radiobuttons.
    • Select HYEAR from x-axis variable listbox.
    • Select USDMG from y-axis variable listbox > OK.
    • Time series is plotted in a new window (it may be necessary to minimize other windows in order to see plot).
    To see the names of the list object created, use the R function names. That is,

    names( Flood)
    [1] "data" "name" "file.path"
    To look at a specific component, say name, do the following.
    Flood$name
    [1] "Flood.dat"
    To look at the first three rows of the flood dataset, do the following.
    Flood$data[1:3,]

    Example 2: Loading an R source Dataset

    The data used in this example were provided by Linda Mearns of NCAR. The file PORTw.R consists of maximum winter temperature values for Port Jervis, N.Y. While the file contains other details of the dataset, the maximum temperatures are in the seventh column, labeled "TMX1". See Wettstein and Mearns (2002) for more information on these data.

    The first step is to read in the data. From the main window labeled "Extremes Toolkit", select
    File Read Data


    An additional window will appear that enables the browsing of the directory tree. Find the file PORTw.R, located in the data directory of the extRemes library. Highlight it and click Open (or double click Portw.R).
    (Windows display shown here)


    Another window will appear providing various options. Because these example data are R source data, check the radiobutton for R source under File type. R source datasets do not have headers or delimiters and these options can be ignored here. For this example, enter the name PORT into the Save As (in R) field and click OK to load the dataset.
    A message is displayed that the file was successfully read along with a summary of the data. Note that if no column names are contained in the file, each column will be labeled with "V" and a numerical index (as this is the convention in both R and S).

    Back to Top

    Simulating data from a GEV distribution

    A fundamental family of distributions in extreme value theory is the generalized extreme value (GEV) . To learn more about this class of distributions see appendix. The general procedure for simulating data from a GEV distribution is:
    • File Simulate Data Generalized Extreme Value (GEV)
    • Enter options and a Save As name > Generate > Plot of simulated data appears
    • The simulated dataset will be saved as an ev.data object.
    In order to generate a dataset by sampling from a GEV, select

    File Simulate Data Generalized Extreme Value (GEV)

    from the main Extremes Toolkit window. The simulation window displays several options specific to the GEV. Namely, the user is able to specify the location (mu), the scale (sigma) and shape (xi) parameters. In addition, a linear trend in the location parameter may be chosen as well as the size of the sample to be generated. As discussed in Loading a dataset, it is a good idea to enter a name in the Save As field. After entering the options, click on Generate to generate and save a simulated dataset. The status section of the main window displays the parameter settings used to sample the data and a plot of the simulated data, such as in Figure 1.2, is produced.

    Figure 1.2: Plot of data simulated from a GEV distribution using all default values: =0, trend=0, =1, =0.2 and sample size=50.


    For example, simulate a dataset from a GEV distribution (using all the default values) and save it as gevsim1. That is,
    • File Simulate Data Generalized Extreme Value (GEV)


    • Enter gevsim1 in the Save As field Generate
    • Plot appears, message on main toolkit window displays parameter choices and an object of class "ev.data" is saved with the name gevsim1.
    Once a dataset has been successfully loaded or simulated, work may begin on its analysis. The Extremes Toolkit provides for fitting data to the GEV, Poisson and generalized Pareto (GPD) distributions as well as fitting data to the GEV indirectly by the point process (PP) approach. For the above example, fit a GEV distribution to the simulated data. Results will differ from those shown here as the data are generated randomly each time. To fit a GEV to the simulated data, do the following.
    • Analyze Generalized Extreme Value (GEV) Distribution New window appears
    • Select gevsim1 from the Data Object listbox.
    • Select gev.sim from the Response listbox.
    • Check the Plot diagnostics checkbutton. OK
    A plot similar to the one in Figure 1.3 should appear. For information on these plots please see Fitting data to a GEV distribution. Briefly, the top two plots should not deviate much from the straight line and the histogram should match up with the curve. The return level plot gives an idea of the expected return level for each return period. The maximum likelihood estimates (MLE) for the parameters of the fit shown in Figure 1.3 were found to be -0.31 (0.15), 0.9 (0.13) and 0.36 (0.15) with a negative log-likelihood value for this model of approximately 84.07. Again, these values should differ from values obtained for different simulations. Nevertheless, the location parameter, , should be near zero, the scale parameter, , near one and the shape parameter, , near 0.2 as these were the parameters of the true distribution from which the data was simulated. An inspection of the standard errors for each of these estimates (shown in parentheses above) reveals that the location parameter is two standard deviations below zero, the scale parameter is well within the first standard deviation from one and the shape parameter is only about one standard deviation above 0.2, which is quite reasonable.

    Figure 1.3: Diagnostic plots for GEV fit to a simulated dataset.


    It is also possible to incorporate a linear trend in the location parameter when simulating from a
    GEV distribution using this toolkit. That is, it is possible to simulate a GEV distribution with a nonconstant location parameter of the form (t)=+ t, where =0 and is specified by the user. For example, to simulate from a GEV with =0.3 do the following.

    • File Simulate Data Generalized Extreme Value (GEV)


    • Enter 0.3 in the Trend field and gevsim2 in the Save As field Generate.
    The trend should be evident from the scatter plot. Now, first fit the GEV without a trend in the location parameter.
    • Analyze Generalized Extreme Value (GEV) Distribution


    • Select gevsim2 from the Data Object listbox.
    • Select gev.sim from the Response listbox.
    • Check the Plot diagnostics checkbutton OK.
    A plot similar to that of Figure 1.4 should appear. As expected, it is not an exceptional fit.

    Figure 1.4: Simulated data from GEV distribution with trend in location parameter fit to GEV distribution without a trend.


    Next fit these data to a GEV, but with a trend in the location parameter.

    • Analyze Generalized Extreme Value (GEV) Distribution


    • Select gevsim2 from the Data Object listbox.
    • Select gev.sim from the Response listbox.
    • Select obs from the Location Parameter (mu) listbox (leave identity as link function).
    • Check the Plot diagnostics checkbutton. OK.
    Notice that only the top two diagnostic plots are plotted when incorporating a trend into the fit as in Figure 1.5. The fit appears, not surprisingly, to be much better. In this case, the MLE for the location parameter is

    0.27 + 0.297 obs

    and associated standard errors are 0.285 and 0.01 respectively; both of which are well within one standard deviation of the true values (=0 and =0.3) that we used to simulate this dataset. Note that these values should be slightly different for different simulations, so your results will likely differ from these here. Values for this particular simulation for the other parameters were also within one standard deviation of the true values.

    Figure 1.5: Simulated data from GEV distribution with trend in location parameter fit to GEV distribution with a trend.


    A more analytic method of determining the better fit is a likelihood-ratio test. Using the toolkit try the following.
    • Analyze Likelihood-ratio test


    • Select gevsim2 from the Data Object listbox.
    • Select gev.fit1 from the Select base fit (M0) listbox.
    • Select gev.fit2 from the Select comparison fit (M1) listbox OK .
    In the case of the data simulated here, the likelihood-ratio test overwhelmingly supports, as expected, the model incorporating a trend in the location parameter with a likelihood ratio of about 117 compared with a 0.95 quantile of the distribution of only 3.8415 and p-value approximately zero.

    Back to Top

    Simulating data from a GPD

    It is also possible to sample from a Generalized Pareto Distribution (GPD) using the toolkit. For more information on the GPD please see Fitting Data to a GPD. The general procedure for simulating from a GPD is as follows.
    • File Simulate Data Generalized Pareto (GP)
    • Enter options and a Save As name > Generate
    • A scatter plot of the simulated data appears, a message on the main toolkit window displays chosen parameter values and an object of class "ev.data" is created.
    Figure 1.6 shows the scatter plot for one such simulation. As an example, simulate a GP dataset in the following manner.
    • File Simulate Data Generalized Pareto (GP)


    • Leave the parameters on their defaults and enter gpdsim1 in the Save As field > Generate
    • A scatter plot of the simulated data appears and a message on main toolkit window displays chosen parameter values and an object of class "ev.data" is created.
    You should see a plot similar to that of Figure 1.6, but not the same because each simulation will yield different values. The next logical step would be to fit a GPD to these simulated data.

    Figure 1.6: Scatter plot of one simulation from a GPD using the default values for parameters.


    To fit a GPD to these data, do the following.
    • Analyze Generalized Pareto Distribution (GPD)


    • Select gpdsim1 from the Data Object listbox.
    • Select gpd.sim from the Response listbox.
    • Check Plot diagnostics checkbutton
    • Enter 0 (zero) in the Threshold field OK
    Plots similar to those in Figure 1.7 should appear, but again, results will vary for each simulated set of data. Results from one simulation had the following MLE's for parameters (with standard errors in parentheses): 1.14 (0.252) and 0.035 (0.170). As with the GEV example these values should be close to those of the default values chosen for the simulation. In this case, the scale parameter is well within one standard deviation from the true value and the shape parameter is nearly one standard deviation below its true value. Note that we used the default selection of a threshold of zero. It is possible to use a different threshold by entering it in the Threshold field. The result is the same as adding a constant (the threshold) to the simulated data.

    Figure 1.7: Diagnostic plots from fitting one simulation from the GP distribution to the GP distribution.


    Back to Top

    Loading an R Dataset from the Working Directory

    Occasionally, it may be of interest to load a dataset either created in the R session working directory or brought in from an R package. For example, the internal toolkit functions are primarily those of the R package ismev, which consist of Stuart Coles' functions (see Coles (2001) (b)) and example datasets. It may, therefore, be of interest to use the toolkit to analyze these datasets. Although these data could be read using the toolkit and browsing to the ismev data directory as described in Loading a dataset, this section gives an alternative method. Other times, data may need to be manipulated in a more advanced manner than extRemes will allow, but subsequently used with extRemes. An extRemes data object must be a list object with at least a component called data, which must be a matrix or data frame; the columns of which must be named. Additionally, the object must be assigned the class, "ev.data".

    Example: Loading the Wooster temperature dataset from ismev package

    From the R session window.

    data( wooster)
    Wooster list( data=wooster)
    Wooster$data matrix( Wooster$data, ncol=1)
    colnames( Wooster$data) "Temperature"
    class( Wooster) "ev.data"













    1Note: there is also an R source file in this directory called Flood.R.

    Back to Top

    Back to Table of Contents