## NR 512 – Spatial Statistical Modeling

## Exercise #5 – Modeling Point Patterns in R

## Kernel Density Estimation and Marked Point Patterns

As discussed in class, a point pattern dataset contains a complete enumeration of events (i.e., objects of interest) occurring in a defined study region. These events could represent anything with a measurable location including trees, soil samples, animal nests, flower locations, crime occurrences, etc. **In addition to locational information, each event could have an associated continuous (e.g., tree DBH, soil Ph) or categorical measurement (e.g., Tree species, soil type). The measurements are called marks and the events with marks are called a marked point pattern.**

Marked point patterns have ** first-order properties**, which are related to the intensity (i.e., density) of events and associated marks across the study region, and

**, which are related to the spatial dependence (i.e., spatial arrangement) of the events and associated marks across the study area.**

*second-order properties*We may be interested in answering several types of questions with marked point pattern analysis. For example: Is the pattern in the occurrences of one type of event (i.e., categorical mark) related to that in the occurrences of another type of event? Does the distribution of one type of event explain another?

When analyzing marked point patterns there are more options available (and more potential traps) as compared to an un-marked point process. In general, there are three different types of null hypotheses that can be tested for marked point patterns:

** Independence of components**: The events of a particular mark are independent. In other words, marks (M) are first generated according to some random mechanism, and then they are placed at certain event (s) locations by point processes (

**Note:**this does not need to be a random point process – there is no reason why two clustered processes are not independent in terms of marks). Marks and events are essentially generated simultaneously, and each event is just as likely to have a certain mark.

** Random labeling: **Given the event locations, the marks are conditionally independent and identically distributed. In other words, events (s) locations are first generated and placed according to a spatial point process, and then marks (M) are ‘assigned’ to the points by a random mechanism. Thus, the marks are determined independently from the point process.

** Complete spatial randomness and independence (CSRI): **The event (s) locations are a uniform Poisson point process, and the marks (M) are independent and identically distributed. (This implies both random labeling and independence of components).

**Objectives: **The goal of today’s lab is to learn how to use a Poisson process to model ** first-order properties** (KDE Estimation) and

**(F-, G-, K-, and L-functions) of a marked point pattern. Remember,**

*second-order properties***are related to the density or intensity of the point process and associated marks while**

*first-order properties***are related to the interdependence and random labeling among events and associated marks.**

*second-order properties***The datasets: **Today you will be working with two separate marked point pattern datasets. The first dataset, called lansing, is a marked point pattern containing locations and species observations of trees in a forest (categorical marked point pattern). The second dataset, called longleaf, is a marked point pattern containing locations and diameter measurements of longleaf pine trees in a southern forest (continuous marked point pattern).

**Exercise 1 – Exploring marked point pattern data with categorical marks**

Today’s lab will be conducted using the spatstat package in R.

- Start R
- Load the spatstat package:

> require(spatstat)

We will be working with the lansing dataset, which is a marked point pattern because there are tree species observations for every event. To inspect the lansing dataset type:

> data(lansing)

> summary(lansing)

How many events are in the dataset? ________________________________________________

What is the total area of the study region? ____________________________________________

How many species are there? ______________________________________________________

What is the relative density of white oak? ____________________________________________

What is the actual density of white oak? _____________________________________________

To plot the data, type the following:

> plot(lansing, legend = F)

To add a legend to the plot, type the following:

> legend(-0.25, 0.5, names(plot(lansing)), pch = plot(lansing))

As you can see, the above two lines of code plots the location of each event with unique symbols for each tree species. The legend command adds a legend outside of the plot boundary (at x and y coordinates of -0.25 and 0.5, respectively).

This graph is very difficult to interpret because the events are so close together. We can use the split command to plot each species on separate panels, which may make interpretation easier.

> plot(split(lansing))

It would be useful to compute and plot a separate estimate of intensity for each type of tree. We could do this using a Kernel density estimator, which can be invoked using the functions density.splitppp and plot.listof. They are invoked simply by typing:

> plot(density(split(lansing)))

The relative proportions of intensity of each tree species could then be computed by taking ratios of these densities, using eval.im. This is done more neatly using the command relrisk (which will also select the bandwidth automatically by cross-validation if it is not specified).

*> *plot(relrisk(lansing), zlim = c(0, 1))

Include this graph with your lab assignment.

Based on the relative intensity graph, do you see any segregation (i.e., repulsion) between some of the tree species? ______________________________________________________________

How could this be formally tested? _________________________________________________

**Exercise 2 – Testing for ****Independence of components with categorical marks**

The cross-type function of G, F, K, and L can all be used to assess the independence among type of each point process.

For example, to run a cross-type G-function to see if there is independence between the locations of maple and hickory trees you could type:

> plot(Gcross(lansing, “maple”, “hickory”))

The interpretation of the cross-type summary functions is similar to that of the original functions F, G, K, and L we covered in Lab 4.

Confidence envelopes can be plotted as well:

> E <- envelope(lansing, Gcross, nsim = 99, i = “maple”, j = “hickory”)

> plot(E)

Using the example code above, use the cross-type L-function to test for segregation between hickory and maple trees.

Is there repulsion or clustering between these species?____________________________

Why do you conclude this? ________________________________________________________________________________________________________________________

Be sure to turn in the cross-type L graph with your lab.

**Note:** The Lcross and other cross-type functions only work with categorical marked point pattern datasets. We will need other methods to deal with continuous marked point patterns.

**Exercise 3 – Testing for ****Random Labeling**

In a randomization test of the random labelling null hypothesis, simulated patterns of X are generated from the dataset by holding the event (s) locations fixed, and randomly resampling the marks, either with replacement (independent random sampling) or without replacement (randomly permuting the marks). This can be done by generating the new patterns using a function like Ldot to generate an envelope of patterns and then resampling the marks by using the rlabel.

*> *Ldif <- function(X, …, i) {

Lidot <- Ldot(X, …, i = i)

L <- Lest(X, …)

dif <- eval.fv(Lidot – L)

return(dif)

}

> E <- envelope(lansing, Ldif, nsim = 39, i = “hickory”, simulate =

+ expression(rlabel(lansing)))

> plot(E)

Based on this graph is the hypothesis of random labeling accepted or rejected? ______________

Include this graph with your lab.

**Exercise 4 – Exploring marked point pattern data with continuous marks**

We will be working with the longleaf dataset, which is a marked point pattern because there are tree diameter measurements for every event. To inspect the longleaf dataset type:

> data(longleaf)

> summary(longleaf)

How many events are in the dataset? ________________________________________________

What is the total area of the study region? ____________________________________________

What is the minimum diameter? ___________________________________________________

What is the maximum diameter? ___________________________________________________

The mark values can be extracted using the marks function and inspected using the histogram function:

> hist(marks(longleaf))

Is this dataset dominated by large or small trees? ______________________________________

Include this graph with your lab.

For any continuously marked point pattern we can can graphically display the events scaled by their marks by typing:

> plot(longleaf)

Include this graph with your lab.

**Challenge:** Try and create a 2 panel graphic with both the histogram of marks and marked point pattern plot as a single row by using the par() and mfrow() command and parameter.

As we used to look at the intensity of tree species in the lansing dataset, we can estimate intensity across the longleaf dataset with a KDE function by typing:

> plot(density(longleaf))

> plot(longleaf, add = T)

This is a density estimate only taking into account event locations.

To assess any spatial trend in the marks (rather than the event locations) one could form a kernel regression smoother using the Smooth.ppp command. This will spatially average the marks across our bandwidth.

> plot(Smooth.ppp(longleaf))

> plot(longleaf, add = T)

Include this graph with your lab.

Comment on the trends you see in this graph:

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

Why is the Smooth.ppp plot different from the density plot?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

In addition to the mark density, the markmean and markvar functions can be used to plot KDEs of the mean and variance of the marks respectively.

> plot(markmean(longleaf))

> plot(longleaf, add = T)

> plot(markvar(longleaf))

> plot(longleaf, add = T)

**Exercise 5 – Marked correlation function**

The “mark correlation function” of a stationary marked point process is a measure of the dependence between the marks of two points of the process a distance (r) apart. Note that “mark correlation function” is not a “correlation” in the usual statistical sense. It can take any nonnegative real value. The value 1 suggests “lack of correlation”: under random labeling, the “mark correlation function” ≡ 1. In general, marked correlations less than 1 indicate negative association and values greater than 1 indicate positive association.

Marked correlation function can be calculated using the markcorr function in spatstat. We will run a marked correlation function on the southwest portion of the longleaf study area by typing the following:

> data(longleaf)

Subset the southwest corner of this large pattern by typing:

> swcorner <- owin(c(0,100),c(0,100))

> sub <- longleaf[ , swcorner]

> plot(sub)

To run the mark correlation function type:

> mc <- markcorr(sub)

> plot(mc)

**Challenge:** Try and create a 3 panel graphic with the marked point pattern, histogram of marks, and mark correlation plot as a single row by using the par() and mfrow() command and parameter.

Based on this graph, is there a spatial dependence between tree diameters, why or why not?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

Does this make sense from an ecological perspective? Why or why not?

______________________________________________________________________________

______________________________________________________________________________

Run the marked correlation function on the entire longleaf dataset:

> mc.full <- markcorr(longleaf)

> plot(mc.full)

Again, try plotting the three representations of the marks as a marked point pattern, histogram of marks, and the mark correlation plot to help improve interpretation.

Based on this graph, is there a spatial dependence between tree diameters, why or why not?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

What are the differences in marked correlation values between the southwest corner and the full longleaf dataset?

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

Do the results still make sense from an ecological perspective? Why or why not? ____________

_____________________________________________________________________________

_____________________________________________________________________________

Nearest neighbor techniques can also be used to explore the spatial relationship among marks. For example, the nearest neighbor correlation function calculates the correlation coefficient of nearest neighbor events. This can be done in spatstat using the nncorr command. This will calculate three things for a continuous marked point pattern, 1) unnormalized is the mean product of the two nearest neighbors, 2) normalized is the mean product normalized against the expected mean from random marks (greater than 1 indicates like values are clustered), 3) correlation is the classic correlation calculated between the nearest neighbor pairs.

Calculate the nearest neighbor correlation function on the southwest corner of the longleaf dataset:

> nncorr(sub)

Report the correlation: ___________________________________________________________

Interpret this result: _____________________________________________________________

Calculate the nearest neighbor correlation function on the entire longleaf dataset:

> nncorr(longleaf)

Report the correlation: __________________________________________________________

Interpret this result: _____________________________________________________________