Occupancy Estimation Multi-scale

Occupancy Estimation Multi-scale

Estimation of occupancy rates generally assume that a site or plot is occupied or not occupied. This model extends the basic occupancy model in 2 directions. First, I’ll discuss the specific case given in the original work by Nichols et al. 2008, where the authors were interested in estimating the probability that a species occurred near a cluster of sampling devices (local occurrence), given that the species existed with a larger sample unit. The occupancy state of the sample unit is assumed static over a defined time period, or season, but the species local availability (e.g., presence at the specific location of the detection devices) may change over temporal primary samples, or visits. In this case, the collocated devices were used as secondary samples to estimate detection probabilities. Next, I discuss a case where multiple independent subunits are our primary samples within each sample unit, and again we are interested in both local and large scale occupancy probabilities. In the latter case, timed observations at each spatial subunit represent our secondary samples.

Local Occupancy (Temporal) or Temporal Variation in Local Occupancy

Nichols et al. 2008 described the situation where multiple devices are used to detect the target species. In their example, camera traps, hair snares, and track plates were collocated together at a random location within a sampled unit to detect skunks. These locations were visited 5 times over the course of a 2-week period, during each visit cameras were checked, hair was collected, and track plates examined and reset. Hence, for each of S sites (sampling units) there were K = 5 primary samples (visits) and L = 3 secondary samples (devices). To run this analysis in MARK, values of these parameters are entered on the initial values screen, obtained through the File | New menu choices. Notice the label on the box normally designated as ‘Mixtures’ changes to ‘Secondaries (L)’ when you chose this multi-scale occupancy data type. Plus, the number of encounter occasions label changes to ‘Enc. History length (K*L)’. Enter the appropriate number of secondary samples (L) into the Secondaries box. The length of the encounter history is a product of the number of primary samples (K) and the number of secondary samples (L) (i.e., K*L, representing the total number of characters in the encounter history ). MARK obtains the value of K by dividing the length of the encounter history by the number of secondary samples. The number of primary samples is not entered directly.

Using the simple example from Nichols et al. 2008, if there were 5 different visits (K = 5), where each of the 3 devices (L = 3) were checked, the resulting encounter history would contain 15 characters. So, specifying 3 in the secondary samples box and 15 in the Enc. History length box provides the correct values to MARK.

Assuming a single group, the result is a PIM for psi with only 1 entry (i.e., the probability that a sampling unit is occupied), a PIM for theta with K = 5 entries (the probability that the animal was available to be sampled by any of the sampling devices on each of the 5 visits), and 5 PIMs for p, each with L = 3 entries, corresponding to the 3 devices. That is, the PIM for visit 1 would have detection probabilities for each of the 3 devices, the next PIM for visit 2 also has detection probabilities for the 3 devices, etc. Note that the PIMs for p are labeled “Primary 1”, “Primary 2”,�”Primary K“

The encounter history is organized chronologically, reporting detections for each secondary sample (device) during the first primary sample (visit 1), and then repeating the process for the second primary sample (visit 2), etc. For K = 5 visits with L = 3 devices, the encounter history would first have the results for each of the 3 devices for the first visit, and then the 3 devices for the second visit, etc. The order of the detections for each device must remain consistent for each visit. So a history like 000111000111101 would tell MARK that none of the 3 devices detected the species on the first and third visits, but all 3 devices detected the species on the second and fourth visits, and on the fifth visit, the species was detected by devices 1 and 3 but not by device 2.

Local Occupancy (Spatial) or Spatial Variation in Local Occupancy

This model has also been used to estimate local and large scale occupancy using spatial subunits as the primary samples instead of temporal visits (e.g., Pavlacky et al. 2012). Consider a bird sampling project with a large sample unit, such as a 1-km^2 plot. Within this plot, multiple (K) independent point count locations are chosen. L replicate surveys are conducted at each location. The surveys may occur on multiple visits, or during a single visit via multiple independent observers or multiple timed intervals.

The key concept is that there is an occupancy parameter estimated for the larger scale 1-km^2 plots (psi), and a set of local occupancy parameters associated with each of the K primary samples (theta). The detection probabilities based are based on the L replicate secondary samples.

A species of bird may not be present at a particular point count location (primary sample), even though it may be present within the larger sample unit. Local occupancy, defined at the probability the species is available for detection at point count location k is estimated as theta(k), k = 1, …, K, allowing this probability to vary for each point count location. Individual covariates (e.g., local habitat variables) might be used to predict the probability that a point count location is occupied during the sampling period. Likewise, individual covariates (e.g., larger-scale habitat variables) might be used to estimate the probability of occupancy for each of the S 1-km^2 sampling units.

Using an example similar to the Pavlacky et al. (2012), let’s assume that we have K = 16 independent point count locations within each sampled unit. Let’s also assume that our secondary samples consist of L = 3 consecutive timed intervals (e.g., each interval is 2 minutes).

Assuming a single group of sample units with K = 16 point count locations, each with L = 3 timed intervals. You would specify Secondaries = 3 and Enc. History length = 48 in MARK. The result is a PIM for psi with only 1 entry (i.e., the probability that a sampling unit is occupied), a PIM for theta with K = 16 entries (the probability that the species was available to be sampled at point count location k), and 16 PIMs for detection probability p, each with L = 3 entries, corresponding to each timed interval. That is, the p PIM labeled “Primary 1” corresponds to the 3 detection probabilities (one for each timed interval) at point count location 1. Within this PIM, the 3 detection probabilities correspond to the probability of detection during minutes 0-2, detection during minutes 2-4, and detection during minutes 4-6, respectively. The next two p PIMs labeled “Primary 2” and “Primary 3”, contain entries for these same detection probabilities at point count location 2 and location 3, etc.

Again, the encounter history is organized ‘spatially’, reporting detections for each secondary sample (timed interval) during the first primary sample (point count location 1), and then repeating the process for the second primary sample (point count location 2), etc. For example with K = 5 point count locations, each with L = 3 timed intervals, the encounter history would first have the detection results for each of the 3 timed intervals at the first point count location, and then the detection results for the 3 timed intervals for the second point count location. So a history like 000111000111101 would tell MARK that the species was not detected during any timed intervals at point count locations 1 and 3, the species was detected during each of the timed intervals at point count locations 2 and 4, and at the fifth point count location, the species was detected during timed intervals 1 and 3 but not during the second timed interval at that location.

Ragged Data

The number of secondary samples (L) is assumed the same for all K primary samples within each sampled unit S. However, the Multi-scale Occupancy data type allows dots in the encounter history to indicate no data were collected. So, if a secondary sample was not collected for a particular primary sample, the encounter history would include a dot to indicate no data. For example, if a camera failed to operate during a given primary occasion, then the entry corresponding to camera would be missing for that primary period in the encounter occasion. Likewise, if a removal design was being employed in our bird point count example, and the target species was detected on the first timed interval at a given point count location, then the remainder of the secondary samples (timed intervals) for that primary sample (point count location) would be missing. Further, an entire primary sample might be missing, e.g., a bird point count location was not visited, or if there were an unequal number of point count locations among sampled units. Again, this missing data can be indicated by dots. Therefore, although the constant value of L and K sounds restrictive, the use of dots in the encounter history provides the flexibility needed to handle diverse situations.

Thanks to Larissa Bailey for writing most of this help page and helping clarify the PIM structure and specification in MARK.

Robust Design Extension

The single-season multi-site data type (#123) has been extended to a multi-season robust design data type (#175). As with the single-season version, ragged data are not allowed, and the user is responsible for using dots as appropriate to handle missing data. Each of the robust design primary sessions are assumed to have the same values of K and L. Now, the number of occasions in the encounter history becomes the number of primary sessions x K x L. As with the single-season data type, the L value is entered in the Secondaries data box. The value of K is gotten by dividing the number of occasions in the encounter histories by L times the number of primary sessions.

For each primary session in the robust design, there is a PIM for theta, and K PIMs for the p parameters. The robust design portion of the model includes psi for the first primary session, and then epsilon and gamma for each of the succeeding primary sessions.