Model Structure – Gary C. White

Model Structure

Program Mark provides parameter estimates for 142 data types: Cormack-Jolly-Seber models (live animal recaptures that are released alive), band (ring) recovery models (dead animal recoveries), models with both live and dead re-encounters (Burnham’s model), known fate models, nest survival models, closed capture models, band (ring) recovery models where the number of animals marked is unknown, robust design models for live recaptures, Barker’s extension to Burnham’s model, multi-strata live recapture model, Brownie et al.’s model of band or ring recoveries, Jolly-Seber models that include either seniority probability, recruitment rate, rate of population change, or probability of entry, probability of occupancy models (including robust design occupancy models), and mark-resight models.

A complete list of the currently available data types can be generated under the Help | Data Types menu selections.

Estimation in Cormack-Jolly-Seber Designs

Live Recaptures. Live recaptures are the basis of the standard Cormack-Jolly-Seber model. Marked animals are released into the population, often by trapping them from the populations. Then, marked animals are encountered by catching them alive and re-releasing them. If marked animals are released into the population on occasion 1, then each succeeding capture occasion is one encounter occasion. Consider the following scenario:

Release —-S(1)—–> Encounter 2 ——-S(2)——> Encounter 3

Animals survive from initial release to the second encounter occasion with probability S(1), and from second encounter occasion to the third encounter occasion with probability S(2). The recapture probability at encounter occasion 2 is p(2), and p(3) is the recapture probability at encounter occasion 3. At least 2 re-encounter occasions are required to estimate the survival rate between the first release occasion and the first re-encounter occasion, i.e., S(1). The survival rate between the last two encounter occasions is not estimable because only the product of survival and recapture probability for this occasion is identifiable.

Generally, the survival rates of the CJS model are labeled as Phi(1), Phi(2), etc., because the quantity estimated is the probability of remaining available for recapture. Thus, animals that emigrate from the study area are not available for recapture, so appear to have died in this model. Thus, Phi(i) = S(i)(1 – E(i)), where E(i) is the probability of emigrating from the study area.

Estimates of population size (N) or births and immigration (B) of the Jolly-Seber model are not provided in the CJS model of Program MARK. See the section on Jolly-Seber models below for these models in MARK. Programs that estimate population size for each occasion (where the quantity is identifiable) are POPAN-5 (Arnason and Schwarz 1995, Schwarz and Arnason 1996) or JOLLY and JOLLYAGE (Pollock et al. 1990).

The modification of the standard Cormack-Jolly-Seber to allow estimation of different apparent survival rates for transients (Pradel et al. 1997) can be developed using age-structured PIMs. An examples using the Lazuli Bunting data is provided in the transients entry.

Estimation in Band/Ring Recovery Designs

Dead Recoveries. With dead recoveries, i.e., band, fish tag, or ring recovery models, animals are captured from the population, marked, and released back into the population at each occasion. Later, marked animals are encountered as dead animals, typically from harvest or just found dead (e.g., gulls).

Marked animals are assumed to survive from one release to the next with survival probability S(i). If they die, the dead marked animals are reported during each period between releases with probability r(i) Note that r(i) is called a “reporting probability”, but this is not the probability that a hunter reports the marked animal. Rather, r(i) is the probability that a marked animal is reported conditional on its death. Animals that die of natural causes would have a very low probability of being found and the mark reported. In contrast, harvested animals have a greater chance of being reported, but still not probability 1. Don’t make the mistake of equating r(i) with the probability that an animal is harvested. However, recognize that shifts in the mortality process do affect the estimates of r(i). For example, heavily harvested populations should have higher values of r than lightly harvested populations, because the probability that an animal dies from harvest is higher, and hence a greater probability that the mark is reported. But even though r relates to the probability of harvest, r is not the probability that a hunter reports the marked animal. Otis and White (2002) provide further discussion on the interpretation of r versus the probability that a hunter reports the band, including an equation relating the 2 variables.

The survival probability and reporting probability prior to the last release can not be estimated individually in the full time-effects model, but only as a product. This parameterization differs from that of Brownie et al. (1985) in that their f(i) is replaced as f(i) = (1 – S(i)) r(i). The r(i) are equivalent to the lambda(i) of Seber (1970), where the original description of this model was developed, and of life table models Anderson et al. 1985; Catchpole et al. 1995). The reason for making this change is so that the encounter process, modeled with the r(i) parameters, can be separated from the survival process, modeled with the S(i) parameters. With the f(i) parameterization, the 2 processes are both part of this parameter. Hence, developing more advanced models with the design matrix options of MARK is difficult, if not illogical with the f(i) parameterization. However, the negative side of this new parameterization is that the last S(i) and r(i) are confounded in the full time-effects model, as only the product (1 – S(i)) r(i) is identifiable, and hence estimable. Secondly, all the parameters are bounded between zero and 1, which seems like a benefit. However, parameter estimates at the boundary do not have proper estimates of their standard errors. An equivalent situation occurs with the binomial distribution when either no successes occur in the data, or all successes occur in the data, and the standard error is estimated as zero. Because the Brownie et al. parameterization overcomes these difficulties, it is also included in Program MARK, and is described below.

Brownie et al. Dead Recoveries Model. This model is the band or ring recovery model of Brownie et al. (1985) with the S and f coding. The model gives the same estimates for fully time-specific models as does the S and r coding, except when estimates of S are >1. However, different estimates will be obtained with covariates used to model survival or the recovery process. The advantage of this data type over the Dead Recoveries data type is that all the parameters are estimable under the fully time-specific model, and parameters are not constrained to the interval [0, 1], so that valid standard errors can be estimated with the identity or log link functions. More details on the differences in the S, r and S, f band recovery models are given by Otis and White (2002), in a discussion of the band reporting rate.

Recoveries Without Knowing Number Marked. The British Trust for Ornithology (BTO) does not have computerized databases of the numbers of birds ringed. Thus, BTO cannot compute the cohort size for a set of ring recoveries. To circumvent this problem, a ring recovery model is formulated where the recovery rate (r(i) is assumed constant by age class and year. Then, the survival rate can be estimated from the observed recoveries. The cell probability for the j year of recoveries given k years of recoveries is

S(1) S(2) … S(j-1) [1 – S(j)] / [1 – S(1) S(2) … S(k)]

where the denominator is 1 minus the probability of still being alive. Of the k survival rates, only k-1 are identifiable. Common approaches to achieve identifiability are to set S(k-1) = S(k) or to set S(k) to the mean of S(1) … S(k-1) using appropriate constraints in the Design Matrix.

This model should only be used when you do not know the number of animals marked because you cannot evaluate the assumption of constant recovery rates with this model. If you know the number of animals marked, use the dead recovery model described above.

Estimation in Live and Dead Encounters Designs

Both Live and Dead Encounters. The model for the joint live and dead encounter data type was first published by Burnham (1993), but with a slightly different parameterization than used in Program MARK. In MARK, the dead encounters are not modeled with the f(i) of Burnham (1993), but rather as f(i) = (1 – S(i))r(i), as discussed above for the dead encounter models. The method is a combination of the 2 above, but allows the estimation of fidelity (F(i) = 1 – E(i)), or the probability that the animal remains on the study area and is available for capture. As a result, the estimates of S(i) are estimates of the survival probability of the marked animals, and not the apparent survival (phi(i) = S(i) F(i)) as discussed for the live encounter model.

In the models discussed so far, live captures and resightings modeled with the p(i) parameters are assumed to occur over a short time interval, whereas dead recoveries modeled with the r(i) parameters extend over the time interval. The actual time of the dead recovery is not used in the estimation of survival for 2 reasons. First, it is often not known. Second, even if the exact time of recovery is known, little information is contributed if the recovery probability (r(i)) is varying during the time interval.

Barker’s model, discussed below, is an extension of the both live and dead encounters model that uses information on live resightings between live recapture intervals to improve estimates of survival.

Barker’s Model. Richard Barker (1997, 1999) has extended the model of Burnham (1993) by allowing resightings of marked animals during the interval between trapping occasions. The model was motivated by a brown trout study, where fish were marked at regular intervals, but which were then caught by fisherman. The marked trout was considered “resighted and released” if the fisherman released the fish alive, and “resighted and killed” if the fisherman kept the fish. Additional parameters of this model from Burnham’s model are R, R‘, and F‘. This model is particularly useful for situations where live sightings of marked animals are obtained between marking periods. If no dead encounters are recorded, the r(i) parameters can be set to zero. More details are provided on Barker’s model here.

Estimation in Radio-tracking and Known Fates Designs

Known Fates. Known fate data assumes that there are no nuisance parameters involved with animal captures or resightings. The data derive from radio-tracking studies, although some radio-tracking studies fail to follow all the marked animals and so would not meet the assumptions of this model. A diagram illustrating this scenario is

Release —–S(1)—-> Encounter 2 —–S(2)—-> Encounter 3 —–S(3)—-> Encounter 4 …

where the probability of encounter on each occasion is 1 if the animal is alive.

The Known Fate data type is equivalent to the Kaplan-Meier estimation method if only one mortality occurs per interval. More importantly, the Known Fate data type provides important advantages over the Kaplan-Meier approach: incorporation of covariates, and selection between competing models. However, the Known Fates data type retains the advantages of the Kaplan-Meier method: left and right censoring, staggered entry, and no assumption about the hazard rate during the interval. In contrast, the Heisey-Fuller model assumes a constant hazard function within an interval, which is often a detrimental feature of the approach.

Nest Survival Models. This model is different than the known fate model because the exact day that the animal (nest) dies is unknown. The nest survival data type is appropriate for known fate data where the occasions are not clearly delineated. As a result, nest survival models provide a means of analyzing ragged radio-tracking data. The data type provides a survival parameter for each day (occasion) of the study. Typically, not all of these parameters would be estimated, but rather models would be constructed that provided some structure across these parameters, such as trend or trend^2 models. More details on the format of how to enter the data and the structure of the model are given here. If data are entered as encounter histories, the LDLDLDLD format is required.

The nest survival data type is equivalent to the Mayfield estimator if the occasion-specific survival rates are all assumed to be constant. The nest survival model in MARK can emulate the Heisey-Fuller if survival rates are set constant over the same intervals as in the Heisey-Fuller method.

Estimation of Population Size in Closed Populations

Closed Captures. Closed-capture data assume that all survival probabilities are 1.0 across the short time intervals of the study (which are assumed to have zero length). Because time intervals are defined to be too small for any mortality or emigration, you are not allowed to enter Time Interval length. Thus, survival is not estimated. Rather, the probability of first capture (p(i)) and the probability of recapture (c(i)) are estimated, along with the number of animals in the population (N). This data type is the same as is analyzed with Program CAPTURE (White et al. 1982). All the likelihood models in CAPTURE can be duplicated in MARK. However, MARK allows additional models not available in CAPTURE, plus comparisons between groups and the incorporation of time-specific and/or group-specific covariates into the model. A total of 6 different closed captures models are available in MARK.

Program MARK also allows models incorporating individual heterogeneity for closed capture-recapture models. These models are developed based on a mixture distribution of first capture and recapture parameters following Norris and Pollock (1995) and Pledger (1998, 1999), Models for Mh and Mtbh are available.

Individual covariates cannot be used with the closed captures data type because animals that were never captured (and hence, whose individual covariates could never be measured) are incorporated into the likelihood as part of the estimate of population size (N). Models that can incorporate individual covariates existing in the literature (Huggins 1989, 1991; Alho 1990) have been implemented in MARK, and are described below. Estimates of population size are given for the Huggins’ models, but these estimates are not quite as efficient as the closed captures data type where the statistical models are equivalent to those in Program CAPTURE. However, the ability to incorporate individual covariates makes the Huggins’ models more appropriate if individual heterogeneity exists in the data.

Huggins’ Closed Captures. Huggins’ model (Huggins 1989, 1991; Alho 1990) allows estimation of closed population size (N) from initial capture probabilities (p) and recapture probabilities (c). The model conditions on the animal being captured at least once during the study, so allows individual covariates to be used to model p and c. The approach used in Huggins’ model is equivalent to the Horvitz-Thompson sampling design, where animals have unequal probability of being included in the sample. Only LLLL encounter histories are required for this model.

Closed Captures with Heterogeneity. This model is closely linked with the closed captures models. Pertinent literature includes Norris and Pollock (1995) and Pledger (1998, 1999). More details are provided here. Only LLLL encounter histories are required for this model.

Full Closed Captures with Heterogeneity. Again, this model is closely linked with the closed captures models. Pertinent literature includes Norris and Pollock (1995) and Pledger (1998, 1999). More details are provided here. Only LLLL encounter histories are required for this model.

Mark-Resight Models. The mark-resight models allow population estimation when unmarked animals are not marked when resighted, but encounter histories are formed for a set of marked animals. The models in MARK have been developed by McClintock(in prep.) and extend the estimators in the NOREMARK software package (White 1996).

Estimation of Robust Designs

Robust Design. This model is a combination of the CJS live recapture model and the closed capture models, and is described in detail by Kendall et al. (1997, 1995) and Kendall and Nichols (1995). Instead of just 1 capture occasion between survival intervals, multiple (>1) capture occasions are used that are close together in time. These closely-spaced encounter occasions are termed “sessions”. To specify the encounter sessions, the Time Interval lengths are used. The time intervals between the encounter occasions within a session have a length of zero, whereas the time intervals between sessions have a positive (>0) length. An example will make this clearer. Assume that animals are trapped for 15 separate times. The first year, animals are trapped for 2 days, the second year for 2 days, the third year for 4 days, the fourth year for 5 days, and the fifth year for 2 days. The number of encounter occasions would be specified as 15. The length of the time intervals would be specified as: 0,1,0,1,0,0,0,1,0,0,0,0,1,0. That is, only 14 time intervals are needed, where the value 1 means that 1 year elapsed. This mechanism is flexible, but can be a bit tricky. Note that all sessions must have at least 2 occasions. Thus, you will never have 2 consecutive time intervals of length >0.

For each trapping session (j), the probability of first capture (p(ji)) and the probability of recapture (c(ji)) are estimated (where i indexes the number of trapping occasions within the session), along with the number of animals in the population (N(j)). For the intervals between sessions, the probability of survival (S(j)), the probability of emigration from the study area (gamma’ ‘ (j)), and the probability of staying away from the study area (gamma’ (j)) are estimated. Indexing of these parameters follows the notation of Kendall et al. (1997). Thus, gamma’ ‘(2) applies to the second trapping session, and gamma’ (2) is not estimated because there are no marked animals outside the study area at that time. To provide identifiability of the parameters for the Markovian emigration model, Kendall et al. (1997) suggest setting gamma’ ‘ (k – 1) = gamma’ ‘(k) and gamma'(k – 1) = gamma'(k), where k is the number of primary trapping sessions. To obtain the “No Emigration” model, set all the gamma parameters to zero. To obtain the “Random Emigration” model, set gamma'(i) = gamma’ ‘(i).

The robust design models in MARK can all incorporate individual heterogeneity closed capture data type in the estimation of population size. Individual covariates can be used to model the parameters S, gamma’ ‘, and gamma’ in the Robust Design data type. Individual covariates cannot be used with the Robust Design data type for the p‘s, c‘s, and N‘s because animals that were never captured (and hence, whose individual covariates could never be measured) are incorporated into the likelihood as part of the estimate of population size (N). Models that can incorporate individual covariates existing in the literature (Huggins 1989, 1991; Alho 1990) have been implemented in MARK (including the heterogeneity models), and are described below for the data type Robust Design (Huggins Est.). Estimates of population size are given for the Huggins’ models, but these estimates are not quite as efficient as the closed captures data type where the statistical models are equivalent to those in Program CAPTURE. However, the ability to incorporate individual covariates makes the Huggins’ models more appropriate if individual heterogeneity exists in the data.

More details are provided on the robust design model here.

Robust Design (Huggins Est.). The robust design model has also been extended to include Huggins’ estimator for population size (N) for each trapping session (Huggins 1989, 1991; Alho 1990). Again, individual covariates can be used to model the initial capture probabilities (p) and recapture probabilities (c) for each trapping session. Only LLLL encounter histories are required for this model.

Barker’s Model Robust Design. Bill Kendall and Richard Barker extended Barker’s model to handle the robust design. This model is an extension of the Lindberg et al. (2001) model because it uses encounter information from live recaptures, dead recoveries, and resightings between the intervals of live captures. LDLDLDLD encounter histories are required for this model, with resighting between the capture intervals given the value 2 for D.

Estimation of Multi-state Designs

Multi-state Model for Live Recaptures. The multi-state model of Brownie et al. (1993) and Hestbeck et al. (1991) allows animals to move between states with transition probabilities. At this time, only the movement model without memory is implemented. More details are provided on the multi-state model here. The multi-state model has also been extended to incorporate dead recoveries, described below, and has been extended to incorporate the robust design, both the open and closed robust design multi-state models. In addition, multi-state models with state uncertainty are available.

Live and Dead Multi-strata Model. The multi-strata model that incorporates both live and dead recoveries is available and described here.

Estimation of Jolly-Seber Designs

Jolly-Seber Models. In addition to the apparent survival and recapture probabilities of the Cormack-Jolly-Seber model (recaptures only model), the Jolly-Seber model allows estimation of the population size (N) at each trapping occasion, plus the number of new animals entering the population (B) at each occasion. Multiple parameterizations of the Jolly-Seber model in MARK. For all of these parameterizations, only the LLLL encounter histories are required. The following parameterizations of the Jolly-Seber model are available in Program MARK: Burnham, Pradel (3 parameterizations: gamma, f, and lambda), POPAN from Schwarz and Arnason (1996), and Link and Barker (2003). The relationships between the parameters of these models are given here. Also, for the population change rates to be meaningful, the study area size must not change during the study. See Population Rate of Change for more discussion of this point.

Burnham’s Jolly-Seber Model. This parameterization provides the population size at the start of the study, plus the rate of population change (lambda) for each interval. This model can be difficult to get numerical convergence of the parameter estimates. Although this model has been thoroughly checked, and found to be correct, the program has difficulty obtaining numerical solutions for the parameters because of the penalty constraints required to keep the parameters consistent with each other.

Pradel Recruitment Only Model. Pradel (1996) developed a model to estimate the proportion of the population that was previously in the population. Thus, this model, labeled ‘Pradel Recruitment Only’, estimates recruitment to the population. The parameters of this model are the seniority probability, gamma (probability that an animal present at time i was already present at time i – 1), and recapture probability r. Only LLLL encounter histories are required for this model. This model can be estimated by reversing the time sequence of the live encounter histories (Pradel 1996), an idea suggested by Pollock et al. (1974:85-85), and even mentioned by R. A. Fisher in about 1939 or so (Box ????).

Pradel Survival and Seniority Model. Pradel (1996) extended his recruitment only model to include apparent survival (phi). In MARK, this model is labeled ‘Pradel Survival and Seniority’. Parameters of the model are apparent survival (phi), recapture probability (p), and seniority probability (gamma), which is the probability that an animal in the population at time i was also in the population at time i – 1 (i.e., the animal did not enter the population during the interval i – 1 to i. Only LLLL encounter histories are required for this model.

Pradel Survival and Lambda Model. Pradel (1996) also parameterized his model with both recruitment and apparent survival to have the parameters apparent survival (phi), recapture probability (p), and rate of population change [lambda(i) = N(i + 1)/N(i)]). This model converges quite readily compared to the Burnham parameterization of the Jolly-Seber model described above. Only LLLL encounter histories are required for this model.

Pradel Survival and Recruitment Model. Pradel (1996) also parameterized his model with both recruitment and apparent survival to have the parameters apparent survival (phi), recapture probability (p), and fecundity rate [f(i) = number of adults at time i + 1 per adult at time i]. This model converges quite readily. Only LLLL encounter histories are required for this model.

POPAN Model. Schwarz and Arnason (1996) parameterized the Jolly-Seber model in terms of a super population (N), and the probability of entry (pent in MARK, beta in the paper). The POPAN data type implements this model. The MLogit link function provides a constraint that makes the sum of the pent parameters <=1, with the probability of occurring in the population on the first occasion as 1 – sum(pent(t)). More details are given here.

Link-Barker Model. Link and Barker (2003) reparameterized the POPAN model from the probability of entry to the recruitment parameter (f). The reason for this reparameterization was to provide a more biologically meaningful interpretation of the parameters of the model, as part of a hierarchical modeling approach. More details are given here.

Estimation of Virtual Population Analysis

VPA — Virtual Population Analysis. A version of the virtual population analysis used by fisheries biologists has been incorporated.

Estimation of Occupancy Rates

Occupancy Estimation. This model provides estimates of the proportion of a set of sites or plots that are occupied by the species of interest when the probability of detection is <1. More details are provided here and in MacKenzie et al. (2002). The robust design occupancy model is also available in MARK, with 3 parameterizations. Other recent extensions of the occupancy model include the Royle and Nichols (2003) model to account for heterogeneity from population size, and the multiple-state occupancy model of Nichols et al. (2007). Occupancy models for 2 species are also available.

Parameters for all the models are specified in Parameter Index Matrices, or PIMs. See Constant Matrix for how parameters are specified constant for each occasion, Time Parameter Matrices for parameters that are specific to each occasion, Age Matrix for parameters that are age-specific, and Time and Age Matrix for an example where parameters are both time and age specific.

Given a set of parameter matrices, the Design Matrix can be used to provide further constraints on the set of estimable parameters. In addition, covariates are specified in the Design Matrix.

The model is then “Run” (see Run Window) to obtain parameter estimates.