Data Cloning – Gary C. White

Data Cloning

Cloning the data is a numerical approach to identifying parameters that are not estimable (Lele et al. 2007, Lele et al. 2010). The reason for a parameter not being estimable is because it may be confounded with 1 or more other parameters. An example is the last phi and p combination in a Cormack-Jolly-Seber model, where only the product of phi and p can be estimated, but not the unique values of each.

The data are cloned by including multiple copies of the encounter histories, i.e., duplicating the encounter histories. In MARK, all that needs to be done is to multiply the encounter history frequencies of each group by the number of clones desired. Consider the example of cloning the data 100 times. An encounter history for an analysis with 2 groups and no individual covariates that looks like this:

11001010010 3 2;

could be cloned 100 times by entering the following encounter history:

11001010010 300 200;

By cloning the data, the sample size is increased without changing the parameter estimates. So, if the original estimates are compared to the cloned estimates, the values of the estimates will remain the same for parameters that are not confounded and are otherwise properly estimated. However, because the sample size has been increased, the standard errors of the cloned estimates will be smaller than the original standard errors. The expected result for parameters that are estimable is SE(original) = SE(cloned)*sqrt(number of clones). As an example, if the data are cloned 100 times, then the standard errors of the cloned data will be 1/10 of the original standard errors.

MARK has the option to clone the data in the Results Browser under the Output | Specific Model Output | Data Cloning menu choice. You should highlight the model you want to use for estimation before making this menu choice. When this menu choice is selected, you are asked to enter the number of copies (clones) of the data for the analysis, with the default value being 100. Once the value has been entered and the OK button pushed, MARK will generate new estimates with the cloned data to compare with the estimates from the model highlighted in the Results Browser. The original estimates and the new estimates from the cloned data are presented in an Excel spreadsheet so that you can compare the estimates and their standard errors.

The confidence intervals can also be compared, and the use of profile likelihood confidence intervals is suggested for examining parameter estimates at boundaries. That is, a parameter at a boundary, e.g., a survival estimate equal to 1, will generally have a zero (or at least unrealistically small) standard error. Cloning the data does not change this small standard error. However, if you have computed profile likelihod confidence intervals for this parameter, the profile likelihood confidence intervals for the cloned data will be considerably shorter (assuming you clone a 100 copies) than the original data. So, data cloning is also useful for verifying that a parameter estimated at the boundary is also estimable.