Design Matrix Functions – Gary C. White

Design Matrix Functions

Twelve special functions are allowed as entries in the design matrix: add, product, power, min, max, log, exp, eq (equal to), gt (greater than), ge (greater than or equal to), lt (less than), and le (less than or equal to). These names can be either upper- or lower-case. You should not include blanks within these function specifications to allow MARK to properly retrieve models with these functions in their design matrix. As shown below, these functions can be nested to create quite complicated expressions, which may require setting a larger value of the design matrix cell size.

1. Add and Product functions.

These two functions require 2 arguments. The add function adds the 2 arguments together, whereas the product function multiplies the 2 arguments. The arguments for both functions must be one of the 3 types allowed: numeric constant, an individual covariate, or another function call. The following design matrix demonstrates the functionality of these 2 functions, where weight is an individual covariate.

1 1 1 weight product(1,weight) product(weight,weight)
1 1 2 weight product(2,weight) product(weight,weight)
1 1 3 weight product(3,weight) product(weight,weight)
1 0 add(0,1) weight product(1,weight) product(weight,weight)
1 0 add(1,1) weight product(2,weight) product(weight,weight)
1 0 add(1,2) weight product(3,weight) product(weight,weight)

Column 5 of the design matrix demonstrates creating an interaction between an individual covariate and another column (the first 3 rows) or a constant and an individual covariate (the last 3 rows). Column 6 of the design matrix demonstrates creating a quadratic effect for an individual covariate. Note that if the 2 arguments were different individual covariates, an interaction effect between 2 individual covariates would be created in column 6.

The use of the add function in column 3 is just to demonstrate examples; it would not be used in a normal application. In each case, a continuous variable is created by adding a constant values. The results are the values 1, 2, and 3, in rows 4, 5, and 6, respectively.

2. IF functions: eq (equal to), gt (greater than), ge (greater than or equal to), lt (less than), le (less than or equal to).

These five functions require 2 arguments. The eq, gt, ge, lt, and le functions will return a zero if the operation is false and a one if the operation is true. For each of these functions, 2 arguments (x1 and x2) are compared based on the function. For example, eq(x1,x2) returns 1 if x1 equals x2, and zero otherwise; gt(x1,x2) returns 1 if x1 is greater than x2, zero otherwise; and le(x1,x2) returns 1 if x1 is less than or equal to x2, zero otherwise. The arguments for these functions must be one of the 3 types allowed: numeric constant, another design matrix function, or an individual covariate. The following design matrix demonstrates the functionality of both the add function and the IF function (eq), where age is an individual covariate.

1 add(0,age) eq(0,add(0,age))
1 add(1,age) eq(0,add(1,age))
1 add(2,age) eq(0,add(2,age))
1 add(3,age) eq(0,add(3,age))
1 add(4,age) eq(0,add(4,age))
1 add(5,age) eq(0,add(5,age))

In this particular example, the individual covariate age corresponds to the number of days before a bird fledges from its nest (fledge day 0) and subsequently enters the study. Suppose an individual fledges from its nest during the fourth survival period. Its encounter history (LDLD format) would consist of 00 00 00 10 and the individual would have -3 as its age covariate because the individual did not fledge from its nest until the fourth survival period. A bird that did not fledge from its nest until survival period 20 would have -19 as its age covariate. Think of the use of negative numbers as an accounting technique to help identify when the individual fledges.

Column 2 of the design matrix demonstrates the use of the add function to create a continuous age covariate for each individual by adding a constant to age. The value returned in the first row of the second column is -3 (0 + -3 = -3). The value returned in the second row of the second column is -2 (1 + -3 = -2). The value returned in the fourth row of the second column is zero and corresponds to fledge day 0 (3 + -3 = 0). The value returned in the fifth row of the second column is one and corresponds to fledge day 1. Thus, column 2 is producing a trend effect of age on survival, with the intercept of the trend model being age zero. A trend model therefore models a constant rate of change with age on the logit scale, so that each increase in age results in a constant change in survival, either positive or negative depending on the sign of beta2.

Now, suppose that survival is thought to be different on the first day that a bird fledges, i.e., the first day that the bird enters the encounter history. To model survival as a function of fledge day 0, use the eq function to create the necessary dummy variable. This is demonstrated in the third column. The eq function returns a value of one only when the statement is true, which only occurs on the first day the bird is fledged. Recall that the value for age of this individual is -3; therefore, the add function column will return a value of -3 (0 + -3 = -3) in the first row. The eq function in the third column would return a value of zero because age (-3) is not equal to zero. The eq function in the third column, fourth row would return a value of one because age (0) is equal to (0). Note this will only be true for row four for this particular individual; all other rows return a value of zero because they are false. Thus, the eq function will produce a dummy variable allowing for a different survival rate on the first day after fledging from the trend model for age which applies thereafter.

Note that the eq function in this example is using the same results of the add function from the preceding column, and illustrates the nesting of functions.

3. Power function.

This function requires 2 arguments. The first argument is raised to the power of the second argument, i.e., the result is x1^x2. As an example, to create a squared term of the individual covariate Length, you would use power(Length,2). To create a cubic term, power(Length,3).

4. Min, Max functions.

The min function returns the minimum of the 2 arguments, whereas the max function returns the max of the 2 arguments. These functions allow the creation of thresholds with individual covariates. So, with the individual covariate Length, the function min(5,Length) would use the value of Length when the variable is <5, but replace Length with the value 5 for all Lengths>5. Similarly, max(3,Length) would replace all Lengths < 3 with the value 3.
4. Log, Exp functions.

These functions are equivalent to the natural logarithm function and the exponential function. Each only requires one argument. So, for the individual covariate Length = 2, log(Length) returns 0.693147181, and exp(Length) returns 7.389056099.

5. PriorCapL, PriorCapD functions.

These functions allow you to determine whether an animal was previously captured on the specified occasions. For example, priorcapl(i,j) will return the value of 0 if the animal was not previously captured on occasions i, i + 1, i + 2, …, j, and 1 if the animal was captured during this set of occasions. Priorcapl(i,i) is valid — again returning 0 if the animal was not captured on occasion i, and 1 if it was captured. Priorcapd(i,j) is equivalent but tests whether an animal was detected in the D part of the LD pair of the encounter history.

Example

These twelve functions are useful for constructing a design matrix when using the nest survival analysis. Here, the add and ge functions are demonstrated. Stage-specific survival (egg or nestling) could be estimated only if nests were aged and frequent nest checks were done to assess stage of failure.

1 add(0,age) GE(add(0,age),15) product(add(0,age),GE(add(0,age),15))
1 add(1,age) GE(add(1,age),15) product(add(1,age),GE(add(1,age),15))
1 add(2,age) GE(add(2,age),15) product(add(2,age),GE(add(2,age),15))
1 add(3,age) GE(add(3,age),15) product(add(3,age),GE(add(3,age),15))
1 add(4,age) GE(add(4,age),15) product(add(4,age),GE(add(4,age),15))
1 add(5,age) GE(add(5,age),15) product(add(5,age),GE(add(5,age),15))
1 add(6,age) GE(add(6,age),15) product(add(6,age),GE(add(6,age),15))
1 add(7,age) GE(add(7,age),15) product(add(7,age),GE(add(7,age),15))
1 add(8,age) GE(add(8,age),15) product(add(8,age),GE(add(8,age),15))
1 add(9,age) GE(add(9,age),15) product(add(9,age),GE(add(9,age),15))
1 add(10,age) GE(add(10,age),15) product(add(10,age),GE(add(10,age),15))
1 add(11,age) GE(add(11,age),15) product(add(11,age),GE(add(11,age),15))
1 add(12,age) GE(add(12,age),15) product(add(12,age),GE(add(12,age),15))
1 add(13,age) GE(add(13,age),15) product(add(13,age),GE(add(13,age),15))
1 add(14,age) GE(add(14,age),15) product(add(14,age),GE(add(14,age),15))
1 add(15,age) GE(add(15,age),15) product(add(15,age),GE(add(15,age),15))
1 add(16,age) GE(add(16,age),15) product(add(16,age),GE(add(16,age),15))
1 add(17,age) GE(add(17,age),15) product(add(17,age),GE(add(17,age),15))
1 add(18,age) GE(add(18,age),15) product(add(18,age),GE(add(18,age),15))

In this particular example, the age covariate corresponds to the day that the first egg was laid in a nest (nest day 0). Suppose a nest is initiated during the fourth survival period. Its encounter history (LDLD format) would consist of 00 00 00 10 and the nest would have -3 as its age covariate because the first egg was not laid in the nest until the fourth survival period.

Column 2 of the design matrix demonstrates the use of the add function to create a continuous age covariate for each nest. The value returned in the first row of the second column is -3. The value returned in the second row of the second column is -2. The value returned in the fourth row of the second column is a zero and corresponds to the initiation of egg laying. The value returned in the fifth row of the second column is one (the nest is one day old).

To model survival as a function of stage, use the ge function to quickly create the necessary dummy variable. This is demonstrated in third column. The value of 15 is used in this example because it corresponds to the number of days before a nest will hatch young Lark Buntings (Calamospiza melanocorys). Day 0 begins with the laying of the first egg, so values of 0-14 correspond to the egg stage. Values of 15-23 correspond to the nestling stage. The ge function will return a value of one (nestling stage) only when the statement is true.

Because the value of age for this nest is -3, the add function column returns a value of -3 (0 + -3 = -3) for the first row. The ge function (third column) returns a value of zero because the statement is false; age (-3) is not greater than or equal to 15. A value of one appears for the first time in row 19; here, the add function returns a value of 15 (18 + -3 = 15). The ge function returns a value of one because the statement is true; add(18,age) results in 15 which is greater than or equal to 15.

The fourth column produces an age slope variable that will be zero until the bird reaches 15 days of age, and then becomes equal to the bird’s age. The result is that the age trend model of survival now changes to a different intercept and slope once the bird hatches.

Some Useful Tricks

An easy way to prepare these complicated sets of functions is to use Excel to prepare the values and then paste them into the design matrix. The following illustrates how to used the concatenate function in Excel to concatenate together a column and a closing “)” to create a complicated column of functions that duplicate the above example.

Column A Column B    Column C    Column D
1   =concatenate(“add(age,”,A2,”)”) =concatenate(“GE(“,B2,”,15)”) =concatenate(“product(“,B2,”,”,C2,”)”)
2   =concatenate(“add(age,”,A3,”)”) =concatenate(“GE(“,B3,”,15)”) =concatenate(“product(“,B3,”,”,C3,”)”)
3   =concatenate(“add(age,”,A4,”)”) =concatenate(“GE(“,B4,”,15)”) =concatenate(“product(“,B4,”,”,C4,”)”)
…

Other Details

The design matrix values can have as many characters as specified in the design matrix cell size, and unlimited nesting of functions. The design matrix cell size is set in the Set Preferences dialog box, available from File | Preferences. As an example, the following is a very complicated way of computing a value of 1:
log(exp(log(exp(product(max(0,1),min(1,5))))))

Before the design matrix is submitted to the numerical optimizer, each entry in the design matrix is checked for a valid function name at the outermost level of nesting, plus that the number of “)” matches the number of “)”.

In previous versions of MARK, the design matrix functions were allowed to reference a value in one of the preceding columns. This capability was removed when the ability to nest functions was installed. No flexibility was lost with the removal of the “Colxx” capability, and a considerable increase if versatility was obtained with the nested design matrix function calls. As shown in the Excel “Tricks” example above, the ability to use values from other columns is still available. The “Colxx” capability was also a very error prone method in that a column could be inserted ahead of the column being referenced, and the entire model now nonsense without the user realizing that a mistake had been made. Therefore, the “Colxx” capability was removed.

Help screen initially written by Amy Yaekel-Adams.