24 Mixed Effects or Multilevel Modeling

24.1 Objectives

In this module, we extend our discussion of regression modeling even further to include “mixed effects” or “multilevel” models.

24.2 Preliminaries

Install and load these packages in R: {lme4}, {redres}, {glmmTMB}, {mixedup}, and {DHARMa}
Install the package {performance}
Load {tidyverse}, {lme4}, {effects}, and {sjPlot}
Be sure you have the packages {car}, {cowplot}, {lmtest}, and {MASS} installed

24.3 Overview of “Mixed” Models

A final extension of linear regression modeling that we will talk about is so-called “multilevel” or “mixed effects” modeling. This is a very complex topic, and we will only scratch the surface!

In a (general or generalized) linear mixed model, we have a reponse variable, $Y$, and observations that fall into different factor categories each with some set of levels (e.g., “sex” with levels “male” and “female”), and we are interested in the effects of the various factors and factor levels on the response variable. Generically, if $\mu$ = a population mean response and $\mu_A$ = mean response for observations belonging to factor level A, then the effect of A is given by $\mu$ - $\mu_A$. We have already dealt with factors and factor levels in our linear regression models when we looked at categorical predictors (e.g., sex, rank category) in our discussions of ANOVA and ANCOVA.

We can conceptualize these factor effects as being either fixed or random. Fixed factors are those that reflect all levels of interest in our study, while random effects are those that represent only a sample of the levels of interest. For example, if we include sex as a factor in a model with the factor levels “male” and “female”, this (typically) will cover the entire gamut of levels of interest our study, thus we would consider sex a fixed factor. When we were doing ANOVA and ANCOVA analyses previously, we were implicitly looking at the effects of such fixed factors.

However, if our observational data were to consist of repeated observations of the same sampling unit, e.g., measurements taken on the same set of individuals on different dates, individual ID would be considered a random factor because it is unlikely that we will have collected data from all possible “levels of interest”, i.e., from all possible individual subjects. We have not yet dealt with such random factors as an additional source of variance in our modeling.

So-called mixed models, then, are those that include BOTH fixed and random effects. Including random effects in addition to fixed effects in our models has several ramifications:

Using random effects broadens the scope of inference. That is, we can use statistical methods to infer something about the population from which the levels of the random factor have been drawn.
Using random effects naturally incorporates dependence in the model and helps us account for pseudoreplication in our dataset. Observations that share the same level of the random effects are explicitly modeled as being correlated. This makes mixed effect modeling very useful for dealing with time series data, spatially correlated data, or situations where we have repeated observations/measures from the same subjects or sampling unit.
Using random factors often gives more accurate parameter estimates.
Incorporating random factors, however, does require the use of more sophisticated estimation and fitting methods.

Not surprisingly, there several different varieties of mixed modeling approaches, which are supported in a variety of different packages in R:

Standard or general Linear Mixed Models (LMM), analogous to standard or general linear regression - used when we are dealing with normally distributed variables and error structures.
Generalized Linear Mixed Models (GLMM), analogous to generalized linear regresseion - used when we are dealing with various other variable types and error structure (e.g., binary, proportion, or count data).
Nonlinear Mixed Models (NLMM), analogous to nonlinear regression - used if we are dealing with situations where our response variable is best modeled by a nonlinear combination of predictor variables.

NOTE: We have not talked at all in this course about general or generalized NONLINEAR modeling, but it is worth knowing that such approaches are also possible. NONLINEAR modeling is where our regression equation is a nonlinear function of the model parameters.

We will explore “mixed effects” modeling using an example based on this excellent tutorial.

EXAMPLE:

Suppose we have measured the amount of grooming received by female chimpanzees when they are either in their periovulatory period (i.e., the window of 2-3 days around the likely time of ovulation) or duing other portions of their reproductive cycle. We collected data on the duration of grooming bouts received and scored a female’s reproductive condition at the time as a categorical factor with two levels: “POP” versus “NONPOP”. On top of that, we also recorded data on female parity at the time of the grooming bout, i.e., whether the female had given birth previously (was “parous”, or “P”) or had not yet had an offspring (was “nulliparous”, or “N”).

If we are interested in how reproductive condition and parity influence how much grooming a female receives, our simple regression model would look like this:

\[grooming\ duration \sim reproductive\ condition + parity + \epsilon\]

Also imagine that our study design was such that we took multiple observations per subject. That is, our data set includes records of multiple grooming bouts received by each subject. This situation violates the assumption of independence of observations that we make for standard linear regression: multiple responses/measures from the same subject cannot be regarded as independent from each other.

Using a mixed effects model, we can deal with this situation by adding subject ID as a random effect in our model. Doing so allows us to address the nonindependence issue by estimating a different set of parameters for each level of the factor “subject”. We can either estimate a different intercept for each subject (which would correspond to each female having a different “baseline” level of grooming received) or estimate a different intercept and slope (where individual subjects are presumed to differ both in the baseline level of grooming received and the strength of the relationship between grooming duration, on the one hand, and reproductive condition and parity, on the other). Our mixed effects model estimates these individual level parameters in addition to the main effects of each variable.

This is why a “mixed effects” model is called a mixed model. The models that we have considered so far have been fixed effects only models and included only one or more “fixed” predictor variables and a general error term. We essentially divided the world into things that we somehow understand or that are systematic (the fixed effects, or the explanatory variables) and things that we could not control for or do not understand (the general error, or $\epsilon$). These fixed effects models did not examine possible structure within the error term.

In a mixed model, by contrast, we add one or more random effects to our fixed effects that may explain a portion of the variance in our error term.

CHALLENGE

Let’s explore these idea using some actual data. First, load in the dataset “chimpgrooming.csv” and do some exploratory data analysis:

f <- "https://raw.githubusercontent.com/difiore/ada-datasets/main/chimpgrooming.csv"
d <- read_csv(f, col_names = TRUE)
head(d)

## # A tibble: 6 × 5
##   subject parity season reprocondition duration
##   <chr>   <chr>  <chr>  <chr>             <dbl>
## 1 Nina    P      WS     NONPOP             214.
## 2 Nina    P      WS     POP                206.
## 3 Nina    P      WS     NONPOP             294.
## 4 Nina    P      WS     POP                269.
## 5 Nina    P      WS     NONPOP             205.
## 6 Nina    P      WS     POP                287.

summary(d)

##    subject             parity             season          reprocondition    
##  Length:84          Length:84          Length:84          Length:84         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     duration     
##  Min.   : 86.34  
##  1st Qu.:132.36  
##  Median :205.41  
##  Mean   :197.81  
##  3rd Qu.:252.27  
##  Max.   :316.23

Convert predictors to factors…

d <- d |>
  mutate(
    reprocondition = as.factor(reprocondition),
    parity = as.factor(parity),
    subject = as.factor(subject)
  )

Plot grooming received duration in relation to subject ID…

Show Code

(p <- ggplot(data = d, aes(x = subject, y = duration)) +
  geom_boxplot() +
  geom_jitter(width = 0.2, alpha = 0.5) +
  xlab("subject") +
  theme(axis.text.x = element_text(angle = 90)))

Plot grooming received in relation to reproductive condition…

Show Code

(p <- ggplot(data = d, aes(x = reprocondition, y = duration)) +
  geom_boxplot() +
  xlab("reproductive condition") +
  geom_jitter(width = 0.2, alpha = 0.5) +
  theme(axis.text.x = element_text(angle = 90)))

Plot grooming received in relation to reproductive condition and parity…

Show Code

(p <- ggplot(
  data = d,
  aes(x = reprocondition, y = duration, fill = factor(parity))
) +
  geom_boxplot() +
  xlab("reproductive condition") +
  labs(fill = "parity") +
  theme(axis.text.x = element_text(angle = 90)))

Show Code

# or

(p <- ggplot(
  data = d,
  aes(x = parity, y = duration, fill = factor(reprocondition))
) +
  geom_boxplot() +
  xlab("parity") +
  labs(fill = "reprocondition") +
  theme(axis.text.x = element_text(angle = 90)))

From these plots, we can see lots of [a] individual variation in how much grooming is received (some females seem to receive more than others, overall), [b] variation in response to reproductive condition (where grooming in the POP seems to be higher, overall), [c] limited variation due to parity, and [d] variation in response to this combination of fixed effects.

Finally, plot grooming received into relation to both a fixed effect (reprocondition) and a random effect (subject)…

Show Code

(p <- ggplot(
  data = d,
  aes(x = reprocondition, y = duration, fill = factor(subject))
) +
  geom_boxplot() +
  xlab("reproductive condition") +
  labs(fill = "subject") +
  theme(axis.text.x = element_text(angle = 90)))

Show Code

# or

(p <- ggplot(
  data = d,
  aes(x = subject, y = duration, fill = factor(reprocondition))
) +
  geom_boxplot() +
  xlab("subject") +
  labs(fill = "reproductive condition") +
  theme(axis.text.x = element_text(angle = 90)))

What patterns do you see? There is, again, lots of apparent variation, and some of it seems to be associated with female ID and/or reproductive condition.

24.4 Random Intercept Models

We will now perform an initial mixed effects analysis where we look in more detail at how reproductive condition and parity (as fixed effects) affect grooming duration, where we include individual subject ID as a random effect.

Here is a first mixed effects model that we will fit, using one extension of formula notation that is commonly used in R:

\[grooming\ duration \sim reproductive\ condition + parity + (1|subject) + \epsilon\]

Here, the 1 refers to the fact that we want to estimate an intercept and the pipe operator(“|”) following the “1” signifies that we want to estimate a different intercept for each subject. Note that this generic formula still contains a general error term, $\epsilon$, to highlight that there will still be unexplained “error” variance after accounting for both fixed and random effects in the model.

We can think of this formula as saying that we expect our dataset to include multiple observations of the response variable per subject, and these responses will depend, in part, on each subject’s baseline level. This effectively accounts the nonindependence that stems from having multiple responses by the same subject.

Because our response variable is a continuous numerical variable, the model we want to run is known as a standard (or general) “Linear Mixed Model” (LMM) (we will contrast this with “Generalized Linear Mixed Models”, or GLMMs, in a moment). The {lme4} package in R is commonly used for linear mixed effects modeling, and the function lmer() is the mixed model equivalent of the function lm().

In the formula syntax for mixed effects models using the {lme4} package, fixed effects are included without parentheses while random effects are included inside of parentheses (the error, $\epsilon$, is understood and is not included explicitly).

NOTE: We could also use the package {nlme} for mixed effects modeling (which requires a slightly different formula syntax than that used here). That same package also allows us to do nonlinear mixed effects modeling, which we will not be talking about. It is important to note that {lme4} uses, by default, a slightly different parameter estimation algorithm than {nlme}. Unless otherwise specified, {lme4} uses “restricted maximum likelihood” (REML) rather than ordinary maximum likelihood estimation, which is what is used in {nlme}. In practice, these often give very similar results. We will see below that when we want to compare different models using {lme4}, we will need to tell {lme4} to use ordinary maximum likelihood.

The code block below implements this first “mixed effects” model:

m <- lmer(
  data = d,
  duration ~
    reprocondition +
    parity +
    (1 | subject)
)
summary(m)

## Linear mixed model fit by REML ['lmerMod']
## Formula: duration ~ reprocondition + parity + (1 | subject)
##    Data: d
## 
## REML criterion at convergence: 796.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2611 -0.5349 -0.1890  0.3918  3.1994 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 611.9    24.74   
##  Residual             847.3    29.11   
## Number of obs: 84, groups:  subject, 6
## 
## Fixed effects:
##                   Estimate Std. Error t value
## (Intercept)        132.841     15.305   8.680
## reproconditionPOP   20.293      6.352   3.195
## parityP            109.650     21.173   5.179
## 
## Correlation of Fixed Effects:
##             (Intr) rprPOP
## rprcndtnPOP -0.208       
## parityP     -0.692  0.000

Let’s focus on the output for the random effects first. Have a look at the column Std.Dev.. The entry for subject shows us how much variability in grooming duration (apart from that explained by the fixed effects) is due to subject ID. The entry for Residual summarizes the remaining variability in grooming duration that is not due to subject or to our fixed effects. The latter is our $\epsilon$, the “random” deviations from the predicted values that are not due to either subject or our fixed effects.

The fixed effects output mirrors the coefficient tables that we have seen previously in our linear models that have focused only on fixed effects. The coefficient “reproconditionPOP” is the $\beta$ coefficient for the categorical effect of reproductive condition. The positive sign for the coefficient means that grooming duration is GREATER by 20.293 units for POP than for NONPOP females. Then, there is a standard error associated with this slope, and a t value, which is simply the estimate divided by the standard error.

The coefficient “parityP” is the $\beta$ coefficient for the categorical effect of parity. The grooming duration associated with being parous versus nulliparous is GREATER by 109.65 units.

The INTERCEPT in this case is the grooming duration associated with being an average, nulliparous, NONPOP female. Like the lm() function, the lmer() took whatever factor level came first in the alphabet to be the reference level for each fixed effect variable.

Let’s also look at the coefficients coming out of the model and their confidence intervals.

coefficients(m)

## $subject
##       (Intercept) reproconditionPOP parityP
## Luna    142.76558          20.29286  109.65
## Maya    137.37902          20.29286  109.65
## Nina    118.37755          20.29286  109.65
## Nipa    160.80050          20.29286  109.65
## Sofia    99.80117          20.29286  109.65
## Vita    137.92047          20.29286  109.65
## 
## attr(,"class")
## [1] "coef.mer"

# or, for just for  the fixed effects...
# coefficients(summary(m))
confint(m, level = 0.95, method = "boot")

## Computing bootstrap confidence intervals ...

## 
## 9 message(s): boundary (singular) fit: see help('isSingular')

##                        2.5 %    97.5 %
## .sig01              1.957336  43.64930
## .sigma             24.206793  33.41729
## (Intercept)       105.607871 163.87647
## reproconditionPOP   7.041567  32.80258
## parityP            68.507932 150.62152

NOTE: For the confint() function, we can set method = argument either to “boot” (to generate CIs for the fixed effects based on bootstrapping), “profile” (to generate CIs based on likelihoods), or “Wald” (which generates CIs parametrically based on assuming the likelihood surface is quadratic… in general, it’s better to use “boot” or “profile”, which run more slowly than “Wald”). Also, in the output from running confint(), “.sigXX” is the square root of the variance associated with each of the random effects (here, “subject”) and “.sigma” is the square root of the variance associated with the residuals of the model. When method="Wald" is used, then variance components are not estimated.

Looking at the resultant coefficients table, we can see separate intercepts (i.e., “baseline” level of grooming received), for each female when they are nulliparous and in a NONPOP reproductive condition.

NOTE: Some females may not ever be seen both parity or reproductive conditions!

24.4.1 Visualizing Model Results

As in earlier modules, we can use the predictorEffects() function from the {effects} package to visualize our mixed model results in terms of how the response variable is expected to differ across levels of each predictor variable when the other predictor and the random effect of subject are controlled for.

plot(predictorEffects(m, partial.residuals = TRUE))

Similar effects plots for our predictors can be created using plot_model() from the {sjPlot} package.

p1 <- plot_model(m, type = "eff", terms = "reprocondition")
p2 <- plot_model(m, type = "eff", terms = "parity")
cowplot::plot_grid(p1, p2, ncol = 2)

# or plot in same figure...
p1 <- plot_model(m, type = "eff", terms = c("parity", "reprocondition"))
p2 <- plot_model(m, type = "eff", terms = c("reprocondition", "parity"))
cowplot::plot_grid(p1, p2, ncol = 2)

## Ignoring unknown labels:
## • linetype : "reprocondition"
## • shape : "reprocondition"
## Ignoring unknown labels:
## • linetype : "parity"
## • shape : "parity"

We can also use plot_model() to visualize estimates and confidence intervals for the fixed effect coefficients and for the random effects.

p1 <- plot_model(m, type = "est")
p2 <- plot_model(m, type = "re")
cowplot::plot_grid(p1, p2, ncol = 2)

The package {mixedup}, installable from here, is a useful package that provides several functions for extracting relevant summary information from mixed models.

NOTE: {mixedup} functions may throw some warnings about deprecated argument names as the package has not been updated in several years.

mixedup::summarize_model(m) # clean model summary

##     Group    Effect Variance    SD SD_2.5 SD_97.5 Var_prop
##   subject Intercept   611.94 24.74  10.57   41.75     0.42
##  Residual             847.28 29.11  24.91   34.13     0.58

##               Term  Value    SE    t P_value Lower_2.5 Upper_97.5
##          Intercept 132.84 15.30 8.68    0.00    102.84     162.84
##  reproconditionPOP  20.29  6.35 3.19    0.00      7.84      32.74
##            parityP 109.65 21.17 5.18    0.00     68.15     151.15

mixedup::extract_fixed_effects(m) # table of fixed effects coefficients

## # A tibble: 3 × 7
##   term              value    se     t p_value lower_2.5 upper_97.5
##   <chr>             <dbl> <dbl> <dbl>   <dbl>     <dbl>      <dbl>
## 1 Intercept         133.  15.3   8.68   0        103.        163. 
## 2 reproconditionPOP  20.3  6.35  3.20   0.001      7.84       32.7
## 3 parityP           110.  21.2   5.18   0         68.2       151.

mixedup::extract_random_coefficients(m) # table of coefficients for random effects... returns actual intercept (and slopes) for each group

## # A tibble: 6 × 7
##   group_var effect    group value    se lower_2.5 upper_97.5
##   <chr>     <chr>     <fct> <dbl> <dbl>     <dbl>      <dbl>
## 1 subject   Intercept Luna  143.   17.0     109.        176.
## 2 subject   Intercept Maya  137.   17.0     104.        171.
## 3 subject   Intercept Nina  118.   17.0      85.0       152.
## 4 subject   Intercept Nipa  161.   17.0     127.        194.
## 5 subject   Intercept Sofia  99.8  17.0      66.5       133.
## 6 subject   Intercept Vita  138.   17.0     105.        171.

mixedup::extract_random_effects(m) # table of coefficients for each *level* of random effect (i.e., how much each level deviates from the baseline)

## # A tibble: 6 × 7
##   group_var effect    group  value    se lower_2.5 upper_97.5
##   <chr>     <chr>     <fct>  <dbl> <dbl>     <dbl>      <dbl>
## 1 subject   Intercept Luna    9.93  7.42     -4.62     24.5  
## 2 subject   Intercept Maya    4.54  7.42    -10.0      19.1  
## 3 subject   Intercept Nina  -14.5   7.42    -29.0       0.082
## 4 subject   Intercept Nipa   28.0   7.42     13.4      42.5  
## 5 subject   Intercept Sofia -33.0   7.42    -47.6     -18.5  
## 6 subject   Intercept Vita    5.08  7.42     -9.46     19.6

24.4.2 Residual Analysis

As with any regression, it is important to conduct residual analyses after running a model. Several R packages are useful for residual analysis of mixed models, including {redres} and {DHARMa}.

24.4.2.1 Using {redres}

We can use functions from the {redres} package to conduct some analyses on the residuals resulting from a mixed effects model. The plot_resqq() function creates a QQ plot of the overall residuals from the model. Ideally, these should fall along a line, and we can see that ours do not…

plot_resqq(m)

## Loading required namespace: testthat

## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the redres package.
##   Please report the issue to the authors.

# or
qqnorm(compute_redres(m))
qqline(compute_redres(m))

# or
car::qqPlot(compute_redres(m), id = FALSE)

## Registered S3 method overwritten by 'car':
##   method           from
##   na.action.merMod lme4

… rather, the residuals from this model appear to deviate from a normal distribution, thus the appropriateness of using linear regression may be in question.

We can also pass an “xvar =” argument to the function plot_redres() to view our residuals in relation to our different predictor variables, e.g., reprocondition, parity, or subject.

The code below generates and prints all of these plots…

p1 <- plot_redres(m)
p2 <- plot_redres(m, xvar = "reprocondition")
p3 <- plot_redres(m, xvar = "parity")
p4 <- plot_redres(m, xvar = "subject")
cowplot::plot_grid(p1, p2, p3, p4, ncol = 2)

The plot_redres() function plots the raw residuals with respect to fitted values of the response variable.

In the plot of residuals by fitted values (upper left), the residuals should roughly form a horizontal belt around the zero line, with the points randomly scattered above and below 0. Such a pattern would be consistent with “homoscedasticity”, or equality of variance in the error term across values of the fitted response, based on our model. Here, things look pretty good (although perhaps the width of the “belt” of points looks like it may be wider at larger values of the fitted response).

In the other plots, we are looking to see whether the scatter (variance) above and below the zero line is similar across levels of the predictor variables.

Finally, we can use the plot_ranef() function to plot estimated modal values for the random effects (i.e., in our model, for subject conditioned on the fixed effects) against quantiles of the normal distribution. For linear regression to be appropriate, the random effects should not deviate from normal expectations.

plot_ranef(m)

This plot looks good, with all estimated intercepts for different subjects not seeming to deviate significantly from normal expectations.

NOTE: We could also potentially use the {sjPlot} function plot_model() with the “type =” argument set to “diag” to produce a similar set of diagnostic plots. However, {sjPlot} treats the mixed model somewhat like a standard linear model diagnostically (e.g., does not distinguish between marginal and conditional residuals), so the residual analysis is less robust than if we use {redres} or {DHARMa}, which are designed for working with mixed models. This is especially important for when we extend to running GLMMs instead of LMMs.

p <- plot_model(m, type = "diag")
cowplot::plot_grid(p[[1]], p[[2]]$subject, p[[3]], p[[4]], ncol = 2)

24.4.2.2 Using {DHARMa}

As mentioned above, {DHARMa} is another useful package for residual analysis. It is specifically designed for working with mixed models (both LMMs and, especially, GLMMs), and it performs residual diagnostics using a simulation-based approach. In general, {DHARMa} is considered more powerful than {redres} because it simulates from a fitted model to create residuals that should be uniformly distributed regardless of the model family. This means the same diagnostic approach works for residuals that follow distributions other than the normal ({redres} residuals are more tied to the assumption of normality). Also, because {DHARMa} simulates from the full fitted model, it naturally accounts for the correlation structure among observations of the response variable (induced by both fixed and random effects).

To use {DHARMa}, we start by simulating a large number of residual values under our model…

sim <- simulateResiduals(m, n = 1000)

We then can examine the results of this simulation to evaluate our model. We can first test our residuals overall…

testUniformity(sim)

## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.072238, p-value = 0.7731
## alternative hypothesis: two-sided

Note that this plot is a QQ plot, which can be used to detect overall deviations of our observed residuals in comparison to those simulated under our model, with added tests for correct distribution (KS test), dispersion, and outliers.

Then, since our predictors are categorical, we can use testCategorical() to test whether the residuals are uniformly distributed across the levels of each predictor. This process performs a Levene test for homogeneity of variance across levels of a predictor and a KS test for each level of the predictor. The function also plots distribution of the scaled residuals under our model in relation to levels of our categorical predictors.

testCategorical(sim, catPred = d$reprocondition)

## $uniformity
## $uniformity$details
## catPred: NONPOP
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.075238, p-value = 0.9569
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: POP
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.084143, p-value = 0.9275
## alternative hypothesis: two-sided
## 
## 
## $uniformity$p.value
## [1] 0.9569396 0.9274508
## 
## $uniformity$p.value.cor
## [1] 1 1
## 
## 
## $homogeneity
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  0.0508 0.8222
##       82

testCategorical(sim, catPred = d$parity)

## $uniformity
## $uniformity$details
## catPred: N
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.15557, p-value = 0.2613
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: P
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.11729, p-value = 0.5699
## alternative hypothesis: two-sided
## 
## 
## $uniformity$p.value
## [1] 0.2612943 0.5699420
## 
## $uniformity$p.value.cor
## [1] 0.5225887 0.5699420
## 
## 
## $homogeneity
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  2.0355 0.1575
##       82

testCategorical(sim, catPred = d$subject)

## $uniformity
## $uniformity$details
## catPred: Luna
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.21729, p-value = 0.4595
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Maya
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.26314, p-value = 0.2408
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Nina
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.227, p-value = 0.4056
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Nipa
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.6, p-value = 8.382e-05
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Sofia
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.7, p-value = 2.119e-07
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Vita
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.22329, p-value = 0.4258
## alternative hypothesis: two-sided
## 
## 
## $uniformity$p.value
## [1] 4.594611e-01 2.407827e-01 4.056400e-01 8.381883e-05 2.118936e-07
## [6] 4.257937e-01
## 
## $uniformity$p.value.cor
## [1] 1.000000e+00 9.631310e-01 1.000000e+00 4.190942e-04 1.271361e-06
## [6] 1.000000e+00
## 
## 
## $homogeneity
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value  Pr(>F)    
## group  5   4.937 0.00056 ***
##       78                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We could also create just the visualizations using the function plotResiduals().

plotResiduals(sim) # plots residuals relative to model prediction
plotResiduals(sim, form = d$reprocondition)
plotResiduals(sim, form = d$parity)
plotResiduals(sim, form = d$subject)

For our particular model, m, we do not see evidence of either over- or under-dispersion in our residuals using these diagnostics, except that the plot by subject shows that two females (Nipa and Sofia) may be poorly fitted by the model. With this information, we might want to check for data entry errors or consider whether/how those individuals may represent outliers.

{DHARMa} also contains several additional overdispersion tests that compare the dispersion of observed to simulated residuals more robustly, by directly comparing the variance of observed to simulated residuals.

testDispersion(sim)

## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.90678, p-value = 0.872
## alternative hypothesis: two.sided

A common case of overdispersion in a dataset is “zero-inflation”, which is the situation when more zeros appear in our response variable than is expected under a fitted model. {DHARMa} has a special test for zero-inflation, which compares the distribution of expected zeros in the data against the observed zeros. This test can be implemented as follows…

testZeroInflation(sim)

## 
##  DHARMa zero-inflation test via comparison to expected zeros with
##  simulation under H0 = fitted model
## 
## data:  simulationOutput
## ratioObsSim = NaN, p-value = 1
## alternative hypothesis: two.sided

For our particular model, we actually do not have any “zeros” in our responses (in our dataset, all chimpanzee females received >0 minutes of grooming during each observation session), so this particular test is moot.

We can also ask {DHARMa} to evaluate whether any of our residuals are seemingly outliers under the model.

testOutliers(sim)

## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  sim
## outliers at both margin(s) = 0, observations = 84, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.001998002
## 95 percent confidence interval:
##  0.00000000 0.04296492
## sample estimates:
## frequency of outliers (expected: 0.001998001998002 ) 
##                                                    0

Inference using LRTs

In mixed effects models, it is not as straightforward as it is for standard linear models to determine p values associated with either overall models or individual coefficients. However, using likelihood ratio tests, which we previously used for comparing among generalized linear models, is one common approach. Likelihood is the probability of seeing the data we have actually collected given a particular model. The logic of the likelihood ratio test is to compare the likelihood of two models with each other, i.e., a model that includes the factor that we are interested in versus a reduced, nested model with that factor excluded.

So… if we are interested in the effect of reproductive condition on grooming duration, we could compare a more complex model…

\[grooming\ duration \sim reproductive\ condition + parity + (1|subject) + \epsilon\]

to a nested, less complex model…

\[grooming\ duration \sim parity + (1|subject) + \epsilon\]

In R, we would do this as follows:

full <- lmer(
  data = d,
  duration ~
    reprocondition +
    parity +
    (1 | subject),
  REML = FALSE
)
# note the additional `REML=` argument
summary(full)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: duration ~ reprocondition + parity + (1 | subject)
##    Data: d
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     825.7     837.9    -407.9     815.7        79 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2996 -0.5283 -0.1783  0.4032  3.2285 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 388.6    19.71   
##  Residual             836.4    28.92   
## Number of obs: 84, groups:  subject, 6
## 
## Fixed effects:
##                   Estimate Std. Error t value
## (Intercept)        132.841     12.625  10.522
## reproconditionPOP   20.293      6.311   3.215
## parityP            109.650     17.288   6.343
## 
## Correlation of Fixed Effects:
##             (Intr) rprPOP
## rprcndtnPOP -0.250       
## parityP     -0.685  0.000

reduced <- lmer(
  data = d,
  duration ~
    parity +
    (1 | subject),
  REML = FALSE
)
summary(reduced)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: duration ~ parity + (1 | subject)
##    Data: d
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     833.4     843.2    -412.7     825.4        80 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4996 -0.6492 -0.1198  0.6594  2.7072 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 380.6    19.51   
##  Residual             947.3    30.78   
## Number of obs: 84, groups:  subject, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   142.99      12.22  11.697
## parityP       109.65      17.29   6.343
## 
## Correlation of Fixed Effects:
##         (Intr)
## parityP -0.707

NOTE: Here, we added the argument “REML = FALSE” to the lmer() function. This is necessary to do when we want to compare models using the likelihood ratio test. Basically, REML uses a different algorithm to determine likelihood values than ordinary likelihood, and, if we want to use these likelihoods to execute an LRT, we need to use ordinary likelihood. See this site for a more complete explanation of this issue.

We then perform the likelihood ratio test using the anova() function or the lrtest() function from {lmtest}.

anova(reduced, full)

## Data: d
## Models:
## reduced: duration ~ parity + (1 | subject)
## full: duration ~ reprocondition + parity + (1 | subject)
##         npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)   
## reduced    4 833.43 843.15 -412.72    825.43                        
## full       5 825.72 837.88 -407.86    815.72 9.7089  1   0.001834 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# or
lmtest::lrtest(reduced, full)

## Likelihood ratio test
## 
## Model 1: duration ~ parity + (1 | subject)
## Model 2: duration ~ reprocondition + parity + (1 | subject)
##   #Df  LogLik Df  Chisq Pr(>Chisq)   
## 1   4 -412.72                        
## 2   5 -407.86  1 9.7089   0.001834 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

These results tell us that the model containing both reproductive condition and parity as fixed effects fits the data better than a model lacking reproductive condition and containing only parity as a fixed effect, while still accounting for subject as a random effect.

CHALLENGE

Now, compare a model containing reproductive condition and parity to one containing just reproductive condition.

Show Code

full <- lmer(
  data = d,
  duration ~
    reprocondition +
    parity +
    (1 | subject),
  REML = FALSE
)

reduced <- lmer(
  data = d,
  duration ~ reprocondition +
    (1 | subject),
  REML = FALSE
)

anova(reduced, full)

Show Output

## Data: d
## Models:
## reduced: duration ~ reprocondition + (1 | subject)
## full: duration ~ reprocondition + parity + (1 | subject)
##         npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)    
## reduced    4 835.97 845.70 -413.99    827.97                         
## full       5 825.72 837.88 -407.86    815.72 12.251  1   0.000465 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on this result, including parity as well as reproductive condition as fixed effects also significantly improves the fit of our model.

CHALLENGE

Construct a model that includes an interaction of reproductive condition and parity and compare it to a model without the interaction term. Is the interaction of these two fixed effects significant?

Show Code

full <- lmer(
  data = d,
  duration ~
    reprocondition * parity +
    (1 | subject),
  REML = FALSE
)

reduced <- lmer(
  data = d,
  duration ~
    reprocondition +
    parity +
    (1 | subject),
  REML = FALSE
)

anova(reduced, full)

Show Output

## Data: d
## Models:
## reduced: duration ~ reprocondition + parity + (1 | subject)
## full: duration ~ reprocondition * parity + (1 | subject)
##         npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)
## reduced    5 825.72 837.88 -407.86    815.72                     
## full       6 826.41 840.99 -407.20    814.41 1.3124  1      0.252

In this case, adding the interaction of reproductive condition and parity does not significantly improve the explanatory power of the model.

24.5 Random Slope Models

In the exercise above, we included only estimation of a separate INTERCEPT for each female and presumed that the same relationships (SLOPES, or regression coefficients) between grooming duration (as our response variable), and reproductive condition and parity (as our predictors) obtained for all females. But we can also allow those relationships between response and predictor to vary from subject to subject. We would indicate this model in formula notation as follows:

\[grooming\ duration \sim reproductive\ condition + parity\ +\] \[(1 + reproductive\ condition|subject) + (1 + parity|subject) + \epsilon\]

NOTE: In this particular example, we are using the term “slopes” loosely, because our predictors (reprocondition and parity) are categorical, not continuous. So, our regression coefficients for these fixed predictors are not “slopes” per se but rather differences in average grooming between levels of the predictor (or, more specifically, between each level and whatever level we set as baseline).

m <- lmer(
  data = d,
  duration ~
    reprocondition +
    parity +
    (1 + reprocondition | subject) +
    (1 + parity | subject),
  REML = FALSE
)
summary(m)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: duration ~ reprocondition + parity + (1 + reprocondition | subject) +  
##     (1 + parity | subject)
##    Data: d
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     833.6     857.9    -406.8     813.6        74 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4515 -0.6093 -0.1559  0.3851  3.2152 
## 
## Random effects:
##  Groups    Name              Variance Std.Dev. Corr  
##  subject   (Intercept)        83.987   9.164         
##            reproconditionPOP   1.465   1.210   -1.00 
##  subject.1 (Intercept)       628.564  25.071         
##            parityP           628.289  25.066   -1.00 
##  Residual                    835.800  28.910         
## Number of obs: 84, groups:  subject, 6
## 
## Fixed effects:
##                   Estimate Std. Error t value
## (Intercept)        132.530     16.294   8.134
## reproconditionPOP   20.293      6.328   3.207
## parityP            110.271     17.259   6.389
## 
## Correlation of Fixed Effects:
##             (Intr) rprPOP
## rprcndtnPOP -0.211       
## parityP     -0.902  0.000
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Here, we have changed the random effects portion of the formula, which now looks a little more complicated. The notation “(1 + reprocondition | subject)” tells the model to estimate differing baseline levels of grooming duration (the intercept, represented by 1) as well as differing responses to the fixed factor in question, which is reproductive condition in this case. We do the same for parity.

NOTE: It is equivalent to use just “(reprocondition | subject)” rather than “(1 + reprocondition | subject)” to specify that we want our analysis to estimate both an intercept and slope coefficient for each subject. We use the latter to be explicit, though, as we could used “(0 + reprocondition | subject)” to return a single intercept but different slopes for each subject, which is sometimes referred to as a “fixed intercept” model.

Looking at the coefficients of our new model, we see the effects. Each female now has a different intercept and a different coefficient for the relationship between grooming duration and both reproductive condition and parity.

coefficients(m)

## $subject
##       (Intercept) reproconditionPOP   parityP
## Luna     144.8585          19.47879 110.25970
## Maya     137.6459          19.95506 110.26557
## Nina     113.0923          21.57640 110.28723
## Nipa     139.1931          19.85290  84.77452
## Sofia    125.3352          20.76797 140.07016
## Vita     135.0568          20.12603 105.96782
## 
## attr(,"class")
## [1] "coef.mer"

confint(m, level = 0.95, method = "boot")

##                           2.5 %      97.5 %
## .sig01             0.000000e+00  20.7892909
## .sig02             1.240938e-04  17.9538330
## .sig03            -1.000000e+00   1.0000000
## .sig04             0.000000e+00  40.3548157
## .sig05             2.140541e-04  39.8500034
## .sig06            -1.000000e+00  -0.4588723
## .sigma             2.374088e+01  32.9623182
## (Intercept)        1.006762e+02 165.4176545
## reproconditionPOP  7.962468e+00  32.1545790
## parityP            7.591530e+01 142.3610436

Finally, effects plots of how the expected amount of grooming received varies with the fixed effects (reprocondition and parity) under our new random slope model is the same as that under the random intercept model.

plot(predictorEffects(m, partial.residuals = TRUE))

# or

plot_model(m, type = "eff", term = "reprocondition")

plot_model(m, type = "eff", term = "parity")

mixedup::summarize_model(m)

##      Group            Effect Variance    SD SD_2.5 SD_97.5 Var_prop
##    subject         Intercept    83.99  9.16   0.00    0.62     0.04
##    subject reproconditionPOP     1.46  1.21   0.00   18.66     0.00
##  subject.1         Intercept   628.56 25.07   0.00    0.62     0.29
##  subject.1           parityP   628.29 25.07   0.00   78.87     0.29
##   Residual                     835.80 28.91  24.90   34.12     0.38

##               Term  Value    SE    t P_value Lower_2.5 Upper_97.5
##          Intercept 132.53 16.29 8.13    0.00    100.59     164.47
##  reproconditionPOP  20.29  6.33 3.21    0.00      7.89      32.70
##            parityP 110.27 17.26 6.39    0.00     76.44     144.10

mixedup::extract_fixed_effects(m)

## # A tibble: 3 × 7
##   term              value    se     t p_value lower_2.5 upper_97.5
##   <chr>             <dbl> <dbl> <dbl>   <dbl>     <dbl>      <dbl>
## 1 Intercept         133.  16.3   8.13   0        101.        164. 
## 2 reproconditionPOP  20.3  6.33  3.21   0.001      7.89       32.7
## 3 parityP           110.  17.3   6.39   0         76.4       144.

NOTE: Here, running the functions extract_random_coefficients() and extract_random_effect() from {mixedup} would throw an error because we would be generating two (Intercept) terms per random effect level, one for each of two predictors.

24.5.1 Residual Analysis

Once again, we can use the {redres} or {DHARMa} packages to do some visual residual analysis…

24.5.1.1 Using {redres}

plot_resqq(m)

plot_redres(m)

plot_redres(m, xvar = "subject")

NOTE: Here, running the functions plot_ranef() would throw an error because we would be generating two (Intercept) terms per random effect level, one for each of two predictors. If we wanted, we could run the ranef() function on the model, look at the $subject dataframe, and pull out the columns we wanted to plot against normal quantiles using qqPlot() or qqnorm().

24.5.1.2 Using {DHARMa}

sim <- simulateResiduals(m, n = 1000)
testUniformity(sim)

## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.056333, p-value = 0.9388
## alternative hypothesis: two-sided

testCategorical(sim, catPred = d$reprocondition)

## $uniformity
## $uniformity$details
## catPred: NONPOP
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.092048, p-value = 0.8369
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: POP
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.089952, p-value = 0.8559
## alternative hypothesis: two-sided
## 
## 
## $uniformity$p.value
## [1] 0.8368682 0.8558761
## 
## $uniformity$p.value.cor
## [1] 1 1
## 
## 
## $homogeneity
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  0.0024 0.9611
##       82

testCategorical(sim, catPred = d$parity)

## $uniformity
## $uniformity$details
## catPred: N
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.132, p-value = 0.4208
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: P
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.085476, p-value = 0.8931
## alternative hypothesis: two-sided
## 
## 
## $uniformity$p.value
## [1] 0.4207942 0.8930671
## 
## $uniformity$p.value.cor
## [1] 0.8415884 0.8930671
## 
## 
## $homogeneity
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  0.0112  0.916
##       82

testCategorical(sim, catPred = d$subject)

## $uniformity
## $uniformity$details
## catPred: Luna
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.24014, p-value = 0.339
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Maya
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.25214, p-value = 0.2848
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Nina
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.296, p-value = 0.1394
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Nipa
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.579, p-value = 5.555e-05
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Sofia
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.691, p-value = 3.419e-07
## alternative hypothesis: two-sided
## 
## ------------------------------------------------------------ 
## catPred: Vita
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  dd[x, ]
## D = 0.22829, p-value = 0.3988
## alternative hypothesis: two-sided
## 
## 
## $uniformity$p.value
## [1] 3.389582e-01 2.847659e-01 1.393667e-01 5.555176e-05 3.419318e-07
## [6] 3.987929e-01
## 
## $uniformity$p.value.cor
## [1] 8.542976e-01 8.542976e-01 5.574670e-01 2.777588e-04 2.051591e-06
## [6] 8.542976e-01
## 
## 
## $homogeneity
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value    Pr(>F)    
## group  5  5.8287 0.0001262 ***
##       78                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Inference using LRTs

To then get p values associated with each of the fixed factors, we can use likelihood ratio tests…

# random effects only
null <- lmer(
  data = d,
  duration ~
    (1 + reprocondition | subject) +
    (1 + parity | subject),
  REML = FALSE
)

# full model with both fixed effects
full <- lmer(
  data = d,
  duration ~
    reprocondition +
    parity +
    (1 + reprocondition | subject) +
    (1 + parity | subject),
  REML = FALSE
)

# model without reproductive condition as a fixed effect
minusRC <- lmer(
  data = d,
  duration ~
    parity +
    (1 + reprocondition | subject) +
    (1 + parity | subject),
  REML = FALSE
)

# model without parity as a fixed effect
minusP <- lmer(
  data = d,
  duration ~
    reprocondition +
    (1 + reprocondition | subject) +
    (1 + parity | subject),
  REML = FALSE
)

# p value for reproductive condition as a fixed effect
anova(minusRC, full)

## Data: d
## Models:
## minusRC: duration ~ parity + (1 + reprocondition | subject) + (1 + parity | subject)
## full: duration ~ reprocondition + parity + (1 + reprocondition | subject) + (1 + parity | subject)
##         npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)   
## minusRC    9 840.23 862.11 -411.11    822.23                        
## full      10 833.63 857.94 -406.82    813.63 8.5963  1   0.003368 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p value for parity as a fixed effect
anova(minusP, full)

## Data: d
## Models:
## minusP: duration ~ reprocondition + (1 + reprocondition | subject) + (1 + parity | subject)
## full: duration ~ reprocondition + parity + (1 + reprocondition | subject) + (1 + parity | subject)
##        npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)   
## minusP    9 840.08 861.96 -411.04    822.08                        
## full     10 833.63 857.94 -406.82    813.63 8.4476  1   0.003655 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(predictorEffects(full, partial.residuals = TRUE))

Inference using AIC

For a long time, the appropriateness of our mixed models was assessed as above – i.e., by evaluating the significance of each fixed effect using LRTs. As information theoretic approaches have become more popular, it is increasingly common to assess model fit by comparing the AIC values of different models, acknowledging the caveat that AIC can only tell us about the relative fit of alternative models, but not whether a particular model is a good fit, overall. Recall that AIC values are a way of inverting and scaling model log-likelihoods that penalizes models with greater numbers of parameters.

The aictab() function from {AICcmodavg} neatly prints out tables with AIC, Delta AIC, and log-likelihood values, along with AIC “weights”.

library(AICcmodavg)
(aic_table <- aictab(list(full, minusRC, minusP, null),
  modnames = c("full", "minusRC", "minusP", "null")
))

## 
## Model selection based on AICc:
## 
##          K   AICc Delta_AICc AICcWt Cum.Wt      LL
## full    10 836.65       0.00   0.91   0.91 -406.82
## minusP   9 842.51       5.87   0.05   0.95 -411.04
## minusRC  9 842.66       6.02   0.04   1.00 -411.11
## null     8 851.47      14.82   0.00   1.00 -416.77

detach(package:AICcmodavg)

NOTE: Here we are printing out AICc values, rather than AIC values. AICc is simply a version of AIC with a correction added for small sample sizes. To print the uncorrected AIC values, which appear in the anova() LRT output, we can add the argument “second.ord = TRUE” to the aictab() function.

\[AIC_c = AIC + \frac{2K^2 + 2K}{n-K-1}\]

where $K$ is the number of parameters and $n$ is the sample size.

In the table, for each model, $K$ is the number of model parameters, the $Delta\ AICc$ value is the difference between that model’s AICc and the best model’s AICc (again, here, that is the full model), and the $AICc\ weight$ is the relative likelihood of that model. [The weights for a particular set of models sum to 1, with each weight equal to the model’s likelihood divided by the summed likelihoods across all models.]

Where the best model has a very high Akaike weight, e.g., >0.9, it is reasonable to base inferences about the included variables on that single most parsimonious model, but when several models rank highly (e.g., several models have Delta AICc values <2 to 4), it is common to model-average effect sizes for the variables that have the most support across that set of models. That is, “model averaging” means making inferences based on a set of candidate models, instead of on a single “best” model.

Here, note that the full model containing both reproductive condition and parity has the highest likelihood (the least negative log-likelihood value) and a much lower AICc than any of the less complex alternative models tested. It also has a very high Akaike weight (0.91).

NOTE: When running a number of the above models and/or in doing the likelihood ratio tests, we saw a significant result but we also got either a warning that our null models “failed to converge” or we saw a warning about a “boundary (singular) fit”. Both of these warning are due to having a LOT of parameters we are trying to estimate relative to the number of observations we have. Dealing with lack of convergence in fitting maximum likelihood models is beyond what we can cover here, but I encourage you to explore that on your own!

Other Methods for Assessing Fit

Using AIC values for evaluating and expressing how well a particular model fits a dataset has some critical limitations:

While AIC provides an estimate of the relative fit of various models, it does not say anything about the absolute fit
AIC does not address the amount of variance in a response variable explained by a model
AIC is not comparable across datasets, and so fit is not generalizeable

Nakagawa & Schielzeth (2013) and Nakagawa et al. (2017) have published a simple and effective method for calculating a type of pseudo-$R^2$ (or coefficient of determination) value for generalized linear mixed models, and because linear mixed models are a specific type of GLMM, this method can be used with LMMs as well.

See:

Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4, 133–142.

Nakagawa, S., Johnson, P. C. D., & Schielzeth, H. (2017). The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society Interface, 14, article 20170213.

In these papers, two measures are proposed for characterizing the amount of “variance explained” for mixed effects models:

$marginal\ R2GLMM(m)^2$ is the amount of variance explained by the model on the latent (or link) scale rather than the original scale. We can interpret this as a measure of the variance explained by only the fixed effects in the model.
$conditional\ R2GLMM(c)^2$ is the amount of variance explained by both fixed and random effects, i.e., by the entire model.

There is an easy way to calculate these two R2GLMM values in R using the r.squaredGLMM() function in the package {MuMIn} or the function r2_nakagawa() from the {performance} package.

CHALLENGE

Compare the full, the two reduced, and the null mixed effects models from our random slope exercise using an information theoretic approach. Is the best model (full) the one that explains the greatest amount of variance in the dataset? In the full model, how much more of the total variance is explained by the random effects than by the fixed effects alone?

aic_table # re-print the AIC table

## 
## Model selection based on AICc:
## 
##          K   AICc Delta_AICc AICcWt Cum.Wt      LL
## full    10 836.65       0.00   0.91   0.91 -406.82
## minusP   9 842.51       5.87   0.05   0.95 -411.04
## minusRC  9 842.66       6.02   0.04   1.00 -411.11
## null     8 851.47      14.82   0.00   1.00 -416.77

library(MuMIn)
r.squaredGLMM(full)

##            R2m       R2c
## [1,] 0.7221645 0.8102369

r.squaredGLMM(minusRC)

##            R2m    R2c
## [1,] 0.6778132 0.7898

r.squaredGLMM(minusP)

##             R2m       R2c
## [1,] 0.01443998 0.8840765

r.squaredGLMM(null)

##      R2m       R2c
## [1,]   0 0.8714861

detach(package:MuMIn)

# or
library(performance)
r2_nakagawa(full)

## Random effect variances not available. Returned R2 does not account for random effects.

## # R2 for Mixed Models
## 
##   Conditional R2: NA
##      Marginal R2: 0.792

r2_nakagawa(minusRC)

## Random effect variances not available. Returned R2 does not account for random effects.

## # R2 for Mixed Models
## 
##   Conditional R2: NA
##      Marginal R2: 0.763

r2_nakagawa(minusP)

## Random effect variances not available. Returned R2 does not account for random effects.

## # R2 for Mixed Models
## 
##   Conditional R2: NA
##      Marginal R2: 0.111

r2_nakagawa(null)

## Random effect variances not available. Returned R2 does not account for random effects.

## # R2 for Mixed Models
## 
##   Conditional R2: NA
##      Marginal R2: 0.000

detach(package:performance)

24.6 Generalized LMMs

Just as we extended our standard linear modeling approach to include non-normally distributed response variables/error structures, so too can we extend our mixed effects modeling to such situations. This is referred to as generalized linear mixed modeling, or GLMM. There are several R packages we can use to do this under either a maximum likelihood (e.g., {lme4}, {glmmML}, the no-longer-maintained {glmmboot}) or Bayesian (e.g., {MCMCglmm}, {glmmTMB}, {brms}) framework. The methods for generating maximum likelihood and Bayesian parameter estimates under GLMMs are more complicated, but conceptually, the process is an extension of what we have talked about already. Below, we explore such a scenario.

CHALLENGE

Boden-Parry et al. (2020) studied the effect of food type and abundance on the begging and food sharing behavior of otters in captivity. Here, we work with the dataset provided in the article below and replicate one of their models.

Bowden-Parry, M.; Postma, E.; and Boogert, N.J. (2020). Effects of food type and abundance on begging and sharing in Asian small-clawed otters (Aonyx cinereus). PeerJ 8: e10369.

Step 1

Read in the data set and look at the variables included.

f <- "https://raw.githubusercontent.com/difiore/ada-datasets/main/Bowden-ParryOtterdata.csv"
d <- read_csv(f, col_names = TRUE)
head(d)

## # A tibble: 6 × 22
##   zoo   ID    fooddensity foodtype trialorder forcedclaim unsuccessfulforcedcl…¹
##   <chr> <chr> <chr>       <chr>         <dbl>       <dbl>                  <dbl>
## 1 T     Feet  medium      trout             7           0                      0
## 2 T     Feet  a_low       trout             8           0                      0
## 3 T     Feet  high        trout             9           0                      0
## 4 T     Feet  medium      crab              1           0                      0
## 5 T     Feet  a_low       crab              2           0                      0
## 6 T     Feet  high        crab              3           0                      0
## # ℹ abbreviated name: ¹unsuccessfulforcedclaim
## # ℹ 15 more variables: activebeg <dbl>, passivebeg <dbl>, totalbeg <dbl>,
## #   collectnear <dbl>, relaxedclaim <dbl>, foodgiving <dbl>,
## #   totalshareproonly <dbl>, attackfromowner <dbl>, trialduration <dbl>,
## #   agecategory <chr>, sex <chr>, begreceived <dbl>,
## #   ownersharebecauseharassed <dbl>, TotalSharebothharasspro <dbl>,
## #   sharingbybegonly <dbl>

Step 2

Create a new random variable, trial, that joins together the variables zoo and trialorder. This will be used as a random effect.
Create a new random variable, obs, that is simply an observation number. This will also be used as a random effect.

HINT: Check out the useful function row_to_column_id()

Rename the variable TotalSharebothharasspro as Shared.
Rename the variable begreceived as BegRec.

Show Code

d <- d |>
  mutate(trial = paste0(zoo, trialorder))
d <- rowid_to_column(d, var = "obs")
d <- d |>
  rename(
    Shared = TotalSharebothharasspro,
    BegRec = begreceived
  )

Step 3

Plot total food transferred (Shared) in relation to begging received (BegReceived).

Show Code

ggplot(d, aes(x = BegRec, y = Shared)) +
  geom_jitter() +
  xlab("Begging Received (total number)") +
  ylab("Food Transfers (total number)") +
  theme(text = element_text(size = 12))

Step 4

Does the amount of begging received influence sharing frequency?

To test this, the researchers used a GLMM with Poisson error distribution to examine how the number of “food transfers” between otters (Shared) in trials was related to the total number of instances of “begging received” (BegRec) as a fixed effect.

We also need to consider that the response variable, Shared, is implicitly expressed as a rate, i.e., it represents a count of the number of events that occured in over a particular length of time (the variable, trial duration). This allows the observation time to vary among records in the dataset. Our regression model, then, needs to contain time on the right-hand side of the regression formula in some way to acknowledge that counts are in some way dependent on the time over which they are collected (i.e., counts are expected to only increase with time).

Additionally, recall that, in Poisson regression, given that our response variable is a count of events that occur within a given amount of time, what we are actually modeling as a function of our predictors is log(count)… so we need to include log(time) as what known as an offset term on the right-hand side of the formula.

In R, we use the function offset() to indicate this in our regression formula. The offset term is, essentially, modeled as an additional fixed effect though we will not generate a coefficient for it.

\[log(count) \sim offset(log(time))\ +\ ...\ other\ predictors\] Finally, the researchers included three random effects in their model: otter ID, trial, and an observation-level random effect. The first of these appears in the author’s data frame, and latter two are our variables trial and obs.

NOTE: It does not entirely make sense to me to include the third of these random effects as there is no variance associated with this term, but to replicate the researchers’ model, we include it. Conceptually, it might be defensible to do so because there are lots of unmeasured variables that could vary between observations and could impact the response that we are not including in our model (e.g., time of day when sampling is done or proximity in time of when the animals were sampled relative to visitor density at the zoo) and we want to account for this additional source of random variation.

Show Code

m1 <- glmer(
  Shared ~
    BegRec +
    offset(log(trialduration / 60)) +
    (1 | ID) +
    (1 | trial) +
    (1 | obs),
  data = d,
  family = poisson(link = "log")
)
summary(m1)

Show Output

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: Shared ~ BegRec + offset(log(trialduration/60)) + (1 | ID) +  
##     (1 | trial) + (1 | obs)
##    Data: d
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     666.9     682.9    -328.5     656.9       175 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.34821 -0.80346 -0.04455  0.48868  2.15070 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  obs    (Intercept) 0.23312  0.4828  
##  ID     (Intercept) 0.06723  0.2593  
##  trial  (Intercept) 0.14429  0.3799  
## Number of obs: 180, groups:  obs, 180; ID, 20; trial, 18
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.65526    0.14568 -18.226  < 2e-16 ***
## BegRec       0.09057    0.02543   3.562 0.000368 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr)
## BegRec -0.372

Show Code

m2 <- glmer(
  Shared ~
    offset(log(trialduration / 60)) +
    (1 | ID) +
    (1 | trial) +
    (1 | obs),
  data = d,
  family = poisson(link = log)
)

anova(m1, m2, test = "F")

Show Output

## Data: d
## Models:
## m2: Shared ~ offset(log(trialduration/60)) + (1 | ID) + (1 | trial) + (1 | obs)
## m1: Shared ~ BegRec + offset(log(trialduration/60)) + (1 | ID) + (1 | trial) + (1 | obs)
##    npar    AIC    BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)    
## m2    4 678.11 690.88 -335.05    670.11                         
## m1    5 666.91 682.88 -328.46    656.91 13.193  1   0.000281 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show Code

# or
lmtest::lrtest(m1, m2)

Show Output

## Likelihood ratio test
## 
## Model 1: Shared ~ BegRec + offset(log(trialduration/60)) + (1 | ID) + 
##     (1 | trial) + (1 | obs)
## Model 2: Shared ~ offset(log(trialduration/60)) + (1 | ID) + (1 | trial) + 
##     (1 | obs)
##   #Df  LogLik Df  Chisq Pr(>Chisq)    
## 1   5 -328.46                         
## 2   4 -335.05 -1 13.193   0.000281 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Looking at these results, we would conclude that the model with BegRec has a lower AIC, higher log likelihood, lower deviance than the model without BegRec, thus variance in the food sharing is associated with the amount of begging received. The coefficent for BegRec (~0.09) is positive, indicating that sharing increases with the amount of begging received.

Again, the {mixedup} package allows us to neatly extract summary information from mixed effects models.

mixedup::summarize_model(m1)

##  Group    Effect Variance   SD SD_2.5 SD_97.5 Var_prop
##    obs Intercept     0.23 0.48   0.27    0.70     0.52
##     ID Intercept     0.07 0.26   0.00    0.51     0.15
##  trial Intercept     0.14 0.38   0.19    0.64     0.32

##       Term Value   SE      Z P_value Lower_2.5 Upper_97.5
##  Intercept -2.66 0.15 -18.23    0.00     -2.94      -2.37
##     BegRec  0.09 0.03   3.56    0.00      0.04       0.14

mixedup::extract_fixed_effects(m1)

## # A tibble: 2 × 7
##   term       value    se      z p_value lower_2.5 upper_97.5
##   <chr>      <dbl> <dbl>  <dbl>   <dbl>     <dbl>      <dbl>
## 1 Intercept -2.66  0.146 -18.2        0    -2.94       -2.37
## 2 BegRec     0.091 0.025   3.56       0     0.041       0.14

mixedup::extract_random_coefficients(m1) |> head()

## # A tibble: 6 × 7
##   group_var effect    group value    se lower_2.5 upper_97.5
##   <chr>     <chr>     <fct> <dbl> <dbl>     <dbl>      <dbl>
## 1 obs       Intercept 1     -2.87 0.64      -4.12      -1.61
## 2 obs       Intercept 2     -2.86 0.641     -4.12      -1.61
## 3 obs       Intercept 3     -2.38 0.616     -3.59      -1.18
## 4 obs       Intercept 4     -2.99 0.62      -4.20      -1.77
## 5 obs       Intercept 5     -2.84 0.646     -4.10      -1.57
## 6 obs       Intercept 6     -2.81 0.65      -4.09      -1.54

mixedup::extract_random_effects(m1) |> head()

## # A tibble: 6 × 7
##   group_var effect    group  value    se lower_2.5 upper_97.5
##   <chr>     <chr>     <fct>  <dbl> <dbl>     <dbl>      <dbl>
## 1 obs       Intercept 1     -0.212 0.623     -1.43      1.01 
## 2 obs       Intercept 2     -0.207 0.624     -1.43      1.02 
## 3 obs       Intercept 3      0.273 0.598     -0.9       1.45 
## 4 obs       Intercept 4     -0.332 0.603     -1.52      0.851
## 5 obs       Intercept 5     -0.18  0.629     -1.41      1.05 
## 6 obs       Intercept 6     -0.158 0.633     -1.40      1.08

The glmmTMB() function from {glmmTMB} can also be used for mixed effects modeling and gives very similar results…

m3 <- glmmTMB(
  Shared ~
    BegRec +
    offset(log(trialduration / 60)) +
    (1 | ID) +
    (1 | trial) +
    (1 | obs),
  data = d,
  family = poisson(link = "log")
)

mixedup::summarise_model(m3)

## 
## Variance Components:

##  Group    Effect Variance   SD SD_2.5 SD_97.5 Var_prop
##     ID Intercept     0.07 0.26   0.11    0.61     0.15
##  trial Intercept     0.14 0.38   0.22    0.65     0.32
##    obs Intercept     0.23 0.48   0.32    0.74     0.52

## 
## Fixed Effects:

##       Term Value   SE      Z P_value Lower_2.5 Upper_97.5
##  Intercept -2.66 0.15 -18.23    0.00     -2.94      -2.37
##     BegRec  0.09 0.03   3.56    0.00      0.04       0.14

m4 <- glmmTMB(
  Shared ~
    offset(log(trialduration / 60)) +
    (1 | ID) +
    (1 | trial) +
    (1 | obs),
  data = d,
  family = poisson(link = "log")
)

anova(m3, m4, test = "F")

## Data: d
## Models:
## m4: Shared ~ offset(log(trialduration/60)) + (1 | ID) + (1 | trial) + , zi=~0, disp=~1
## m4:     (1 | obs), zi=~0, disp=~1
## m3: Shared ~ BegRec + offset(log(trialduration/60)) + (1 | ID) + , zi=~0, disp=~1
## m3:     (1 | trial) + (1 | obs), zi=~0, disp=~1
##    Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## m4  4 678.10 690.87 -335.05   670.10                             
## m3  5 666.91 682.88 -328.46   656.91 13.185      1  0.0002823 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finally, the {MASS} package provides yet another function, glmmPQL(), that can be used for mixed effects modeling. The structure for specifying the model is different than we have used before, with a list of the random effects being presented as a separate argument from the model including the fixed effects. Note the absence of obs as a random effect in the model below… including it causes the model to fail to run due to singular convergence, i.e., the variance in that random effect variable is essentially zero.

m5 <- MASS::glmmPQL(
  Shared ~
    BegRec +
    offset(log(trialduration / 60)),
  random = list(ID = ~1, trial = ~1),
  data = d,
  family = poisson(link = "log")
)

mixedup::summarise_model(m5)

##     Group    Effect Variance   SD SD_2.5       SD_97.5 Var_prop
##        ID Intercept     0.00 0.00   0.00 1.837003e+136     0.00
##     trial Intercept     0.00 0.00   0.00  5.348731e+13     0.00
##  Residual        NA     2.08 1.44   1.30  1.600000e+00     1.00

##       Term Value   SE      Z P_value Lower_2.5 Upper_97.5
##  Intercept -2.51 0.09 -27.76    0.00     -2.69      -2.33
##     BegRec  0.08 0.02   4.83    0.00      0.05       0.11

NOTE: The coefficient values estimated by glmmPQL() are similar to, but not the same, as those estimated by glmer() and glmmTMB().

Concept Review

“Mixed” modeling extends general and generalized linear modeling to consider cases where we have additional “random” factors that are another source of possible variation in our response variable
The approach allows us to estimate either different intercepts or different slopes and intercepts for each level of the random factor
As in generalized linear modeling, likelihood ratio tests and information criteria approaches can be used to compare the explanatory power of different models
Though we do not address it in this class, regression modeling can be extended further to consider NONLINEAR relationships among predictor variables
- The R packages {lme4} and {nlme} include functions for performing for “nonlinear mixed effects modeling”

The website Mixed Models with R is great resource for information about mixed effects modeling.