**Original link: ** tecdat.cn/?p=3059

## Original source: Tuoduan Data Tribe Official Account

# Introduction

Analysts who deal with grouped data and complex hierarchies, from measurements embedded in participants, counties nested in states or students nested in classrooms, often find that they need modeling tools to reflect this kind of data. structure. In R, there are two main ways to fit multi-level models that take into account this structure in the data. These tutorials will show users how to use

This tutorial will show how to

- Construct varying intercepts, varying slopes, and varying slope and intercept models in R
- Generate predictions and explain parameters from mixed-effects models
- Generalized and nonlinear multi-level models
- Fully Bayesian multi-level model fitrstanOr other MCMC methods

# Set up the environment

It is easy to start multi-level modeling in R.

To install

# Major version install.packages("lme4") # Or install the development version library(devtools) install_github("lme4", user = "lme4") Copy code

# Read in data

Multi-level models are suitable for specific types of data structures, where cells are nested in groups (usually more than 5 groups), and we want to model the group structure of the data. For our introductory example, we will start

library(lme4) # load library library(arm) # function used for regression in R # summary(lmm.data) head(lmm.data) Copy code

## id extro open agree social class school ## 1 1 63.69 43.43 38.03 75.06 d IV ## 2 2 69.48 46.87 31.49 98.13 a VI ## 3 3 79.74 32.27 40.21 116.34 d VI ## 4 4 62.97 44.41 30.51 90.47 c IV ## 5 5 64.25 36.86 37.44 98.52 d IV ## 6 6 50.97 46.26 38.83 75.22 d I Copy code

# **model**

Let us first fit a simple OLS regression.

OLSexamp <- lm(extro ~ open + agree + social, data = lmm.data) display(OLSexamp) Copy code

## lm(formula = extro ~ open + agree + social, data = lmm.data) ## coef.est coef.se ## (Intercept) 57.84 3.15 ## open 0.02 0.05 ## agree 0.03 0.05 ## social 0.01 0.02 ## --- ## n = 1200, k = 4 ## residual sd = 9.34, R-Squared = 0.00 Copy code

The R model interface is very simple, first specify the dependent variable, then

If we want to extract metrics such as AIC.

MLexamp <- glm(extro ~ open + agree + social, data = lmm.data) display(MLexamp) Copy code

## glm(formula = extro ~ open + agree + social, data = lmm.data) ## coef.est coef.se ## (Intercept) 57.84 3.15 ## open 0.02 0.05 ## agree 0.03 0.05 ## social 0.01 0.02 ## --- ## n = 1200, k = 4 ## residual deviance = 104378.2, null deviance = 104432.7 (difference = 54.5) ## overdispersion parameter = 87.3 ## residual sd is sqrt(overdispersion) = 9.34 Copy code

AIC (MLexamp) Copy code

## [1] 8774Copy code

This leads to poor model fit. Let us now look at a simple model.

# Fit different models

Our next step may be to use grouping variables (such as school or class) to fit different models.

MLexamp.2 <- glm(extro ~ open + agree + social + class, data = lmm.data) display(MLexamp.2) Copy code

## glm(formula = extro ~ open + agree + social + class, data = lmm.data) ## coef.est coef.se ## (Intercept) 56.05 3.09 ## open 0.03 0.05 ## agree -0.01 0.05 ## social 0.01 0.02 ## classb 2.06 0.75 ## classc 3.70 0.75 ## classd 5.67 0.75 ## --- ## n = 1200, k = 7 ## residual deviance = 99187.7, null deviance = 104432.7 (difference = 5245.0) ## overdispersion parameter = 83.1 ## residual sd is sqrt(overdispersion) = 9.12 Copy code

AIC(MLexamp.2) Copy code

## [1] 8719Copy code

anova (MLexamp, MLexamp.2, test = "F") copy the code

## Analysis of Deviance Table ## ## Model 1: extro ~ open + agree + social ## Model 2: extro ~ open + agree + social + class ## Resid. Df Resid. Dev Df Deviance F Pr(>F) ## 1 1196 104378 ## 2 1193 99188 3 5190 20.8 3.8e-13 *** ## --- ## Signif. codes: 0'***' 0.001'**' 0.01'*' 0.05'.' 0.1 '' 1 Copy code

This is often called a fixed effect.

MLexamp.3 <- glm(extro ~ open + agree + social + school, data = lmm.data) display(MLexamp.3) Copy code

## glm(formula = extro ~ open + agree + social + school, data = lmm.data) ## coef.est coef.se ## (Intercept) 45.02 0.92 ## open 0.01 0.01 ## agree 0.03 0.02 ## social 0.00 0.00 ## schoolII 7.91 0.27 ## schoolIII 12.12 0.27 ## schoolIV 16.06 0.27 ## schoolV 20.43 0.27 ## schoolVI 28.05 0.27 ## --- ## n = 1200, k = 9 ## residual deviance = 8496.2, null deviance = 104432.7 (difference = 95936.5) ## overdispersion parameter = 7.1 ## residual sd is sqrt(overdispersion) = 2.67 Copy code

AIC(MLexamp.3) Copy code

## [1] 5774Copy code

anova (MLexamp, MLexamp.3, test = "F") copy the code

## Analysis of Deviance Table ## ## Model 1: extro ~ open + agree + social ## Model 2: extro ~ open + agree + social + school ## Resid. Df Resid. Dev Df Deviance F Pr(>F) ## 1 1196 104378 ## 2 1191 8496 5 95882 2688 <2e-16 *** ## --- ## Signif. codes: 0'***' 0.001'**' 0.01'*' 0.05'.' 0.1 '' 1 Copy code

The school factor greatly improves the fit of our model. But how do we explain these effects?

table(lmm.data$school, lmm.data$class) Copy code

## ## abcd ## I 50 50 50 50 ## II 50 50 50 50 ## III 50 50 50 50 ## IV 50 50 50 50 ## V 50 50 50 50 ## VI 50 50 50 50 Copy code

Here, we can see that we have a perfectly balanced design with 50 observations in each combination of classroom and school.

Let's try to model these unique factors.

MLexamp.4 <- glm(extro ~ open + agree + social + school:class, data = lmm.data) display(MLexamp.4) Copy code

## glm(formula = extro ~ open + agree + social + school:class, data = lmm.data) ## coef.est coef.se ## (Intercept) 80.36 0.37 ## open 0.01 0.00 ## agree -0.01 0.01 ## social 0.00 0.00 ## schoolI:classa -40.39 0.20 ## schoolII:classa -28.15 0.20 ## schoolIII:classa -23.58 0.20 ## schoolIV:classa -19.76 0.20 ## schoolV:classa -15.50 0.20 ## schoolVI:classa -10.46 0.20 ## schoolI:classb -34.60 0.20 ## schoolII:classb -26.76 0.20 ## schoolIII:classb -22.59 0.20 ## schoolIV:classb -18.71 0.20 ## schoolV:classb -14.31 0.20 ## schoolVI:classb -8.54 0.20 ## schoolI:classc -31.86 0.20 ## schoolII:classc -25.64 0.20 ## schoolIII:classc -21.58 0.20 ## schoolIV:classc -17.58 0.20 ## schoolV:classc -13.38 0.20 ## schoolVI:classc -5.58 0.20 ## schoolI:classd -30.00 0.20 ## schoolII:classd -24.57 0.20 ## schoolIII:classd -20.64 0.20 ## schoolIV:classd -16.60 0.20 ## schoolV:classd -12.04 0.20 ## --- ## n = 1200, k = 27 ## residual deviance = 1135.9, null deviance = 104432.7 (difference = 103296.8) ## overdispersion parameter = 1.0 ## residual sd is sqrt(overdispersion) = 0.98 Copy code

AIC(MLexamp.4) Copy code

## [1] 3396Copy code

This is very useful, but what if we want to understand the impact of the school and the impact of the classroom, as well as the impact of the school and the class?

MLexamp.5 <- glm(extro ~ open + agree + social + school * class-1, data = lmm.data) display(MLexamp.5) Copy code

## glm(formula = extro ~ open + agree + social + school * class- ## 1, data = lmm.data) ## coef.est coef.se ## open 0.01 0.00 ## agree -0.01 0.01 ## social 0.00 0.00 ## schoolI 39.96 0.36 ## schoolII 52.21 0.36 ## schoolIII 56.78 0.36 ## schoolIV 60.60 0.36 ## schoolV 64.86 0.36 ## schoolVI 69.90 0.36 ## classb 5.79 0.20 ## classc 8.53 0.20 ## classd 10.39 0.20 ## schoolII:classb -4.40 0.28 ## schoolIII:classb -4.80 0.28 ## schoolIV:classb -4.74 0.28 ## schoolV:classb -4.60 0.28 ## schoolVI:classb -3.87 0.28 ## schoolII:classc -6.02 0.28 ## schoolIII:classc -6.54 0.28 ## schoolIV:classc -6.36 0.28 ## schoolV:classc -6.41 0.28 ## schoolVI:classc -3.65 0.28 ## schoolII:classd -6.81 0.28 ## schoolIII:classd -7.45 0.28 ## schoolIV:classd -7.24 0.28 ## schoolV:classd -6.93 0.28 ## schoolVI:classd 0.06 0.28 ## --- ## n = 1200, k = 27 ## residual deviance = 1135.9, null deviance = 4463029.9 (difference = 4461894.0) ## overdispersion parameter = 1.0 ## residual sd is sqrt(overdispersion) = 0.98 Copy code

AIC(MLexamp.5) Copy code

## [1] 3396Copy code

# Explore random slopes

Another option is to build a separate model for each school and class combination. If we think that the relationship between our variables may be highly dependent on the school and class combination, we can simply fit a series of models and explore the parameter changes between them:

require(plyr) display(modellist[[1]]) Copy code

## glm(formula = extro ~ open + agree + social, data = x) ## coef.est coef.se ## (Intercept) 35.87 5.90 ## open 0.05 0.09 ## agree 0.02 0.10 ## social 0.01 0.03 ## --- ## n = 50, k = 4 ## residual deviance = 500.2, null deviance = 506.2 (difference = 5.9) ## overdispersion parameter = 10.9 ## residual sd is sqrt(overdispersion) = 3.30 Copy code

display(modellist[[2]]) Copy code

## glm(formula = extro ~ open + agree + social, data = x) ## coef.est coef.se ## (Intercept) 47.96 2.16 ## open -0.01 0.03 ## agree -0.03 0.03 ## social -0.01 0.01 ## --- ## n = 50, k = 4 ## residual deviance = 47.9, null deviance = 49.1 (difference = 1.2) ## overdispersion parameter = 1.0 ## residual sd is sqrt(overdispersion) = 1.02 Copy code

We will discuss this strategy in more depth in a future tutorial, including how to perform performance inferences in the list of models generated in this command.

# Build different slope models

Although all the above techniques are effective methods to solve this problem, they are not necessarily the best methods when we are clearly interested in changes between groups. This is where the mixed effects modeling framework is useful. Now we use

display(MLexamp.6) Copy code

## lmer(formula = extro ~ open + agree + social + (1 | school), ## data = lmm.data) ## coef.est coef.se ## (Intercept) 59.12 4.10 ## open 0.01 0.01 ## agree 0.03 0.02 ## social 0.00 0.00 ## ## Error terms: ## Groups Name Std.Dev. ## school (Intercept) 9.79 ## Residual 2.67 ## --- ## number of obs: 1200, groups: school, 6 ## AIC = 5836.1, DIC = 5789 ## deviance = 5806.5 Copy code

We can use multiple groups to fit multiple group effects.

display(MLexamp.7) Copy code

## lmer(formula = extro ~ open + agree + social + (1 | school) + ## (1 | class), data = lmm.data) ## coef.est coef.se ## (Intercept) 60.20 4.21 ## open 0.01 0.01 ## agree -0.01 0.01 ## social 0.00 0.00 ## ## Error terms: ## Groups Name Std.Dev. ## school (Intercept) 9.79 ## class (Intercept) 2.41 ## Residual 1.67 ## --- ## number of obs: 1200, groups: school, 6; class, 4 ## AIC = 4737.9, DIC = 4683 ## deviance = 4703.6 Copy code

Finally, we can fit the nesting with the following syntax:

display(MLexamp.8) Copy code

## lmer(formula = extro ~ open + agree + social + (1 | school/class), ## data = lmm.data) ## coef.est coef.se ## (Intercept) 60.24 4.01 ## open 0.01 0.00 ## agree -0.01 0.01 ## social 0.00 0.00 ## ## Error terms: ## Groups Name Std.Dev. ## class:school (Intercept) 2.86 ## school (Intercept) 9.69 ## Residual 0.98 ## --- ## number of obs: 1200, groups: class:school, 24; school, 6 ## AIC = 3568.6, DIC = 3508 ## deviance = 3531.1 Copy code

Copy code

it's here

# Use lmer to fit a varying slope model

However, if we want to explore the impact of different student level indicators, because they vary from classroom to classroom. We can fit different slope models instead of fitting models by school (or school/class). Here, we modify our random effects term to include variables before the grouping term:

display(MLexamp.9) Copy code

## lmer(formula = extro ~ open + agree + social + (1 + open | school/class), ## data = lmm.data) ## coef.est coef.se ## (Intercept) 60.26 3.93 ## open 0.01 0.01 ## agree -0.01 0.01 ## social 0.00 0.00 ## ## Error terms: ## Groups Name Std.Dev. Corr ## class:school (Intercept) 2.62 ## open 0.01 1.00 ## school (Intercept) 9.51 ## open 0.00 1.00 ## Residual 0.98 ## --- ## number of obs: 1200, groups: class:school, 24; school, 6 ## AIC = 3574.7, DIC = 3506 ## deviance = 3529.3 Copy code

# Conclusion

In R language and ecosystem, it is very easy to fit mixed effects models and explore group variation. In future tutorials, we will explore the comparison of models, use mixed effects models for reasoning, and create graphical representations of mixed effects models to understand their effects.

# appendix

## Platform: x86_64-w64-mingw32/x64 (64-bit) ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] plyr_1.8 arm_1.6-10 MASS_7.3-29 lme4_1.0-5 ## [5] Matrix_1.1-0 lattice_0.20-24 knitr_1.5 ## ## loaded via a namespace (and not attached): ## [1] abind_1.4-0 coda_0.16-1 evaluate_0.5.1 formatR_0.10 ## [5] grid_3.0.1 minqa_1.2.1 nlme_3.1-113 splines_3.0.1 ## [9] stringr_0.6.2 tools_3.0.1 Copy code

## Thank you very much for reading this article. If you have any questions, please leave a message below!

references

1. Lmer mixed linear regression model based on R language

3. Practical case of R language linear mixed effect model

4. Practical case of R language linear mixed effect model 2

5. Practical case of R language linear mixed effect model

6. Partially folded Gibbs sampling of Linear Mixed-Effects Models

7. R language LME4 mixed effects model to study the popularity of teachers

8. The HAR-RV model based on mixed data sampling (MIDAS) regression in R language predicts GDP growth

9. Hierarchical linear model HLM using SAS, Stata, HLM, R, SPSS and Mplus