Announcing a major update to merTools

merTools is an R package that is designed to make working with multilevel models from lme4, particularly large models with many random effects, fast and easy. With merTools you can generate prediction intervals that incorporate various components of uncertainty (fixed effect, random effect, and model uncertainty), you can get the expected rank of individual random effect levels (a combination of magnitude and precision of the estimate) and you can explore the substantive effect of variables in the model using a Shiny application interactively!

Recently, we've updated the package to significantly improve performance and accuracy. You can get it on CRAN now. 

Below are some updates from the To learn more check out the package development on GitHub. You can also read previous a previous blog entry discussing the package and its uses.

merTools 0.3.0

  • Improve handling of formulas. If the original merMod has functions specified in the formula, the draw and wiggle functions will check for this and attempt to respect these variable transformations. Where this is not possible a warning will be issued. Most common transformations are respected as long as the the original variable is passed untransformed to the model.
  • Change the calculations of the residual variance. Previously residual variance was used to inflate both the variance around the fixed parameters and around the predicted values themselves. This was incorrect and resulted in overly conservative estimates. Now the residual variance is appropriately only used around the final predictions
  • New option for predictInterval that allows the user to return the full interval, the fixed component, the random component, or the fixed and each random component separately for each observation
  • Fixed a bug with slope+intercept random terms that caused a miscalculation of the random component
  • Add comparison to rstanarm to the Vignette
  • Make expectedRank output more tidy like and allow function to calculate expected rank for all terms at once
    • Note, this breaks the API by changing the names of the columns in the output of this function
  • Remove tests that test for timing to avoid issues with R-devel JIT compiler
  • Remove plyr and replace with dplyr
  • Fix issue #62 varList will now throw an error if == is used instead of =
  • Fix issue #54 predictInterval did not included random effects in calculations when newdata had more than 1000 rows and/or user specified parallel=TRUE. Note: fix was to disable the .paropts option for predictInterval ... user can still specify for temporary backward compatibility but this should be either removed or fixed in the permanent solution.
  • Fix issue #53 about problems with predictInterval when only specific levels of a grouping factor are in newdata with the colon specification of interactions
  • Fix issue #52 ICC wrong calculations ... we just needed to square the standard deviations that we pulled

Introducing the R Data Science Livestream

Have you ever watched a livestream? Have you ever wondered what the actual minute to minute of doing data science looks like? Do you wonder if other R users have the same frustrations as you? If yes -- then read on!

I'm off on a new professional adventure where I am doing public facing work for the first time in years. While working at home the other day I thought it would be a great idea to keep myself on-task and document my decisions if I recorded myself working with my webcam. Then, I thought, why stop there -- why not livestream my work? 

And thus, the R Data Science Livestream was born. The idea is that every day for an hour or two I will livestream myself doing some data science tasks related to my current project -- which is to analyze dozens of years of FBI Uniform Crime reports (read more). I haven't done much R coding in the last 4 months, so it's also a good way to shake off the rust of being out of the game for so long.

So if you are at all interested or curious why someone would do this, check out the landing page I put up to document the project and if you are really curious, maybe even tune in or watch the archives on YouTube!


Explore multilevel models faster with the new merTools R package

Update 1: Package is now available on CRAN

Update 2: Development version also supports models from the blme package.

By far the most popular content on this blog are my two tutorials on how to fit and use multilevel models in R. Since publishing those tutorials I have received numerous questions, comments, and hits to this blog looking for more information about multilevel models in R. Since my day job involves fitting and exploring multilevel models, as well as explaining them to a non-technical audience, I began working with my colleague Carl Frederick on an R package to make these tasks easier. Today, I'm happy to announce that our solution, merTools, is now available. Below, I reproduce the package README file, but you can find out more on GitHub. There are two extensive vignettes that describe how to make use of the package, as well as a shiny app that allows interactive model exploration. The package should be available on CRAN within the next few days. 

Working with generalized linear mixed models (GLMM) and linear mixed models (LMM) has become increasingly easy with advances in the lme4 package. As we have found ourselves using these models more and more within our work, we, the authors, have developed a set of tools for simplifying and speeding up common tasks for interacting with merMod objects from lme4. This package provides those tools.


# development version

# CRAN version -- coming soon

Shiny App and Demo

The easiest way to demo the features of this application is to use the bundled Shiny application which launches a number of the metrics here to aide in exploring the model. To do this:

m1 <- lmer(y ~ service + lectage + studage + (1|d) + (1|s), data=InstEval)
shinyMer(m1, simData = InstEval[1:100, ]) # just try the first 100 rows of data

On the first tab, the function presents the prediction intervals for the data selected by user which are calculated using the predictInterval function within the package. This function calculates prediction intervals quickly by sampling from the simulated distribution of the fixed effect and random effect terms and combining these simulated estimates to produce a distribution of predictions for each observation. This allows prediction intervals to be generated from very large models where the use of bootMer would not be feasible computationally.

On the next tab the distribution of the fixed effect and group-level effects is depicted on confidence interval plots. These are useful for diagnostics and provide a way to inspect the relative magnitudes of various parameters. This tab makes use of four related functions in merTools: FEsim, plotFEsim, REsim and plotREsim which are available to be used on their own as well.

On the third tab are some convenient ways to show the influence or magnitude of effects by leveraging the power of predictInterval. For each case, up to 12, in the selected data type, the user can view the impact of changing either one of the fixed effect or one of the grouping level terms. Using the REimpact function, each case is simulated with the model’s prediction if all else was held equal, but the observation was moved through the distribution of the fixed effect or the random effect term. This is plotted on the scale of the dependent variable, which allows the user to compare the magnitude of effects across variables, and also between models on the same data.


Standard prediction looks like so.

predict(m1, newdata = InstEval[1:10, ])
#>        1        2        3        4        5        6        7        8
#> 3.146336 3.165211 3.398499 3.114248 3.320686 3.252670 4.180896 3.845218
#>        9       10
#> 3.779336 3.331012

With predictInterval we obtain predictions that are more like the standard objects produced by lm and glm:

#predictInterval(m1, newdata = InstEval[1:10, ]) # all other parameters are optional
predictInterval(m1, newdata = InstEval[1:10, ], n.sims = 500, level = 0.9,
                stat = 'median')
#>         fit      lwr      upr
#> 1  3.074148 1.112255 4.903116
#> 2  3.243587 1.271725 5.200187
#> 3  3.529055 1.409372 5.304214
#> 4  3.072788 1.079944 5.142912
#> 5  3.395598 1.268169 5.327549
#> 6  3.262092 1.333713 5.304931
#> 7  4.215371 2.136654 6.078790
#> 8  3.816399 1.860071 5.769248
#> 9  3.811090 1.697161 5.775237
#> 10 3.337685 1.417322 5.341484

Note that predictInterval is slower because it is computing simulations. It can also return all of the simulated yhat values as an attribute to the predict object itself.

predictInterval uses the sim function from the arm package heavily to draw the distributions of the parameters of the model. It then combines these simulated values to create a distribution of the yhat for each observation.


merTools also provides functionality for inspecting merMod objects visually. The easiest are getting the posterior distributions of both fixed and random effect parameters.

feSims <- FEsim(m1, n.sims = 100)
#>          term        mean      median         sd
#> 1 (Intercept)  3.22673524  3.22793168 0.01798444
#> 2    service1 -0.07331857 -0.07482390 0.01304097
#> 3   lectage.L -0.18419526 -0.18451731 0.01726253
#> 4   lectage.Q  0.02287717  0.02187172 0.01328641
#> 5   lectage.C -0.02282755 -0.02117014 0.01324410
#> 6   lectage^4 -0.01940499 -0.02041036 0.01196718

And we can also plot this:

plotFEsim(FEsim(m1, n.sims = 100), level = 0.9, stat = 'median', intercept = FALSE)

We can also quickly make caterpillar plots for the random-effect terms:

reSims <- REsim(m1, n.sims = 100)
#>   groupFctr groupID        term        mean      median        sd
#> 1         s       1 (Intercept)  0.15317316  0.11665654 0.3255914
#> 2         s       2 (Intercept) -0.08744824 -0.03964493 0.2940082
#> 3         s       3 (Intercept)  0.29063126  0.30065450 0.2882751
#> 4         s       4 (Intercept)  0.26176515  0.26428522 0.2972536
#> 5         s       5 (Intercept)  0.06069458  0.06518977 0.3105805
#> 6         s       6 (Intercept)  0.08055309  0.05872426 0.2182059
plotREsim(REsim(m1, n.sims = 100), stat = 'median', sd = TRUE)

Note that plotREsim highlights group levels that have a simulated distribution that does not overlap 0 – these appear darker. The lighter bars represent grouping levels that are not distinguishable from 0 in the data.

Sometimes the random effects can be hard to interpret and not all of them are meaningfully different from zero. To help with this merTools provides the expectedRank function, which provides the percentile ranks for the observed groups in the random effect distribution taking into account both the magnitude and uncertainty of the estimated effect for each group.

ranks <- expectedRank(m1, groupFctr = "d")
#>      d (Intercept) (Intercept)_var       ER pctER
#> 1 1866   1.2553613     0.012755634 1123.806   100
#> 2 1258   1.1674852     0.034291228 1115.766    99
#> 3  240   1.0933372     0.008761218 1115.090    99
#> 4   79   1.0998653     0.023095979 1112.315    99
#> 5  676   1.0169070     0.026562174 1101.553    98
#> 6   66   0.9568607     0.008602823 1098.049    97

Effect Simulation

It can still be difficult to interpret the results of LMM and GLMM models, especially the relative influence of varying parameters on the predicted outcome. This is where the REimpact and the wiggle functions in merTools can be handy.

impSim <- REimpact(m1, InstEval[7, ], groupFctr = "d", breaks = 5,
                   n.sims = 300, level = 0.9)
#>   case bin   AvgFit     AvgFitSE nobs
#> 1    1   1 2.787033 2.801368e-04  193
#> 2    1   2 3.260565 5.389196e-05  240
#> 3    1   3 3.561137 5.976653e-05  254
#> 4    1   4 3.840941 6.266748e-05  265
#> 5    1   5 4.235376 1.881360e-04  176

The result of REimpact shows the change in the yhat as the case we supplied to newdata is moved from the first to the fifth quintile in terms of the magnitude of the group factor coefficient. We can see here that the individual professor effect has a strong impact on the outcome variable. This can be shown graphically as well:

ggplot(impSim, aes(x = factor(bin), y = AvgFit, ymin = AvgFit - 1.96*AvgFitSE,
                   ymax = AvgFit + 1.96*AvgFitSE)) +
  geom_pointrange() + theme_bw() + labs(x = "Bin of `d` term", y = "Predicted Fit")

Here the standard error is a bit different – it is the weighted standard error of the mean effect within the bin. It does not take into account the variability within the effects of each observation in the bin – accounting for this variation will be a future addition to merTools.

Version 0.9.0 of eeptools released!

A long overdue overhaul of my eeptools package for R was released to CRAN today and should be showing up in the mirrors soon. The release notes for this version are extensive as this represents a modernization of the package infrastructure and the reimagining of many of the utility functions contained in the package. From the release notes:

This is a major update including removing little used functions and renaming and restructuring functions.

New Functionality

  • A new package vignette is now included
  • nth_max function for finding the nth highest value in a vector
  • retained_calc now accepts user specified values for sid and grade
  • destring function deprecated and renamed to makenum to better reflect the use of the function
  • crosstabs function exported to allow the user to generate the data behind crosstabplot but not draw the plot


  • dropbox_source deprecated, use the rdrop2 package
  • plotForWord function deprecated in favor of packages like knitr and rmarkdown
  • mapmerge2 has been deprecated in favor of a tested mapmerge
  • mosaictabs.labels has been deprecated in favor of crosstabplot

Bug Fixes

  • nsims in gelmansim was renamed to n.sims to align with the arm package
  • Fixed bug in retained_calc where user specified sid resulted in wrong ids being returned
  • Inserted a meaningful error in age_calc when the enddate is before the date of birth
  • Fixed issue with age_calc which lead to wrong fraction of age during leap years
  • lag_data now can do leads and lags and includes proper error messages
  • fix major bugs for statamode including faulty default to method and returning objects of the wrong class
  • add unit tests and continuous integration support for better package updating
  • fix behavior of max_mis in cases when it is passed an empty vector or a vector of NA
  • leading_zero function made robust to negative values
  • added NA handling options to cutoff and thresh
  • Codebase is now tested with lintr to improve readability

Announcing the caretEnsemble R package

Last week version 1.0 of the caretEnsemble package was released to CRAN. I have co-authored this package with Zach Mayer, who had the original idea of allowing for ensembles of train objects in the caret package. The package is designed to make it easy for the user to optimally combine models of various types together to produce a meta-model with superior fit than the sub-models. 

From the vignette:

"caretEnsemble has 3 primary functions: caretList, caretEnsemble and caretStack. caretList is used to build lists of caret models on the same training data, with the same re-sampling parameters. caretEnsemble andcaretStack are used to create ensemble models from such lists of caret models. caretEnsemble uses greedy optimization to create a simple linear blend of models and caretStack uses a caret model to combine the outputs from several component caret models."

I am excited about this package because the ensembling features in caretEnsemble are used to provide additional predictive power in the Wisconsin Dropout Early Warning System (DEWS). I've written about this system before, but it is a large-scale machine learning system used to provide schools with a prediction on the likely graduation of their middle grade students. It is easy to implement and provides additional predictive power for the cost of some CPU cycles. 

Additionally, Zach and I have worked hard to make ensembling models *easy*. For example, you can automatically build lists of models -- a library of models -- for ensembling using the caretList function. This caretList can then be used directly in either the caretEnsemble or caretStack mode, depending on how you want to combine the predictions from the submodels. These new caret objects also come with their own S3 methods (adding more in future releases) to allow you to interact with them and explore the results of ensembling -- including summary, print, plot, and variable importance calculations. They also include the all important predict method allowing you to generate predictions for use elsewhere. 

Zach has written a great vignette that should give you a feel for how caretEnsemble works. And, we are actively improving caretEnsemble over on GitHub. Drop by and let us know if you find a bug, have a feature request, or want to let us know how it is working for you!

Of Needles and Haystacks: Building an Accurate Statewide Dropout Early Warning System in Wisconsin

For the past two years I have been working on the Wisconsin Dropout Early Warning System, a predictive model of on time high school graduation for students in grades 6-9 in Wisconsin. The goal of this project is to help schools and educators have an early indication of the likely graduation of each of their students, early enough to allow time for individualized intervention. The result is that nearly 225,000 students receive an individualized prediction at the start and end of the school year. The workflow for the system is mapped out in the diagram below:

The system is moving into its second year of use this fall and I recently completed a research paper describing the predictive analytic approach taken within DEWS. The research paper is intended to serve as a description and guide of the decisions made in developing an automated prediction system using administrative data. The paper covers both the data preparation and model building process as well as a review of the results. A preview is shown below which demonstrates how the EWS models trained in Wisconsin compare to the accuracy reported in the research literature - represented by the points on the graph. The accuracy is measured using the ROC curve. The article is now available via figshare.

The colored lines represent different types of ensembled statistical models and their accuracy across various thresholds of their predicted probabilities. The points represent the accuracy of comparable models in the research literature using reported accuracy from a paper by Alex Bowers

Bowers, A.J., Sprott, R.*, Taff, S.A.* (2013) Do we Know Who Will Drop Out? A Review of the Predictors of Dropping out of High School: Precision, Sensitivity and Specificity. The 
High School Journal, 96(2), 77-100. doi:10.1353/hsj.2013.0000. This article serves as good background and grounds the benchmarking of the models built in Wisconsin and for others when benchmarking their own models. 

Article Abstract:

The state of Wisconsin has one of the highest four year graduation rates in the nation, but deep disparities among student subgroups remain. To address this the state has created the Wisconsin Dropout Early Warning System (DEWS), a predictive model of student dropout risk for students in grades six through nine. The Wisconsin DEWS is in use statewide and currently provides predictions on the likelihood of graduation for over 225,000 students. DEWS represents a novel statistical learning based approach to the challenge of assessing the risk of non-graduation for students and provides highly accurate predictions for students in the middle grades without expanding beyond mandated administrative data collections.

Similar dropout early warning systems are in place in many jurisdictions across the country. Prior research has shown that in many cases the indicators used by such systems do a poor job of balancing the trade off between correct classification of likely dropouts and false-alarm (Bowers et al., 2013). Building on this work, DEWS uses the receiver-operating characteristic (ROC) metric to identify the best possible set of statistical models for making predictions about individual students. 

This paper describes the DEWS approach and the software behind it, which leverages the open source statistical language R (R Core Team, 2013). As a result DEWS is a flexible series of software modules that can adapt to new data, new algorithms, and new outcome variables to not only predict dropout, but also impute key predictors as well. The design and implementation of each of these modules is described in detail as well as the open-source R package, EWStools, that serves as the core of DEWS (Knowles, 2014). 


The code that powers the EWS is an open source R extension of the caret package which is available on GitHub: EWStools on GitHub

Mixed Effects Tutorial 2: Fun with merMod Objects

Update: Since this post was released I have co-authored an R package to make some of the items in this post easier to do. This package is called merTools and is available on CRAN and on GitHub. To read more about it, read my new post here and check out the package on GitHub.

Fun with merMod Objects


First of all, be warned, the terminology surrounding multilevel models is vastly inconsistent. For example, multilevel models themselves may be referred to as hierarchical linear models, random effects models, multilevel models, random intercept models, random slope models, or pooling models. Depending on the discipline, software used, and the academic literature many of these terms may be referring to the same general modeling strategy. In this tutorial I will attempt to provide a user guide to multilevel modeling by demonstrating how to fit multilevel models in R and by attempting to connect the model fitting procedure to commonly used terminology used regarding these models.

We will cover the following topics:

  • The structure and methods of merMod objects
  • Extracting random effects of merMod objects
  • Plotting and interpreting merMod objects

If you haven’t already, make sure you head over to the Getting Started With Multilevel Models tutorial in order to ensure you have set up your environment correctly and installed all the necessary packages. The tl;dr is that you will need:

  • A current version of R (2.15 or greater)
  • The lme4 package (install.packages("lme4"))

Read in the data

Multilevel models are appropriate for a particular kind of data structure where units are nested within groups (generally 5+ groups) and where we want to model the group structure of the data. We will use data from Jon Starkweather at the University of North Texas on student social attributes within schools and classes. Visit the excellent tutorial available here for more.

library(lme4) # load library
library(arm) # convenience functions for regression in R <- read.table("",
   header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
##   id extro  open agree social class school
## 1  1 63.69 43.43 38.03  75.06     d     IV
## 2  2 69.48 46.87 31.49  98.13     a     VI
## 3  3 79.74 32.27 40.21 116.34     d     VI
## 4  4 62.97 44.41 30.51  90.47     c     IV
## 5  5 64.25 36.86 37.44  98.52     d     IV
## 6  6 50.97 46.26 38.83  75.22     d      I

Here we have data on the extroversion of subjects nested within classes and within schools.

Let’s understand the structure of the data a bit before we begin:

## 'data.frame':    1200 obs. of  7 variables:
##  $ id    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ extro : num  63.7 69.5 79.7 63 64.2 ...
##  $ open  : num  43.4 46.9 32.3 44.4 36.9 ...
##  $ agree : num  38 31.5 40.2 30.5 37.4 ...
##  $ social: num  75.1 98.1 116.3 90.5 98.5 ...
##  $ class : Factor w/ 4 levels "a","b","c","d": 4 1 4 3 4 4 4 4 1 2 ...
##  $ school: Factor w/ 6 levels "I","II","III",..: 4 6 6 4 4 1 3 4 3 1 ...

Here we see we have two possible grouping variables – class and school. Let’s explore them a bit further:

##   a   b   c   d 
## 300 300 300 300
##   I  II III  IV   V  VI 
## 200 200 200 200 200 200
##      I II III IV  V VI
##   a 50 50  50 50 50 50
##   b 50 50  50 50 50 50
##   c 50 50  50 50 50 50
##   d 50 50  50 50 50 50

This is a perfectly balanced dataset. In all likelihood you aren’t working with a perfectly balanced dataset, but we’ll explore the implications for that in the future. For now, let’s plot the data a bit. Using the excellent xyplot function in the lattice package, we can explore the relationship between schools and classes across our variables.

xyplot(extro ~ open + social + agree | class, data =, 
                 auto.key = list(x = .85, y = .035, corner = c(0, 0)), 
       layout = c(4,1), main = "Extroversion by Class")