On Early Warning Systems in Education

Recently the NPR program Marketplace did a story about the rise of the use of dropout early warning systems in public schools that you can read or listen to online. I was lucky enough to be interviewed for the piece because of the role I have played in creating the Wisconsin Dropout Early Warning System. Marketplace did a great job explaining the nuances of how these systems fit into the ways schools and districts work. I wanted to use this as an opportunity just write a few thoughts about early warning systems based on my work in this area. 

Not discussed in the story was the more wonky but important question of how these predictions are obtained. While much academic research discusses the merits of various models in terms of their ability to correctly identify students, there is not as much work done discussing the choice of which system to use in application. By its nature, the problem of identifying dropouts early presents a fundamental trade-off between simplicity and accuracy. When deploying an EWS to educators in the field, then, analysts should focus on not how accurate a model is, but if it is accurate enough to be useful and actionable. Unfortunately, most of the research literature on early warning systems focuses on the accuracy of a specific model and not the question of sufficient accuracy. 

Part of the reason for this focus is that each model tended to have its own definition of accuracy. A welcome and recent shift in the field to using ROC curves to measure the trade-off between false-positives and false-negatives now allows for these discussions of simple vs. complex to use a common and robust accuracy metric. (Hat tip to Alex Bowers for working to provide these metrics for dozens of published early warning indicators.) For example, a recent report by the Chicago Consortium on School Research (CCSR) demonstrates how simple indicators such as grade 8 GPA and attendance can be used to accurately project whether a student will be on-track in grade 9 or not. Using ROC curves, the CCSR can demonstrate on a common scale how accurate these indicators are relative to other more complex indicators and make a compelling case that in Chicago Public Schools these indicators are sufficiently accurate to merit use.  

However, in many cases these simple approaches will not be sufficiently accurate to merit use in decision making in schools. Many middle school indicators in the published literature have true dropout identification rates that are quite low, and false-positive rates that are quite high (Bowers, Sprott and Taff 2013). Furthermore, local conditions may mean that a linkage between GPA and dropout that holds in Chicago Public Schools is not nearly as predictive in another context. Additionally, though not empirically testable in most cases, many EWS indicator systems simply serve to provide a numeric account of information that is apparent to schools in other ways -- that is, the indicators selected identify only "obvious" cases of students at risk of dropping out. In this case the overhead of collecting data and conducting identification using the model does not generate a payoff of new actionable information with which to intervene. 

More complex models have begun to see use perhaps in part to respond to the challenge of providing value added beyond simple checklist indicators. Unlike checklist or indicator systems, machine learning approaches determine the risk factors empirically from historical data. Instead of asserting that an attendance rate above 95% is necessary to be on-track to graduate, a machine learning algorithm identifies the attendance rate cutoff that that best predicts successful graduation. Better still, the algorithm can do this while jointly considering several other factors simultaneously. This approach is the approach I have previously written about taking in Wisconsin, and has also been developed in Montgomery County Public Schools by Data Science for Social Good fellows

In fact, the machine learning model is much more flexible than a checklist approach. Once you have moved away from the desire to provide simple indicators that can be applied by users on the fly, and are willing to deliver analytics much like another piece of data, the sky is the limit. Perhaps the biggest advantage to users is that machine learning approaches allow analysts to help schools understand the degree of student risk. Instead of providing a simple yes or no indicator, these approaches can assign probabilities to student completion, allowing the school to use this information to decide on the appropriate level of response. 

This concept of degree is important because not all dropouts are simply the lowest performing students in their respective classes. While low performing students do represent a majority of dropouts in many schools, these students are often already identified and being served because of their low-performance. A true early warning system, then, should seek to identify both students who are already identified by schools and those students who are likely non-completers, but who may not already be receiving intervention services. To live up to their name, early warning systems should identify students earlier than after they have started showing acute signs of low performance or disengagement in school. This is where the most value can be delivered to schools. 

Despite the improvements possible with a machine learning approach, a lot of work remains to be done. One issue that was raised in the piece in the Marketplace story is understanding how schools put this information to work. An EWS alone will not improve outcomes for students -- it only enables schools more time to make changes. There has not been much research on how schools use information like an early warning system to make decisions about students. There needs to be more work done to understand how schools as organizations respond to analytics like early warning indicators. What are their misconceptions? How do they work together? What are the barriers to trusting these more complex calculations and the data that underlie them?

The drawback of the machine learning approach, as the authors of the CCSR report note, is that the results are not intuitive to school staff and this makes the resulting intervention strategy seem less clear. This trade-off strikes at the heart of the changing ways in which data analysis is assisting humans in making decisions. The lack of transparency in the approach must be balanced by an effort on the part of the analysts providing the prediction to communicate the results. Communication can make the results easier to interpret, can build trust in the underlying data, and build capacity within organizations to create the feedback loops necessary to sustain the system. Analysts must actively seek out feedback on the performance of the model, learn where users are struggling to understand it, and where users are finding it clash with their own observations. This is a critical piece in ensuring that the trade-off in complexity does not undermine the usefulness of the entire system. 

EWS work represents just the beginning for meaningful analytics to replace the deluge of data in K-12 schools. Schools don't need more data, they need actionable information that reduces the time not spent on instruction and student services. Analysts don't need more student data, they need meaningful feedback loops with educators who are tasked with interpreting these analyses and applying the interventions to drive real change. As more work is done to integrate machine learning and eased data collection into the school system, much more work must be done to understand the interface between school organizations, individual educators, and analytics. Analysts and educators must work together to continually refine what information schools and teachers need to be successful and how best to deliver that information in an easy to use fashion at the right time. 

Further Reading

Read about the machine learning approach applied in Montgomery County Public Schools

Learn about the ROC metric and how various early warning indicators have performed relative to one another in this paper by Bowers, Sprott, and Taff. 

Learn about the Wisconsin DEWS machine learning system and how it was developed

Read the comparison of many early warning indicators and their performance within Chicago Public Schools. 

Of Needles and Haystacks: Building an Accurate Statewide Dropout Early Warning System in Wisconsin

For the past two years I have been working on the Wisconsin Dropout Early Warning System, a predictive model of on time high school graduation for students in grades 6-9 in Wisconsin. The goal of this project is to help schools and educators have an early indication of the likely graduation of each of their students, early enough to allow time for individualized intervention. The result is that nearly 225,000 students receive an individualized prediction at the start and end of the school year. The workflow for the system is mapped out in the diagram below:

The system is moving into its second year of use this fall and I recently completed a research paper describing the predictive analytic approach taken within DEWS. The research paper is intended to serve as a description and guide of the decisions made in developing an automated prediction system using administrative data. The paper covers both the data preparation and model building process as well as a review of the results. A preview is shown below which demonstrates how the EWS models trained in Wisconsin compare to the accuracy reported in the research literature - represented by the points on the graph. The accuracy is measured using the ROC curve. The article is now available via figshare.

The colored lines represent different types of ensembled statistical models and their accuracy across various thresholds of their predicted probabilities. The points represent the accuracy of comparable models in the research literature using reported accuracy from a paper by Alex Bowers

Bowers, A.J., Sprott, R.*, Taff, S.A.* (2013) Do we Know Who Will Drop Out? A Review of the Predictors of Dropping out of High School: Precision, Sensitivity and Specificity. The 
High School Journal, 96(2), 77-100. doi:10.1353/hsj.2013.0000. This article serves as good background and grounds the benchmarking of the models built in Wisconsin and for others when benchmarking their own models. 

Article Abstract:

The state of Wisconsin has one of the highest four year graduation rates in the nation, but deep disparities among student subgroups remain. To address this the state has created the Wisconsin Dropout Early Warning System (DEWS), a predictive model of student dropout risk for students in grades six through nine. The Wisconsin DEWS is in use statewide and currently provides predictions on the likelihood of graduation for over 225,000 students. DEWS represents a novel statistical learning based approach to the challenge of assessing the risk of non-graduation for students and provides highly accurate predictions for students in the middle grades without expanding beyond mandated administrative data collections.

Similar dropout early warning systems are in place in many jurisdictions across the country. Prior research has shown that in many cases the indicators used by such systems do a poor job of balancing the trade off between correct classification of likely dropouts and false-alarm (Bowers et al., 2013). Building on this work, DEWS uses the receiver-operating characteristic (ROC) metric to identify the best possible set of statistical models for making predictions about individual students. 

This paper describes the DEWS approach and the software behind it, which leverages the open source statistical language R (R Core Team, 2013). As a result DEWS is a flexible series of software modules that can adapt to new data, new algorithms, and new outcome variables to not only predict dropout, but also impute key predictors as well. The design and implementation of each of these modules is described in detail as well as the open-source R package, EWStools, that serves as the core of DEWS (Knowles, 2014). 

Code:

The code that powers the EWS is an open source R extension of the caret package which is available on GitHub: EWStools on GitHub

On School Boards and Policy Shocks

The dissertation process has many steps. The prospectus or proposal is one of the last. Awhile ago I was lucky enough to have my dissertation proposal defense and pass!  My project is seeking to understand the linkage between political activity at the state level and voter and candidate participation at the local level. To evaluate this, I take the case of Wisconsin--an extreme example of domain specific policy activity--and see if the events of the last two years in Wisconsin, particularly around education reform, drove more candidates and more voters to participate in school board elections statewide. 

There are over 14,000 school boards in the United States, and they are responsible for expenditures equal to those of the US Department of Defense annually. However, little is known about the democratic process by which individuals on school boards come into office. Some work has been done on large urban school boards, but this work has largely concerned itself with either the question of mayoral control, or of district wide vs. regional board electoral districts. The broader question of whether or not school boards are democratic institutions that respond to community pressures and have meaningful participation has only been studied intermittently since the 1970s. Worse, the political dynamics between state and federal policymaking and local participation in school board elections has received little or no attention over this same period, despite a large increase in both state and federal involvement over this time period. 

Classic model of dissatisfaction theory as presented by Wu 1995

The dissatisfaction model is a nice theoretical model, but it leaves something to be desired in terms of generating predictions and allowing us to understand the school board as part of democratic system. It only describes the actions of the board, but not of the voters deciding to vote, and the congruence between voter beliefs and voter turnout.

In my research I found a political science dissertation out of Stanford that helped with this. Wu 1995 proposed a much more fully developed game-theoretic model of the interaction between voters, board members, and policy. The model is depicted below. 

Wu 1995 Model of Political Game for School Boards

This is a classic game theoretical model, simplified here, explaining conditions under which various actors undertake certain actions. The most important feature is that the decision of voters to vote based on policy decisions, and board members base policy decisions on the likelihood of voters voting and defeating them. This paints a much more comprehensive system than the dissatisfaction theory, but builds on that theory nicely. Wu's work also dovetails well with the work of other scholars trying to incorporate public choice models (Rada 1987, 1988). 

These innovations are necessary to help understand the Wisconsin political context. The main puzzle of school board elections is whether or not they retain features of a democratic entity. Rational choice theory does a lot to help move the discussion away from school boards having to have high turnout elections to be considered democratic--indeed, if the school board is passing policy in line with voters preferences there is no need for voters to vote in this case. However, no serious empirical tests of the models above have been conducted to understand their predictive power and how closely they reflect reality.

The timeline below gives a picture of political activity in education at the state level in Wisconsin. This historic and unprecedented political and policy activity focused very closely around issues related to education--school budget cuts, reduction in collective bargaining rights for public workers, etc.-- allows a test of the democratic linkages between state and local policy. There is no doubt that state level politics in Wisconsin have never seen a more active electorate.

Wisconsin Political Timeline 2011-Present

If state and local politics are linked in their activity levels, then we should see corresponding increases in the activity levels of citizens participating in local elections. In fact, in policy areas of high contestation at the state level, we would expect strong increases in political activity at the local level as well. The figure below attempts to understand how state and local politics might be linked. 

Knowles Model of Wisconsin School Boards

The essential belief is that the number of challengers in an election for school board is influenced by the congruence of voter preferences vis a vis the state policy changes, the overall support for the controversial reforms of Governor Walker, and the strength of independent interest groups--particularly teachers' unions--in the school district. All of these determine the policy decisions of school boards both in making budget reductions in response to the fiscal tightening in the new state budget, and in decisions to suspend or extend union contracts in response to the new local authority given to districts in bargaining with public employees. 

This is in turn linked the voter participation. Voters will only turn out in elections that present serious choices of candidates, but they may not uniformly turn out in these cases. Finally, the results of the election have an impact on the make up of the board and possibly on the policy direction of the board depending on the number of incumbents defeated. The table below summarizes the expected relationships between dependent and independent variables in the study. 

Dependent Variable Independent Variable Expected Relationship
Candidate Participation Unity on State Policy Negative
Candidate Participation Prior Challenger Emergence Null
Voter Participation Prior Turnout Positive
Voter Participation Number of Challengers Positive
Voter Participation Policy Divergence Positive
Union Policy Union Strength Policy Resistance
Union Policy Walker Support Policy Support
Budget Policy Budgetary Health (up) Fewer cuts
Budget Policy Walker Support Greater Cuts

A few rows need to be explained. In the first row, the more unified a community is in support of Governor Walker's policies, the less likely there will be emergence of a higher than usual number of candidates because of policy stability in the district. Prior challenger emergence, then, should yield little predictive effect because of the changed policy environment.

For voter participation, policy divergence is a critical variable. Here, policy divergence means the split among voters in their support of the policies enacted at the state level over the last two years. The wider the divergence, the more likely the election will be contested and voters have reason to mobilize and participate.

For union policy, the stronger the union in a district--not surprisingly--the more likely the school board should be to resist the policies at the state level, if possible. This will be counteracted by the strength of support for the governor among the electorate.

Finally, for budget policy, the healthier the budget of a district, the fewer cuts in the budget should be experienced. Net of that, districts with high support for the Governor should experience greater cuts.

Conclusion

That, in a nutshell, is the dissertation proposal. You can read more about it by reading the official abstract submitted to my political science department, or the full proposal, both available here. I encourage you to do so, and to check back here as the project progresses!

All Your Source Code Are Belong to... Nature?

The Journal of Nature put out an interesting op-ed recently discussing the need to make source code available for scientific articles that require statistical computation to produce their results.
The article is hits on a point that is absolutely critical--statistical computing is difficult. Honest mistakes get made. A lot. The peer review process catches theoretical flaws, omitted bibliographic references, and some criticism of the methods based on the amount of detail provided in the article itself. But, all of those flaws could be absent and an article could still be fatally flawed and draw completely false conclusions, simply due to an error in the code, and it would still be published if that code was never reviewed or made public.
A big concern here is transparency, as the authors state so well:
Our view is that we have reached the point that, with some exceptions, anything less than release of actual source code is an indefensible approach for any scientific results that depend on computation, because not releasing such code raises needless, and needlessly confusing, roadblocks to reproducibility.
And of course, R and Sweave are mentioned as an elegant solution to this problem:
There are a number of tools that enable code, data and the text of the article that depends on them to be packaged up. Two examples here are Sweave associated with the programming language R and the text-processing systems LaTeX and LyX, and GenePattern-Word RRS, a system specific to genomic research31.Sweave allows text documents, figures, experimental data and computer programs to be combined in such a way that, for example, a change in a data file will result in the regeneration of all the research outputs.
Technology has changed the tools necessary to ensure rigor and replicability in science, but not the principle behind it. It is great to see a journal such as Nature making the case for this level of scrutiny to be applied to the computational routines used to derive results.