Ability grouping of students doesn’t work

Academic achievement in England is strongly impacted by class, with those of a higher socioeconomic status (S.E.S.) more likely to achieve than than those of a lower S.E.S. (Clifton & Cook, 2012). These gaps between students can be seen between students as early as three years old (Feinstein, 2003) and continue to widen as the children age (Feinstein, 2004). One of the historical measures to reduce these inequalities is ability grouping. Students are placed into groups based on their test scores for certain subjects so they can be taught with their peers of similar ability. ‘Streaming’ (called ‘tracking’ in the US) divides students into groups based on their test scores across all/most of their subjects, meaning they stay with the same students across those subjects. This is similar to ‘banding’. ‘Setting’ occurs when students are put into ability groups for specific subjects that are not necessarily consistent across subjects e.g. a student could be placed in top set for maths but middle set for English (Francis et al., 2017). Data on the prevalence of ability grouping is inconsistent but the evidence suggests it is prevalent in secondary school and to a lesser extent primary school in the U.K. (Dracup, 2014). It is becoming more common in the U.S. after a drop in popularity during the 1990’s (Steenbergen-Hu, Makel, & Olszewski-Kubilius, 2016).

Why are students grouped according to test scores?

Setting students is popular with parents, especially middle-class parents (Boaler, Wiliam, Brown; 2000), so schools are incentivised to structure their classrooms this way. But just because it is popular doesn’t mean it is beneficial to the students or even that it should be used. The putative benefits of ability grouping include: allowing teachers to go at a pace suitable for students of different abilities; giving more able students greater opportunities to push themselves; establishing smaller classes for lower attaining pupils (Ireson, 1999); and children in the lower sets receiving greater support (Francis et al., 2017). There is some evidence to suggest it works e.g. Steenbergen-Hu, Makel, & Olszewski-Kubilius (2016) but there are methodological concerns with the supporting evidence which undermine their conclusions.

Steenbergen-Hu, Makel, & Olszewski-Kubilius (2016) conducted a meta-analysis of meta-analyses studying the effects of different types of ability grouping. They identified four types of which I will focus on two: between-class (same age students are placed into high, average, or low classes based on prior achievement across subjects); and within class (teachers assign students into sub-groups within the classroom based on ability). Of the meta-analyses studies, seven were rated as having low methodological quality (characterised as having “major weaknesses”) and six had moderate methodological quality. None were rated as high quality. Most of the low-quality meta-analyses lacked fundamental details like effect sizes and research design and all the others were missing important information. The results for between-class grouping suggested a weak to nil effect (all confidence intervals included 0), with a small positive impact of within-class grouping. But the spectre of publication bias hangs over as, though they assessed for it, they used a sub-optimal method. The trim and fill method has a false positive rate of close to 100% given likely values of publication bias (Carter, Schönbrodt, Hilgard, & Gervais, 2017). Coupled with the fact publication bias inflates effect sizes, because the statistical significance filter means only effect sizes that are likely to be larger than the true effect size are published when typical sample sizes are small and measures are noisy (Gelman, 2011), we can be even less confident in the results. Even for the mini meta-analysis of 12 randomised control trials, the highest quality studies, the effect size of between-class grouping couldn’t break free of 0 and the evidence for within-class grouping was mixed (with a meta-analytic result of a small positive effect size).

An example of streaming

The evidence against

Whilst ability grouping is supposed to reduce disparities between students of different S.E.S., they can widen them (Higgins et al., 2012). It can also promote social segregation (OECD, 2014), with working class pupils – and students from some minority ethnic groups – disproportionately represented in low sets and streams (Kutnick et al., 2005). Pupils in some mathematics sets are taught as if they were identical in ability, given the same tasks at the same pace. Pupils in lower mathematics sets report being, and are observed to be, insufficiently challenged and expected to spend more time copying off the board than in higher sets. Consequently[although this isn’t the only reason for this finding[/note], children placed in lower sets show lower motivation (Suknandan & Lee, 1998).

For primary and secondary school students, there is a paucity of strong evidence showing a positive effect of ability grouping (either setting or streaming) on academic outcomes on average (Kutncik et al., 2005). Results from PISA survey (OECD, 2009) show ability grouping within schools is related related to lower performance at the system level. Looking at individual subjects, Ireson, Hallam, & Hurley (2005) investigated the effect of setting in Maths, Science, and English at G.C.S.E. and found no significant effect for any of the classes. However, this isn’t to say there is no evidence of its benefit for certain sub-groups. Pupils in higher sets achieve more than children in schools that did not stream their pupils, even after controlling for variations between children, (Parsons & Hallam, 2014) not only because teachers can move at a faster pace but because there is a greater chance they will be be taught by a more experienced and qualified teacher1. The corollary of this is that children in lower sets are taught by less experienced teachers and those who are less likely to be subject specialists. They are also more likely to have a higher number of teachers as their educators are more likely to leave (Kutncik et al., 2005).

As Boaler & Wiliam (2001) summarise, “streaming [appears to have] no academic benefits whatsoever, while setting confers small academic benefits on some high-attaining students, at the expense of large disadvantages for lower attainers”.

Where do these negative effects come from?

There are multiple interrelated factors which negatively affect students’ outcomes when they are grouped according to test scores. One of the main reasons is a students’ test score is not the only variable that determines which set they will be placed in; class and Special Educational Needs diagnoses are also significant predictors (Muijs & Dunne, 2010). These groups are unable to account for changes in children’s results or misallocation as there is a lack of fluidity between groups (Dunne, Hunphreys, Sebba, Dyson, Gallanmaugh, & Muijs  2007). Lower set classes are taught by poorer quality teachers (Slavin, 1990), as well as the other deficiencies mentioned earlier. Placing students in sets can create an artificial ceiling, where students are excluded from higher tier study and therefore higher grades (e.g. being limited to a ‘C’ grade by taking the Foundation level paper for a subject). Pupils in lower sets often express dissatisfaction with their set and often with the school as a whole (Ball, 1981). This “anti-school” attitude has been shown to have a negative impact on outcomes (Baines & Blatchford, 2010). Student’s self-perception of ability and their confidence in being able to succeed in a subject can also be reduced by being placed in a lower set. All these factors combine to create self-fulfilling prophecies for the students: placement in lower sets due to a multitude of factors (of which previous attainment is only one) leads to reduced motivation and self-efficacy (Bandura, 1994). Coupled with poorer quality teaching and artificial ceilings, these conspire to limit student achievement and are an example of the Matthew Effect: the rich get richer and the poor get poorer (Kerckhoff & Glennie, 1999).

What’s an alternative?

With the results of numerous studies pointing to limited positive effects and wide ranging negative effects, the evidence leads one to conclude ability grouping should no longer be practiced in schools. But what should replace it? Mixed grouping appears to be the fairest given the evidence, as it benefits the most without punishing others as strongly as ability grouping does (Taylor et al., 2017). However, there are a number of factors deterring schools from adopting mixed ability grouping e.g. changing teachers’ minds about mixed ability teaching and the lack of exemplars to follow (Taylor et al., 2017).


  1. However there is evidence even students in the top set don’t benefit from this placement as they can be disadvantaged by the fast pace and high expectations (Boaler, 1997)

