Brexit in graphs

This is a collection of graphs showing how people voted and other interesting statistics. Some you might not have seen and others you definitely will have. I’m not going to include every graph, especially the most common ones, as you will almost certainly have seen them. Please remember to take all the polls with a pinch of salt (only a small sample of people can be asked and it may not be representative, people may have given socially desirable answers, lied, etc.). If there are any graphs you feel I have missed, please comment below and I will add them.

*Please note that while many of these graphs focus on immigration I do not think it’s the only reason (nor, by extension, fear of foreigners) people voted Leave. These graphs are meant to show people’s views and the related data.*


Before the referendum, much was made of the difference between Leave and Remain voters in their rating of the importance of immigration for their decision.2016-06-26 (5)

For all those surveyed, immigration was the most important issue but it was closely followed by the impact on the economy.

Important issues

Prior to the referendum, Leave voters were far more likely to say immigration has had a negative impact on “Britain as a whole”.2016-06-26 (6)When asked if they had been personally affected, that number dropped.personallySome people have argued that many people’s belief that there are too many immigrants in Britain is new. But British people have thought there were too many immigrants for decades. If anything, the belief that there are too many immigrants is in decline. Too many immigrantsBut surveyed after they had voted, a different story was presented. 49% of leave voters said the biggest single reason for wanting to leave the EU was “the principle that decisions about the UK should be taken in the UK”. 33% said the main reason was that leaving “offered the best chance for the UK to regain control over immigration and its own borders.” 13% said remaining would mean having no choice “about how the EU expanded its membership or its powers in the years ahead.” (Ashcroft, 2016).Leave-vs-Remain-podium-rankings-768x989This was supported by a ComRes poll conducted on the 24th which found:  “the ability of Britain to make its own laws is cited by Leave voters as the most important issue when deciding which way to vote (53%), ahead of immigration (34%).” (Comres, 2016).

But what does the data tell us about the impact of immigration?

Immigration has dramatically increased in the last decade or so. Immigration increase

Yet there appears to be no negative effect on people’s wages or employment due to increased immigration.

2016-06-26 (3)2016-06-26 (4)

Many on the Leave side have argued people voted to Leave because they had been adversely affected by immigration. If voters backed Leave because they had suffered from increased immigration, you would expect to see a correlation between voting Leave and a decrease in hourly earnings. But there is no correlation. This is evidence against (but not a refutation) of the idea people voted Leave as rational response to the negative economic effects they had suffered as a result of immigration.No correlation between wage fall and Leave vote

Education and voting patterns:

Whilst education level was the strongest correlation for voting Remain, it’s not as simple as “stupid people voted to leave”. Areas with lower education levels also reflect areas that have borne the brunt of economic hardship. They are therefore more likely to have unfavourable views of the status-quo (which has not helped them in the past) and, by extension, the Remain campaign. image

Dependency on the EU and voting patterns:imageThe graph below shows which areas were given funding by the EU over different time periods.

imageCiaran Jenkins (2016).

Income and voting patterns:

There was a negative correlation between income and remain voting; those who earned less were more likely to vote Leave.


Personality and voting patterns:

A strong correlation (r=-0.67) was found between openness (which is about being open to new experiences, “having wide interests, and being imaginative and insightful”; Srivastava, 2016) and voting Leave. Areas that had a higher concentration of people scoring highly on openness were more likely to vote Remain. 2016-06-29

Correlations between certain personality factors and voting behaviour was also found by Eric Kauffman. He analysed participant’s voting behaviour and compared it with their answers for questions that examined their authoritarianism (which is how in favour someone is of obeying authority among other things). There was almost no correlation between income but there was a correlation between voting Leave and agreeing that the death penalty is appropriate for certain crimes (for whites only).2016-06-29 (1)

Views on social issues:

For this graph, people were asked whether they thought different social issues were a force for “good” or “ill”. After that, they stated which way they voted (Leave or Remain). So it shows what percentage of people voted for Leave or Remain, given their views on different issues. It is not a poll showing how people who voted Leave or Remain view these issues. E.g. it doesn’t show 81% of Leave voters think multiculturalism is a “force for ill”. It shows that of those who think multiculturalism is a “force for ill”, 81% voted Leave. So those who hold that view were more likely to vote Leave.Cl27HTgWEAENWEcWhy so many scientists are anti-Brexit:

Britain receives a lot of funding from the EU and it is uncertain how much we would receive afterwards (though it will almost certainly decrease).

EU science funding

Voter turnout and satisfaction:

These two aren’t graphs (yet…) but they are important, especially the first. Whilst it’s true the elderly overwhelming voted Leave and the young voted Remain, the (estimated) turnout from young people was very low. So the meme of “it’s completely the old people’s fault!” isn’t totally accurate.

This was further supported by this graph which shows a correlation between age and voter turnout for different areas.


Rather unsurprisingly, Leave voters were happier than Remain voters. But it appears the vast majority of Leave voters were happy with only 1% (of those sampled) stating they were unhappy with the result. This puts the anecdotes of people voting Leave without properly thinking it through and then worrying about the consequences in context.


Despite the startling drop in the FTSE 100, it wasn’t any lower than 7 days earlier (though it got there in a more eye-catching way). As some have correctly pointed out, the FTSE 100 has recovered significantly since the initial drop. But that’s only because the pound has been devalued so it is an artificial recovery.  ftse 100

The drop in the value of the pound though was more serious when compared with the long-term trends, as it dropped to the second lowest it has ever been.

2016-06-27 (2)Compared with the Euro it’s not doing as badly, though the Euro has been struggling for years and the climb seen at the start of the graph is the result of recovering from the 2008 financial crash.2016-07-04


Ashcroft, M. (2016). How the United Kingdom voted on Thursday… and why. [online] Available at:

Burn-Murdoch, J. (2016). Brexit: voter turnout by age. [online] Available at:

ComRes. (2016). SUNDAY MIRROR POST REFERENDUM POLL. [online] Available at:

The Economist. (2016). The European experiment. [online] Available at:

Ipsos-MORI (2016). Final Referendum Poll. [online] Available at:

Ipsos-MORI (2016). Just one in five Britons say EU immigration has had a negative effect on them personally. [online] Available at:

Jenkins, C. (2016). [online] Available at:

Krueger, J. I. (2016). The Personality of Brexit Voters. [online] Available at:

Kaufmann, E. (2016). [online] Available at:

Sky Data (2016). [online] Available at:

Srivastava, S. (2016). Measuring the Big Five Personality Factors. Retrieved [2016] from

Taub, A. (2016). Making Sense of ‘Brexit’ in 4 Charts. [online] Available at:

Vox (2016). Brexit was fueled by irrational xenophobia, not real economic grievances. [online] Available at:

Notes on Paul Meehl’s “Philosophical Psychology Session” #02

These are the notes I made whilst watching the video recording of Paul Meehl’s philosophy of science lectures. This is the second episode (a list of all the videos can he found here). Please note that these posts are not designed to replace or be used instead of the actual videos (I highly recommend you watch them). They are to be read alongside to help you understand what was said. I also do not include everything that he said (just the main/most complex points).

  • Popper did not accept the verifiable criterion of meaning. Popper never said falsifiability was a criterion of meaning.
  • No experimental/quantitative for Freud (there is empirical data). Popper rejected induction completely.
  • Unscientific theories don’t give you examples of things that will show it’s wrong, just what will confirm it.
  • If P > Q (conditional) = -P (P is false) v Q (Q is true) = not true that P (P is true) v -Q (Q is false). P is sufficient for Q and Q is necessary for P.
  • If there is a semantic connection between propositions, use stronger notation I-


  • Implicative syllogism: P -> Q, P therefore Q. Valid figure. Used when a theory is predicting an event. Modus ponens.
  • P -> Q, ~P therefore ~Q. Invalid. If Nixon is honest I’ll eat my hat. Nixon isn’t honest, can’t conclude I won’t eat my hat.
  • Q -> P, P therefore Q. Invalid. All inductive reasoning is formally invalid (if the theory is true then the facts will be so. The facts are so, therefore the theory is true). Hence why all empirical reasoning is probable. Hence why it can never be proved in sense of Euclid. Used when a piece of evidence is trying to support a law.
  • P -> Q, ~Q therefore ~P. Valid. If Newton is right, then the star will be here. The star is not here, therefore Newton is wrong. Used to refute a scientific law or theory.  Modus Tollens (destructive mood). 4th figure of implicative syllogism.


  • Facts control the theory collectively over the long-haul (rather than just being dismissed after one piece of counter evidence). If the theory is robust enough/substantiated enough, allowed to roll with a piece of counter-evidence. There’s no specified point where this theoretical tenacity becomes unscientific.
  • Empirical science cannot be like formal set theory/ mathematics as it deals with probabilities.
  • Demarcation of scientific theory from non-science.
  • We don’t just state if a theory has been “slain” or not. There is some implicit hierarchy (based on evidence). Popper developed idea of corroboration. A theory is corroborated when it has been subjected to a test that hasn’t been refuted and the more risky the test (greater the chance of falsifiability as it makes more precise predictions), the better it is corroborated. A test is risky if it carves out a narrow interval out of a larger interval.
  • You need to calculate the prior probability
  • Look at theories that predict the most unlikely results.
  • Main problem with NHST as a way of evaluating theories: within parameters (set by previous evidence or common sense) you say it will fall within half this range (so 50% chance). Not impressive.
  • Salmon’s principle: principle of the damn strange coincidence (highly improbable coincidence). If absent the theory, knowing what roughly the range of values occur, I am able to pick out a number that’s a strange coincidence. But if a theory picks out that narrow number and it comes up true, then it’s strongly corroborated.


  • Salmon believes you can attach probability numbers to theories. Talked about confirmation (which Popper rejected) but they give the same numbers as Popper’s way of doing things. Salmon does this by using Bayes’ Formula.


  • Bayes’ Theorem (criticism of the Neyman, Fischerian, and Pearson): picking white marbles from urns (don’t know which urn it comes from).
  • P (prior probability of urn 1, 1/3) Q (prior probability we have picked urn 2)
  • p1= probability that I draw a white marble from urn 1 (conditional)
  • p2= probability that I draw a white marble from urn 2
  • Posterior probability/inverse probability/probability of causes: probability that if I got a white marble, I got it from urn 1
  • Pxp1                       (product=that you drew from urn 1 and got a white marble)
  • Pxp1+Qxp2          (product=that you drew from urn 1 and got a white marble PLUS you drew from urn 2 and got a white marble)- probability that you got a white marble period
  • Clinical example:
  • P1= probability of a certain symptom on having schizophrenia.
  • Prior probability- what’s the probability that someone has schizophrenia?
  • Posterior probability- what’s the probability that someone showing this Rorschach test has schizophrenia?
  • You have a certain prior on the theory, and the theory implies strongly a certain fact (p1=large, good chance of it happening). Without a theory, the Qxp2 is blank. Filled in by being “fairly small” IF you used a precise/risky test as it’s unlikely you could guess with that precision. Means that pxp1 is quite big, so the ratio is big so the theory is well supported.
  • Salmon says you want large priors (Popper says small) but both recommend risky tests that are more likely to falsify your study (due to their precise predictions)


  • Lakatos: research programs (amended theories that started out with leading programs). Kuhn=revising certain things about the theory until it has died and then you have a paradigm shift.
  • Popper says it’s far more important to predict results rather than explain an old one.


Yonce, J. L., 2016. Philosophical Psychology Seminar (1989) Videos & Audio, [online] (Last updated 05/25/2016) Available at: [Accessed on: 06/06/2016]

Notes on Paul Meehl’s “Philosophical Psychology Session” #01

These are the notes I made whilst watching the video recording of Paul Meehl’s philosophy of science lectures. This is the first episode (a list of all the videos can be found here). Please note that these posts are not designed to replace or be used instead of the actual videos (I highly recommend you watch them). They are to be read alongside to help you understand what was said. I also do not include everything that was discussed (just the main/most complex points).

  • Power of hard sciences doesn’t come from operational verbal definitions but from the tools of measurements & the mathematics.
  • A subset of the concepts must be operationally defined otherwise it doesn’t connect with the facts.
  • Methodological remarks= remark in the meta language (statements that occur in science and the relations between them, properties of statements and between statements, relations between beliefs and evidence e.g. true, false, rational, unknown, confirmed by data, fallacious, deducible, valid, probable) rather than object language (language that speaks about subject matter e.g. protons, libido, atom, g, reinforce),
  • Hans Reichenbach was wrong about induction
  • Pure observations are infected by theory (FALSE for psychology). If protocol you record is infected by theory, bad scientist e.g. Choosing to look at 1 thing rather than another just because of a theory OR falsifying data just to fit your theory.
  • Watson’s theory that learning took place in muscles (from proprioception feedback) was falsified by rats being able to negotiate a maze almost as quickly after having neural pathway that controlled proprioception feedback severed or when the maze was flooded.
  • Operationalism (we only know a concept if we can measure it & all necessary steps for demonstrating meaning or truth must be specified) sparked psychologies’ obsession with operationalising our terms (even though the harder sciences we are trying to emulate are not as rigorous with it) but Carnap suggests it is folly.
  • Logical positivism- taking things that could not be doubted by any sane person and building up from there a justification for science, and with the math and logic on top of the protocols you “coerce them” into believing in science. Urge for certainty.
  • Analyse science and rationally reconstruct (justify) it, show why a rational person should believe in science. Negative aim: liquidation of metaphysics (by creating meaning criterion).
  • A statement is cognitively meaningless if you don’t know how to verify it (either empirically or logically)- Criterion of meaning. The meaning of a sentence is the method of it’s verification, statement of affirmative meaning. A sentence’s meaning is derived from the evidence that supports it (“the meaning of a sentence is to be found entirely in the conditions under which it could be verified by some possible experience”*). Rejected because the sentence “Caesar crossed the Rubicon” means COMPLETELY different things to us and to a Centurion at Caesar’s side because we have different evidence.
  • Lots of our information comes from “authorities” (even though it’s a logical fallacy). We have to calibrate the authority and often we presume someone has done it for us so we trust it.



Mattey, G.J., 2005. Schlick on Meaning and Verification. [pdf] G.J. Mattey. Available at: <> [Accessed on: 06/06/2016]

Yonce, J. L., 2016. Philosophical Psychology Seminar (1989) Videos & Audio, [online] (Last updated 05/25/2016) Available at: [Accessed on: 06/06/2016]

In defence of preregistration

This post is a response to “Pre-Registration of Analysis of Experiments is Dangerous for Science” by Mel Slater (2016). Preregistration is stating what you’re going to do and how you’re going to do it before you collect data (for more detail, read this). Slater gives a few examples of hypothetical (but highly plausible) experiments and explains why preregistering the analyses of the studies (not preregistration of the studies themselves) would not have worked. I will reply to his comments and attempt to show why he is wrong.

Slater describes an experiment where they are conducting a between groups experimental design, with 2 conditions (experimental & control), 1 response variable, and no covariates. You find the expected result but it’s not exactly as you predicted. It turns out the result is totally explained by the gender of the participants (a variable you weren’t initially analysing but was balanced by chance). So its gone from a 2 group analysis to a 2×2 analysis (with the experimental & control conditions as one factor and male & female being the other).

Slater then argues that (according to preregistration) you must preregister and conduct a new experimental design because you have not preregistered those new analyses (which analyse the role of gender). The example steadily gets more detailed (with other covariates discovered along the way) until the final analysis is very different from what you initially expected. He states that you would need to throw out the data each time and start again every time you find a new covariate or factor because it wasn’t initially preregistered. The reason you would need to restart your experiment is because doing a “post hoc analysis is not supposed to be valid in the classical statistical framework”. So because you didn’t preregister the analyses you now want to perform, you need to restart the whole process. This can result in wrong conclusions being drawn as it could lead to complex (but non-predicted) relationships being missed as the original finding will be published (as often it’s too expensive or time consuming or not even possible to run the experiment again with the new analyses) and the role of gender (and the other covariates) won’t be explored.

This is, however, a fundamental misunderstanding of what preregistration of analyses is. If you perform any new analyses on your data that weren’t preregistered, you don’t need to set up another study. You can perform these new analyses (which you didn’t predict before the experiment began) but you have to be explicit in the Results section that this was the case (Chambers, Feredoes, Muthukumaraswamy, & Etchells; 2014). And post hoc analyses of the data is very common (Ioannidis, 2005) and preregistration is directly trying to counter this.

Later in the post, he argues “discovery is out the window” because this occurs when “you get results that are not those that were predicted by the experiment.” Preregistration would therefore stifle discovery as you have to conduct a new study for each new analysis you want to perform. He states “Registrationists” argue for an ‘input-output’ model of science where “thought is eliminated”.

This is a fair concern, but it has already been answered by the FAQ page for Registered Reports (link here) and many other places. To summarise, discovery will not be stifled because you can perform the non-predicted analyses but you have to clearly state they weren’t predicted. The only thing you aren’t allowed to do is pretend you were going to conduct that analysis initially which is called HARKing, or hypothesising after results known (Kerr, 1998).

Slater argues that because data in the life and social sciences is so messy (as compared with physics) it is much harder to make the distinction between ‘exploratory’ and ‘confirmatory’ experiments. He implies preregistration requires a harsh divide between them so confirmatory experiments cannot become exploratory (which often happens in the real world) because they weren’t preregistered. Whilst there would be a clearer divide between exploratory and confirmatory experiments, preregistration does not forbid the latter becoming the former (merely that you are open about what you’ve done). Having a clear divide between the two is very important for maintaining the value of both types of experiments (de Groot, 2014).

He argues that (due to the pressure to publish positive results) researchers could “run their experiment, fish for some result, and then register it”. But this is not “possible without committing fraud” (Chambers, Feredoes, Muthukumaraswamy, & Etchells; 2014). You have to share time-stamped raw data files that were used in the study so you can see when the data was collected. This will help reduce the chance of fraud and ensure they are performed properly.

He argues that currently there is not enough thought put into the analysis process. He states this based on the fact results sections start with F-tests and t-tests rather than presenting the data in tables and graphs and discussing it. Researchers look straight for the result they were expecting and only focus on those, potentially missing other important aspects. Preregistration, he believes, would exacerbate this problem.

Whilst I agree there is an over-emphasis on getting P<0.05 in the literature, preregistration will not make this problem worse. If anything, preregistration could help reduce the collective obsession with P<0.05 because if a study is preregistered and agreed for publication (based on the quality of the methods) then it won’t rely on a significant value to be published (see here for a diagram of the registration and publication process). It also makes replications of previous findings more attractive to researchers because publication doesn’t depend on the results, which we know has lead to the neglect of replications (Nosek, Spies, & Motyl, 2012).

Could preregistration increase the likelihood that researchers focus solely on their preregistered analyses and ignore other potential findings? Maybe, but this worry is very abstract. This is contrasted with the very real (and very damaging) problem of questionable research practices (QRPs) which we know plague the literature (John, Loewenstein, & Prelec; 2012) and have a negative impact (Simmons, Nelson, & Simonsohn; 2011). Preregistration can help limit these QRPs.

Is preregistration the panacea for psychology’s replication crisis? No, but then it never claimed to be. It’s one of the (many) tools to help improve psychology.


Bowman, S.; Chambers, D.C.; & Nosek, B.A. (2014). FAQ 5: Scientific Creativity and Exploration.  [OSF open-ended registration] Available at: [Accessed on 19/05/2016].

Chambers, D.C. (2015). Cortex’s Registered Reports: How Cortex’s Registered Reports initiative is making reform a reality. Available at: [Accessed on 16/05/2016]

Chambers, D.C.; Feredoes, E.; Muthukumaraswamy, S.D.; & Etchells, P.J. (2014). Instead of “playing the game” it is time to change the rules: Registered Reports at AIMS Neuroscience and beyond. AIMS Neuroscience, 1 (1), 4-17.

de Groot AD. (2014) The meaning of “significance” for different types of research [translated and annotated by Eric-Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, and Han L. J. van der Maas]. Acta Psychologica (Amst), 148, 188-194.

Ioannidis, J.P.A. (2005) Why Most Published Research Findings Are False. PLoS Med 2: e124.

John, L.K.; Loewenstein, G.; & Prelec, D. (2012). Measuring the Prevalence of Questionable
Research Practices With Incentives for Truth Telling. Psychological Science, 23 (5), 524–532.

Kerr, N.L. (1998). HARKing: Hypothesising After the Results are Known. Personality and Social Psychology Review, 2 (3), 196-217.

Nosek, B.A.; J.R. Spies; & Motyl, M. (2012). Scientific Utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science,  7 (6), 615-631.

PsychBrief (2016). Keep Calm and Preregister. [Online]. Available at: [Accessed on 23/05/2016]

Slater, M. (2016). Pre-Registration of Analysis of Experiments is Dangerous for Science [Online]. Available at: [Accessed on 15/05/2016].

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). Falsepositive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366.

Podcast list









I’ve recently discovered podcasts and they are awesome. They’re a great way to learn interesting new things, especially when you’re travelling. So this post is a collection of fantastic podcasts that I listen to and would recommend you pick up. Any suggestions are welcome so please let me know if there are any you like. (*= my favourites).

Social and Life sciences:

*Everything Hertz: Discussions about biological psychiatry, psychology, and the process of science with heavy sarcasm. (iTunes) (Soundcloud)

*The Black Goat: Three psychologists discuss how to perform science and the various issues that face scientists e.g. publication pressure, how to be an open scientist etc. (website) (iTunes)

BOLD Signals: Interviews with a wide variety of scientists such as neuroscientists, science informationists, and cognitive neuroscientists (and many others) on a huge range of topics. (iTunes) (Soundcloud)

Science VS: A podcast that takes an interesting topic e.g. the “gay gene”, and examines the evidence for and against it. (iTunes)

*Say Why to Drugs: Cutting through the hype and hyperbole about different drugs and examining what the research actually tells us about them. (website) (iTunes)

Invisibilia: Fascinating episodes on broad ranging topics related to how we experience the world e.g. is there a “solution” to mental health, and is thinking like this part of the problem? (website)

Stuff You Should Know: One topic or idea is examined in great detail and explained in each episode with topics ranging from sleep terrors to nitrous oxide. (website) (iTunes)

Unsupervised Thinking: Discussions about specific areas within neuroscience and AI e.g. brain size, the Connectome, etc. (website) (iTunes)


Not So Standard Deviations: Informal talks about statistics in academia and industry, covering a huge variety of topics in an entertaining and engaging way. (website) (iTunes)

More or Less: Tim Harford explains – and sometimes debunks – the numbers and statistics used in political debate, the news and everyday life. (website)


The Partially Examined Life: In-depth analysis of famous philosophical books or ideas with no presumed prior knowledge. (website) (iTunes)

*Very Bad Wizards: Discussions between a philosopher and a moral psychologist (with occasional guests) about current events, social issues, and research from their fields. (website) (iTunes)


*PhDivas: A cancer scientist and literary critic talk about life in academia, science, and how social issues affect academia. (website) (iTunes)

Intelligence Squared: Hour long discussions or debates about an interesting topic featuring prominent thinkers. (website) (iTunes)

Video games cause violence

For almost as long as there have been video games, there have been people arguing that they are bad for you. There also seems to be a wealth of experimental evidence behind it (Hasan et al., 2013, to name just one of many). But there have been suggestions that these negative outcomes are oversold.


Problems with the literature:

One of the strongest pieces of evidence for the negative effects of video games is a meta-analysis by Anderson et al. (2010). They found strong evidence that “exposure to violent video games is a causal risk factor for increased aggressive behaviour, aggressive cognition, and aggressive affect and for decreased empathy and prosocial behaviour”. However, there were immediate questions about the methodology in this meta-analysis. Ferguson & Kilburn (2010) commented that many studies do not relate well to aggression and the authors do not consider the impact of unstandardised aggression measures (differences between studies in how they measured aggressive behaviour), among other things. They comment that the studies analysed in Anderson et al. (2010) only show weak evidence for their conclusion. A more recent reanalysis by Hilgard, Engelhardt, and Rouder (2016) used more advanced tools to adjust for research bias and found that the short-term effects of game play on aggressive feelings and behaviour were badly overestimated by bias. The adjustments recommended by Hilgard et al. (2016) were mostly substantially lower than those performed by Anderson et al. (2010), with some being smaller adjustments. In some studies, the result was adjusted to zero e.g. aggressive affect. This does not completely eliminate the original findings but I feel we should adjust our estimate of the strength of the causal association downwards.

Ferguson (2007a) conducted a meta-analysis to see the relationship between violent video games and aggression demonstrated in a lab environment. He found that there was strong evidence of publication bias, showing that the inclusion of unpublished or suppressed studies made the result of the meta-analysis non-significant and/or trivial. It was also shown that age of the participant was a significant moderator effect which research showing a positive association e.g. Anderson & Dill (2000), being one of the most cited, did not control for. Anderson & Dill (2000) also didn’t control for family background which may lead to a preference for aggressive video games as well as aggressive behaviour.

Variety when there shouldn’t be:

Most of these studies use a tool called the “Competitive Reactive Time Task”, also called the “Taylor Aggression Paradigm” (TAP). It is supposed to measure aggressive behaviour in the laboratory. But there are a huge number of ways that it is used: 120 studies were found to have used TAP but there were 147 different ways it has been used (at the time of writing) e.g. some use electric shocks, others use noise blasts. This makes comparing results between studies difficult and raises questions about why there is so much variety. Are researchers using a variety of methods and reporting the measure that produces a positive result? For more detail, check out this excellent website by Malte Elson, 2016. An analysis by Elson et al. (2014) found that not only was there a wide variety of ways the TAP was being implemented, there were many different data analysis strategies. This raises further questions about researchers looking for positive results and publishing only those methods that produce them. Elson et al. analysed data for 3 studies and found that, depending on which strategy was used, p-values and effect sizes showed very different results (and sometimes even reversed the effect). This further undermines the credibility of research using the TAP. Ferguson (2007a) points out that studies use TAP but “none of these indices of “aggression” [have] been linked with actual criminally violent behavior” (these “indices” being giving noise blasts to other participants and other methods). This further questions the ability of studies using TAP to show us how violent video games make people more violent; even if they make them more aggressive in the TAP, that might not correspond with an increase in real world aggression.

There have also been concerns about the validity of this tool to measure aggression in the laboratory. Tedeschi & Quigley (1996) first identified the problem of demand characteristics for this method (participants acting as they think the researcher wants them to) as this would undermine the relevance to the real world of findings using this tool. They developed their criticism (Tedeschi & Quigley, 2000) to include difficulties of measuring the concept of aggression using the TAP and how researchers have selected certain evidence to support the convergent validity of the TAP whilst ignoring evidence that doesn’t.

For research that demonstrates a link between video game violence and actual aggressive behaviour, Ferguson (2007b) found a very weak relationship between video game violence and aggressive behaviour. However he found more evidence of publication bias which, once corrected for, reduced the relationship to almost zero. The link between video games and aggression in real life has been further questioned by Ferguson (2014). His paper suggested that as video games sales increases, youth violence decreases which suggests (on a macro-scale) that there isn’t a clear relationship between video game violence and real-world violence.

Minimal impact:

Etchells et al. (2016) performed an analysis of 1,815 children’s video game exposure at age 8/9 and rates of conduct disorder (CD) and depression at 15. They found a very weak association between video game exposure and CD when controlling for confounders (e.g. sex, bullying, peer problems, etc.) but only 26 met the criteria for CD. The measure just reached significance: p=0.05 fully adjusted (Table 2). They found no association between video game exposure and depression when controlling for confounders (Table 2). There was also only a weak association between children playing more violent games (shoot-em-ups vs competitive games) and an increased risk of CD (Table 4). With the weak evidence and the fact they do not consider the impact of other media exposure, I agree with their conclusion that assuming this association is causal is “inappropriate”.

A meta-analysis found that video games have a minimal impact on increased aggression, reduced prosocial behaviour, reduced academic performance, depressive symptoms, and attention deficit symptoms (Ferguson, 2015). This was found after controlling for publication bias. The study was re-analysed by Furuya-Kanamori & Doi (2016) and replicated the statistically significant but very small impact video-games have on several outcomes. A longitudinal study by Ferguson et al. (2012) found that video game exposure did not relate to any of the negative outcomes analysed, whilst exposure to family violence and antisocial personality traits did predict aggressive behaviour. However this study used self-report measures of aggressive behaviour and video game exposure so caution should be encouraged.

Drummond & Sauer (2014) analysed data from 192,000 students across 22 countries who took part in the Programme for International Student Assessment (PISA) and found video-game use had a negligible impact on academic success in Science, Mathematics, and Reading. Their large sample size, psychometrically valid tests, and natural environment were positives of the study whilst the reliance on self-report for the frequency of video-game playing is a negative. McCarthy et al. (2016) found playing video-games (either violent or nonviolent) had no impact on participant’s aggressive inclinations. This study was preregistered (for a detailed description of what study preregistration is and why it’s a good thing, click here) so we can be more confident in the results and that they aren’t the result of p-hacking. These studies together indicate that video-games have less of an impact than implied by a quick read of the literature.

It’s (always) more complicated:

A Bayesian reanalysis of many of the studies and meta-analyses in the violent video-game literature was conducted by Hilgard, Engelhardt, Bartholow, & Rouder (2016). They found the evidence against the effect of violent video-games varied hugely: some strongly supported the null hypothesis (violent video-games had no impact on the dependent variable), some finding weak evidence for the null, and some may have even found evidence for the alternative hypothesis. Their results also suggest many of the studies using “matched” non-violent and violent video-games weren’t in fact well matched. This means comparisons between the types of games cannot rule out confounding variables to explain differences between them. They recommend larger sample sizes and Bayesian analysis for future studies to improve the strength of the evidence.


Does this mean that you can do whatever you want, play for 7 hours a day, 6 days a week, and everything be rosy? No. Some research has found that playing more than 3 hours a day is linked with higher levels of hyperactivity and conduct issues (Przybylski & Mishkin, 2015) and lower levels of prosocial behaviour (Przybylski, 2014). But this same research found a positive impact of playing roughly 1 hour a day on hyperactivity and conduct issues (Przybylski & Mishkin, 2015) and prosocial behaviour (Przybylski, 2014). The idea of an ideal amount of time to spend playing video games was corroborated by Przybylski & Weinstein (2017). They conduced a preregistered study on just over 120,000 children and examined the link between video game use and mental health. They found that “moderate use of digital technology is not intrinsically harmful”, running counter to many previous studies suggesting it has a negative impact. There is more and more research demonstrating the potential positives of engaging in video games (e.g. Eichenbaum, Bavelier, & Green, 2014) but that is another post for another time. But I hope I have demonstrated that the evidence for the negative impacts of video games (especially with regards to causing real-world violence) are overstated.


2DJimmyN. (2008). Angry Video Game Nerd Artwork. Available at: [Accessed: 5th April 2016].

Anderson, C.A. & Dill, K.E. (2004). Video Games and Aggressive Thoughts, Feelings, and Behaviour in the Laboratory and in Life. Journal of Personality and Social Psychology, 78 (4), 772-790.

Anderson, C.A.; Shibuya, A.; Ihori, N.; Swing, E.L.; Bushman, B.J.; Sakamoto, A.; Rothstein, H.R.; & Saleem, M. (2010). Violent Video Game Effects on Aggression, Empathy, and Prosocial Behaviour in Eastern and Western Countries: A Meta-Analytic Review. Psychological Bulletin, 136 (2), 151-173.

Chambers, C. (2012). Changing the culture of scientific publishing from within [online] Available at: [Accessed on: 19/09/2016]

Drummond, A. & Sauer, J.D. (2014) Video-Games Do Not Negatively Impact Adolescent Academic Performance in Science, Mathematics or Reading. PLoS ONE 9(4): e87943. doi: 10.1371/journal.pone.0087943

Eichenbaum, A.; Bavelier, D.; & Green, C.S. (2014). Video Games: Play that Can Do Serious Good. American Journal of Play, 7, 50-72.

Elson, M. (2016). Flexibility in Methods and Measures of Social Science, [online] 3 April. Available at: [Accessed on: 05/04/2016].

Elson, M. (2014). Press CRTT to measure aggressive behavior: The unstandardized use of the competitive reaction time task in aggression research. Psychological Assessment, 26 (2), 419-432. 

Etchells, P.J.; Gage, S.H.; Rutherford, A.D.; & Munafo, M.R. (2016). Prospective Investigation of Video Game Use in Children and Subsequent Conduct Disorder and Depression Using Data from the Avon Longitudinal Study of Parents and Children. PLOS ONE. Available at: [Accessed on 05/04/2016].

Ferguson, C.J. (2007a) Evidence for publication bias in video game violence effects literature: A meta-analytic review. Aggression and Violent Behaviour, 12 (4), 470-482

Ferguson, C.J. (2007b) The Good, the Bad, and the Ugly: A Meta-analytic Review of Positive and Negative Effects of Violent Video Games. Psychiatry, 78, 309-316.

Ferguson, C.J. & Kilburn, J. (2010). Much Ado About Nothing: The Misestimation and Overinterpretation of Violent Video Game Effects in Eastern and Western Countries: Comment on Anderson et al. (2010). Psychological Bulletin, 136 (2), 174-178.

Ferguson, C.J.; Miguel, S.M.; Garza, A.; & Jerabeck, J.M. (2012). A longitudinal test of video game violence influences on dating and aggression: A 3-year longitudinal study of adolescents. Journal of Psychiatric Research, 46 (2), 141-146.

Ferguson, C.J. (2014). Does Media Violence Predict Societal Violence? It Depends What You Look At and When. Journal of Communication, 65 (1), E1-E22.

Ferguson, C.J. (2015). Do Angry Birds Make for Angry Children? A Meta-Analysis of Video Game Influences on Children’s and Adolescents’ Aggression, Mental Health, Prosocial Behavior, and Academic Performance. Perspectives on Psychological Science, 10 (5), 646-666.

Furuya-Kanamori, L. & Doi, S. A. R. (2016). Angry Birds, Angry Children, and Angry Meta-Analysts. A Reanalysis. Perspectives on Psychological Science, 11 (3) 408-414.

Hasan, Y.; Begue, L.; Scharkow, M.; & Busman, B.J. (2013). The more you play, the more aggressive you become: A long-term experimental study of cumulative violent video games effects on hostile expectations and aggressive behaviour. Journal of Experimental Social Psychology, 49, 224-227.

Higgins, J.P.T. & Green, S. (2011). Trim and fill. Cochrane Handbook for Systematic Reviews of Interventions [online] Available at: [Accessed on 05/04/2016].

Hilgard, J.; Engelhardt, C.R.; Bartholow, B.D.; & Rouder, J.N. (2016). How Much Evidence Is p > .05? Stimulus Pre-Testing and Null Primary Outcomes in Violent Video Games Research. [online] Available at:

Hilgard, J.; Engelhardt, C.R.; & Rouder, J.N. (2016). Overestimated Effects of Violent Video Games on Aggressive Outcomes in Anderson et al. (2010). Early view article. Published 11/03/2016. Available at: [Accessed: 5th April 2016].

Przybylski, A.K. (2014). Electronic Gaming and Psychosocial Adjustment. Pediatrics, 134 (3). Available through: PsychBrief’s DropBox <> [Accessed 05/04/2016].

Przybylski, A.K. & Mishkin, A.F. (2015). How the Quantity and Quality of Electronic Gaming Relates to Adolescents’ Academic Engagement and Psychosocial Adjustment. Psychology of Popular Media Culture. Advance online publication

Przybylski, A.K. & Weinstein, N. (2017). A Large-Scale Test of the Goldilocks Hypothesis:
Quantifying the Relations Between Digital-Screen Use and the Mental Well-Being of Adolescents. Psychological Science, 28 (2), 204-215.

Tedeschi, J.M. & Quigley, B.M. (1996). Limitations of laboratory aggression paradigms for studying aggression. Aggression and Violent Behaviour, 1 (2), 163-177. 

Tedeschi, J.M. & Quigley, B.M. (2000). A further comment on the construct validity of laboratory aggression paradigms: A response to giancola and chermack. Aggression and Violent Behaviour, 5 (2), 127-136.

Collection of criticisms of Adam Perkins’ ‘The Welfare Trait’

In late 2015, Dr Adam Perkins published his book called ‘The Welfare Trait’. The main crux of his argument was that each generation who is supported by the welfare state becomes more work-shy. He also argued that the welfare state increased the number of children born to households where neither parent works. His solution is to change the welfare state to limit the number of children that each non-working household has.

His book caused quite a storm when it was first released. Some people argued that it was crudely-disguised eugenics, others argued that those who were dismissing it were refusing to face the facts. Over time, more and more criticisms of and problems with Perkins’ work have come to light (e.g. basic statistical errors and incorrect conclusions from papers). Below is a collection of some (but not all) of the criticisms levelled at Perkins’ book.

Storify. (2016). Criticisms of Adam Perkins and ‘The Welfare Trait’ (with images, tweets) · PsychologyBrief. [online] Available at: [Accessed 11 Mar. 2016].

Stereotype threat

Don’t you just love being wrong? Of course you don’t, no one does. But there is a grim satisfaction in no longer believing something that there isn’t good enough evidence for. This is what I experienced after examining the phenomenon known as ‘stereotype threat’. In short, it’s the idea that groups with negative stereotypes about them feel anxiety when these stereotypes are made salient (and are therefore more likely to confirm those stereotypes) e.g. women being inferior than men at maths.

I believed in it (as evidenced by the fact I’ve written about it before) because there appeared to be a lot of evidence in favour of it. But there have been some significant failed replications (Ganley et al., 2013; Stricker & Ward, 2004; Stafford, 2016; Finnigan & Corker, 2016; Wei, 2012 available here). These were large-scale replications and (in the case of Stricker & Ward and Wei) were field experiments; they used data from actual tests (with stereotype threat manipulations). This is in contrast with the positive replications and initial studies: low numbers of participants in a zero-consequence test in a laboratory. The fact that the largest studies found no effect of stereotype threat is significant. There have also been suggestions of publication bias, with evidence from Flore & Witcherts (2015). This was also suggested by Ganley et al. (2013) who found that 80% of published articles found at least one instance of stereotype threat but none of the unpublished articles did (though that doesn’t necessarily mean this result was due to publication bias).

The original conclusion by Steele & Aronson (1995) has been called into question by Sackett, Hardison, & Cullen (2004). They point out that Steele & Aronson statistically adjusted the student’s results in the test to control for differences in student’s prior SAT performance. So the (small) differences seen between the baseline test and the “threat” condition may have been due to the differences in prior SAT performance.

There have also been issues raised with some of the positive replications. Stoet & Geary (2012) analysed 20 attempted replications of stereotype threat (for women in mathematics). They found only 55% of the studies replicated it but almost all of those (8/11) had adjusted the scores of participants to control for differences in maths ability (using a prior test performance e.g. SAT, as a covariate).

The two results above are a problem, as the variable they are examining is differences in mathematical scores. The experimental hypothesis is it is due to stereotype threat (with the assumption that the groups don’t differ on the covariate of prior mathematical ability) but if they do differ on it then it might explain the difference, making stereotype threat irrelevant (see Jussim, Crawford, Anglin, Stevens, & Duarte; 2016, for an elaboration).

I don’t think stereotype threat doesn’t exist at all, but it seems to be a lot more complicated than simply “making stereotypes salient will reduce performance”. I think it’s relevance in the real world is hugely over-played and there are a lot of interacting variables at work which affect it’s impact. Whilst even small effect sizes (as found in most of the studies) can have impacts in the real world, I am unconvinced that spending time and resources on reducing stereotype threat is worthwhile when other factors are known to negatively affect mathematics performance (Ceci & Williams, 2010). I am willing to be convinced of its existence but there needs to be much stronger evidence for it.

For a nuanced discussion of the possible reasons for positive effects in the lab but null effects in the field, I recommend Stricker (2008). For an excellent in-depth statistical analysis as to why it’s difficult to replicate the original stereotype threat finding, I suggest you read this post by Replication Index (2015).


Ceci, S. J., & Williams, W. M. (2010). Sex differences in math-intensive fields. Current Directions in Psychological Science, 19, 275–279. doi: 10.1177/0963721410383241 (2015). Salient. [Online] Available at: [Accessed 21/12/2015]

Finnigan, K.M.; Corker, K.S. (2016). Do performance avoidance goals moderate the effect of different types of stereotype threat on women’s math performance? Journal of Research in Personality, 63, 36-43.

Flore, P.C. & Wicherts, J.M. (2015). Does stereotype threat influence performance of girls in stereotyped domains? A meta-analysis. Journal of School Psychology, 53 (1), 25-44.

Ganley, C. M., Mingle, L. A., Ryan, A. M., Ryan, K., Vasilyeva, M., & Perry, M. (2013). An Examination of Stereotype Threat Effects on Girls’ Mathematics Performance. Developmental Psychology. Advance online publication. doi: 10.1037/a0031412

Jussim, L.; Crawford, J. T.; Anglin, S. M.; Stevens, S. T.; & Duarte, J. L. (2016). Interpretations and methods: Towards a more effectively self-correcting social psychology. Journal of Experimental Social Psychology, 66, 116-133.

PsychBrief (2014). Overcoming stereotype threat. [Online] Available from: [Accessed: 21/12/2015]

Replication-Index (2015). Why are Stereotype-Threat Effects on Women’s Math Performance Difficult to Replicate? [Online] Available at: [Accessed: 21/12/2015]

Sackett, P.R. Hardison, C.M., & Cullen, M.J. On Interpreting Stereotype Threat as Accounting for African American–White Differences on Cognitive Tests. American Psychologist, 59 (1), 7-13.

Stafford, T. (2016). No stereotype threat effect in international chess, Annual Conference of the Cognitive Science Society, 10-13th August 2016, Philadelphia, USA.

Stoet, G. & Geary, D.C. (2012). Can Stereotype Threat Explain the Gender Gap in Mathematics Performance and Achievement? Review of General Psychology, 16 (1), 93-102.

Steele, C.M. & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69 (5), 797-811.

Stricker, L.J. (2008). The Challenge of Stereotype Threat for the Testing Community. [Online] Available from: [Accessed: 21/12/2015]

Stricker, L.J. & Ward, W.C. (2004). Stereotype Threat, Inquiring About Test Takers’ Ethnicity and Gender, and Standardized Test Performance. Journal of Applied Social Psychology, 34 (4), 665–693

Wei, T.E. (2012). Sticks, Stones, Words, and Broken Bones; New Field and Lab Evidence on Stereotype Threat. Educational Evaluation and Policy Analysis, 34 (4), 465-488

How biased are you? The role of intelligence in protecting you from thinking biases.

People generally like to believe they are rational (Greenberg, 2015). Unfortunately, this isn’t usually the case (Tversky & Kahneman, 1974). People very easily fall prey to thinking biases which stops them from making a purely rational judgement (whether always making a rational judgement is a good thing is a discussion for another time). These are flaws in thinking e.g. the availability bias, where you judge the likelihood of an event or the frequency of a class by how easily you can recall an example of that event (Tversky & Kahneman, 1974). So after seeing a shark attack in the news, people think the probability of a shark attack is much higher than it is (because they can easily recall an example of one).

But what are some of the factors that protect you against falling for these thinking biases? You would think that the smarter someone is, the less likely they are to be affected by them. However, the evidence paints a different picture.

The first bias we are going to look at is called the “myside bias”, which is defined as the predisposition to “evaluate evidence, generate evidence, and test hypotheses in a manner biased toward their own prior beliefs” (Stanovich, West, & Toplak, 2013). The ability to view an argument from both sides and decouple your prior opinions from the situation is seen as a crucial skill for being rational (Baron, 1991; Baron, 2000). Interestingly, there have been multiple experiments showing that susceptibility to myside bias is independent of cognitive ability (Stanovich & West, 2007; Stanovich & West, 2008; Stanovich, West, & Toplak, 2013); it doesn’t matter how smart you are, you are just as likely to evaluate something from your perspective if you aren’t told to do otherwise.

Not only is there evidence to suggest the myside bias is uncorrelated with intelligence, there is further evidence to suggest that a whole host of thinking biases are unrelated to intelligence (Stanovich & West, 2008), including but not limited to: the anchoring effect, framing effects, sunk-cost effect etc. Further evidence from Teovanovic, Knezevic, & Stankov (2015) supports the idea that intelligence doesn’t protect you from these biases (because intelligence was only weakly or not at all correlated with performing well and therefore avoiding biases).

It has even been shown that the more intelligent someone is, the more likely they are to feel that others are more biased than they are and they are more rational by comparison (West, Meserve, & Stanovich, 2012). This is called the “bias blind spot” (they are blind to their own biases). Another study (Scopelliti et al., 2016) found susceptibility to “bias blind spot” is largely independent from intelligence, cognitive ability, and cognitive reflection.

However, it’s not a completely level playing field. On some tests where people might fall prey to thinking biases e.g. the selection test (Wason, 1966), intelligence was correlated with success; the more intelligent a participant was the more likely they were to get it right (Stanovich & West, 1998).

You would think being an expert in a field would also be a factor that helps you resist biases, but for the hindsight bias it appears not to matter whether you are an expert or not; you are just as likely to fall for it (Guilbault et al., 2004).

Some have argued that these biases aren’t actually biases at all (Gigerenzer, 1991) or that they are just performance errors or “mere mistakes” rather than systematic irrationality (Stein, 1996). However, these views have been argued against by Kahneman & Tversky (1996) and Stanovich & West (2000) respectively. Stanovich and West conducted a series of experiments testing some of the most famous biases and found that performance errors accounted for very little of the variation in answers, whereas computational limitations (the fact people aren’t purely rational) accounted for most of the times people fell for biases.

So it seems that being intelligent or an expert doesn’t always protect you against cognitive biases (and can even make you less aware of your own shortcomings). But what can? I’ll be exploring the techniques to protect yourself from biases in my next blog post.


Baron, J. (1991). Beliefs about thinking. In Voss, J.; Perkins, D.; & Segal, J. (Eds.), Informal reasoning and education (p.169-186). Hillsdale, NJ: Lawrence Erlbaum Associates Inc.

Baron, J. (2000), Thinking and Deciding (3rd Ed.). Cambridge, UK: Cambridge University Press.

Gigerenzer, G. (1991). How to Make Cognitive Illusions Disappear: Beyond “Heuristics and Biases”. European Review of Social Psychology, 2, 83-115.

Greenberg, S. (2015). How rational do people think they are, and do they care one way or another? [Online] Available from:!How-rational-do-people-think-they-are-and-do-they-care-one-way-or-another/c1toj/5516a8030cf220353060d241
[Accessed: 21st July 2015].

Guilbault, R.L.; Bryant, F.B.; Brockway, J.H.; & Posavac, E.J. (2004). A Meta-Analysis of Research on Hindsight Bias. Basic and Applied Social Psychology, 26 (2&3), 113-117.

Kahneman, D. & Tversky, A. (1983). Choices, Values, and Frames. American Psychological Association, 39 (4), 341-350.

Kahneman, D. & Tversky, A. (1996). On the Reality of Cognitive Illusions. Psychological Review, 103 (3), 582-591.

Scopelliti, I.; Morewedge, C.K.; McCormick, E.; Min, H.L.; Lebrecht, S.; & Kassam, K.S. (2016). Bias Blind Spot: Structure, Measurement, and Consequences. Management Science,  61 (10) 2468-2486.

Stanovich, K.E. & West, R.F. (1998). Individual Differences in Rational Thought. Journal of Experimental Psychology: General, 127 (2), 161-188.

Stanovich, K.E. & West, R.F. (2000). Individual differences in reasoning: Implications for the rationality debate? Behavioural and Brain Sciences, 23, 645-726.

Stanovich, K.E. & West, R.F. (2007). Natural Myside Bias is Independent of Cognitive Ability. Thinking and Reasoning, 13 (3), 225-247.

Stanovich, K.E. & West, R.F. (2008). On the failure of cognitive ability to predict myside and one-sided thinking biases. Thinking and Reasoning, 14 (2), 129-167.

Stanovich, K.E. & West, R.F. (2008). On the Relative Independence of Thinking Biases and Cognitive Ability. Personality Processes and Individual Differences, 94 (4), 672-695.

Stanovich, K.E.; West, R.F.; & Toplak, M.E. (2013). Myside Bias, Rational Thinking, and Intelligence. Association for Psychological Science, 22 (4), 259-264.

Staw. B.M. (1976). Knee-Deep in the Big Muddy: A Study of Escalating Commitment to a Chosen Course of Action. Organisational Behaviour and Human Performance, 16, 27-44.

Stein, E. (1996). Without Good Reason: The Rationality Debate in Philosophy and Cognitive Science. Oxford University Press [rKES].

Teovanovic, P.; Knezevic, G.; & Stankov, L. (2015). Individual differences in cognitive biases: Evidence against one-factor theory of rationality. Intelligence, 50, 75-86.

Tversky, A. & Kahneman, D. (1974). Judgements under Uncertainty: Heuristics and Biases. Science, 185 (4157), 1124-1131.

Wason, P.C. (1966). Reasoning. In B. Foss (Ed.), in New Horizons in Psychology, 135-151. Harmonsworth, England: Penguin.

West, R.F.; Meserve, R.J.; & Stanovich, K.E. (2012). Cognitive Sophistication Does Not Attenuate the Bias Blind Spot. Journal of Personality and Social Psychology, 103 (3), 506-519. (function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,’script’,’//’,’ga’); ga(‘create’, ‘UA-63654510-1’, ‘auto’); ga(‘send’, ‘pageview’);

Image credit:

The benefits of single-sex schooling

Many people claim that single-sex (SS) education is better for students than co-educational (CE) e.g. Jackson (2016). There have been criticisms of this idea e.g. Halpern et a. (2011) but generally it is believed to be beneficial. But what does the evidence suggest? A large-scale meta-analysis by Pahlke et al. (2014), involving 184 studies and 1,663,662 students, compared them on a variety of variables (mathematics performance; mathematics attitudes; science performance; science attitudes; attitudes about school; gender stereotyping; self-concept; interpersonal relations; aggression; victimisation; and body-image) to see if attending a SS school benefited males, females, or both.

They looked at studies involving kindergarten to college level students (4-19 year old students). This wide age range (and the use of moderator analyses) allows the authors to see if there’s a difference in effect for different ages. They also used moderator analyses to tease out effects of socioeconomic status (SES) and dosage of SS instruction (is the whole school SS or is it just one class in a CE school?). But others weren’t e.g. whether the school was public or private. So it’s good they are looking to examine some of the reasons for the possible differences but it’s not great that they ignored factors that may have an effect on student outcomes (though the evidence is contradictory: Dronkers & Robert, 2003; Braun, Jenkins, & Grigg, 2006; Cobbold, 2015).

They overcame the “file-drawer” effect; the tendency for non-significant results to remain unpublished (Rosenthal, 1979), which is a good thing. Some studies were classified as controlled (they controlled for ONE of the possible selection effects) e.g. random selection of participants into CE or SS schooling; SES of parents, controlled for initial performance for target domain; and checked for initial differences between CE or SS. Weighted and unweighted effect sizes were calculated (so they could see the impact that studies with enormous sample sizes had on the results).

I’ve included some of the results below:

For controlled studies on mathematics performance, the effect sizes were very low for both boys and girls in favour of SS (weighted and unweighted). The same for uncontrolled, unweighted studies was also seen (though slightly stronger). However, for uncontrolled and weighted studies there was a moderate positive effect for SS. This result is due to a few samples with very large samples reporting large effect sizes e.g. Jackson (2012). There were some differences in outcomes across age: there was a weak effect size in favour of SS for females in middle school and trivial effect sizes for elementary and high school. There was also a weak effect size in favour of SS for females from middle/upper SES. For boys, there was a small effect size for SS in elementary school, a small effect size for CE in middle school.

For science performance, controlled and weighted studies, controlled and unweighted and uncontrolled and unweighted showed almost no effect sizes for males and females. Only for weighted uncontrolled studies showed a positive effect size for SS.

Across 8 controlled studies, they found that girls in CE schools were more significantly more likely to endorse gender stereotypes (e.g. women are not as good at maths as men). However, there was a large variation seen between weighted and unweighted studies so the authors urge caution when interpreting these results.

For most of the variables, there were too few studies to be able to see what moderating effect SES had on outcomes. It would have been interesting to know what moderating effect (if any) SES had on outcomes between SS and CE schools. Another criticism of the study is that they didn’t examine confidence intervals (though this is almost certainly due to most of the studies it is based on not examining confidence intervals). This also isn’t as big as a problem as it might be because the results indicate there is no real positive effect for SS schooling.

This meta-analysis indicates that the better quality the study, the lower the advantage conferred by attending a SS school. Studies that used at least one control generally found very little to no benefit of SS schooling. A few studies suggested a slight benefit of SS schooling but overall this meta-analysis suggests there is no significant benefit of attending a SS school.

This was further supported by Sohn (2016) who found the benefits of SS schools was very small when you control for teacher- and parental-sorting (but the effect is larger for average-performing students). These findings though were contradicted by Jackson (2016) who made use of 20 low-performing pilot secondary schools in Trinidad and Tobago being converted in SS schools. After controlling for student selection, he found a small increase in grades for academic subjects on national exams. He also found the male cohorts were less likely to have arrests. This suggests there might be some nuance to the benefits of SS education but the gains seem to be quite small even in studies that find a positive impact.

Braun, H.; Jenkins, F.; & Grigg, W. (2006). Comparing Public and Private Schools Using Hierarchical Linear Modeling. National Assessment of Educational Progress, 1-54.
Cobbold, T. (2015). A Review of Academic Studies of Public and Private School Outcomes in Australia. Education Research Brief.

Dronkers, J. & Robert, P. (2003). The Effectiveness of Public and Private Schools from a Comparative Perspective. EUI Research Repository, 1-63.

Halpern, D.F.; Eliot, L.; Bigler, R.S.; Fabes, R.A.; Hanish, L.D.; Hyde, J.; Liben, L.S.; & Martin, C.L. (2011). The Pseudoscience of Single-Sex Schooling. Science, 333 (6050), 1706-1707.

Jackson, C.K. (2012).  Single-sex Schools, School Achievement, and Course Selection: Evidence from Rule-Based Student Assignments in Trinidad & Tobago. Journal of Public Economics, 96 (1-2), 173-187.

Jackson, C.K. (2016). The Effect of Single-Sex Education on Academic Outcomes and Crime: Fresh Evidence from Low-Performing Schools in Trinidad and Tobago. National Bureau of Economic Research, 2222, (DOI): 10.3386/w22222

Pahlke, E.; Shibley Hyde, J. & Allison, C.M. (2014). The Effects of Single-Sex Compared With Coeducational Schooling On Students’ Performance and Attitudes: A Meta-Analysis. Psychological Bulletin, 140 (4), 1042-1072.

Rosenthanl, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638-641.

Sohn, H. (2016). Mean and distributional impact of single-sex high schools on students’ cognitive achievement, major choice, and test-taking behavior: Evidence from a random assignment policy in Seoul, Korea. Economics of Education Review, 52, 155-175.