[Guest post] How Twitter made me a better scientist

I’m a big fan of Twitter and have learned so much from the people on there1, so I’m always happy to share someone singing it’s praises. This article was written by Jean-Jacques Orban de Xivry2 for the University of Leuven’s blog. He talks about how he uses it to find out about interesting papers and a whole host of other benefits. The article can be found here. His twitter account is: @jjodx.

Improving the psychological methods feed

The issue of diversity has once again been raised in relation to online discussions of psychology. Others have talked about why it may happen and the consequences of it. I have nothing to add about those areas so I’m not going to discuss them. The purpose of this post is to analyse the diversity of my main contribution to social media discussions that I have total control over: the psychological methods blog feed. How many blogs by women are featured? How many non-white authors are there? How many early-career researchers (ECR’s) are shared 1

Looking at gender first (simply because that was the issue that started this) I coded the gender of the authors of the blog to male and female. If it was a blog with a wide collection of authors I excluded them from the analysis (I’m looking at you JEPS). If there were multiple authors, I coded them individually (hence why the total number is greater than the number of blogs in the feed). So how many male and female authors are there? Compared to the very low bar set by a previous analysis it’s not terrible (mainly because n>0). But it could (should) be better. Here are my results:

For the ethnicity of the author (white or not) 2 I judged whether they were white by their social media profiles and photos of that person. Again, if the blog contains posts by a large collection of authors then I excluded them and I for multiple authors I coded them individually. The results aren’t great:

Coding for whether the author was an ECR or not I used the definition provided by the Research Excellence Framework, which states an ECR is someone who has been hired to teach and perform research within the past 5 years (REF, 2014).  To ascertain whether the blog author was an ECR I consulted their public CV’s or asked them if they believed they qualified (according to the above definition). The descriptive statistics are:

So what does this tell us? That the vast majority of blog authors in my feed are male, white, and not ECR’s. Not particularly diverse. As I said, the purpose of this isn’t to show that I should be marched naked through the streets with a nun chanting “Shame!” behind me while ringing a bell. It’s to recognise I could do better and ask for your help. I want to increase the variety of voices people hear through my blog feed, so do you have any suggestions? The blogs don’t need to focus exclusively on psychological methods but they do need to discuss them. Feel free to comment on this post, contact me via Twitter (@psybrief) or Facebook (search PsychBrief on Facebook and you’ll find me), or send me an email (www.psychbrief@gmail.com). Any other names put forward are much appreciated. Please check the blog list (http://psychbrief.com/psychological-methods-blog-feed/) before adding to see if I haven’t already included it.



Ledgerwood, A. (2017). Why the F*ck I Waste My Time Worrying about Equality. [online] Incurably Nuanced. Available at: http://incurablynuanced.blogspot.co.uk/2017/01/inequality-in-science.html. Accessed on 25/01/2017

Ledgerwood, A.; Haines, E.; & Ratliff, K. (2015). Guest Post: Not Nutting Up or Shutting Up. [online] sometimes i’m wrong. Available at: http://sometimesimwrong.typepad.com/wrong/2015/03/guest-post-not-nutting-up-or-shutting-up.html. Accessed on 25/01/2017

Research Excellence Framework (2014). FAQ’s. [online] Available at: http://www.ref.ac.uk/about/guidance/faq/all/. Accessed on 25/01/2017



#Create a string called “BlogName” with all the names of the different blogs in it
BlogName<-c(“Brown”, “Coyne”, “Allen”, “Neurobonkers”, “Sakaluk”, “Heino”, “Kruschke”, “Giner-Sorolla”, “Magnusson”, “Zwaan”, “CogTales”, “Campbell”, “Vanderkerckhove”, “Mayo”, “Funder”, “Schonbrodt”, “Fried”, “Coyne”, “Yarkoni”, “Neuroskeptic”, “JEPS”, “Morey”, “PsychBrief”, “DataColada”, “Innes-Ker”, “Schwarzkopf”, “PIG-E”, “Rousselet”, “Gelman”, “Bishop”, “Srivastava”, “Vazire”, “Etz”, “Bastian”, “Zee”, “Schimmack”, “Hilgard”, “Rouder”, “Lakens”)
#Create a vector called “BlogGender” with a string of numbers to represent either female, male, or N/a
#Turn BlogGender into a factor where 1 is labelled Female, 2 male, and 3 N/a
BlogGender<-factor(BlogGender, levels= c(1:3), labels =c(“Female”,”Male”, “N/a”))
#Create a data frame of the variable BlogName by the variable BlogGender
Blogs<-data.frame(Name=BlogName, Gender=BlogGender)
#Because I’m a peasant and can’t work out how to create a graph straight from the data frame I created (though #I’m pretty sure I can’t in its current form and don’t know how to transform it into something that can be #mapped to a graph) I created one vector and one string with the number of male and female blog authors after #counting them up
Gender<-c(“Female”, “Male”)
#Data frame of the vector and string
#Graph object of the data frame with gender as the x axis and frequency as the y, coloured according to the variable Gender
Gender_Graph<-ggplot(Blogsdata, aes(Gender, Frequency, fill=Gender))
#Put bars on my graph object and give it a title
Gender_Graph+geom_bar(stat=”identity”)+ ggtitle(“Number of female blog authors compared to male blog authors”)

BlogName<-c(“Brown”, “Coyne”, “Allen”, “Neurobonkers”, “Sakaluk”, “Heino”, “Kruschke”, “Giner-Sorolla”, “Magnusson”, “Zwaan”, “CogTales”, “Campbell”, “Vanderkerckhove”, “Mayo”, “Funder”, “Schonbrodt”, “Fried”, “Coyne”, “Yarkoni”, “Neuroskeptic”, “JEPS”, “Morey”, “PsychBrief”, “DataColada”, “Innes-Ker”, “Schwarzkopf”, “PIG-E”, “Rousselet”, “Gelman”, “Bishop”, “Srivastava”, “Vazire”, “Etz”, “Bastian”, “Zee”, “Schimmack”, “Hilgard”, “Rouder”, “Lakens”)
BlogEthn<-factor(Ethnlist, levels= c(1:3), labels =c(“White”,”Non-white”, “N/a”))
Ethn<-c(“White”, “Non-white”)
Frequency<-c(39, 2)
EthnGraph<-ggplot(Ethndata, aes(Ethn, Frequency, fill=Ethn))
EthnGraph+geom_bar(stat=”identity”)+ ggtitle(“Number of non-white blog authors compared to white blog authors”)


BlogName<-c(“Brown”, “Coyne”, “Allen”, “Neurobonkers”, “Sakaluk”, “Heino”, “Kruschke”, “Giner-Sorolla”, “Magnusson”, “Zwaan”, “CogTales”, “Campbell”, “Vanderkerckhove”, “Mayo”, “Funder”, “Schonbrodt”, “Fried”, “Coyne”, “Yarkoni”, “Neuroskeptic”, “JEPS”, “Morey”, “PsychBrief”, “DataColada”, “Innes-Ker”, “Schwarzkopf”, “PIG-E”, “Rousselet”, “Gelman”, “Bishop”, “Srivastava”, “Vazire”, “Etz”, “Bastian”, “Zee”, “Schimmack”, “Hilgard”, “Rouder”, “Lakens”)
BlogECR<-factor(ECRlist, levels= c(1:3), labels =c(“Yes”,”No”, “N/a”))
ECR<-c(“Yes”, “No”)
Frequency<-c(9, 31)
ECRGraph<-ggplot(ECRdata, aes(ECR, Frequency, fill=ECR))
ECRGraph+geom_bar(stat=”identity”)+ ggtitle(“Number of non-ECR blog authors compared to ECR blog authors”)

The best papers and articles of 2016

These are some of the best scientific papers and articles I’ve read this year. They’re in no particular order and not all of them were written this year. I don’t necessarily agree with them. I’ve divided it into different categories for convenience.


Current Incentives for Scientists Lead to Underpowered Studies with Erroneous Conclusions by Andrew Higginson and Marcus Munafò. How the current way of doing things in science encourages scientists to run lots of small scale studies with low evidentiary value.

Selective Publication of Antidepressant Trials and Its Influence on Apparent Efficacy by Erick Turner, Annette Matthews, Eftihia Linardatos, Robert Tell, and Robert Rosenthal. The paper that drove home for me the suppression of negative trials for the efficacy of antidepressants and how this affected our perception of them.

Why Does the Replication Crisis Seem Worse in Psychology? by Andrew Gelman. Why psychology is at the forefront of the replication crisis.

False positive psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant by Joseph Simmons, Leif D. Nelson, and Uri Simonsohn. Essential reading for everyone. An excellent demonstration of how damaging some standard research practices can be.

Power failure: why small sample size undermines the reliability of neuroscience by Katherine Button, John Ioannidis, Claire Mokrysz, Brian Nosek, Jonathan Flint, Emma Robinson, & Marcus Munafò. A discussion of the average power of neuroscience studies, what this means, and how to improve the situation. Another must read.

Recommendations for Increasing Replicability in Psychology by Jens B. Asendorpf, Mark Conner, Filip De Fruyt, Jan De Houwer, Jaap Denissen, Klaus Fiedler, Susann Fiedler, David Funder, Reinhold Kliegl, Brian Nosek, Marco Perugini, Brent Roberts, Manfred Schmitt, Marcel van Aken, Hannelore Weber, Jelte M. Wicherts.  A list of how to improve psychology.

Do Learners Really Know Best? Urban Legends in Education by Paul Kirschner & Jeroen van Merriënboer. A critique of some of the myths in education, such as “digital natives” and learning styles.

My position on “Power Poses” by Dana Carney. Dana Carney explains why she no longer believes in the well known phenomenon “power posing”. A rare and important document that should be encouraged and celebrated.

Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking by Jelte Wicherts, Coosje  Veldkamp, Hilde Augusteijn, Marjan Bakker, Robbie van Aert, and Marcel van Assen. A checklist to consult when reading or designing a study to make sure the authors haven’t engaged in p-hacking. A very useful resource.

Instead of “playing the game” it is time to change the rules: Registered Reports at AIMS Neuroscience and beyond by Christopher Chambers, Eva Feredoes, Suresh Muthukumaraswamy, and Peter Etchells. What Registered Reports are & are not and why they are important for improving psychology.

Replication initiatives will not salvage the trustworthiness of psychology by James Coyne. Why replications, though important, are not enough to save psychology (open data & research methods are also essential)

Saving Science by Daniel Sarewitz. Why scientists need to make their research not only accessible to the public but also applicable, so as to stop science from “self-destructing”.

A Multilab Preregistered Replication of the Ego-Depletion Effect by Martin Hagger and Nikos Chatzisarantis. The paper that undermined the previously rock solid idea of ego-depletion and brought the replication crisis to the public.

Everything is Fucked: The Syllabus by Sanjay Srivastava. A collection of articles demonstrating many of the problems in psychology and associated methodologies.

Why summaries of research on psychological theories are often uninterpretable by Paul Meehl. Seminal paper by Meehl which discusses how ten obfuscating factors undermine psychological theories.


Donald Trump: Moosbrugger for President by David Auerbach. The best analysis of Trump’s personality I’ve read this year.

Hillary Clinton, ‘Smart Power’ and a Dictator’s Fall by Jo Becker and Scott Shane. An exposé on the Libyan intervention and the role Hillary Clinton played.

Your App Isn’t Helping The People Of Saudi Arabia by Felix Biederman. A brief history of how religion came to dominate life in Saudi Arabia, interviews with some of the people negatively affected by this, and how the involvement of tech innovations won’t help.

The Right Has Its Own Version of Political Correctness. It’s Just as Stifling by Alex Nowrasteh. A welcome antidote to the constant message that the left are the only one’s who censor others.

Too Much Stigma, Not Enough Persuasion by Conor Friedersdorf. Why the left’s habit of tearing our own apart is so counterproductive.

When and Why Nationalism Beats Globalism by Jonathan Haidt. When and why globalism loses to nationalism in Western politics.

Democracies end when they are too democratic. And right now, America is a breeding ground for tyranny by Andrew Sullivan. The more democratic a nation becomes, the more vulnerable it is to a demagogue.

How Half Of America Lost Its F**king Mind by David Wong and Trump: Tribune Of Poor White People by Rod Dreher. People who went and spoke to Trump supporters explain why he appeals to them.

It’s NOT the economy, stupid: Brexit as a story of personal values by Eric Kaufmann. How personality affected voting patterns in the British referendum.


A crisis of politics, not economics: complexity, ignorance, and policy failure by Jeffrey Friedman. The libertarian explanation for the financial crash of ’08.

Capitalist Fools by Joseph Stiglitz. The more typical explanation for the ’08 crash.


P-Curve: A Key to the File-Drawer by Uri Simonsohn, Leif D. Nelson, and Joseph P. Simmons. A useful tool to test for publication bias in a series of results.

Statistical points and pitfalls by Jimmie Leppink, Patricia O’Sullivan, and Kal Winston. A series of publications on common statistical errors. Read them so you can avoid these mistakes.

Improving your statistical inferences by Daniel Lakens. I’m kind of cheating with this one (well, totally cheating) but this is the best resource I’ve used to develop my understanding of statistical tests and results.

The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant by Andrew Gelman and Hal Stern. A relatively simple statistical concept that isn’t as well known as it should be.

Small Telescopes by Uri Simonsohn. A helpful way to interpret studies and design suitably powered replications.


Book Review: Albion’s Seed by Scott Alexander. Review of a book that (kind of) explains the differences in American geopolitics by looking at the different groups of people who settled in America.

There is no language instinct by Vyvyan Evans. A dismantling of the pervasive idea that humans are born with an innate ability to interpret language.

The Failed Promise of Legal Pot by Tom James. How and why decriminalisation of marijuana can fail, as well as the way you need to approach legalisation in order for it to succeed.

Clean eating and dirty burgers: how food became a matter of morals by Julian Baggini. How we moralise food and the negative consequences of this.

The truth about the gender wage gap by Sarah Kliff. The best explanation of why there is a gap in pay between the genders that I’ve read.

Gamble for science!

Do you think you know which studies will replicate? Do you want to help improve the replicability of science? Do you want to make some money? Then take part in this study on predicting replications!1

But why are researchers gambling on whether a study will successfully replicate (defined as finding “an effect in the same direction as the original study and a p-value<0.05 in a two-sided test” for this study)? Because there is some evidence to suggest that a prediction market can be a good predictor of replicability, even better than individual psychology researchers.

What is a prediction market?

A prediction market is a forum where people can bet on whether something will happen or not. They do this by buying or selling stocks for it. In this study, if you think a study will replicate you can buy stocks for that study. This will then raise the value of the stock because it is in demand. Because the transactions are public, this shows other buyers that you (though you remain anonymous) think this study will replicate. They might agree with you and also buy, further raising the value of the stocks. However, if you don’t think a study will replicate you won’t buy stocks in it (or you will sell the stocks you have) and the price will go down. If you’re feeling confident, you can short sell a stock. This is where you borrow stocks of something you think will devalue (a study that you don’t think will replicate and you believe people will agree with you in the future), sell them to someone else, then buy back the stocks after their value has dropped to return to the person you initially borrowed from, keeping the margin.

This allows the price to be interpreted as the “predicted probability of the outcome occurring”. For each study, participants could see the current market prediction for the probability of successful replication.

A brief history of prediction markets for science:

Placing bets on whether a study will replicate is a relatively new idea; the idea was first tested with a sample of the studies from the Reproducibility Project: Psychology (2015) by Dreber et al. (2015)2. They took 44 of the studies in the RP:P and, prior to the replications being conducted, asked participants (researchers who took part in the RP:P, though they weren’t allowed to bet on studies they took part in) to rate how likely they thought the studies were to replicate. They created a market where participants could trade stocks on how likely they thought a study would replicate. They then compared how accurate the individual researchers and the market was for predicting which studies would replicate. The market significantly outperformed the researchers prior predictions as to how likely a replication was to occur, with the researcher’s prediction results not being suitably unlikely assuming the null hypothesis was true (and when weighted for self-rated expertise, performed exactly as good as flipping a coin).

This result was not replicated in a sample of 18 economic’s papers by Camerer et al. (2016a) who found that both methods (market or prior prediction) did not have a significant difference between their predicted replication rate and the actual replication rate. However they noted the smaller sample size in their study which may have contributed to this null finding (as well as other factors).

What the current study will do:

Camerer (2016b) will take 21 studies and attempt to replicate them. Before the results are published, a sample of participants will trade stocks on the chance of these studies replicating. In order to calculate the number of participants (n) the researchers would need to conduct a worthwhile replication, the researchers looked at the p-value, n, and standardised effect size (r) of the original. They then calculated how many participants they would need to have a “90% power to detect 75% of the original, standardised effect size”. This means that if the original r was 0.525, with a n of 51, and a p-value of 0.000054, you would need 65 participants to have a 90% chance of detecting 75% of the original effect. If the first replication isn’t successful, the researchers will conduct another replication with a larger sample size (pooled from the original replication sample and a second sample) that will have 90% power to detect 50% of the original effect.

But there’s one problem; you shouldn’t use the effect sizes from previous results to determine your sample size.

Using the past to inform the future:

Morey & Lakens (2016) argue that most psychological studies are statistically unfalsifiable because the statistical power (usually defined as the probability of correctly rejecting a false null hypothesis) is often too low. This is mainly due to small sample sizes, which is a ubiquitous problem in psychological research (though as Sam Schwarzkopf notes here, measurement error can add to this problem). This threatens the evidentiary value of a study as it might mean it wasn’t suitably powered to detect a real effect. Coupled with publication bias (Kühberger, 2014) and the fact that low sample size studies are the most susceptible to p-hacking (Head et al., 2015), this makes judging the evidential weight of the initial finding difficult.

So any replication is going to have to take into account the weaknesses of the original studies. This is why basing the required n on previous effect sizes is flawed, because they are likely to be over-inflated. Morey and Lakens give the example of a fire alarm: “powering a study based on a previous result is like buying a smoke alarm sensitive enough to detect only the raging fires that make the evening news. It is likely, however, that such an alarm would fail to wake you if there were a moderately-sized fire in your home.” Because of publication bias, the researchers will expect there to be a larger effect size than may actually exist. This may mean they don’t get enough participants to discover an effect.

But these researchers have taken steps to try to avoid this problem. They have powered their studies to detect 50% of the original effect size (if they are unsuccessful with their first attempt at 75%). Why 50%? Because the average effect size found in the RP:P was 50% of the original. So they are prepared to discover an effect size that is half as large as was reported in the initial study. This of course leaves the possibility that they miss a true effect because it is less than half of the originally reported effect (and given the average effect size was 50% we know there will very likely be smaller effects). But they have considered the problem of publication bias and attempted to mitigate it. They have also eliminated the threat of p-hacking as these are Registered Reports which have their protocol preregistered so there can be almost no chance of questionable research practices (QRP’s) occurring.

But is their attempt to eliminate the problem of publication bias enough? Are there better methods for interpreting replication results than theirs?

There is another way:

There have been a variety of approaches to interpreting a replication attempt published in the literature. I am going to focus on two: the ‘Small Telescopes’ approach (Simonsohn, 2015) and the Bayesian approach (Verhagen & Wagenmakers, 2014). The ‘Small Telescopes’ approach focuses on whether an initial study was suitably powered to be able to detect the effect. If the replication with the larger n finds a statistically significantly smaller effect size than the original e.g. an effect that would give 33% power to the first study3 (if you were to take the replication effect size and put into the power calculation of the original), then this suggests the initial study didn’t have a big enough sample size to detect the effect. In this paper he also recommends using 2.5x the number of participants of the initial study. This ensures 80% power regardless of publication bias, the power of the original studies, and study design. Looking at the replications in the Social Sciences Replication Project, 18 of the 21 studies have at least 2.5x the number of participants of the first study by the second phase of data collection, leaving just 3 that don’t (studies 2, 6, and 11).

The Bayesian test to quantify whether a replication was successful or not compares two hypotheses: the effect is spurious and isn’t greater than 0 (null hypothesis); and the effect is consistent with what was found in the initial study (the posterior distribution of the replication is similar to the posterior distribution of the original). The larger the effect size in the first study, the larger we expect the effect size of the replication to be and the further from zero we expect it to be. We can therefore quantify how much the replication data supports the null hypothesis or the alternative hypothesis (that the replication will find a similar effect to the original study).

Simonsohn argues that the use of the somewhat arbitrary 90% power to detect 75% or 50% of the original effect is inferior when compared to the ‘Small Telescopes’ or the Bayesian approach (personal communication, October 24, 2016). This is because the other two approaches have thoroughly analysed what the results mean and how often you get what results under what underlying effect. The same has not been done for the method used in this study so we can’t be as confident in the results as we can for the other two methods.


I would recommend getting involved in this study (if you can) and if you can’t, look for the results when they are published as they will be very interesting. But try to avoid using previous effect sizes to calculate future n’s because of the problem of publication bias. You also need to consider how you interpret replication results and the best means of doing this (does it have a solid theoretical justification, a clear meaning, and have the statistical properties been thoroughly examined?).

Author feedback:

I contacted all the researchers from Camerer (2016b) prior to publication to see if they had any comments. Anna Dreber had no criticisms of the post and Colin Camerer expressed familiarity with the issues raised by Morey & Lakens. Magnus Johannesson made the point that they used a 50% chance of finding the original effect size precisely because of publication bias and the fact the average effect size in the RP:P was 50%. I updated my post to reflect this. I contacted Richard Morey to clarify about calculating the required n for a study using previous effect sizes. I contacted Uri Simonsohn about whether the actions of the researchers had overcome the problems he outlined in his ‘Small Telescopes’ paper. He argued they were worse than previously explored methods and I added this information.


Almenberg, J.; Kittlitz, K.; & Pfeiffer, T. (2009). An Experiment on Prediction Markets in Science. PLoS ONE 4(12): e8500. doi:10.1371/journal.pone.0008500

Camerer, C.; Dreber, A.; Ho, T.H.; Holzmeister, F.; Huber, J.; Johannesson, M.; Kirchler, M.; Almenberg, J.; Altmejd, A.; Buttrick, N.; Chan, T.; Forsell, E.; A.; Heikensten, E.; Hummer, L.; Imai, T.; Isaksson, S.;  Nave, G.; Pfeiffer, T.; Razen, M.;& Wu, H. (2016a). Evaluating replicability of laboratory experiments in economics. Science, DOI: 10.1126/science.aaf0918

Camerer, C.; Dreber, A.; Ho, T.H.; Holzmeister, F.; Huber, J.; Johannesson, M.; Kirchler, M.; Nosek, B.; Altmejd, A.; Buttrick, N.; Chan, T.; Chen, Y.; Forsell, E.; Gampa, A.; Heikensten, E.; Hummer, L.; Imai, T.; Isaksson, S.; Manfredi, D.; Nave, G.; Pfeiffer, T.; Rose, J.; & Wu, H. (2016b). Social Sciences Replication Project. [online] Available at: http://www.socialsciencesreplicationproject.com/

Dreber, A.; Pfeiffer, T.; Almenbergd, J.; Isakssona, S.; Wilsone, B.; Chen, Y.; Nosek, B.A.; & Johannesson, M. (2015) Using prediction markets to estimate the reproducibility of scientific research. Proceedings of the National Academy of Sciences, 112 (50), 15343–15347.

Hanson, R. (1995). Could gambling save science? Encouraging an honest consensus. Social Epistemology, 9 (1).

Head, M.L.; Holman, L.; Lanfear, R.; Kahn, A.T.; & Jennions, M.D. (2015). The Extent and Consequences of P-Hacking in Science. PLoS Biol 13(3): e1002106. doi:10.1371/journal.pbio.1002106

Investopedia (2016). Short Selling. [online] Available at: http://www.investopedia.com/terms/s/shortselling.asp. Accessed on: 21/10/2016.

Kühberger, A.; Fritz, A.; & Scherndl, T. (2014). Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size. PLoS ONE 9(9): e105825. doi:10.1371/journal.pone.0105825

Open Science Collaboration (2015). Estimating the reproducibility of psychological
science. Science 349 (6251):aac4716.

Open Science Foundation (2016). Registered Reports. [online] Available at: https://osf.io/8mpji/wiki/home/

Simonsohn, U. (2015). Small Telescopes: Detectability and the Evaluation of Replication Results. Psychological Science, 26 (5), 559–569

Morey, R. & Lakens, D. (2016). Why most of psychology is statistically unfalsifiable. [online] Available at: https://github.com/richarddmorey/psychology_resolution/blob/master/paper/response.pdf

Schwarzkopf, S. (2016). Boosting power with better experiments. [online] Available at: https://neuroneurotic.net/2016/09/18/boosting-power-with-better-experiments/

Verhagen, J. & Wagenmakers, E.J. (2014). Bayesian Tests to Quantify the Result of a Replication Attempt. Journal of Experimental Psychology: General, 143 (4), 1457-1475.

Credit where credit is due

There has been a lot of tension in the psychological community recently. Replications are becoming more prevalent and many of them are finding much smaller effects or none at all. This then raises a lot of uncomfortable questions: is the studied effect real? How was it achieved in the first place? Were less than honest methods used (p-hacking etc.)? The original researchers can sometimes feel that these questions go beyond valid criticisms to full-blown attacks on their integrity and/or their abilities as a scientist. This has led to heated exchanges and some choice pejoratives being thrown about by both “sides”1.

This blog post isn’t here to pass judgement on those who are defending the original studies or the replications (and all the associated behaviour and comments). This article is here to celebrate the behaviour of a researcher whose example I think many of us should follow.

Dana Carney was one of the original researchers who investigated a phenomenon called “power posing” (Carney, Cuddy, Yapp; 2010). They supposedly found that “high-power nonverbal displays” affected your hormone levels and feelings of confidence. But a large-scale failed replication and a re-analysis later, it appears there is no effect.

So, as one of the main researchers for this effect, what should you do when faced with this evidence? All the incentive structures currently in place2 would encourage you to hand-wave away these issues: the replication wasn’t conducted properly, there are hidden moderators that were not present in the replication, the replicators were looking to find no effect, etc. But Carney has written an article stating that she does not “believe that “power pose” effects are real.” She goes into further detail as to the problems with the original study (admitting to using “researcher degrees of freedom” to fish for a significant result and to analysing “subjects in chunks” and stopping when a significant result was found).

I find this honesty commendable and wish all researcher’s whose work is shown to be false would be able to admit past mistakes and wrong-doings. Psychology cannot improve as a science unless we update our beliefs in the face of new evidence. As someone who is quite early in their science career, I’ve never had the experience of someone failing to replicate a finding of mine but I imagine it is quite hard to take (for more detail I recommend this post by Daniel Lakens). Admitting that something you have discovered isn’t real, whilst difficult, helps us have a clearer picture of reality. Hopefully this acknowledgement will encourage others to be more honest with their work.

But there’s a reason why few have taken this step. The potential negative consequences can be quite daunting: loss of credibility due to admissions of p-hacking, undermining of key publications (which may have an impact on job and tenure applications), to name a few. I understand and am (slightly) sympathetic as to why it is so rare. This is why I like Tal Yarkoni’s suggestion of an “amnesty paper” where authors could freely admit they have lost confidence in a finding of theirs and why. They could do so without any fear of repercussions and because many others are doing it, it would be less daunting. Until journals are willing to publish these kinds of articles, I would suggest there be  a website/repository created which is dedicated to such articles. This will mean there is a publicly available record of this paper of doubt about a researcher’s finding.  I also think celebrating those who do make the decision to publicly denounce one of their findings is important as it should encourage scientists to see this admission as a sign of strength, not weakness. This will help change the culture of how we interpret failed replications and past findings. This will hopefully then encourage scientists to write these articles expressing doubts about their past work and journals to publish them. I believe psychology will only improve if this behaviour becomes the norm.

Notes: I contacted Dana Carney prior to publication and she had no corrections to make.


Carney, D.R.; Cuddy, A.J.C.; & Yap, A.J. (2010). Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance. Psychological Science, 21 (10) 1363–1368.

Lakens, D. (2016). Why scientific criticism sometimes needs to hurt [online] Available at: http://daniellakens.blogspot.co.uk/2016/09/why-scientific-criticism-sometimes.html

Ranehill, E.; Dreber, A.; Johannesson, M.; Leiberg, S.; Sul, S.; & Weber, R.A. (2015). Psychological Science, 1–4.

Simmons, D. & Simonsohn, U. (2015). [37] Power Posing: Reassessing The Evidence Behind The Most Popular TED Talk [online] Available at: http://datacolada.org/37

Brexit in graphs

This is a collection of graphs showing how people voted and other interesting statistics. Some you might not have seen and others you definitely will have. I’m not going to include every graph, especially the most common ones, as you will almost certainly have seen them. Please remember to take all the polls with a pinch of salt (only a small sample of people can be asked and it may not be representative, people may have given socially desirable answers, lied, etc.). If there are any graphs you feel I have missed, please comment below and I will add them.

*Please note that while many of these graphs focus on immigration I do not think it’s the only reason (nor, by extension, fear of foreigners) people voted Leave. These graphs are meant to show people’s views and the related data.*


Before the referendum, much was made of the difference between Leave and Remain voters in their rating of the importance of immigration for their decision.2016-06-26 (5)

For all those surveyed, immigration was the most important issue but it was closely followed by the impact on the economy.

Important issues

Prior to the referendum, Leave voters were far more likely to say immigration has had a negative impact on “Britain as a whole”.2016-06-26 (6)When asked if they had been personally affected, that number dropped.personallySome people have argued that many people’s belief that there are too many immigrants in Britain is new. But British people have thought there were too many immigrants for decades. If anything, the belief that there are too many immigrants is in decline. Too many immigrantsBut surveyed after they had voted, a different story was presented. 49% of leave voters said the biggest single reason for wanting to leave the EU was “the principle that decisions about the UK should be taken in the UK”. 33% said the main reason was that leaving “offered the best chance for the UK to regain control over immigration and its own borders.” 13% said remaining would mean having no choice “about how the EU expanded its membership or its powers in the years ahead.” (Ashcroft, 2016).Leave-vs-Remain-podium-rankings-768x989This was supported by a ComRes poll conducted on the 24th which found:  “the ability of Britain to make its own laws is cited by Leave voters as the most important issue when deciding which way to vote (53%), ahead of immigration (34%).” (Comres, 2016).

But what does the data tell us about the impact of immigration?

Immigration has dramatically increased in the last decade or so. Immigration increase

Yet there appears to be no negative effect on people’s wages or employment due to increased immigration.

2016-06-26 (3)2016-06-26 (4)

Many on the Leave side have argued people voted to Leave because they had been adversely affected by immigration. If voters backed Leave because they had suffered from increased immigration, you would expect to see a correlation between voting Leave and a decrease in hourly earnings. But there is no correlation. This is evidence against (but not a refutation) of the idea people voted Leave as rational response to the negative economic effects they had suffered as a result of immigration.No correlation between wage fall and Leave vote

Education and voting patterns:

Whilst education level was the strongest correlation for voting Remain, it’s not as simple as “stupid people voted to leave”. Areas with lower education levels also reflect areas that have borne the brunt of economic hardship. They are therefore more likely to have unfavourable views of the status-quo (which has not helped them in the past) and, by extension, the Remain campaign. image

Dependency on the EU and voting patterns:imageThe graph below shows which areas were given funding by the EU over different time periods.

imageCiaran Jenkins (2016).

Income and voting patterns:

There was a negative correlation between income and remain voting; those who earned less were more likely to vote Leave.


Personality and voting patterns:

A strong correlation (r=-0.67) was found between openness (which is about being open to new experiences, “having wide interests, and being imaginative and insightful”; Srivastava, 2016) and voting Leave. Areas that had a higher concentration of people scoring highly on openness were more likely to vote Remain. 2016-06-29

Correlations between certain personality factors and voting behaviour was also found by Eric Kauffman. He analysed participant’s voting behaviour and compared it with their answers for questions that examined their authoritarianism (which is how in favour someone is of obeying authority among other things). There was almost no correlation between income but there was a correlation between voting Leave and agreeing that the death penalty is appropriate for certain crimes (for whites only).2016-06-29 (1)

Views on social issues:

For this graph, people were asked whether they thought different social issues were a force for “good” or “ill”. After that, they stated which way they voted (Leave or Remain). So it shows what percentage of people voted for Leave or Remain, given their views on different issues. It is not a poll showing how people who voted Leave or Remain view these issues. E.g. it doesn’t show 81% of Leave voters think multiculturalism is a “force for ill”. It shows that of those who think multiculturalism is a “force for ill”, 81% voted Leave. So those who hold that view were more likely to vote Leave.Cl27HTgWEAENWEcWhy so many scientists are anti-Brexit:

Britain receives a lot of funding from the EU and it is uncertain how much we would receive afterwards (though it will almost certainly decrease).

EU science funding

Voter turnout and satisfaction:

These two aren’t graphs (yet…) but they are important, especially the first. Whilst it’s true the elderly overwhelming voted Leave and the young voted Remain, the (estimated) turnout from young people was very low. So the meme of “it’s completely the old people’s fault!” isn’t totally accurate.

This was further supported by this graph which shows a correlation between age and voter turnout for different areas.


Rather unsurprisingly, Leave voters were happier than Remain voters. But it appears the vast majority of Leave voters were happy with only 1% (of those sampled) stating they were unhappy with the result. This puts the anecdotes of people voting Leave without properly thinking it through and then worrying about the consequences in context.


Despite the startling drop in the FTSE 100, it wasn’t any lower than 7 days earlier (though it got there in a more eye-catching way). As some have correctly pointed out, the FTSE 100 has recovered significantly since the initial drop. But that’s only because the pound has been devalued so it is an artificial recovery.  ftse 100

The drop in the value of the pound though was more serious when compared with the long-term trends, as it dropped to the second lowest it has ever been.

2016-06-27 (2)Compared with the Euro it’s not doing as badly, though the Euro has been struggling for years and the climb seen at the start of the graph is the result of recovering from the 2008 financial crash.2016-07-04


Ashcroft, M. (2016). How the United Kingdom voted on Thursday… and why. [online] Available at: http://lordashcroftpolls.com/2016/06/how-the-united-kingdom-voted-and-why/

Burn-Murdoch, J. (2016). Brexit: voter turnout by age. [online] Available at: http://blogs.ft.com/ftdata/2016/06/24/brexit-demographic-divide-eu-referendum-results/

ComRes. (2016). SUNDAY MIRROR POST REFERENDUM POLL. [online] Available at: http://www.comres.co.uk/polls/sunday-mirror-post-referendum-poll/

The Economist. (2016). The European experiment. [online] Available at: http://www.economist.com/news/britain/21699504-most-scientists-want-stay-eu-european-experiment

Ipsos-MORI (2016). Final Referendum Poll. [online] Available at: https://www.ipsos-mori.com/Assets/Docs/Polls/eu-referendum-charts-23-june-2016.pdf

Ipsos-MORI (2016). Just one in five Britons say EU immigration has had a negative effect on them personally. [online] Available at: https://www.ipsos-mori.com/Assets/Docs/Polls/EU%20immigration_FINAL%20SLIDES%2020.06.16%20V3.pdf

Jenkins, C. (2016). [online] Available at: https://twitter.com/C4Ciaran/status/747092548343181312

Krueger, J. I. (2016). The Personality of Brexit Voters. [online] Available at: https://www.psychologytoday.com/blog/one-among-many/201606/the-personality-brexit-voters

Kaufmann, E. (2016). [online] Available at: http://www.fabians.org.uk/brexit-voters-not-the-left-behind/comment-page-1/#comment-33662

Sky Data (2016). [online] Available at: https://twitter.com/SkyData/status/746700869656256512

Srivastava, S. (2016). Measuring the Big Five Personality Factors. Retrieved [2016] from http://psdlab.uoregon.edu/bigfive.html.

Taub, A. (2016). Making Sense of ‘Brexit’ in 4 Charts. [online] Available at: http://www.nytimes.com/2016/06/24/world/europe/making-sense-of-brexit-in-4-charts.html?_r=0

Vox (2016). Brexit was fueled by irrational xenophobia, not real economic grievances. [online] Available at: http://www.vox.com/2016/6/25/12029786/brexit-uk-eu-immigration-xenophobia

In defence of preregistration

This post is a response to “Pre-Registration of Analysis of Experiments is Dangerous for Science” by Mel Slater (2016). Preregistration is stating what you’re going to do and how you’re going to do it before you collect data (for more detail, read this). Slater gives a few examples of hypothetical (but highly plausible) experiments and explains why preregistering the analyses of the studies (not preregistration of the studies themselves) would not have worked. I will reply to his comments and attempt to show why he is wrong.

Slater describes an experiment where they are conducting a between groups experimental design, with 2 conditions (experimental & control), 1 response variable, and no covariates. You find the expected result but it’s not exactly as you predicted. It turns out the result is totally explained by the gender of the participants (a variable you weren’t initially analysing but was balanced by chance). So its gone from a 2 group analysis to a 2×2 analysis (with the experimental & control conditions as one factor and male & female being the other).

Slater then argues that (according to preregistration) you must preregister and conduct a new experimental design because you have not preregistered those new analyses (which analyse the role of gender). The example steadily gets more detailed (with other covariates discovered along the way) until the final analysis is very different from what you initially expected. He states that you would need to throw out the data each time and start again every time you find a new covariate or factor because it wasn’t initially preregistered. The reason you would need to restart your experiment is because doing a “post hoc analysis is not supposed to be valid in the classical statistical framework”. So because you didn’t preregister the analyses you now want to perform, you need to restart the whole process. This can result in wrong conclusions being drawn as it could lead to complex (but non-predicted) relationships being missed as the original finding will be published (as often it’s too expensive or time consuming or not even possible to run the experiment again with the new analyses) and the role of gender (and the other covariates) won’t be explored.

This is, however, a fundamental misunderstanding of what preregistration of analyses is. If you perform any new analyses on your data that weren’t preregistered, you don’t need to set up another study. You can perform these new analyses (which you didn’t predict before the experiment began) but you have to be explicit in the Results section that this was the case (Chambers, Feredoes, Muthukumaraswamy, & Etchells; 2014). And post hoc analyses of the data is very common (Ioannidis, 2005) and preregistration is directly trying to counter this.

Later in the post, he argues “discovery is out the window” because this occurs when “you get results that are not those that were predicted by the experiment.” Preregistration would therefore stifle discovery as you have to conduct a new study for each new analysis you want to perform. He states “Registrationists” argue for an ‘input-output’ model of science where “thought is eliminated”.

This is a fair concern, but it has already been answered by the FAQ page for Registered Reports (link here) and many other places. To summarise, discovery will not be stifled because you can perform the non-predicted analyses but you have to clearly state they weren’t predicted. The only thing you aren’t allowed to do is pretend you were going to conduct that analysis initially which is called HARKing, or hypothesising after results known (Kerr, 1998).

Slater argues that because data in the life and social sciences is so messy (as compared with physics) it is much harder to make the distinction between ‘exploratory’ and ‘confirmatory’ experiments. He implies preregistration requires a harsh divide between them so confirmatory experiments cannot become exploratory (which often happens in the real world) because they weren’t preregistered. Whilst there would be a clearer divide between exploratory and confirmatory experiments, preregistration does not forbid the latter becoming the former (merely that you are open about what you’ve done). Having a clear divide between the two is very important for maintaining the value of both types of experiments (de Groot, 2014).

He argues that (due to the pressure to publish positive results) researchers could “run their experiment, fish for some result, and then register it”. But this is not “possible without committing fraud” (Chambers, Feredoes, Muthukumaraswamy, & Etchells; 2014). You have to share time-stamped raw data files that were used in the study so you can see when the data was collected. This will help reduce the chance of fraud and ensure they are performed properly.

He argues that currently there is not enough thought put into the analysis process. He states this based on the fact results sections start with F-tests and t-tests rather than presenting the data in tables and graphs and discussing it. Researchers look straight for the result they were expecting and only focus on those, potentially missing other important aspects. Preregistration, he believes, would exacerbate this problem.

Whilst I agree there is an over-emphasis on getting P<0.05 in the literature, preregistration will not make this problem worse. If anything, preregistration could help reduce the collective obsession with P<0.05 because if a study is preregistered and agreed for publication (based on the quality of the methods) then it won’t rely on a significant value to be published (see here for a diagram of the registration and publication process). It also makes replications of previous findings more attractive to researchers because publication doesn’t depend on the results, which we know has lead to the neglect of replications (Nosek, Spies, & Motyl, 2012).

Could preregistration increase the likelihood that researchers focus solely on their preregistered analyses and ignore other potential findings? Maybe, but this worry is very abstract. This is contrasted with the very real (and very damaging) problem of questionable research practices (QRPs) which we know plague the literature (John, Loewenstein, & Prelec; 2012) and have a negative impact (Simmons, Nelson, & Simonsohn; 2011). Preregistration can help limit these QRPs.

Is preregistration the panacea for psychology’s replication crisis? No, but then it never claimed to be. It’s one of the (many) tools to help improve psychology.


Bowman, S.; Chambers, D.C.; & Nosek, B.A. (2014). FAQ 5: Scientific Creativity and Exploration.  [OSF open-ended registration] Available at: https://osf.io/8mpji/wiki/FAQ%205:%20Scientific%20Creativity%20and%20Exploration/ [Accessed on 19/05/2016].

Chambers, D.C. (2015). Cortex’s Registered Reports: How Cortex’s Registered Reports initiative is making reform a reality. Available at: https://www.elsevier.com/editors-update/story/peer-review/cortexs-registered-reports [Accessed on 16/05/2016]

Chambers, D.C.; Feredoes, E.; Muthukumaraswamy, S.D.; & Etchells, P.J. (2014). Instead of “playing the game” it is time to change the rules: Registered Reports at AIMS Neuroscience and beyond. AIMS Neuroscience, 1 (1), 4-17.

de Groot AD. (2014) The meaning of “significance” for different types of research [translated and annotated by Eric-Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, and Han L. J. van der Maas]. Acta Psychologica (Amst), 148, 188-194.

Ioannidis, J.P.A. (2005) Why Most Published Research Findings Are False. PLoS Med 2: e124.

John, L.K.; Loewenstein, G.; & Prelec, D. (2012). Measuring the Prevalence of Questionable
Research Practices With Incentives for Truth Telling. Psychological Science, 23 (5), 524–532.

Kerr, N.L. (1998). HARKing: Hypothesising After the Results are Known. Personality and Social Psychology Review, 2 (3), 196-217.

Nosek, B.A.; J.R. Spies; & Motyl, M. (2012). Scientific Utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science,  7 (6), 615-631.

PsychBrief (2016). Keep Calm and Preregister. [Online]. Available at: http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-preregister/ [Accessed on 23/05/2016]

Slater, M. (2016). Pre-Registration of Analysis of Experiments is Dangerous for Science [Online]. Available at: http://presence-thoughts.blogspot.co.uk/2016/04/pre-registration-of-analysis-of.html [Accessed on 15/05/2016].

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). Falsepositive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366.

Podcast list

I’ve recently discovered podcasts and they are awesome. They’re a great way to learn interesting new things, especially when you’re travelling. So this post is a collection of fantastic podcasts that I listen to and would recommend you pick up. Any suggestions are welcome so please let me know if there are any you like. (*= my favourites).

Social and Life sciences:

*Everything Hertz: Discussions about biological psychiatry, psychology, and the process of science with heavy sarcasm. (iTunes) (Soundcloud)

*The Black Goat: Three psychologists discuss how to perform science and the various issues that face scientists e.g. publication pressure, how to be an open scientist etc. (website) (iTunes)

BOLD Signals: Interviews with a wide variety of scientists such as neuroscientists, science informationists, and cognitive neuroscientists (and many others) on a huge range of topics. (iTunes) (Soundcloud)

Science VS: A podcast that takes an interesting topic e.g. the “gay gene”, and examines the evidence for and against it. (iTunes)

*Say Why to Drugs: Cutting through the hype and hyperbole about different drugs and examining what the research actually tells us about them. (website) (iTunes)

Invisibilia: Fascinating episodes on broad ranging topics related to how we experience the world e.g. is there a “solution” to mental health, and is thinking like this part of the problem? (website)

Stuff You Should Know: One topic or idea is examined in great detail and explained in each episode with topics ranging from sleep terrors to nitrous oxide. (website) (iTunes)

Unsupervised Thinking: Discussions about specific areas within neuroscience and AI e.g. brain size, the Connectome, etc. (website) (iTunes)


Not So Standard Deviations: Informal talks about statistics in academia and industry, covering a huge variety of topics in an entertaining and engaging way. (website) (iTunes)

More or Less: Tim Harford explains – and sometimes debunks – the numbers and statistics used in political debate, the news and everyday life. (website)


The Partially Examined Life: In-depth analysis of famous philosophical books or ideas with no presumed prior knowledge. (website) (iTunes)

*Very Bad Wizards: Discussions between a philosopher and a moral psychologist (with occasional guests) about current events, social issues, and research from their fields. (website) (iTunes)


*PhDivas: A cancer scientist and literary critic talk about life in academia, science, and how social issues affect academia. (website) (iTunes)

Intelligence Squared: Hour long discussions or debates about an interesting topic featuring prominent thinkers. (website) (iTunes)

Collection of criticisms of Adam Perkins’ ‘The Welfare Trait’

In late 2015, Dr Adam Perkins published his book called ‘The Welfare Trait’. The main crux of his argument was that each generation who is supported by the welfare state becomes more work-shy. He also argued that the welfare state increased the number of children born to households where neither parent works. His solution is to change the welfare state to limit the number of children that each non-working household has.

His book caused quite a storm when it was first released. Some people argued that it was crudely-disguised eugenics, others argued that those who were dismissing it were refusing to face the facts. Over time, more and more criticisms of and problems with Perkins’ work have come to light (e.g. basic statistical errors and incorrect conclusions from papers). Below is a collection of some (but not all) of the criticisms levelled at Perkins’ book.


Storify. (2016). Criticisms of Adam Perkins and ‘The Welfare Trait’ (with images, tweets) · PsychologyBrief. [online] Available at: https://storify.com/PsychologyBrief/criticisms-of-adam-perkins-and-the-welfare-trait [Accessed 11 Mar. 2016].

How biased are you? The role of intelligence in protecting you from thinking biases.

People generally like to believe they are rational (Greenberg, 2015). Unfortunately, this isn’t usually the case (Tversky & Kahneman, 1974). People very easily fall prey to thinking biases which stops them from making a purely rational judgement (whether always making a rational judgement is a good thing is a discussion for another time). These are flaws in thinking e.g. the availability bias, where you judge the likelihood of an event or the frequency of a class by how easily you can recall an example of that event (Tversky & Kahneman, 1974). So after seeing a shark attack in the news, people think the probability of a shark attack is much higher than it is (because they can easily recall an example of one).

But what are some of the factors that protect you against falling for these thinking biases? You would think that the smarter someone is, the less likely they are to be affected by them. However, the evidence paints a different picture.

The first bias we are going to look at is called the “myside bias”, which is defined as the predisposition to “evaluate evidence, generate evidence, and test hypotheses in a manner biased toward their own prior beliefs” (Stanovich, West, & Toplak, 2013). The ability to view an argument from both sides and decouple your prior opinions from the situation is seen as a crucial skill for being rational (Baron, 1991; Baron, 2000). Interestingly, there have been multiple experiments showing that susceptibility to myside bias is independent of cognitive ability (Stanovich & West, 2007; Stanovich & West, 2008; Stanovich, West, & Toplak, 2013); it doesn’t matter how smart you are, you are just as likely to evaluate something from your perspective if you aren’t told to do otherwise.

Not only is there evidence to suggest the myside bias is uncorrelated with intelligence, there is further evidence to suggest that a whole host of thinking biases are unrelated to intelligence (Stanovich & West, 2008), including but not limited to: the anchoring effect, framing effects, sunk-cost effect etc. Further evidence from Teovanovic, Knezevic, & Stankov (2015) supports the idea that intelligence doesn’t protect you from these biases (because intelligence was only weakly or not at all correlated with performing well and therefore avoiding biases).

It has even been shown that the more intelligent someone is, the more likely they are to feel that others are more biased than they are and they are more rational by comparison (West, Meserve, & Stanovich, 2012). This is called the “bias blind spot” (they are blind to their own biases). Another study (Scopelliti et al., 2016) found susceptibility to “bias blind spot” is largely independent from intelligence, cognitive ability, and cognitive reflection.

However, it’s not a completely level playing field. On some tests where people might fall prey to thinking biases e.g. the selection test (Wason, 1966), intelligence was correlated with success; the more intelligent a participant was the more likely they were to get it right (Stanovich & West, 1998).

You would think being an expert in a field would also be a factor that helps you resist biases, but for the hindsight bias it appears not to matter whether you are an expert or not; you are just as likely to fall for it (Guilbault et al., 2004).

Some have argued that these biases aren’t actually biases at all (Gigerenzer, 1991) or that they are just performance errors or “mere mistakes” rather than systematic irrationality (Stein, 1996). However, these views have been argued against by Kahneman & Tversky (1996) and Stanovich & West (2000) respectively. Stanovich and West conducted a series of experiments testing some of the most famous biases and found that performance errors accounted for very little of the variation in answers, whereas computational limitations (the fact people aren’t purely rational) accounted for most of the times people fell for biases.

So it seems that being intelligent or an expert doesn’t always protect you against cognitive biases (and can even make you less aware of your own shortcomings). But what can? I’ll be exploring the techniques to protect yourself from biases in my next blog post.


Baron, J. (1991). Beliefs about thinking. In Voss, J.; Perkins, D.; & Segal, J. (Eds.), Informal reasoning and education (p.169-186). Hillsdale, NJ: Lawrence Erlbaum Associates Inc.

Baron, J. (2000), Thinking and Deciding (3rd Ed.). Cambridge, UK: Cambridge University Press.

Gigerenzer, G. (1991). How to Make Cognitive Illusions Disappear: Beyond “Heuristics and Biases”. European Review of Social Psychology, 2, 83-115.

Greenberg, S. (2015). How rational do people think they are, and do they care one way or another? [Online] Available from: http://www.clearerthinking.org/#!How-rational-do-people-think-they-are-and-do-they-care-one-way-or-another/c1toj/5516a8030cf220353060d241
[Accessed: 21st July 2015].

Guilbault, R.L.; Bryant, F.B.; Brockway, J.H.; & Posavac, E.J. (2004). A Meta-Analysis of Research on Hindsight Bias. Basic and Applied Social Psychology, 26 (2&3), 113-117.

Kahneman, D. & Tversky, A. (1983). Choices, Values, and Frames. American Psychological Association, 39 (4), 341-350.

Kahneman, D. & Tversky, A. (1996). On the Reality of Cognitive Illusions. Psychological Review, 103 (3), 582-591.

Scopelliti, I.; Morewedge, C.K.; McCormick, E.; Min, H.L.; Lebrecht, S.; & Kassam, K.S. (2016). Bias Blind Spot: Structure, Measurement, and Consequences. Management Science,  61 (10) 2468-2486.

Stanovich, K.E. & West, R.F. (1998). Individual Differences in Rational Thought. Journal of Experimental Psychology: General, 127 (2), 161-188.

Stanovich, K.E. & West, R.F. (2000). Individual differences in reasoning: Implications for the rationality debate? Behavioural and Brain Sciences, 23, 645-726.

Stanovich, K.E. & West, R.F. (2007). Natural Myside Bias is Independent of Cognitive Ability. Thinking and Reasoning, 13 (3), 225-247.

Stanovich, K.E. & West, R.F. (2008). On the failure of cognitive ability to predict myside and one-sided thinking biases. Thinking and Reasoning, 14 (2), 129-167.

Stanovich, K.E. & West, R.F. (2008). On the Relative Independence of Thinking Biases and Cognitive Ability. Personality Processes and Individual Differences, 94 (4), 672-695.

Stanovich, K.E.; West, R.F.; & Toplak, M.E. (2013). Myside Bias, Rational Thinking, and Intelligence. Association for Psychological Science, 22 (4), 259-264.

Staw. B.M. (1976). Knee-Deep in the Big Muddy: A Study of Escalating Commitment to a Chosen Course of Action. Organisational Behaviour and Human Performance, 16, 27-44.

Stein, E. (1996). Without Good Reason: The Rationality Debate in Philosophy and Cognitive Science. Oxford University Press [rKES].

Teovanovic, P.; Knezevic, G.; & Stankov, L. (2015). Individual differences in cognitive biases: Evidence against one-factor theory of rationality. Intelligence, 50, 75-86.

Tversky, A. & Kahneman, D. (1974). Judgements under Uncertainty: Heuristics and Biases. Science, 185 (4157), 1124-1131.

Wason, P.C. (1966). Reasoning. In B. Foss (Ed.), in New Horizons in Psychology, 135-151. Harmonsworth, England: Penguin.

West, R.F.; Meserve, R.J.; & Stanovich, K.E. (2012). Cognitive Sophistication Does Not Attenuate the Bias Blind Spot. Journal of Personality and Social Psychology, 103 (3), 506-519. (function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’); ga(‘create’, ‘UA-63654510-1’, ‘auto’); ga(‘send’, ‘pageview’);

Image credit: www.anthraxinvestigation.com