Interruptions are common in most conversations, but is there are general trend among those talking as to who interrupts whom? Recently David Schmitt tweeted about a paper called “Influence of Communication Partner’s Gender on Language” (Hancock & Rubin, 2014). It gained some traction as it featured a graph purporting to show women interrupt other women more than men. It implies there is an interaction between the gender of the speaker and the gender of the communication partner, with women interrupting other women a lot more than they do men and men interrupting either gender about as frequently. Given the generally widely held belief that men interrupt women more than other men, this stirred quite a bit of interest as it seemed to provide evidence against this idea. But what do the results show?
There are some immediate questions that can be asked of the graph. Why was a line graph used when the variable is not continuous[note]I’m steering clear of the discussion about sex as non-dichotomous as I’m not knowledgeable enough to have an informed opinion.[/note]? Why are there no error bars? Whilst not fatal, they are confusing and short on valuable information. Was this graph taken from the paper itself? Reading the paper, you can see it isn’t. It was created using the data from Table I which shows women interrupted other women more than men did.
The problem with this, as the authors in the article note, is that these results don’t control for individual differences (e.g. rate of speech and total time spoken). Doing so gives us a different picture.
When controlling for individual differences (by calculating the ratio of linguistic markers to total words) the means are very close to one another*. Not only that, but the standard deviations completely encompass the differences between the groups, showing the variation is as large as the observed difference. This undermines the presentation of the original graph, which shows a clear difference between the genders.
Inferential statistics to the rescue?
But the graphs original message (women interrupt women more than men interrupt other men) may be supported by the inferential statistics reported in the paper. Using the standardised results, the authors analysed the results using a “repeated measures 2 × 2 MANOVA (Pillai’s trace)”[note]They performed an analysis of variance (ANOVA) with 2 dependent variables: speaker gender as one (the means of M/M and M/F for each marker are combined and compared to the combined means of F/M and F/F for each marker) and communication partner gender as the other (the means of the M/M and F/M for each marker are totaled and compared to the combined means of M/F and F/F for each marker), with a specific type of test used.[/note]. The use of a repeated measures MANOVA when the paper clearly states one variable is between (speaker gender) and one is within (communication partner gender) raises questions that cannot be answered without the raw data so will be put to one side. There were no overall significant effects of either dependent variable or an interaction between them. Therefore the above graph is demonstrating a non-significant interaction using unstandardised results. Looking at the individual markers, we can see there were 2 significant results in the communication partner condition: dependent clauses and interruptions. The results suggest “[speakers] used more dependent clauses and interruptions when they were talking to females compared with when they were talking to males.”
But I am not convinced by these results. The two significant p-values (dependent clauses and interruptions) are 0.02 and 0.028. Given how many variables were analysed, this is very likely a spurious result caused by multiple testing (Noble, 2009). On top of this, the authors didn’t correct for multiple comparisons* which inflates the risk of a type I error (Goldman, 2008). The flexibility in the design and analysis of these results e.g. including data from participants that wasn’t directly about the predetermined topics, further increases the chances of false positives (Simmons, Nelson, & Simonsohn, 2011). And fundamentally, the experiment involved a small number of participants (40 in total) discussing reality TV or cell phones in the laboratory with total strangers. Therefore the ability of this paper to tell us anything is, in my opinion, very limited.
So why all the fuss?
The graph that kicked this discussion off has been retweeted more than 600 times across two accounts (as of writing). A lot of people have seen this graph, believing it provides evidence that women interrupt other women more than men interrupt either gender. But even the quickest look at the abstract of the paper will reveal this isn’t the case. And after examining the article, I don’t believe it provides support for the actual reported findings[note]This does not mean there aren’t studies out there demonstrating women are interrupted more than men, this just isn’t one of them.[/note]. A lot of people retweeted the graph without critically examining it. This mental shorthand is understandable but undesirable for obvious reasons. Of course, I am not above this kind of thinking. I have shared articles that, upon later examination, didn’t provide evidence for the claims. I read the article after seeing concerns raised about the accuracy of the graph and, on first reading, found it confirmed my prior belief that women are interrupted more than men. But by analysing the design and results, I no longer believe the paper supports this phenomenon.
Being sceptical of results is difficult, especially if we want them to be true. I’ve fallen prey to this cognitive shortcut in the past and I’m sure I will in the future. But we aren’t going to improve our understanding without this critical inquiry. Hopefully we can use this episode as a reminder of the value of this necessary challenge. And crucially, not disregard someone’s valid criticisms of another’s work simply because we agree with the ideas initially presented.
All points that have been marked with an asterisk(*) are ones that were raised by others on social media. Those that are not come from my own reading of the original paper.
Chira, S. (2017). The Universal Phenomenon of Men Interrupting Women. Retrieved from: https://www.nytimes.com/2017/06/14/business/women-sexism-work-huffington-kamala-harris.html
Glen, S. (2016). Pillai’s Trace. Retrieved from: http://www.statisticshowto.com/pillais-trace/
Goldman, M. (2008). Statistics for Bioinformatics. Retrieved from: https://www.stat.berkeley.edu/~mgoldman/Section0402.pdf
Hancock. A.B. & Rubin, B.A. (2014). Influence of Communication Partner’s Gender on Language. Journal of Language and Social Psychology 1–19. DOI: 10.1177/0261927X14533197
Noble, W. S. (2009). How does multiple testing correction work? Nature Biotechnology, 27(12), 1135–1137. http://doi.org/10.1038/nbt1209-1135
Simmons, J.P.; Nelson, L.D.; & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22 (11), 1359 – 1366.<!– 1521578597628 –>