Is there good evidence for the ‘paradox of choice’?

The paradox of choice seems relevant in our modern consumer culture. Though it goes by many names[note]Choice overload, overchoice.[/note] it was popularised by Barry Schwartz in his 2004 book ‘The Paradox of Choice – Why More Is Less’. The concept as was initially proposed is straightforward: when presented with a large number of options, consumers are overwhelmed and thus are less likely to make a decision and, when they do, are less satisfied with their choice.

When there is too much choice

Research into the effects of overwhelming choice on people started in earnest at the beginning of the 21st century. [zotpressInText item=”{KDZ3P8QW}”][note]Open access version here.[/note] theoretically proposed the idea and [zotpressInText item=”{CFUWNXWQ}” format=”%a% (%d%, %p%)”][note]Open access version here.[/note] first empirically tested this phenomenon. Iyengar and Lepper looked at whether presenting participants with a large number of choices demotivated them to make a purchase or perform a related task and whether they were happy with their decision. The authors ran 3 studies looking at the effect of greater choice in 2 domains: food (specifically jam and chocolates) and university assignments.

Mm.. Food

The 2 studies looking at the effect of choice on food examined whether having a larger number of options impacted purchasing behaviour and satisfaction ratings. Study 1 used a real world setting of a supermarket with a selection of jams on display. Motivation was measured by the participants’ initial attraction to the booth and whether they bought any jam or not. 60% of the 242 participants who passed the extensive-selection display of jams (which contained 24 jams) stopped at the stall. 40% of the 260 participants stopped at the limited-selection display of jams (which contained 6). Of the 145 who were intrigued by the 24 jams 30% bought a jam, whereas only 3% of the 104 who saw the 6 jams did the same.

In Study 3, 67 participants were presented with either 6 different chocolates (limited-choice condition) or 30 (extensive-choice condition) and choose which chocolate they would like, after which they were allowed to taste it. There were 33 participants in the limited-choice condition and 34 participants in the extensive-choice condition. 67 participants were in a control group, where they could choose a chocolate but were given another chocolate they didn’t choose to taste[note]Half the participants in the no-choice condition saw 6 chocolates and the other half saw 30.[/note]. Participants could then choose either $5 or $5 worth of chocolates as reimbursement for their time. The relevant dependent variables were the participants’ satisfaction with their choice (coded as 3 questions relating to their experiences of enjoyment and 2 to regret regarding their choice on a 7-point scale) and their purchasing behaviour (coded as whether they took the chocolates or not).

After collapsing all 5 questions related to satisfaction into 1 composite number[note]The questions relating to satisfaction had an average correlation of r=0.62 and the questions about regret were correlated with each other r=0.41, so were transformed to z-scores and combined into 2 composite measures of “enjoyment” and “regret”. These 2 were correlated r=-0.55 so were also combined, with “regret” being reverse scored.[/note] the researchers found participants in the limited-choice group were more satisfied than those in the extensive-choice group (M=6.28, SD = 0.54 versus M=5.46, SD = 0.82). They also found these participants were more likely to choose chocolates as compensation (48% versus 12%).

Possible problems

Whilst statistically significant, these results are based on a potentially meaningful mistake: analysing ordinal data like it is interval or ratio. One of the assumptions of an Analysis of Variance (ANOVA) is metric data i.e. data has equal gaps between each unit. Ordinal data (like the Likert scale used in this experiment) are not metric and thus this assumption is violated. Many have argued these tests are robust to such violations, but there is strong evidence this isn’t the case[note]For a more detailed explanation, please read these 2 blog posts I wrote here and here.[/note]. This can greatly distort the results and lead to incorrect inferences. However, you would have to reanalyse the raw data using a non-metric model to see if there are differences in results. This analysis of ordinal data using a metric model is very common in psychology [zotpressInText item=”{5421944:VD8XETGZ}”].

Another common theoretically risky behaviour is assuming the philosophical idea of operationalism. Operationalism holds that theoretical constructs e.g. motivation or satisfaction, are identical to the measurement of said construct [zotpressInText item=”{5421944:B8YAYQXS}”]. This philosophical stance was rejected as deeply flawed by philosophers of science over 50 years ago but is still the pervading view in psychology [zotpressInText item=”{5421944:5LSA4RWL}”]. If the assumption that the measurement is the same as the construct is violated, this undermines the validity of the conclusion reached as there is an unknown amount of measurement error [zotpressInText item=”{5421944:7BMPFF3L}”]. Given that the authors apparently subscribe to this belief without exploring other models, the conclusions could be unjustified. There are also questions about whether taking chocolates as compensation is a suitable proxy for ‘purchasing behaviour’. Is choosing chocolate as a reimbursement for the participant’s time the equivalent of spending their money in a store? Whilst none of these criticisms prima facie invalidate the results, they do raise concerns about the validity of the conclusions. Additional analyses testing these assumptions and exploring alternatives would be enough to refute them.

You don’t understand my words, but you must choose

Study 2 used 193 first year Stanford psychology students to see whether being presented with more or fewer essay titles would affect their motivation to write an extra-credit essay. Motivation was measured by the percentage of students in the different groups who completed the essay and the quality of those essays[note]Quality was measured by according to two measures on 10-point scales: content and form.[/note].

Of the 70 students assigned to the limited-choice condition (6 essay titles), 74% completed the assignment. Of the 123 students assigned to the extensive-choice condition (30 essay titles), 60% submitted an essay. The quality of the essays were statistically significantly different, with the limited-choice condition essays being rated as being of higher quality content (p<0.05) and form (p<0.03). P-values near the statistical significance threshold are weaker evidence for a phenomenon than smaller values [zotpressInText item=”{5421944:5ITNRVRK}”] and as such you should be cautious when interpreting this as evidence for a difference. Compounding this issue is the inappropriate analysis method. The researchers tested for a difference in content and form (ordinal data) using a metric model. As explained earlier, this is risky and can distort results. The participants were also explicitly told their performance on the assignment was irrelevant to the receipt of the two points. Thus, an external motivator was likely removed and only internal motivation remained, which may have differed between the groups.

Large-scale paradox of choice

[zotpressInText item=”{BXUZW6GB}”][note]Open access version here.[/note] explored the paradox of choice in relation to contributions to employee’s 401(k) plans[note]a type of American retirement plan.[/note]. Using a regression model which controlled for various individual and plan characteristics e.g. age, number of employees, they found that as the number of funds offered by a plan increased, the lower the percentage chance of the employee participating in a 401(k). For every ten funds added, a 1.5-2% drop in participation was seen. This real world example is arguably more persuasive because of the greater external validity. However, the lack of control means a causal relationship cannot be established. This is why both experimental and and evidence from more naturalistic environments is often needed to support a claim.

Do options overwhelm us?

Whilst these individual studies may pique interest in the phenomenon, they aren’t in and of themselves enough to convince us of the choice overload hypothesis. [zotpressInText item=”{245MPM65}” format=”%a% (%d%, %p%)”][note]Open access version here.[/note] tried to provide a more comprehensive answer. The authors performed a meta-analysis of all the studies they could find exploring this phenomenon (including unpublished work). They included 63 results across a variety of domains including food and dating across a number of different items e.g. mobile phones, chocolates, and pens. To standardise the results, they transformed all of the results into Cohen’s d (weighted by sample size). The mean effect size was 0.02 (95% confidence interval=-0.09 to 0.12). There was high heterogeneity in the results, ranging from d=-1.89 to d=1.21, with larger sample sizes typically finding smaller effect sizes[note]The notable exception being Study 1 from Iyengar & Lepper (2001).[/note]. They fitted a meta-regression to the results to see what impact different moderators e.g. different dependent variables, publication status, had on the estimate.

There is one possible problem with the mean effect size: converting results to Cohen’s d assumes the data is normally distributed, continuous, and the variance between the groups is homogeneous. Ordinal data isn’t normally distributed [zotpressInText item=”{VD8XETGZ}”] nor continuous. Of the 63 results, 29 converted the results from the original papers (which came from ordinal data) into Cohen’s d. Common effect size estimates like Cohen’s d are sensitive to departures from normality and homogeneity [zotpressInText item=”{UPT8QFKI}”]. Thus, there is a risk of distorting results when using tests which have their assumptions violated[note]Cliff’s delta is the non-parametric equivalent for Cohen’s d. Another alternative is the probability-based effect size estimator (A) (Ruscio & Mullen, 2012).[/note]. Therefore, our confidence in the mean effect size is reduced. Rerunning the analyses using a non-parametric equivalent would allow readers to know if the violations of assumptions had an impact on the final estimate. The other results were binary data converted into Cohen’s d e.g. Study 1 from Iyengar and Lepper (2000), but the authors didn’t explain which transformation they used. When looking at the results, they identified some feasibly important preconditions for choice overload to occur. Unfortunately, the data didn’t give a clear picture as to which of these were important.

Moderating factors

A more recent attempt to summarise the literature was published by [zotpressInText item=”{HHLU3FGU}” format=”%a% (%d%, %p%)”][note]Open access version here.[/note]. They set out to better explain the variation in results by positing 4 moderating factors. These moderating factors were: the difficulty of the decision task; the complexity of the choice set; consumers’ preference uncertainty (how well can the consumers evaluate the choices and whether they have a clear preference); and consumers’ decision goal (whether consumers aim to minimise the cognitive load of the decision. Each of these moderators had a statistically significant effect on choice overload, with estimates in the meta-regression model ranging from 0.32 (preference uncertainty) to 0.56 (decision goal). Their model accounted for 68% of the residual variances of the reported effect sizes, compared with 36% of the [zotpressInText item=”{245MPM65}” format=”%a% (%d%, %p%)”] model.

Distribution difficulties

However, this meta-analysis suffers from the same issues identified above. 32 of the 99 results had ordinal data converted into Cohen’s d without any apparent consideration of the underlying distribution. There are also questions about how the authors transformed the binary results into this effect size. The authors used 2 transformations to see whether they produced similar results which they should be commended for. But the methods they used have been criticised. The first was an arcsine transformation which has been harshly criticised  because of its inferiority to other methods[zotpressInText item=”{5421944:EFIGAML9},{5421944:U3F2CQ7D}”][note]Both sets of authors recommend using a logistic regression instead.[/note].

The other method of transformation was a log-odds ratio. This test assumes an underlying continuous trait exists with a logistic distribution[note]This distribution is very similar to a normal distribution but has fatter tails.

Robb (2008). Available at:

[/note] and homogeneous variance [zotpressInText item=”{5421944:TKGPB2WN}”]. It may be reasonable to assume a logistic distribution for the dichotomised continuous variable (classifying the number of items seem into “small” or “large” categories). But it is not reasonable to assume whether they made a purchase or not (which forms a binomial distribution) is a latent continuous trait with a logistic distribution[note]Some may think the Central Limit Theorem would likely lead to a normal distribution, but I explain why that isn’t necessarily the case here.[/note]. The paper the authors cite to justify their use of a log-odds ratio explicitly state none of the effect indices they examine (which includes a log-odds ratio) is suitable to estimate the population effect size with non-normal distributions [zotpressInText item=”{5421944:NLBN7F6L}”]. As with the points raised earlier, they aren’t enough to negate the results. But they expose potential uncontrolled errors which could undermine the conclusions.

Measuring what is missing

The authors of the meta-analysis decided to only include published studies. Thus, they are missing “grey literature”. Grey literature refers to unpublished papers or those published in less mainstream avenues, such as journals [zotpressInText item=”{5421944:8JTL9NF8}”]. Meta-analyses that exclude grey have been found to report effect sizes a third larger than meta-analyses that don’t [zotpressInText item=”{5421944:UMLZ7HJC}”].

The authors did test for publication bias by looking at the funnel plot and testing whether the residual effect size estimates could be predicted by the standard errors[note]Standard errors are a function of the sample size: the larger the sample size, the smaller the standard error.[/note]. If the funnel plot is symmetrical and the effect sizes cannot be predicted by the standard error[note]The authors used a “weighted regression model with a multiplicative dispersion term and the studies’ standard errors as predictor”[/note], this is taken as evidence for the absence of publication bias. Factoring in all results reported in the meta-analysis, the checks for publication bias didn’t report any evidence of said bias. But as [zotpressInText item=”{5421944:4PPNIY8A}” format=”%a% (%d%, %p%)”] highlight, different tests of publication bias have differing strengths in the face of questionable research practices, heterogeneity, etc. They recommend authors run a variety of methods and compare the results. Depending on the plausible conditions of the literature, different tests may produce more accurate results and therefore different conclusions may be warranted.


The evidence regarding the paradox of choice hypothesis is equivocal. Whilst there is evidence for the phenomenon, it suffers from potentially serious flaws. The summaries of the literature provide different conclusions and each have issues which may undermine the validity of the conclusions. In my opinion, what is needed is a large-scale, preregistered, multi-site study with an agreed upon (by both proponents and sceptics) design. Something akin to the preregistered replication of ego-depletion [zotpressInText item=”{5421944:EWC7J273}”]. If operationalism is rejected, then the difference between the measure and the construct should be modelled. This will allow readers to see how variation in the construct is reflected in changes in the scores and whether this affects the conclusions. The data should be analysed using more theoretically appropriate statistical analyses. This can then be contrasted with more typical analysis plans to see for any differences. As with many results in psychology, the results aren’t clear cut and there are legitimate questions about the validity of the methods. As our field reckons with its past, perhaps new data will come to light answering questions around the paradox of choice. Until such a time, I’m going to remain sceptical.


[zotpressInTextBib style=”apa” sort=”ASC”]

[zotpress items=”{5421944:M7WKTFLU}” style=”apa”]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: