Statistical power is often defined as “the probability of correctly rejecting H0 when a true association is present” where H0 is the null hypothesis, often an association or effect size of zero (Sham & Purcell, 2014). It is determined by the the effect size you want to detect^{1}, the size of your sample (N), and the alpha level which is typically 0.05 but you can set it to whatever you want (Lakens et al. 2017). I always thought of power as a static value for your study.

But this is wrong.

##### I’ve got the power

Say you were to run an independent samples t-test and you want to have an 80% chance of finding an effect when one is genuinely there^{2}. To reliably capture a Cohen’s δ of 0.5 you will need 64 participants per group. The image below is the power curve for your hypothetical test and is taken from the Shiny App by Daniel Lakens which can be found here^{3}.

As you can see, whilst your test appears to have a single value, it actually doesn’t. It changes based on the effect size you want to capture. If you want find a smaller effect size but don’t or are unable to change the number of participants, for example, your test will have a much lower chance of detecting an effect.

The power of your test is a function of the effect size you are looking for, so varies along a continuum based on that (among other things). It can also vary due to differing variability in your sample, independent of the effect size you are trying to find^{4}. This emphasises the importance of separating the effect size from variability and not combining them into standardised effect sizes. The variability of the study is affected by the standard deviation (SD). If a study has a large SD (and thus higher variability) this reduces the power estimate, independent of the effect size you are looking for.

The power of your test is also totally unrelated to the “true” effect size of the population. This is because power is a measure of the sensitivity of your test, not of what is actually happening in the world. As a result, studies don’t “have” power because it is just a property of the test you’re performing. In sum: power is a hypothetical value that helps you determine a priori the number of participants you need, based on your alpha level, and the effect size you want to capture. As such, a better definition is: statistical power is “an evaluation of the ability of a test/design combination to detect various hypothetical departures from the null” (Morey, 2019).

##### Author feedback

Daniel Lakens corrected an error in one of the captions.

##### References

Lakens, D. (2017). Distribution of Cohen’s d, p-values, and power curves for an independent two-tailed t-test. Available at: https://lakens.shinyapps.io/p-curves/. [Accessed on: 05/12/2017]

Lakens, D. (2017). How a power analysis implicitly reveals the smallest effect size you care about. Available at: http://daniellakens.blogspot.co.uk/2017/05/how-power-analysis-implicitly-reveals.html [Accessed on: 05/12/2017]

Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., … Zwaan, R. A. (2017, September 18). Justify Your Alpha: A Response to “Redefine Statistical Significance”. Retrieved from psyarxiv.com/9s3y6

Morey, R. (2019) Statistical issues facing the reform movement. Available at: http://richarddmorey.org/content/OpenScienceTrier2019/#1. [Accessed on: 14/03/2019]

Morey, R. & D. Lakens. (2016). Why most of psychology is statistically unfalsifiable. *Medium*. Available at: https://medium.com/@richarddmorey/new-paper-why-most-of-psychology-is-statistically-unfalsifiable-4c3b6126365a [Accessed on: 11/09/2017]

Sham, P.C. & S.M. Purcell. (2014). Statistical power and significance testing in large-scale genetic studies. *Nature Reviews Genetics*, 15, 335–346.

##### Code

parameter_grid <- expand.grid(n = c(32, 64), | |

d = seq(0.05, 2, by = 0.05), | |

sd = seq(1, 2, by = 0.2)) | |

parameter_grid$power <- power.t.test(n = parameter_grid$n, delta = parameter_grid$d, sd = parameter_grid$sd)$power | |

library(tidyverse) | |

parameter_grid$n <- paste(parameter_grid$n, "pro group") | |

ggplot(parameter_grid, | |

aes(x = d, | |

y = power, | |

colour = factor(sd))) + | |

geom_line() + | |

xlab("mean difference in population") + | |

scale_colour_brewer(name = "Population SD", | |

palette = "Dark2") + | |

facet_wrap(~ n) + | |

theme_bw(12) + | |

theme(legend.position = "bottom") |

^{1} Many believe that a power analysis should be based on the “true” effect size or the effect size previously found in the literature. However, this is a mistake. We cannot know what the “real” effect size is, only estimates, and due to publication bias the previously reported effect sizes are likely to be inflated (Morey & Lakens, 2016).

^{2} This is often phrased as you want your test to have 80%. But your test doesn’t technically “have” power, as Richard Morey explains here and here.

^{3} I strongly recommend you go and play about with the values, it has helped elucidate not only the idea of power as a curve but also the idea of your smallest effect size of interest.

^{4} Credit to Jan Vanhove for pointing this out and creating the graphs below.

## Leave a Reply