Why you should think of statistical power as a curve

Statistical power is defined as “the probability of correctly rejecting H0 when a true association is present” where H0 is the null hypothesis, often an association or effect size of zero (Sham & Purcell, 2014). It is determined by the the effect size you want to detect1, the size of your sample (N), and the alpha level which is typically 0.05 but you can set it to whatever you want (Lakens et al. 2017). I always thought of power as a static value for your study.

But this is wrong.

I’ve got the power:

Say you were to run an independent samples t-test and you want to have an 80% chance of finding an effect when one is genuinely there.2 To reliably capture a Cohen’s δ of 0.5 you will need 64 participants per group. The image below is the power curve for your hypothetical test and is taken from the Shiny App by Daniel Lakens which can be found here.3

Your hypothetical test has an 80% chance of finding an effect when it is present with a Cohen’s δ of 0.5 with 64 participants per group.

As you can see, whilst your test appears to have a single value, it actually doesn’t. It changes based on the effect size you want to capture. If you want find a smaller effect size but don’t or are unable to change the number of participants, for example, your test will have a much lower chance of detecting an effect.

This is the same hypothetical test as above, but the effect size is a Cohen’s δ of 0.2.

The power of your test is a function of the effect size you are looking for, so varies along a continuum based on that (among other things). It can also vary due to differing variability in your sample, independent of the effect size you are trying to find.4 This emphasises the importance of separating the effect size from variability and not combining them into standardised effect sizes. The variability of the study is affected by the standard deviation (SD). If a study has a large SD (and thus higher variability) this reduces the power estimate, independent of the effect size you are looking for.

The power of your test is also totally unrelated to the “true” effect size of the population. This is because power is a measure of the sensitivity of your test, not of what is actually happening in the world. As a result, studies don’t “have” power because it is just a property of the test you’re performing. In sum: power is a hypothetical value that helps you determine a priori the number of participants you need, based on your alpha level, and the effect size you want to capture.

Author feedback: Daniel Lakens corrected an error in one of the captions.

References:

Morey, R. & D. Lakens. (2016). Why most of psychology is statistically unfalsifiable. Medium. Available at: https://medium.com/@richarddmorey/new-paper-why-most-of-psychology-is-statistically-unfalsifiable-4c3b6126365a [Accessed on: 11/09/2017]

Lakens, D. (2017). Distribution of Cohen’s d, p-values, and power curves for an independent two-tailed t-test. Available at: https://lakens.shinyapps.io/p-curves/. [Accessed on: 05/12/2017]

Lakens, D. (2017). How a power analysis implicitly reveals the smallest effect size you care about. Available at: http://daniellakens.blogspot.co.uk/2017/05/how-power-analysis-implicitly-reveals.html [Accessed on: 05/12/2017]

Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., … Zwaan, R. A. (2017, September 18). Justify Your Alpha: A Response to “Redefine Statistical Significance”. Retrieved from psyarxiv.com/9s3y6

Sham, P.C. & S.M. Purcell. (2014). Statistical power and significance testing in large-scale genetic studies. Nature Reviews Genetics, 15, 335–346.

Code:

  1. Many believe that a power analysis should be based on the “true” effect size or the effect size previously found in the literature. However, this is a mistake. We cannot know what the “real” effect size is, only estimates, and due to publication bias the previously reported effect sizes are likely to be inflated (Morey & Lakens, 2016).
  2. This is often phrased as you want your test to have 80%. But your test doesn’t technically “have” power, as Richard Morey explains here and here.
  3. I strongly recommend you go and play about with the values, it has helped elucidate not only the idea of power as a curve but also the idea of your smallest effect size of interest.
  4. Credit to Jan Vanhove for pointing this out and creating the graphs below.

One Comments

  1. I’m confused by this statement:

    “This emphasises the importance of separating the effect size from variability and not combining them into standardised effect sizes. The variability of the study is affected by the standard deviation (SD). If a study has a large SD (and thus higher variability) this reduces the power estimate, independent of the effect size you are looking for.”

    Surely the effect size (if we’re still talking about Cohen’s D) is already a function of the smallest detectable difference and the standard deviation, so given effect size isn’t power independent of variation by definition? Or have I misunderstood what you mean by effect size in that section?

Write a Comment

Your email address will not be published. Required fields are marked *