Notes on Paul Meehl’s “Philosophical Psychology Session” #12

These are the notes I made whilst watching the video recording of Paul Meehl’s philosophy of science lectures. This is the twelfth and final episode (a list of all the videos can he found here). Please note that these posts are not designed to replace or be used instead of the actual videos (I highly recommend you watch them). They are to be read alongside to help you understand what was said. I also do not include everything that he said (just the main/most complex points).

The weakness of significance tests and the corroboration index

Meaningful distinction between saying “there is no convincing quantitative or experimental evidence for this theory so I’m going to fall back on my clinical experience” (rational) and “the quantitative and experimental evidence tends against the theory but I’m going to fall back on my clinical experience anyway” (irrational).

Refuting a null hypothesis of 0 difference is not a powerful test of theory (not worthless at an exploratory stage).

Weak use of significance tests: conventional tests. Theory is so weak that nothing more than “the boys should be taller than the girls” can be predicted. Creates an illusion of a strong test of a theory. But it is an illusion as the crud factor and the large set of alternative theories can also predict the vague result (mere difference between one group). Epistemological small risk even if it is a big risk for the statistician (the risk is as big as how small you make alpha).

Strong use of significance tests: reverse of weak test. Not available unless you specify the theory. You specify a point value or interval and the significance test asks whether the results come from the theories prediction. The original conception of chi squared used this testing: testing a predicted point difference or interval and then testing whether the results found significantly differ. The current conception starts from the belief that chance is the main driver of the difference. A “falsification” of this then means the data is unlikely assuming the null is true, therefore it is believed to provide evidence for the theory. The math is identical even if the framework is different.

Though the strong may be more appealing, it is still a “pointless exercise” as it refutes a theory assuming it is true as stated, along with the auxiliaries as stated, and the ceteris paribus clause is true (the odds of all these things being true is so vanishingly small that a falsification of it tells us little).

A falsification of the weak test shows the two variables aren’t completely unrelated (which is a prior very unlikely) and falsification of the strong test shows the variables aren’t related precisely as the theory and auxiliaries specify.

Popper doesn’t talk about the risk of a prediction.

Even if a prediction misses (because the results came outside the predicted interval) it may have more verisimilitude than a hit (because the test was very weak) and therefore it would be more fruitful to refine the former miss than the latter hit.

Case studies/anecdotes can only refute a null statement, nothing more.

There are more properties of theories than just predicting and explaining facts e.g. where they fit in Comte’s pyramid.

2 components for an index of corroboration.

1) how detailed or precise the theory is in its forecasts: what the theory will tolerate divided by the spielraum (range of possible values for a particular type of experiment that prior knowledge, common sense, etc. allows, can be thought of as margin or error). The spielraum is necessary as otherwise you won’t know if the prediction made is a good or bad one. Intolerance= (1-T/S).

2) Relative error (“closeness”)=1-D/S where D=deviation. If there is no error, D=0.

Corroboration index (Ci) for a particular experiment: closeness x intolerance.

“Pureness” is individual differences, variance, and measurement error. Measured by the dispersion from the means.

“Lack of fit”  is you didn’t fit the approximation perfectly.

Your model can be a worse fit than one totally determined by chance.

4 indexes of theory performance: 1) point of interval goodness or accuracy; 2) predicting function forms; 3) embedment in Comte’s pyramid (predicting things higher up in the pyramid whilst being derived from lower levels)/reducability; 4) measure of qualitative diversity that the theory speaks of e.g. predicts things that (on the surface) are distinct.

Can look to the history of a theory and see which index was the most accurate/the best at predicting whether a theory would stand the test of time.

Whilst a working scientist doesn’t have to worry about Hume’s induction problem, they do have to worry about the relationship between corroboration and verisimilitude (about how the track record relates to the long-term success).


Wikipedia. Comte’s Theory of Science. Available at: %5BAccessed on: 18/07/2018]

Yonce, J. L., 2016. Philosophical Psychology Seminar (1989) Videos & Audio, [online] (Last updated 05/25/2016) Available at: [Accessed on: 18/07/2018]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: