Is cohen d only applicable for normal data

In our two previous post on Cohen’s d and standardized effect size measures [1, 2], we learned why we might want to use such a measure, how to calculate it for two independent groups, and why we should always be mindful of what standardizer (i.e., the denominator in d = effect size / standardizer) is used to calculate Cohen’s d.

But how do we interpret Cohen’s d?

First a tangent: bias in Cohen’s d

Most statistical analyses try to inform us about the population we are studying, not the sample of that population we happen to have tested for our study. With Cohen’s d we want to estimate the standardized effect size for a given population.

If our standardizer is an estimate, which it almost always will be, d will be a biased measure and tend to overestimate our estimate of the population effect size. As pointed out by Cummings and Robert Calin-Jageman in their book Introduction to the New Statistics: Estimation, Open Science, and Beyond, if our sample size is less that 50 we should be reporting d_unbiased. The unbiased version of Cohen’s d is often referred to as Hedge’s g and can easily be calculated by various statistical packages, including R.

r-based family of effect size values.

There is another family of standardized effect size measures based on r, which is often used in correlation and regression analysis. As explained in this article, eta squared (n-looking thing) is the biased version and omega squared (w-looking thing) is the unbiased version.

Thinking about Cohen’s d: Cohen’s reference values

Cohen was reluctant to provide reference values for his standardized effect size measures. Although he stated that d = 0.2, 0.5 and 0.8 correspond to small, medium and large effects, he specified that these values provide a conventional frame of reference which is recommended when no other basis is available .

Thinking about Cohen’s d: overlap pictures

If we can assume that our data comes from a population with a normal distribution, it is helpful to picture the amount of overlap between two distributions associated with various values of Cohen’s d. Below is a figure illustrating the amount of overlap associated with the three d values identified by Cohen (code used to generate figure is available here:

Figure 1: Examples of overlap between two normally distributed groups for different Cohen d values. The mean of the pink population is 50. The standardizer (i.e., the standard deviation) of the between-group difference is 15. Thus, for a standardised between-group difference of 0.5, the between-group difference (effect size; ES) in original units will be 0.5 = ES/15, which gives 7.5. So the difference between the mean of the two distributions is 1/2 a standard deviation, or 7.5 (figure panel 2).

cohen_example

As you can see, there is considerable overlap between the two distribution even when Cohen’s d indicates a large effect. This means that even for large effects there will be many individuals that go against the population-level pattern. Always keep these types of figures in mind when trying to interpret effect size measures.

Thinking about Cohen’s d: effect size in original units

This is often the first approach to use when interpreting results. The outcome measure used to compute Cohen’s d may have known reference values (e.g., BMI) or a meaningful scale (e.g., hours of sleep per night).

Thinking about Cohen’s d: the standardizer and the reference population

Cohen’s d is a number of standard deviation units. It is important to ask yourself what standard deviation these units are based on. As was discussed in the previous post, if available it is always better to use an estimate of the population standard deviation rather than the standard deviation of the studied sample. If such a value is not available and the sample standard deviation is used, be aware that, as the denominator in the formula, the standardizer can have a large influence on the value of d.

Thinking about Cohen’s d: values of d across disciplines

In any discipline there is a wide range of effect sizes reported. However, as highlighted by Cummings and Calin-Jageman, researchers in various fields have reported on what range of d values can be expected.

The mean effect size in psychology is d = 0.4, with 30% of of effects below 0.2 and 17% greater than 0.8. In education research, the average effect size is also d = 0.4, with 0.2, 0.4 and 0.6 considered small, medium and large effects. In contrast, medical research is often associated with small effect sizes, often in the 0.05 to 0.2 range. Despite being small, these effects often represent meaningful effects such as saving lives. For example, being fit decreases mortality risk in the next 8 years by d = 0.08. Finally, effects as large as d = 5 are common in fields such as pharmacology.

Summary

There is no straight forward way to interpret standardized effect size measures. While they are increasingly being reported in published manuscripts, Cohen’s d and other such measures should not be glanced over. As pointed out in this and previous posts [1, 2] numerous things need to be considered when interpreting these values.