Degrees of Freedom
The number of values in a question that are free to vary independently.
Example: Choosing Hats
You have 4 hats (blue, gold, red and green) and want to wear a different one every day.
- On the first day you can choose any hat
- On the 2nd day you have 3 choices left
- On the 3rd day you have 2 choices left
- On the 4th day you have only 1 hat left, so no choice at all really
In fact your "degrees of freedom" turned out to be only 3, by the 4th day you had no freedom to choose.
So, depending on the situation, the degrees of freedom can be less (but never more) than the number of items you are dealing with:
df = n − r
- df = Degrees of Freedom
- n = sample size
- r = number of independent restrictions
In the hats example, n is the number of hats, and r is the restriction that you have one less choice each day, so df = 4 − 1 = 3
Why Do Degrees of Freedom Matter?
Degrees of freedom tell us how much independent information we really have.
Example: The Mean Uses One Degree of Freedom
Suppose we have 4 numbers with a mean of 10.
If we choose the first three numbers freely, the fourth number is already decided.
Why? Because the total must be 40:
Mean = 10, so the Total = 4 × 10 = 40
If the first three add up to 32, the last one must be 8.
So only 3 values were free to vary.
df = 4 − 1 = 3
That's why sample variance uses n − 1 instead of n: once we calculate the mean, one value is no longer free.
In general, every time we estimate something from the data (like a mean), we lose one degree of freedom.
Each estimated parameter adds a restriction.
Here are some dfs by topic:
| Formula | ||
|---|---|---|
| Sample Variance | df = n − 1 | the 1 restriction is the mean |
| Independent Student's t-test | df = n1 + n2 − 2 | r=2 because of two separate means |
| Paired Student's t-test | df = n − 1 | We have one (overall) mean |
| Chi Square Test | df = (rows − 1) × (cols − 1) |