Confidence Intervals
An interval of 4 plus or minus 2
A Confidence Interval is a range of values we are fairly sure our true value lies in.
Example: Average Height
We measure the heights of 40 randomly chosen men, and get a:
- mean height of 175cm,
- with a standard deviation of 20cm.
The 95% Confidence Interval (we show how to calculate it later) is:
175cm ± 6.2cm
This says the true mean of ALL men (if we could measure their heights) is likely to be between 168.8cm and 181.2cm.
But it might not be!
The "95%" says that 95% of experiments like we just did will include the true mean, but 5% won't.
So there is a 1-in-20 chance (5%) that our Confidence Interval does NOT include the true mean.
Calculating the Confidence Interval
Step 1: note down the number of samples n, and calculate the mean X and standard deviation s of those samples:
- Number of samples: n = 40
- Mean: X = 175
- Standard Deviation: s = 20
Step 2: decide what Confidence Interval we want. 90%, 95% and 99% are common choices. Then find the "Z" value for that Confidence Interval here:
Z | |
80% | 1.282 |
85% | 1.440 |
90% | 1.645 |
95% | 1.960 |
99% | 2.576 |
99.5% | 2.807 |
99.9% | 3.291 |
For 95% the Z value is 1.960
Step 3: use that Z in this formula for the Confidence Interval
X ± Z | s |
√(n) |
Where:
- X is the mean
- Z is the chosen Z-value from the table above
- s is the standard deviation
- n is the number of samples
And we have:
175 ± 1.960 × | 20 |
√40 |
Which is:
175cm ± 6.20cm
In other words: from 168.8cm to 181.2cm
The value after the ± is called the margin of error
The margin of error in the previous example is 6.20cm
Calculator
We have a Confidence Interval Calculator to make life easier for you.
Another Example
Example: Apple Orchard
Are the apples big enough?
There are hundreds of apples on the trees, so you randomly choose just 30 and get these results:
- Mean: 86
- Standard Deviation: 5
Let's calculate:
X ± Z | s |
√(n) |
We know:
- X is the mean = 86
- Z is the Z-value = 1.960 (from the table above for 95%)
- s is the standard deviation = 5
- n is the number of samples = 30
86 ± 1.960 | 5 | = 86 ± 1.79 |
√30 |
So the true mean (of all the hundreds of apples) is likely to be between 84.21 and 87.79
True Mean
Now imagine we get to pick ALL the apples straight away, and get them ALL measured by the packing machine (this is a luxury not normally found in statistics!)
And the true mean turns out to be 84.9
Let's lay all the apples on the ground from smallest to largest:
Each apple is a green dot,
except our samples which are blue
Our result was not exact ... it is random after all ... but the true mean is inside our confidence interval of 86 ± 1.79 (in other words 84.21 to 87.79)
But the true mean might not be inside the confidence interval but 95% of the time it will!
95% of all "95% Confidence Intervals" will include the true mean.
Maybe we had this sample, with a mean of 83.5 and a Standard Deviation of 3.5:
Each apple is a green dot,
our samples are marked purple
That does not include the true mean. Expect that to happen 5% of the time for a 95% confidence interval.
So how do we know if the sample we took is one of the "lucky" 95% or the unlucky 5%? Unless we get to measure the whole population like above we simply don't know.
This is the risk in sampling, we might have a bad sample.
Example in Research
Here is Confidence Interval used in research on extra exercise for older people:
Example: the "Male" line says there were:
- 1,226 Men (47.6% of all people)
- had a "HR" (which means Hazard Reduction*) with a mean of 0.92,
- and a 95% Confidence Interval (95% CI) of 0.88 to 0.97 (which is also 0.92±0.05)
In other words the true benefit (for the wider population of men) has a 95% chance of being between 0.88 and 0.97
* Note for the curious: "HR" is used in research and means "Hazard Ratio" where lower is better, so an HR of 0.92 means the subjects were better off, and 1.03 means slightly worse off.
Standard Normal Distribution
It is all based on the idea of the Standard Normal Distribution, where the Z value is the "Z-score"
For example the Z for 95% is 1.960, and here we see the range from -1.96 to +1.96 includes 95% of all values:
From -1.96 to +1.96 standard deviations is 95%
Applying that to our sample looks like this:
Also from -1.96 to +1.96 standard deviations, so includes 95%
Conclusion
The Confidence Interval formula is
X ± Z | s |
√(n) |
Where:
- X is the mean
- Z is the Z-value from the table below
- s is the standard deviation
- n is the number of samples
Z | |
80% | 1.282 |
85% | 1.440 |
90% | 1.645 |
95% | 1.960 |
99% | 2.576 |
99.5% | 2.807 |
99.9% | 3.291 |