# Mean, Median and Mode

from Grouped Frequencies

Let's start off with some raw data (**not a grouped frequency**) ...

### Example: Alex did a survey of how many games each of 20 friends owned, and got this:

9, 15, 11, 12, 3, 5, 10, 20, 14, 6, 8, 8, 12, 12, 18, 15, 6, 9, 18, 11

To find the Mean, add up all the numbers, then divide by how many numbers there are:

Mean = |
9+15+11+12+3+5+10+20+14+6+8+8+12+12+18+15+6+9+18+11 | = 11.1 |

20 |

To find the Median,
place the numbers in value order and find the middle number (or the mean of the middle two numbers). In this case the mean of the 10^{th} and 11^{th} values:

3, 5, 6, 6, 8, 8, 9, 9, 10, 11, 11, 12, 12, 12, 14, 15, 15, 18, 18, 20:

Median = |
11 + 11 | = 11 |

2 |

To find the Mode, or modal value, place the numbers in value order then count how many of each number. The Mode is the number which appears most often (you can have more than one mode):

3, 5, 6, 6, 8, 8, 9, 9, 10, 11, 11, 12, 12, 12, 14, 15, 15, 18, 18, 20:

12 appears three times, more often than the other values, so **Mode = 12**

## Grouped Frequency Table

Now, let's make a Grouped Frequency Table of Alex's data:

Number of games | Frequency |
---|---|

1 - 5 | 2 |

6 - 10 | 7 |

11 - 15 | 8 |

16 - 20 | 3 |

(It says that 2 of Alex's friends own somewhere between 1 and 5 games, 7 own between 6 and 10 games, etc)

## Oh No!

Suddenly all the original data gets lost (naughty pup!) |

... can we help Alex calculate the Mean, Median and Mode from just that table?

The answer is ... no we can't. Not accurately anyway. But, we can make estimates.

## Estimating the Mean from Grouped Data

So all we have left is:

Number of games | Frequency |
---|---|

1 - 5 | 2 |

6 - 10 | 7 |

11 - 15 | 8 |

16 - 20 | 3 |

- The groups (1-5, 6-10, etc) also called class intervals, are of width 5
- The numbers 1, 6, 11 and 16 are the lower class boundaries
- The numbers 5, 10, 15 and 20 are the upper class boundaries
- The midpoints are halfway between the lower and upper class boundaries
- So the midpoints are 3, 8, 13 and 18

We can estimate the Mean by using the **midpoints**.

So, how does this work?

Think about Alex's 7 friends who are in the group **6 - 10**: all we know is that they each have between 6 and 10 games:

- Maybe all seven of them have 6 games,
- Maybe all seven of them have 10 games,
- But it is more likely that there is a spread of numbers: some have 6, some have 7, and so on

So we take an average: we **assume** that all seven of them have 8 games (8 is the average
of 6 and 10), which is the midpoint of the group.

So, we could make the table in a different way:

Midpoint | Frequency |
---|---|

3 | 2 |

8 | 7 |

13 | 8 |

18 | 3 |

Now we think "2 people have 3 games, 7 people have 8 games, 8 people have 13 games and 3 people have 18 games", so we **imagine** the data looks like this:

3, 3, 8, 8, 8, 8, 8, 8, 8, 13, 13, 13, 13, 13, 13, 13, 13, 18, 18, 18

Now we can add them all up and divide by 20. This is the quick way to do it:

Midpoint x |
Frequency f |
fx |
---|---|---|

3 | 2 | 6 |

8 | 7 | 56 |

13 | 8 | 104 |

18 | 3 | 54 |

Totals: | 20 |
220 |

So an **estimate** of the mean number of games is:

Estimated Mean = |
220 | = 11 |

20 |

## Estimating the Median from Grouped Data

To estimate the Median, let's look at our data again:

Number of games | Frequency |
---|---|

1 - 5 | 2 |

6 - 10 | 7 |

11 - 15 | 8 |

16 - 20 | 3 |

The median is the mean of the middle two numbers (the 10^{th} and 11^{th} values) ...

... and they are both in the 11 - 15 group:

We can say "the **median group** is 11 - 15"

But if we need to estimate a single **Median value** we can use this formula:

Estimated Median = L + |
(n/2) − cf_{b} |
× w |

f_{m} |

where:

**L**is the lower class boundary of the group containing the median**n**is the total number of data**cf**is the cumulative frequency of the groups before the median group_{b}**f**is the frequency of the median group_{m}**w**is the group width

For our example:

**L**= 11**n**= 20**cf**= 2 + 7 = 9_{b}**f**= 8_{m}**w**= 5

Estimated Median = 11 + |
(20/2) − 9_{} |
× 5 = 11 + (1/8) x 5 = 11.625 |

8_{} |

## Estimating the Mode from Grouped Data

Again, looking at our data:

Number of games | Frequency |
---|---|

1 - 5 | 2 |

6 - 10 | 7 |

11 - 15 | 8 |

16 - 20 | 3 |

We can easily identify the modal group (the group with the highest
frequency), which is **11 - 15**

We can say "the **modal group** is 11 - 15"

But the actual Mode may not even be in that group! Or there may be more than one mode. Without the raw data we don't really know.

But, we can estimate the Mode using the following formula:

Estimated Mode = L + |
f_{m} − f_{m-1}_{} |
× w |

(f_{m} − f_{m-1}) + (f_{m} − f_{m+1}) |

where:

- L is the lower class boundary of the modal group
- f
_{m-1}is the frequency of the group before the modal group - f
_{m}is the frequency of the modal group - f
_{m+1}is the frequency of the group after the modal group - w is the group width

In this example:

- L = 11
- f
_{m-1}= 7 - f
_{m}= 8 - f
_{m+1}= 3 - w = 5

Estimated Mode = 11 + |
8_{} − 7_{}_{} |
× 5 = 11 + (1/6) × 5 = 11.833... |

(8_{} − 7_{}) + (8_{} − 3_{}) |

**Our final result is:**

- Estimated Mean: 11
- Estimated Median: 11.625
- Estimated Mode: 11.833...

(Compare that with the true Mean, Median and Mode of **11.1, 11 and 12** that we got at the very start.)

And that is how it is done.

Now let us look at two more special examples, and get some more practice along the way!

## Continuous Data

Data can be Discrete or Continuous:

**Discrete data**can only take certain values, like our previous example (games owned)**Continuous data**can take any value (within a range), such as length or weight

Continuous data can be treated in exactly the same way as discrete data, but with one important difference.

The difference concerns the class boundaries.

Example: You grew fifty baby carrots using special soil. You dig them up and measure their lengths (to the nearest mm) and group the results:

Length (mm) | Frequency |
---|---|

150 - 154 | 5 |

155 - 159 | 2 |

160 - 164 | 6 |

165 - 169 | 8 |

170 - 174 | 9 |

175 - 179 | 11 |

180 - 184 | 6 |

185 - 189 | 3 |

Now, what does "155 - 159" mean?

The clue is "to the nearest mm".

- A length
of 154.5 mm would be rounded up to 155 mm (and placed in
**155 - 159**), - Similarly 159.49 mm would be rounded down to 159 mm (and also be
placed in
**155 - 159**).

So lengths from **154.5 up to (but not including) 159.5** get placed in **155 - 159**

And so for continuous data "155 - 159" has two types of numbers at the beginning and end:

- the lower class
**boundary**of 155 and the upper class**boundary**of 159 - the lower class
**limit**of 154.5 and upper class**limit**of 159.5

Note that the upper class limit of one class interval is the lower class limit of the next class interval.

So, how does this affect our calculations?

- The Mean is not affected
- But the Median and Mode now have L = Lower class
**limit**(rather than Lower class boundary)

Now let's go:

### Mean

Length (mm) | Midpoint x |
Frequency f |
fx |
---|---|---|---|

150 - 154 | 152 | 5 | 760 |

155 - 159 | 157 | 2 | 314 |

160 - 164 | 162 | 6 | 972 |

165 - 169 | 167 | 8 | 1336 |

170 - 174 | 172 | 9 | 1548 |

175 - 179 | 177 | 11 | 1947 |

180 - 184 | 182 | 6 | 1092 |

185 - 189 | 187 | 3 | 561 |

Totals: | 50 |
8530 |

Estimated Mean = |
8530 | = 170.6 mm |

50 |

### Median

The Median is the mean of the 25^{th}
and the 26^{th} length, so is in the **170 - 174** group:

**L**= 169.5 (the lower class**limit**of the 170 - 174 group)**n**= 50**cf**= 5 + 2 + 6 + 8 = 21_{b}**f**= 9_{m}**w**= 5

Estimated Median = 169.5 + |
(50/2) − 21_{} |
× 5 = 169.5 + 2.22... = 171.7 mm (to 1 decimal) |

9_{} |

### Mode

The Modal group is the one with the highest frequency,
which is **175 - 179**:

- L = 174.5 (the lower class
**limit**of the 175 - 179 group) - f
_{m-1}= 9 - f
_{m}= 11 - f
_{m+1}= 6 - w = 5

Estimated Mode = 174.5 + |
11_{} − 9_{}_{} |
× 5 = 174.5 + 1.42... = 175.9 mm (to
1 decimal) |

(11_{} − 9_{}) + (11_{} − 6_{}) |

## Ages

Age is a special case.

When we say "Sarah
is 17" she stays
"17" up until her eighteenth birthday.

She might be 17 years and 364 days old and still be called "17".

In other words, even though "age" is a continuous variable (time), we treat it as discrete.

Example: The ages of the 112 people who live on a tropical island were grouped as follows:

Age | Number |
---|---|

0 - 9 | 20 |

10 - 19 | 21 |

20 - 29 | 23 |

30 - 39 | 16 |

40 - 49 | 11 |

50 - 59 | 10 |

60 - 69 | 7 |

70 - 79 | 3 |

80 - 89 | 1 |

A child in the first group **0 - 9 **could be
almost 10 years old. So the midpoint for this group is **5** not
4.5

The midpoints are 5, 15, 25, 35, 45, 55, 65, 75 and 85

Similarly, in the calculations of Median and Mode, we will use the class boundaries 0, 10, 20 etc

### Mean

Age | Midpoint x |
Number f |
fx |
---|---|---|---|

0 - 9 | 5 | 20 | 100 |

10 - 19 | 15 | 21 | 315 |

20 - 29 | 25 | 23 | 575 |

30 - 39 | 35 | 16 | 560 |

40 - 49 | 45 | 11 | 495 |

50 - 59 | 55 | 10 | 550 |

60 - 69 | 65 | 7 | 455 |

70 - 79 | 75 | 3 | 225 |

80 - 89 | 85 | 1 | 85 |

Totals: | 112 |
3360 |

Estimated Mean = |
3360 | = 30 |

112 |

### Median

The Median is the mean of the ages of the 56^{th}
and the 57^{th} people, so is in the 20 - 29 group:

**L**= 20 (the lower class boundary of the class interval containing the median)**n**= 112**cf**= 20 + 21 = 41_{b}**f**= 23_{m}**w**= 10

Estimated Median = 20 + |
(112/2) − 41_{} |
× 10 = 20 + 6.52... = 26.5 (to 1 decimal) |

23_{} |

### Mode

The Modal group is the one with the highest frequency, which is 20 - 29:

- L = 20 (the lower class boundary of the modal class)
- f
_{m-1}= 21 - f
_{m}= 23 - f
_{m+1}= 16 - w = 10

Estimated Mode = 20 + |
23_{} − 21_{}_{} |
× 10 = 20 + 2.22... = 22.2 (to
1 decimal) |

(23_{} − 21_{}) + (23_{} − 16_{}) |

## Summary

For grouped data, we cannot find the exact Mean, Median and Mode, we can only give estimates.

To estimate the **Mean** use the **midpoints** of the class intervals.

Estimated Median = L + |
(n/2) + cf_{b} |
× w |

f_{m} |

where:

**L**is the lower class boundary of the group containing the median**n**is the total number of data**cf**is the cumulative frequency of the groups before the median group_{b}**f**is the frequency of the median group_{m}**w**is the group width

Estimated Mode = L + |
f_{m} − f_{m-1}_{} |
× w |

(f_{m} − f_{m-1}) + (f_{m} − f_{m+1}) |

where:

- L is the lower class boundary of the modal group
- f
_{m-1}is the frequency of the group before the modal group - f
_{m}is the frequency of the modal group - f
_{m+1}is the frequency of the group after the modal group - w is the group width

For continuous data use limits (rather than boundaries) for median and mode