Mean, Median and Mode
from Grouped Frequencies

Let's start off with some raw data (not a grouped frequency) ...

Example: Alex did a survey of how many games each of 20 friends owned, and got this:

9, 15, 11, 12, 3, 5, 10, 20, 14, 6, 8, 8, 12, 12, 18, 15, 6, 9, 18, 11

To find the Mean, add up all the numbers, then divide by how many numbers there are:

Mean =   9+15+11+12+3+5+10+20+14+6+8+8+12+12+18+15+6+9+18+11
20
=    11.1

To find the Median, place the numbers in value order and find the middle number (or the mean of the middle two numbers). In this case the mean of the 10th and 11th values:

3, 5, 6, 6, 8, 8, 9, 9, 10, 11, 11, 12, 12, 12, 14, 15, 15, 18, 18, 20:

Median =   11 + 11   = 11
2

To find the Mode, or modal value, place the numbers in value order then count how many of each number. The Mode is the number which appears most often (there can be more than one mode):

3, 5, 6, 6, 8, 8, 9, 9, 10, 11, 11, 12, 12, 12, 14, 15, 15, 18, 18, 20:

12 appears three times, more often than the other values, so Mode = 12

Grouped Frequency Table

Now, let's make a Grouped Frequency Table of Alex's data:

Number of games Frequency
1 - 5 2
6 - 10 7
11 - 15 8
16 - 20 3

(It says that 2 of Alex's friends own somewhere between 1 and 5 games, 7 own between 6 and 10 games, etc)

Oh No!

   

Suddenly all the original data gets lost (naughty pup!)


Only the Grouped Frequency Table survived ...

... can we help Alex calculate the Mean, Median and Mode from just that table?

The answer is ... no we can't. Not accurately anyway. But, we can make estimates.

 

Estimating the Mean from Grouped Data

So all we have left is:

Number of games Frequency
1 - 5 2
6 - 10 7
11 - 15 8
16 - 20 3
  • The groups (1-5, 6-10, etc) also called class intervals, are of width 5
  • The numbers 1, 6, 11 and 16 are the lower class boundaries
  • The numbers 5, 10, 15 and 20 are the upper class boundaries
  • The midpoints are halfway between the lower and upper class boundaries
  • So the midpoints are 3, 8, 13 and 18

We can estimate the Mean by using the midpoints.

So, how does this work?

Think about Alex's 7 friends who are in the group 6 - 10: all we know is that they each have between 6 and 10 games:

  • Maybe all seven of them have 6 games,
  • Maybe all seven of them have 10 games,
  • But it is more likely that there is a spread of numbers: some have 6, some have 7, and so on

So we take an average: we assume that all seven of them have 8 games (8 is the average of 6 and 10), which is the midpoint of the group.

So, we could make the table in a different way:

Midpoint Frequency
3 2
8 7
13 8
18 3

Now we think "2 people have 3 games, 7 people have 8 games, 8 people have 13 games and 3 people have 18 games", so we imagine the data looks like this:

3, 3, 8, 8, 8, 8, 8, 8, 8, 13, 13, 13, 13, 13, 13, 13, 13, 18, 18, 18

Now we can add them all up and divide by 20. This is the quick way to do it:

Midpoint
x
Frequency
f

fx
3 2 6
8 7 56
13 8 104
18 3 54
Totals: 20 220

So an estimate of the mean number of games is:

Estimated Mean =   220   = 11
20

 

Estimating the Median from Grouped Data

To estimate the Median, let's look at our data again:

Number of games Frequency
1 - 5 2
6 - 10 7
11 - 15 8
16 - 20 3

The median is the mean of the middle two numbers (the 10th and 11th values) ...

... and they are both in the 11 - 15 group:

We can say "the median group is 11 - 15"

But if we need to estimate a single Median value we can use this formula:

Estimated Median = L +   (n/2) − cfb  × w
fm

where:

  • L is the lower class boundary of the group containing the median
  • n is the total number of data
  • cfb is the cumulative frequency of the groups before the median group
  • fm is the frequency of the median group
  • w is the group width

For our example:

  • L = 11
  • n = 20
  • cfb = 2 + 7 = 9
  • fm = 8
  • w = 5
Estimated Median = 11 +   (20/2) − 9  × 5
8
  = 11 + (1/8) x 5
  = 11.625

Estimating the Mode from Grouped Data

Again, looking at our data:

Number of games Frequency
1 - 5 2
6 - 10 7
11 - 15 8
16 - 20 3

We can easily find the modal group (the group with the highest frequency), which is 11 - 15

We can say "the modal group is 11 - 15"

But the actual Mode may not even be in that group! Or there may be more than one mode. Without the raw data we don't really know.

But, we can estimate the Mode using the following formula:

Estimated Mode = L +   fm − fm-1  × w
(fm − fm-1) + (fm − fm+1)

where:

  • L is the lower class boundary of the modal group
  • fm-1 is the frequency of the group before the modal group
  • fm is the frequency of the modal group
  • fm+1 is the frequency of the group after the modal group
  • w is the group width

In this example:

  • L = 11
  • fm-1 = 7
  • fm = 8
  • fm+1 = 3
  • w = 5
Estimated Mode = 11 +   8 − 7  × 5
(8 − 7) + (8 − 3)
  = 11 + (1/6) × 5
  = 11.833...

 

Our final result is:

  • Estimated Mean: 11
  • Estimated Median: 11.625
  • Estimated Mode: 11.833...

(Compare that with the true Mean, Median and Mode of 11.1, 11 and 12 that we got at the very start.)

 

And that is how it is done.

Now let us look at two more special examples, and get some more practice along the way!

Continuous Data

Data can be Discrete or Continuous:

  • Discrete data can only take certain values, like our previous example (games owned)
  • Continuous data can take any value (within a range), such as length or weight

Continuous data can be treated in exactly the same way as discrete data, but with one important difference.

The difference concerns the class boundaries.

 

Example: You grew fifty baby carrots using special soil. You dig them up and measure their lengths (to the nearest mm) and group the results:

Length (mm) Frequency
150 - 154 5
155 - 159 2
160 - 164 6
165 - 169 8
170 - 174 9
175 - 179 11
180 - 184 6
185 - 189 3

Now, what does "155 - 159" mean?

The clue is "to the nearest mm".

  • A length of 154.5 mm is rounded up to 155 mm (and placed in 155 - 159),
  • Similarly 159.49 mm is rounded down to 159 mm (and also be placed in 155 - 159).

So lengths from 154.5 up to (but not including) 159.5 get placed in 155 - 159

And so for continuous data "155 - 159" has two types of numbers at the beginning and end:

  • the lower class boundary of 155 and the upper class boundary of 159
  • the lower class limit of 154.5 and upper class limit of 159.5

Note that the upper class limit of one class interval is the lower class limit of the next class interval.

So, how does this affect our calculations?

  • The Mean is not affected
  • But the Median and Mode now have L = Lower class limit (rather than Lower class boundary)

Now let's go:

Mean

Length (mm) Midpoint
x
Frequency
f

fx
150 - 154 152 5 760
155 - 159 157 2 314
160 - 164 162 6 972
165 - 169 167 8 1336
170 - 174 172 9 1548
175 - 179 177 11 1947
180 - 184 182 6 1092
185 - 189 187 3 561
  Totals: 50 8530

Estimated Mean =   8530   = 170.6 mm
50

 

Median

The Median is the mean of the 25th and the 26th length, so is in the 170 - 174 group:

  • L = 169.5 (the lower class limit of the 170 - 174 group)
  • n = 50
  • cfb = 5 + 2 + 6 + 8 = 21
  • fm = 9
  • w = 5
Estimated Median = 169.5 +   (50/2) − 21  × 5
9
  = 169.5 + 2.22...
  = 171.7 mm (to 1 decimal)

 

Mode

The Modal group is the one with the highest frequency, which is 175 - 179:

  • L = 174.5 (the lower class limit of the 175 - 179 group)
  • fm-1 = 9
  • fm = 11
  • fm+1 = 6
  • w = 5
Estimated Mode = 174.5 +   11 − 9  × 5
(11 − 9) + (11 − 6)
  = 174.5 + 1.42...
  = 175.9 mm (to 1 decimal)

Ages

Age is a special case.

When we say "Sarah is 17" she stays "17" up until her eighteenth birthday.
She might be 17 years and 364 days old and still be called "17".

In other words, even though "age" is a continuous variable (time), we treat it as discrete.

 

Example: The ages of the 112 people who live on a tropical island are grouped as follows:

Age Number
0 - 9 20
10 - 19 21
20 - 29 23
30 - 39 16
40 - 49 11
50 - 59 10
60 - 69 7
70 - 79 3
80 - 89 1

A child in the first group 0 - 9 could be almost 10 years old. So the midpoint for this group is 5 not 4.5

The midpoints are 5, 15, 25, 35, 45, 55, 65, 75 and 85

Similarly, in the calculations of Median and Mode, we will use the class boundaries 0, 10, 20 etc

Mean

Age Midpoint
x
Number
f

fx
0 - 9 5 20 100
10 - 19 15 21 315
20 - 29 25 23 575
30 - 39 35 16 560
40 - 49 45 11 495
50 - 59 55 10 550
60 - 69 65 7 455
70 - 79 75 3 225
80 - 89 85 1 85
  Totals: 112 3360

Estimated Mean =   3360   = 30
112

 

Median

The Median is the mean of the ages of the 56th and the 57th people, so is in the 20 - 29 group:

  • L = 20 (the lower class boundary of the class interval containing the median)
  • n = 112
  • cfb = 20 + 21 = 41
  • fm = 23
  • w = 10
Estimated Median = 20 +   (112/2) − 41  × 10
23
  = 20 + 6.52...
  = 26.5 (to 1 decimal)

 

Mode

The Modal group is the one with the highest frequency, which is 20 - 29:

  • L = 20 (the lower class boundary of the modal class)
  • fm-1 = 21
  • fm = 23
  • fm+1 = 16
  • w = 10
Estimated Mode = 20 +   23 − 21  × 10
(23 − 21) + (23 − 16)
  = 20 + 2.22...
  = 22.2 (to 1 decimal)

Summary

For grouped data, we cannot find the exact Mean, Median and Mode, we can only give estimates.

To estimate the Mean use the midpoints of the class intervals.

Estimated Median = L +   (n/2) + cfb  × w
fm

where:

  • L is the lower class boundary of the group containing the median
  • n is the total number of data
  • cfb is the cumulative frequency of the groups before the median group
  • fm is the frequency of the median group
  • w is the group width
Estimated Mode = L +   fm − fm-1  × w
(fm − fm-1) + (fm − fm+1)

where:

  • L is the lower class boundary of the modal group
  • fm-1 is the frequency of the group before the modal group
  • fm is the frequency of the modal group
  • fm+1 is the frequency of the group after the modal group
  • w is the group width

For continuous data use limits (rather than boundaries) for median and mode