Business Statistics Unit 3
Describing Data: Common Numerical Techniques
In this unit we will examine the most common methods and techniques used to describe data. Some of these techniques you may already be familiar with (the mean, median, and mode), while others like the midrange, variance, or standard deviation may be new.
Introduction
| How many times have you heard someone describe some data using the term average or mean? For example, if you are a baseball fan you probably are acquainted with a player's batting average. A baseball player's batting average is based upon the number of hits the person gets out of the total number of times at bat. So a batting average of .333 indicates that the player is likely to get a hit every third time at bat. |
|
We also are often exposed to information about the average income, or age. For example, a newspaper article may discuss the average income for the city you live or work in. The average or mean as it is called in statistics is one of several commonly used techniques to describe a set of data. We will discuss each type, how it is calculated, what it is used for, and the strengths and weaknesses of each type.
The Mean
The mean is a commonly used calculation which provides a central value for a set of numbers. It is found by adding a set of values together and dividing the total by the number of values added together. For example, take a look at table 3.1 below.
Table 3.1. Age of Students in a Class.
| 25 | 29 | 30 |
| 28 | 49 | 24 |
| 19 | 21 | 26 |
| 22 | 24 |
To determine the mean, we add all of the ages together. The total equals 297. Then we divide the total by the number of values, which is 11. So we divide 297 by 11 and we get 27. The number 27 represents the mean age of students in a class.
There are two formulas we use to calculate the mean. The actual calculations are identical for both formulas. The first formula is for a population mean. This formula is used when you are taking all values in a population and calculating the mean. The mean for a set of values in a population is called a parameter.
Population mean = Sum of all the values in a population / Number of values in a population
The formula for calculating the mean for a population is found in Formula Box 3.1.
Formula Box 3.1. The Population Mean.
![]()
You will notice that the symbols used in the formula may not be familiar to you. When we are referring to formulas for populations, Greek symbols are often used as part of the formulas. The definitions of the symbols used in Formula Box 3.1 are given below. You should become comfortable with the symbols and understand what they represent as they will be used frequently throughout this course.
The Greek symbol µ is used to represent the population mean. It is pronounced "mu".
The Greek symbol ∑ is used to indicate that all items following should be added together. It is pronounced "sigma"
The letter x represents any particular value.
The capital letter N represents the number of values or items in a population.
When the two symbols ∑X are together, that indicates that you are to add all of the X values together.
As in any formula, if you find a horizontal line ( _____ ) or a diagonal line ( / ) that indicates you will be dividing the value above (for a horizontal line) or to the left (for a diagonal line) by the value below (horizontal line) or to the right (diagonal line). Of course we already did this when we solved for the mean in Table 3.1.
If you have a set of values for a sample, then you will use the formula for a sample mean. The mean for a set of values in a sample is called a statistic.
Sample mean = Sum of all the values in a sample / Number of values in the sample
The formula for calculating the mean for a sample is found in Formula Box 3.2.
Formula Box 3.2. The Sample Mean
![]()
Notice that some of the symbols used in the formula for a sample mean are different from the symbols used for a population mean. Even though the calculations are identical, it is important for you to understand that the symbols refer to different types of data. Later in this course, these differences will be extremely important to recognize.
We will now examine the symbols used to calculate the mean for a sample.
is the symbol used to indicate the sample
mean. It is often pronounced "X bar" or the sample mean.
The Greek symbol ∑ is used to indicate that all items following should be added together. It is pronounced "sigma"
The letter x represents any particular value.
The letter n indicates the number of values or items in a sample. Notice that the lower case n is used for a sample, and capital N is used for a population.
CHARACTERISTICS OF THE MEAN
The mean is a measurement of central tendency. Simply stated, it provides you with a central or middle value for a set of values. The mean has the following characteristics:
It is the average or central value of a set of numbers.
The mean is often referred to as a balance point for a set of numbers. This indicates that all of the values added together above the mean are the same distance as the values added together below the mean.
The mean is influenced by large or small values. Extreme values in a set of data are called outliers. Outliers affect the mean and cause it to be larger or smaller. As a result if outliers are present, the mean may not accurately represent a set of data.
Lets discuss the problem associated with outliers in depth. Suppose we have a set of sample data with the following values:
5, 10, 8, 4, 6, 5, 7, 9, 6, 50
You will notice that all of the numbers are close to each other in value except for 50. We call the value of 50 an outlier. An outlier is any extreme value found in a set of data. It can be a high or low number value. Now lets calculate the mean. To calculate the sample mean we use formula 3.2:
5 + 10 + 8 + 4 + 6 + 5 + 7 + 9 + 6 + 50 / 10 = 110 / 10 = 11
Our mean equals 11. But does the mean actually represent the data properly? Except for the value of 50, all other data values were at 10 or below. In this case, the mean is being influenced by the large outlier value of 50. If this value did not exist, our mean would actually be much lower (6.667). Should you ignore the outlier value of 50? The answer is no. You should verify that the number is accurate and if it is it should be included with the rest of the data values.
In the business environment, outliers usually indicate a problem associated with a manufacturing process or perhaps a customer service issue. Often companies will use computerized reporting to identify outliers and bring them to the attention of quality control personnel and management. For example, if you are manufacturing cereal you have specialized equipment that fills cereal boxes by weight. If the cereal box is supposed to contain 15 ounces of cereal, your are required to fill the box with at least 15 ounces of cereal. It's OK if you put more than 15 ounces in the box, but not less than 15 ounces. In terms of cost and profitability, you really want to make sure that you can get as close to 15 ounces as possible in the box.

Therefore a cereal company will purchase specialized equipment that accurate fills the cereal boxes. However, the boxes are weighed by another piece of equipment to ensure that the minimum amount of cereal is being placed in the box. This equipment creates a database of weights for all of the cereal boxes and calculates the mean and other statistical measurements. It also identifies outliers and may actually route cereal boxes that are too light or too heavy to another location in the factory. Exception reports may be created that quality control and production personnel can examine and determine if a problem exists.
CALCULATING THE MEAN USING EXCEL
We can use Excel to calculate the mean for a set of numbers instead of manually calculating it. If you have not already done so, please open another window on your computer using Excel.
Practice Exercise 3.1. Using Excel to Calculate the Mean.
The Last National Bank has an ATM at a popular mall in the city. Bank personnel are interested in the length of time each customer transaction takes. A sample was taken during the lunch period from 11:00 AM - 2:00 PM of the transaction times for each customer during the specified time period. The times in seconds, are given below.
| 78 | 125 | 187 | 55 | 112 | 264 |
| 89 | 172 | 163 | 208 | 144 | 121 |
| 196 | 205 | 36 | 179 | 169 | 97 |
| 145 | 158 | 161 | 48 | 151 | 168 |
| 133 | 241 | 177 | 118 | 253 | 105 |
| 166 | 115 | 138 | 182 | 129 |
Whenever you have a large set of values it is much easier to use Excel to calculate the mean instead of entering the numbers into a calculator and using the formula to find the mean. Follow the steps given below to use Excel to find the mean.
Excel 2007
Make sure you have opened Excel and have a blank spreadsheet. Then highlight the data table above and copy and paste it into the worksheet starting at cell A1. In order to perform the calculations in Excel, you will need to take the table data and reorganize it into one column. The easiest way to do this is to cut and paste each column of data below the previous column. Once you are finished you should have one column of data starting at A1 and ending at A35.
Click on the Data folder.
Click on Data
Analysis. If the Data Analysis option does not
appear, you probably forgot to load Analysis Toolpak.
Instructions for loading Analysis Toolpak are found at the end of
Unit 1. If you are certain that you have loaded Analysis
ToolPak correctly and it still does not appear as a menu option, try
closing Excel and restarting it again.
Under Data
Analysis, choose Descriptive Statistics
from the choices listed. Then click OK.


The Descriptive Statistics window asks for additional information. First, you are asked for the Input Range. Type in A1:A35 or use your mouse to highlight the data values in the spreadsheet.
Your data is grouped by columns so make sure that the columns box next to Grouped By in the window is checked. For this example we are not concerned about data labels.
Under Output Options, click in the box to the left of Output Range. Then type in C1 in the blank box.
Click in the box to the left of
Summary Statistics. Then click OK.
| Column1 | |
| Mean | 148.2285714 |
| Standard Error | 9.087907351 |
| Median | 151 |
| Mode | #N/A |
| Standard Deviation | 53.76478495 |
| Sample Variance | 2890.652101 |
| Kurtosis | 0.056838277 |
| Skewness | 0.008228963 |
| Range | 228 |
| Minimum | 36 |
| Maximum | 264 |
| Sum | 5188 |
| Count | 35 |
Excel 2003
Make sure you have opened Excel and have a blank spreadsheet. Then highlight the data table above and copy and paste it into the worksheet starting at cell A1. In order to perform the calculations in Excel, you will need to take the table data and reorganize it into one column. The easiest way to do this is to cut and paste each column of data below the previous column. Once you are finished you should have one column of data starting at A1 and ending at A35.
In Excel, click on Tools and then Data Analysis. You will then see a list of options. Click on Descriptive Statistics and click OK. If the option Data Analysis is not given, you will need to load it into Excel. Simply click on Tools, Add-Ins, and in the Add-In list, select the Analysis ToolPak box and then click on OK. Then follow the first part of step 2 again.
You should see a new window titled Descriptive Statistics.
You are then asked for the Input Range. Type in A1:A35 or use your mouse to highlight the data values in the spreadsheet.
Your data is grouped by columns so make sure that the columns box next to Grouped By in the window is checked. For this example we are not concerned about data labels.
Under Output Options, click in the box to the left of Output Range. Then type in C1 in the blank box.
Click in the box to the left of Summary Statistics. Then click OK.
Starting in cell C1, you should see output identical to the example given below.
| Column1 | |
| Mean | 148.2286 |
| Standard Error | 9.087907 |
| Median | 151 |
| Mode | #N/A |
| Standard Deviation | 53.76478 |
| Sample Variance | 2890.652 |
| Kurtosis | 0.056838 |
| Skewness | 0.008229 |
| Range | 228 |
| Minimum | 36 |
| Maximum | 264 |
| Sum | 5188 |
| Count | 35 |
If your output looks like the above example, you did it correctly! If not, go back and follow the steps again. There is a substantial amount of information in the output from Excel, but at this point we are only concerned about six items.
Mean. Excel calculated the mean to be 148.2286 seconds. This is the average time customer transactions occurred during the specified time period. The symbol x bar represents the mean of a sample.
Range. The range value of 228 refers to the distance in seconds from the lowest time value of 36 to the highest time value of 264. The range is a simple and crude method used to measure dispersion, the distance between the variables.
Minimum. The minimum or lowest value within the data set is given which is 36 seconds.
Maximum. The maximum or highest data value with the data set is given which is 264 seconds.
Sum. The sum is simply all of the data values added together. In this example the sum equals 5188 seconds. The symbol ∑ indicates to sum or add.
Count. The count refers to the number of values you have included which is 35. The symbol n is used to indicate a sample sum.
If you are interested in just finding the mean using Excel, and are not interested in the additional information provided through descriptive statistics, you can find the mean of a set of data using the Excel function =average.
To use this function, enter your data into a blank Excel spreadsheet in one column.
In the blank cell below your column of date, type in =average and then in parentheses, indicate the cells you want included in your calculation.
Now press the enter key and you have the mean value.
Are you comfortable using Excel to calculate the mean? If not, lets do another sample exercise.
Practice Exercise 3.2. Using Excel to Calculate the Mean.
For this exercise, we are looking at highway driving speeds over a section of Interstate I-94 where many accidents have happened. The Michigan State Police have increased patrols and enforcement of the 70 MPH speed limit. The goal is to get the average speed of motorists at or below 70 MPH. Using the data below that contains a sample of radar checks of motorist speeds during a one hour time period, is the goal of an average of 70 MPH being achieved?
| 68 | 77 | 62 | 71 | 75 | 70 | 69 | 60 | 73 | 72 |
| 58 | 75 | 70 | 69 | 80 | 74 | 67 | 65 | 77 | 72 |
| 85 | 60 | 68 | 73 | 70 | 74 | 76 | 70 | 66 | 67 |
| 55 | 71 | 75 | 72 | 70 | 67 | 78 | 72 | 65 | 74 |
| 70 | 66 | 72 | 75 | 59 | 80 | 72 | 77 | 64 | 73 |
| 72 | 70 | 68 | 74 | 78 | 71 | 63 | 70 | 69 | 71 |
Excel 2007
Click on the
Data folder.
Click on Data Analysis. If the Data Analysis option does not appear, you probably forgot to load Analysis Toolpak. Instructions for loading Analysis Toolpak are found at the end of Unit 1.
Under Data Analysis, choose Descriptive Statistics from the choices listed. Then click OK.
The Descriptive Statistics window asks for additional information. First, you are asked for the Input Range. Type in A1:A60 or use your mouse to highlight the data values in the spreadsheet.
| Column1 | |
| Mean | 70.43 |
| Standard Error | 0.735 |
| Median | 71 |
| Mode | 70 |
| Standard Deviation | 5.691 |
| Sample Variance | 32.39 |
| Kurtosis | 0.629 |
| Skewness | -0.37 |
| Range | 30 |
| Minimum | 55 |
| Maximum | 85 |
| Sum | 4226 |
| Count | 60 |
Excel 2003
Make sure you have opened Excel and have a blank spreadsheet. Then highlight the data table above and copy and paste it into the worksheet starting at cell A1. In order to perform the calculations in Excel, you will need to take the table data and reorganize it into one column. The easiest way to do this is to cut and paste each column of data below the previous column. Once you are finished you should have one column of data starting at A1 and ending at A60.
In Excel, click on Tools and then Data Analysis. You will then see a list of options. Click on Descriptive Statistics and click OK.
You should see a new window titled Descriptive Statistics.
You are then asked for the Input Range. Type in A1:A60 or use your mouse to highlight the data values in the spreadsheet.
Your data is grouped by columns so make sure that the columns box next to Grouped By in the window is checked. Once again we do not have any labels to add.
Under Output Options, click in the box to the left of Output Range. Then type in C1 in the blank box.
Click in the box to the left of Summary Statistics. Then click OK.
Starting in cell C1, you should see output identical to the example given below.
| Column1 | |
| Mean | 70.43 |
| Standard Error | 0.735 |
| Median | 71 |
| Mode | 70 |
| Standard Deviation | 5.691 |
| Sample Variance | 32.39 |
| Kurtosis | 0.629 |
| Skewness | -0.37 |
| Range | 30 |
| Minimum | 55 |
| Maximum | 85 |
| Sum | 4226 |
| Count | 60 |
Looking at the mean which Excel has calculated for you, can you answer the question? Our goal was to have an average speed of 70 MPH or less. Are we achieving our goal? Since the mean is 70.43 which is not less than 70, the answer is no. A reminder that the mean represents an average or central value and it does not indicate that every motorist is in compliance. Clearly we have some motorists exceeding the speed limit while other motorists are well below the speed limit.
The mean can also be calculated in Excel by using a simple command called =average. Using the numbers below, let's calculate the mean using =average.
| 68 |
| 58 |
| 85 |
| 55 |
| 70 |
| 72 |
Make sure you have opened Excel and have a blank spreadsheet. Then highlight the data table above and copy and paste it into the worksheet starting at cell A1.
Immediately below the last number copied, enter the following formula into cell A7: =Average(A1:A6) and then hit your Enter key.
Cell A7 should have your answer which is 68.
If you just need to find the mean for a set of data, using the =Average formula provides you with a quick method to find the mean instead of using the Descriptive Statistics option.
THE MEDIAN
The median is the middle number or value from a set of data. The data values must be placed in a order from lowest to highest or highest to lowest in order for the median to be determined. Once you have placed all of the data in order, the median is the middle value.
For example, look at the numbers in Table 3.1 again.
| 25 | 29 | 30 |
| 28 | 49 | 24 |
| 19 | 21 | 26 |
| 22 | 24 |
How do you find the median? You must first place the data in order - lowest to highest is very common so that is what we will do.
19, 21, 22, 24, 24, 25, 26, 28, 29, 30, 49
Now you find the middle value Since we have 11 values, the middle value is the one where there are five values above it and five values below it. So our middle value is 25. The median is 25.
But what happens if you have an even set of data? To find the median you simply add the middle two values together and divide by 2. Lets use the data in Table 3.1, but add another value of 18. Now our data in order from lowest to highest looks like this:
18, 19, 21, 22, 24, 24, 25, 26, 28, 29, 30, 49
To find the median from an even number of values, you take the middle two values, add them together, and divide by 2. Here our middle two values are 24 and 25. We take 24 + 25 and divide the total (49) by 2. Our answer is 24.5. This is the median.
The median does not have a formula associated with it. It is found by using these simple steps.
Formula Box 3.3. Steps to Find the Median.
Organize the data values from lowest to highest, or highest to lowest.
If you have an odd number of data values, the middle value is your median.
If you have an even number of data values, take the two middle values, add them together, and then divide by 2. This value is the median.
CHARACTERISTICS OF THE MEDIAN
The median has the following characteristics:
It is the middle value or number in a group of values that are organized or ranked from lowest to highest, or highest to lowest.
If there is an even set of data values, the median is found by taking the middle two values and adding them together and then dividing by two.
The median is not influenced by outliers or extreme values. Outliers whether they are large or small are considered just one value in a series of values. All of the values have equal weight since each value is counted as one.
USING EXCEL TO FIND THE MEDIAN
Finding the median is not difficult unless you have a large set of data values to organize. Then it becomes much easier to use Excel to do the work for you. The steps used to calculate the median in Excel are identical to the steps used to calculate the mean. Both the mean and median values are included in the output screen in Excel.
Lets look again at Practice Exercise 3.2. Scroll up and review the steps and output for Practice Exercise 3.2. For this large set of data values, using Excel can easily simplify the calculation of the median. Regardless of how the data values are entered into Excel, it will automatically determine the median. There is no need to enter the data into Excel in a specific order.
If you are interested in finding the median only and are using Excel, there is a specific function you can use.
Simply enter all of your numbers into one column.
In the blank cell below your numbers, type in =median followed by the cell addresses you wish to include in parentheses. Press the Enter key
You now have your median value.
THE MODE
The mode is another measure of central tendency like the mean and median. The mode is found by looking for the value that appears most frequently in a set of data. If no data values are identical, then a mode does not exist. If a set of data has two values that each appears twice, then the set of data has two modes and is called bimodal.
Once again lets take a look at the data in Table 3.1 again.
| 25 | 29 | 30 |
| 28 | 49 | 24 |
| 19 | 21 | 26 |
| 22 | 24 |
The mode is found by looking for values that appear more than once in a set of data. To make it easier to find the mode, it is often helpful to organize the data from lowest to highest or highest to lowest just like we did for the median. Listed below are the data values ranked lowest to highest for Table 3.1.
19, 21, 22, 24, 24, 25, 26, 28, 29, 30, 49
Now its much easier to find the mode. Since the value 24 appears twice, the mode is 24.
Although the mode does not have a formula, the steps below outline the procedure for finding the mode.
Formula Box 3.4. Finding the Mode.
Organize the data values from lowest to highest, or highest to lowest.
Look for any data values that are repeated.
The mode is found by choosing the data value that is repeated the most times in a set of data.
If two sets of data values appear more than once and the same number of occurrences, the data set is bimodal.
CHARACTERISTICS OF THE MODE
The mode is a data value that appears most frequently in a set of data.
The mode is not affected by outliers or extreme values.
In some cases, no mode exists while in other cases two or more modes could exist.
Using Excel To Find The Mode
Excel can be used to find the mode for a set of data values. It is especially helpful to use Excel to find the mode when you have a large set of data and it would take you a significant amount of time to organize the data from lowest to highest.
The steps to use Excel to find the mode are identical to the steps used for the median and mean. Take a look at Sample Exercise 3.2 again. Notice that the output contains the Mode in addition to the median and mean. What happens if there is no mode? If you take a look at the output from Excel in Sample Exercise 3.1 you will notice that a mode was not calculated (you are given N/A). This indicates that none of the data values were repeated.
You can find the mode only in Excel by using the =mode function followed by the cells in parentheses you wish to include.
NOTE: Excel has a problem finding one mode when several modes exist. Therefore you can't rely upon Excel to find the Mode when multiple modes exist. Excel always chooses the first set of numbers that repeat the most times. However if another set of numbers repeats the same amount of times, Excel ignores the second mode.
Midrange
The midrange is simply the lowest and highest values of a data set, added together and then divided by 2. It is a rough estimate of the middle of a set of data.
For example, let's say we have the following set of numbers:
15, 25, 16, 5, 9, 21
The midrange is found by taking the highest number of 25, adding it to the lowest number 5, and dividing by 2. Here we have 25 + 5 = 30. Then we divide 30 by 2 and we get 15. 15 is the midrange.
Excel does not have a function to find the midrange. However you can use Excel to do the math for you. Using our previous example, we can write a quick formula in Excel to find the midrange value. In a cell in Excel, enter the following:
=(25+5)/2
The press Enter. The number 15 is given.
The mean, median, mode, and midrange are all measures of central tendency.
Weighted Mean
You are all familiar with weighted means. The best example of using a weighted mean occurs when the college you are attending calculates your grade point average for a semester of classes. Depending upon the number of credits, each class you have received a grade for may have more or less "weight" on your grade point average.
Take a look at the table below of grades and credit hours. To simplify things, we are using a 4 point grade scale and two possible grades.
| Course Name | Credit Hours | Grade | Weighted Grade Total |
| Economics 201 | 3 | B (3 points) | 9 (3 X 3) |
| Business Statistics 112 | 3 | A (4 points) | 12 (3 X 4) |
| Accounting 102 | 4 | B ( 3 points) | 12 (4 X 3) |
| Beginning Excel 181 | 2 | A ( 4 points) | 8 (2 X 4) |
To determine the grade point average, the point equivalent of each grade is multiplied by its "weight" which in this case relates to the credit hours. All of the weighted grade totals are added together (9+12+12+8=41). The total is then divided by the total amount of credit hours which is 12. Therefore we have 41/12 = 3.4167. The weighted mean (and grade point average) for this student is 3.4167. Notice that each grade carries a different weight depending upon the number of credits.
You use a weighted mean when you want to give a higher weight to items or classes that contain more "hits" or counts than other classes. For example, you may want to calculate a weighted mean for data that is organized into classes where the count per class differs.
Another way to use the weighted mean is to apply it based upon frequency counts. Suppose you visit the local McDonald's restaurant and observe the different sizes of soft drinks sold through the drive up window. Three sizes were available: small, medium, and large. The prices for each are $.85, $1.25, and $1.75. If 10 small, 15 medium, and 20 large drinks are sold, find the weighted mean. Here's what we do:
The weighted mean = 10($.85) + 15($1.25) + 20($1.75) / 45 = $1.3833. Notice that we divide by the total count of all drinks sold.
Symmetrical and Skewed Distributions of Data
We first examined the shape of distributions in unit 2. Let's review this again and pay extra attention to the mean, median, and mode.
When we examine a series of data values, we often look to see if the series of values have a specific shape. In order to determine the shape of a series of data values, we often construct a line chart and plot the data values on the chart.
Looking at Figure 3.1 below, the data on this chart is symmetrical. A symmetrical distribution has the same number of values above and below the mean which is represented by the peak of the curve. The mean and median in a symmetrical distribution are equal.
Figure 3.1. A Symmetrical Distribution of Data.

Sometimes we have a distribution of data that contains extreme values. When this occurs we have a skewed distribution. When the extreme values are above the mean value, we have a positively skewed distribution. The shape of the distribution is illustrated in Figure 3.2. Notice that positively skewed distributions have a tail to the right indicating that the extreme values or outliers are on the right side of the distribution. In this situation, the mean is greater than the median and mode.
Figure 3.2. A Positively Skewed Distribution of Data.

If the extreme values are below the mean then we have a negatively skewed distribution. The shape of this distribution is illustrated in Figure 3.3. Notice that for a negatively skewed distribution, the extreme values on the left produce a left sided tail. In this situation, the mean is less than the median and mode.
Figure 3.3. A Negatively Skewed Distribution of Data.

There are several statistical methods using formulas that can be used to determine the skewness of a set of data. Excel can automatically calculate the skewness of a set of data using the Descriptive Statistics option. It is commonly reported along with the mean, median, mode, and range. A positive value indicates that the data values are positively skewed. A negative value indicates that the data values are negatively skewed.
Let's take a look at the Excel output from the Descriptive Statistics function below and find the information on skewness.
| Column1 | |
| Mean | 70.43 |
| Standard Error | 0.735 |
| Median | 71 |
| Mode | 70 |
| Standard Deviation | 5.691 |
| Sample Variance | 32.39 |
| Kurtosis | 0.629 |
| Skewness | -0.37 |
| Range | 30 |
| Minimum | 55 |
| Maximum | 85 |
| Sum | 4226 |
| Count | 60 |
Notice that the skewness is being reported as -0.37. This indicates that the distribution is slightly negatively skewed.
Measures of Dispersion/Variation
A measure of dispersion or variation is used to determine the variability of data. Frequently a measure of dispersion or variability is used to determine how well the mean represents the data. If there is little variability in our data, then the mean can accurately represent the data. However if there is a lot of variability in the data, represented by data values that are scattered and distant from the mean, then using the mean should not be used to represent the data values.
Why are we concerned about dispersion and variation? If we take a look at our cereal company that we discussed in the previous unit, we know that the company wishes to have a consistent weight in its cereal boxes. If the goal is 15 ounces, then the company wants to be certain that all of its boxes weigh at least 15 ounces. If we examine 100 cereal boxes at random, weigh each box, and find the mean, we may find that the mean weight is 15.2 ounces. But, what is important is that no boxes are below 15 ounces and that very few boxes are above 15 ounces. The mean does not tell us if we are achieving our goal. In addition to the mean, we calculate a measure of dispersion to determine if we could possibly have some boxes weighing more or less than our goal. The more dispersed our data is, the more likely we will have boxes weighing below 15 ounces.
The Range
The shape of a distribution of data provides us with our first clue about the data. One of the easiest measurements used to measure the dispersion of data is called the range. The range is the difference between the highest and lowest values in a set of data. The range is often referred to as the simplest or crudest measure of dispersion.
Take a look at the data in Table 3.2. To calculate the range we simply take the highest value and subtract from it the lowest value. The result is the range.
Table 3.2. Bids Received for Main Street Road Renovations (in dollars).
| 454,000 | 495,000 | 450,000 |
| 389,500 | 417,000 | 400,000 |
| 412,700 | 365,000 | 433,000 |
| 329,500 | 429,000 | 419,000 |
To find the range in Table 3.2, the highest bid was $495,000 and the lowest bid was $329,500. Therefore the range is equal to $495,000 - $329,500 = $165,500. The steps for finding the range for a group of values is given in Formula Box 3.5.
Formula Box 3.5. Steps to Find the Range.
Find the highest and lowest values for a set of data.
Subtract the lowest value from the highest value.
The resulting number is the range.
You can also use Excel to calculate the range. The practice exercise below uses Excel to calculate the range for the values found in Table 3.2.
Practice Exercise 3.3. Using Excel to Find the Range.
Excel 2007
Make sure you have opened Excel and have a blank spreadsheet. Then highlight Table 4.1 above and copy and paste it into the worksheet starting at cell A1. In order to perform the calculations in Excel, you will need to take the table data and reorganize it into one column. The easiest way to do this is to cut and paste each column of data below the previous column. Once you are finished you should have one column of data starting at A1 and ending at A12.
Click on the Data folder.
Click on Data Analysis. If the Data Analysis option does not appear, you probably forgot to load Analysis Toolpak. Instructions for loading Analysis Toolpak are found at the end of Unit 1.
Under Data Analysis, choose Descriptive Statistics from the choices listed. Then click OK.
The Descriptive Statistics window asks for additional information. First, you are asked for the Input Range. Type in A1:A12 or use your mouse to highlight the data values in the spreadsheet.
Starting in cell C1, you
should see output identical to the example given below.
| Column1 | |
| Mean | 416141.7 |
| Standard Error | 12419.79 |
| Median | 418000 |
| Mode | #N/A |
| Standard Deviation | 43023.41 |
| Sample Variance | 1.85E+09 |
| Kurtosis | 0.837647 |
| Skewness | -0.29713 |
| Range | 165500 |
| Minimum | 329500 |
| Maximum | |