Business Statistics Unit 9
Chi-Square Analysis and Testing, Analysis of Variance (ANOVA)
Introduction
In this unit you are introduced to another new statistical technique called Chi-square analysis and testing. There are two possible tests we can use: one is called a goodness-of-fit test and the other is called a contingency table test. The Chi-square tests do not require a normal distribution and are often used when the shape of the population distribution is unknown. Chi-square tests are appropriate when you have nominal data. Nominal data is the most basic form of data where data is classified into categories with no order. Examples of nominal data include gender, state of residence, brand of toilet paper you use, or the type of car you drive.
Later in this unit you will learn about Analysis of Variance (ANOVA). The Analysis of Variance test (ANOVA) is used to compare the means between three or more normal populations to determine if they are equal or not.
Basic Thoughts and Terms
A goodness-of-fit test is used to determine under hypothesis testing, if our observed outcomes from an experiment are different from the expected outcomes from the experiment. For example, we can use a goodness-of-fit test to determine if a new fast food restaurant had an impact on the sales of surrounding existing fast food restaurants after it opened. In this case, our expected outcomes would be that no change has occurred in terms of sales at the existing fast food restaurants, and our observed outcomes are the actual sales at all fast food restaurants, including the new one.
The contingency table test is used to determine if two or more variables are independent of each other, or that one has a statistically significant effect on the other. For example, if we wanted to determine the taste preferences of men and women related to a new food product, we could use a contingency table test to determine if gender was a factor related to taste preference or not.
The Goodness-Of-Fit Test
For a goodness-of-fit test, we use the familiar five step process that we have used in earlier units. For each step however, some of the procedures are different.
State the null and alternative hypothesis. For the null hypothesis (H0 ), it is usually stated as there has been no change or preference and that the population being studied fits a specified pattern or distribution. The alternative hypothesis (H1) is that the population does not fit a specified pattern or preference, or that there has been a change.
Find the critical value. To find the critical value for a chi-square distribution using the tables in the back of your book, we must first find the degrees of freedom. For this test, the degrees of freedom are equal to k - 1, where k is the number of possible outcomes in an experiment. The possible outcomes are often found by looking at the different categories the data has been classified into. Once you have the degrees of freedom, you use the significance value and use the chi-square table to find the critical value. Note: you will only need to use the right columns in the table. For example, the .05, .01, and .10 columns are often used. All chi-square tests are considered one-tail tests. It is also a good idea to state your decision rule to guide you in what you should do after you calculate your test statistic. The same basic format is used. Reject H0 in favor of H1 if our test statistic X2 (calculated X2 value in step 3) is > the critical value from step 2, otherwise fail to reject H0. The decision rule for testing will always use the > symbol.
Compute the test statistic. The test statistic for chi-square goodness-of-fit testing can be found in Formula Box 9.1.
Formula Box 9.1. The Chi-Square Goodness-Of-Fit Formula.

O = an observed frequency from the sample
E = the expected (hypothesized) frequency if the null hypothesis is true.
Make the decision. If our calculated value for the test statistic in step 3 is greater than the critical value found in step 2, we reject the null hypothesis (H0). If our test statistic value is less than the critical value, we fail to reject the null hypothesis (H0).
Summarize the results. For this step you once again state what decision should be made in relation to the question that was posed in the problem.
Ready to try a practice problem? Lets go to Practice Exercise 9.1.
Practice Exercise 9.1. The Goodness-Of-Fit Test.
The president of a small local college is interested in determining if the number of college graduates from the college's four major schools has changed during the last 10 years. In the past, the school of art had 15% of the graduates, the school of business had 25% of the graduates, the school of education had 40%, and the school of engineering had 20% of the graduates. The number of recent graduates from each of the four schools is given below:
|
Art |
30 |
|
Business |
55 |
|
Education |
25 |
|
Engineering |
40 |
At the .05 level of significance, does a goodness-of-fit test indicate that there has been a change in the number of graduates by each school in the last 10 years?
State the null and alternative hypothesis. The null hypothesis is stated as: H0 : There has been no change in the past 10 years in the number of graduates from each school. The alternative hypothesis is stated as: H1: There has been a change.
Find the critical value. Significance (α) = .05. The critical value is based upon k - 1 degrees of freedom. There are four possible outcomes so k = 4. Our degrees of freedom equal 4 - 1 = 3. The critical value with 3 degrees of freedom and .05 significance equals 7.81. The decision rule: Reject H0 in favor of H1 if the test statistic is > 7.81. Otherwise fail to reject H0.
Compute the test statistic. Here we use the information given above along with formula 9.1 to solve. Notice that the expected (E) column is based upon the past percentages for each school, multiplied by the total number of graduates.
| Observed (O) | Expected (E) | O - E | (O - E)2 | (O - E)2 / E |
| 30 | 22.50 | 7.50 | 56.25 | 2.50 |
| 55 | 37.50 | 17.50 | 306.25 | 8.1667 |
| 25 | 60.00 | -35.00 | 1225 | 20.4167 |
| 40 | 30.00 | 10.00 | 100 | 3.3333 |
|
Total |
34.4167 |
|||
The calculated test statistic value is 34.4167.
Make the decision. Since our test statistic value of 34.4167 is greater than our critical value of 7.81, we reject H0.
Summarize the results. This indicates that a statistically significant change has occurred in the number of graduates by school for the college.
By now you may have noticed that the calculations for the goodness-of-fit test can be quite time consuming although they are not difficult to do. Is there a better way? Sure! Let's use Excel to find the same answer to Practice Exercise 9.1.
Practice Exercise 9.2. Using Excel 2007 or 2003 for the Goodness-Of-Fit Test.
The president of a small local college is interested in determining if the number of college graduates from the college's four major schools has changed during the last 10 years. In the past, the school of art had 15% of the graduates, the school of business had 25% of the graduates, the school of education had 40%, and the school of engineering had 20% of the graduates. The number of recent graduates from each of the four schools is given below:
|
Art |
30 |
|
Business |
55 |
|
Education |
25 |
|
Engineering |
40 |
At the .05 level of significance, does a goodness-of-fit test indicate that there has been a change in the number of graduates by each school in the last 10 years?
State the null and alternative hypothesis. The null hypothesis is stated as: H0 : There has been no change in the past 10 years in the number of graduates from each school. The alternative hypothesis is stated as: H1: There has been a change.
Find the critical value. Significance (α) = .05. The critical value is based upon k - 1 degrees of freedom. There are four possible outcomes so k = 4. Our degrees of freedom equal 4 - 1 = 3. The critical value with 3 degrees of freedom and .05 significance equals 7.81. The decision rule: Reject H0 in favor of H1 if the test statistic is > 7.81. Otherwise fail to reject H0.
Compute the test statistic. Here is where we can use Excel to make our job easier. First of all make sure you have a blank spreadsheet opened in Excel.
Copy the table of information from above into Excel starting at cell A1. Adjust your column widths as needed. The labels should be in column A and the numbers should be in column B.
In column C you will need to manually compute the expected values (E). The easiest way to do this is to insert a simple formula within each cell under column C starting at C1. Since the expected values are based upon the prior study, we take the total number of students from the recent study (150), and use the proper percentage for each category. For cell C1, your formula looks like this: =(150*.15). For cell C2 your formula is: =(150*.25). For cell C3 your formula is: =(150*.40). For cell C4 your formula is: =(150*.20). This will give you the expected values for each category. Therefore your worksheet should look like this:
Art 30 22.5 Business 55 37.5 Education 25 60 Engineering 40 30 Now you are ready for the next step. We will be using the next column D to place another formula within each cell starting with cell D1. This formula is essentially our (O - E)2 / E formula. In cell D1 enter the following formula: =(B1-C1)^2/C1. For cell D2 enter this formula: =(B2-C2)^2/C2. For cell D3 enter this formula: =(B3-C3)^2/C3. Finally for cell D4 use this formula: =(B4-C4)^2/C4. Your worksheet should look like this:
Art 30 22.5 2.500000 Business 55 37.5 8.166667 Education 25 60 20.416667 Engineering 40 30 3.333333 Now we are ready to compute the value of our test statistic. In cell D5 place the following formula which adds up our values: =SUM(D1:D4). Your worksheet should look like this:
Art 30 22.5 2.500000 Business 55 37.5 8.166667 Education 25 60 20.416667 Engineering 40 30 3.333333 34.416667
Make the decision. Since our test statistic value of 34.4167 is greater than our critical value of 7.81, we reject H0.
Summarize the results. This indicates that a statistically significant change has occurred in the number of graduates by school for the college.
Although Excel does make our calculations easier to complete, it is not able to directly compute the Chi-Square test statistic from a set of data variables. There is a command available in Excel called CHITEST which is able to compute a P value for a Chi-square goodness-of-fit test only. However in the past many students attempted to use this test for a contingency table test and also became confused about the P value. We are not using this method to find the Chi-square value.
The Contingency Table Test
The contingency table test is used to determine whether or not two or more variables are independent. Let's review the five step process and look at the changes for the contingency table test.
State the null and alternative hypothesis. For the null hypothesis (H0 ), it is usually stated as the populations being examined are independent of each other. In other words, for the factor being examined, all of the populations are independent of each other. The alternative hypothesis (H1) is that the populations are not independent of each other.
Find the critical value. To find the critical value for a chi-square distribution, we must first find the degrees of freedom. For this test, the degrees of freedom are found by using the following formula: df = (r - 1)(c - 1). The letter r represents the number of rows in the table and the value c represents the number of columns. Note: you will only need to use the right columns in the table. For example, the .05, .01, and .10 columns are often used. All chi-square tests are considered one-tail tests. It is also a good idea to state your decision rule to guide you in what you should do after you calculate your test statistic. The same basic format is used. Reject H0 in favor of H1 if our test statistic X2 (calculated X2 value in step 3) is > the critical value from step 2, otherwise fail to reject H0. The decision rule for testing will always use the > symbol.
Compute the test statistic. The test statistic for chi-square contingency table testing can be found in Formula Box 9.2. In Formula Box 9.3, the formula for finding the expected values (E) for each cell of a contingency table test is given.
Formula Box 9.2. The Chi-Square Contingency Table Test.

Notice that the formula for the contingency table test is the same one used for the goodness-of-fit test. The difference in our procedure is related to finding the expected values (E). In the goodness-of-fit test, the value of E for each cell was determined from prior percentages given in the problem. For a contingency table test, the value of E for each cell is determined by using Formula 9.3 below.
Formula Box 9.3. The Chi-Square Expected Value Formula for a Contingency Table Test.

This formula is applied by taking the totals from each row and column in a contingency table, multiplying them together, and dividing by the grand total of all rows and columns.
Make the decision. If our calculated value for the test statistic in step 3 is greater than the critical value found in step 2, we reject the null hypothesis (H0). If our test statistic value is less than the critical value, we fail to reject the null hypothesis (H0).
Summarize the results. Explain your decision here.
It's time to practice solving a contingency table problem.
Practice Exercise 9.3. The Chi-Square Contingency Table Test.
A survey based upon a random sample of 1000 people was taken by a political research company interested in finding out the public's opinion of the federal deficit and what should be done about it. People were asked if the government should reduce the deficit, increase the deficit, or if they had no opinion. The results of the study by gender is given below.
| Gender | Reduce the Deficit | Increase the Deficit | No Opinion | Total |
| Female | 250 | 220 | 70 | 540 |
| Male | 310 | 120 | 30 | 460 |
| Total | 560 | 340 | 100 | 1000 |
At the .05 level, test the hypothesis that the opinion of those surveyed in regards to the deficit is independent of gender.
The null hypothesis is stated as: H0 : The opinion about the budget deficit is independent of gender. The alternative hypothesis is stated as: H1: The variables are not independent.
The critical value is based upon (r - 1)(c - 1) degrees of freedom. We have two rows of data and three columns of data. Therefore the degrees of freedom equal: (2 - 1)(3 - 1) = 2. The critical value with a significance of .05 and 2 degrees of freedom equals 5.99. The decision rule: Reject H0 in favor of H1 if the test statistic is > 5.99. Otherwise fail to reject H0.
Compute the test statistic. Here we use the information given above along with formulas 9.2 and 9.3 to solve. Our first step is to create a table and list all of the observed values. You simply take the observed values from the previous table, and insert them into the table below starting with the first column of data and moving to the right.
| Observed (O) | Expected (E) | O - E | (O - E)2 | (O - E)2 / E |
| 250 | ||||
| 310 | ||||
| 220 | ||||
| 120 | ||||
| 70 | ||||
| 30 |
Our next step is to calculate the expected values (E). In order to do this, you use formula 9.3 for each cell. Notice below that the values for each cell have been inserted into the formula.
| Observed (O) | Expected (E) | O - E | (O - E)2 | (O - E)2 / E |
| 250 | (540)(560) / 1000 = 302.40 | |||
| 310 | (460)(560) / 1000 = 257.60 | |||
| 220 | (540)(340) / 1000 = 183.60 | |||
| 120 | (460)(340) / 1000 = 156.40 | |||
| 70 | (540)(100) / 1000 = 54 | |||
| 30 | (460)(100) / 1000 = 46 |
The next step we take is to subtract the expected values (E) from the observed values (O), square the result, and then divide it by E for each cell.
| Observed (O) | Expected (E) | O - E | (O - E)2 | (O - E)2 / E |
| 250 | (540)(560) / 1000 = 302.40 | -52.40 | 2745.76 | 9.0799 |
| 310 | (460)(560) / 1000 = 257.60 | 52.40 | 2745.76 | 10.6590 |
| 220 | (540)(340) / 1000 = 183.60 | 36.40 | 1324.96 | 7.2166 |
| 120 | (460)(340) / 1000 = 156.40 | -36.40 | 1324.96 | 8.4716 |
| 70 | (540)(100) / 1000 = 54 | 16.00 | 256.00 | 4.7407 |
| 30 | (460)(100) / 1000 = 46 | -16.00 | 256.00 | 5.5652 |
|
Total |
45.733 | |||
Make the decision. Since our test statistic value of 45.733 is greater than our critical value of 5.99, we reject H0.
Summarize the results. By rejecting the null hypothesis, this indicates that people's opinion about the deficit is not independent of gender.
So once again manually calculating the contingency table test is not difficult, but it is time consuming. Can Excel help us? Yes, let's take a look at this problem again but this time we will use Excel to find our test statistic value.
Practice Exercise 9.4. Using Excel 2007 or 2003 to Solve a Contingency Table Problem.
A survey based upon a random sample of 1000 people was taken by a political research company interested in finding out the public's opinion of the federal deficit and what should be done about it. People were asked if the government should reduce the deficit, increase the deficit, or if they had no opinion. The results of the study by gender is given below.
| Gender | Reduce the Deficit | Increase the Deficit | No Opinion | Total |
| Female | 250 | 220 | 70 | 540 |
| Male | 310 | 120 | 30 | 460 |
| Total | 560 | 340 | 100 | 1000 |
At the .05 level, test the hypothesis that the opinion of those surveyed in regards to the deficit is independent of gender.
The null hypothesis is stated as: H0 : The opinion about the budget deficit is independent of gender. The alternative hypothesis is stated as: H1: The variables are not independent.
The critical value is based upon (r - 1)(c - 1) degrees of freedom. We have two rows of data and three columns of data. Therefore the degrees of freedom equal: (2 - 1)(3 - 1) = 2. The critical value with a significance of .05 and 2 degrees of freedom equals 5.99. The decision rule: Reject H0 in favor of H1 if the test statistic is > 5.99. Otherwise fail to reject H0.
Compute the test statistic. Now we can use Excel to make our calculations easier. First open Excel and have a blank worksheet ready.
In your blank worksheet, develop a table similar to the table we used in practice exercise 9.3 except that we only need three columns. Or you can copy the table below. Adjust your column widths as needed.
Observed (O) Expected (E) (O - E)2 / E 250 310 220 120 70 30 You will need to once again calculate your expected values (E) using formula 9.3. Place those values in your spreadsheet in column B.
Now you are ready for the next step. We will be using column C to place another formula within each cell starting with cell C2. This formula is essentially our (O - E)2 / E formula. In cell C2 enter the following formula: =(A2-B2)^2/B2. For cell C3 enter this formula: =(A3-B3)^2/B3. Continue entering the formula for all six cells. If you prefer you can copy the formula from cell C2 to all other cells and verify that the numbers within the formula have changed. Your spreadsheet should look like this:
Observed (O) Expected (E) (O - E)2 / E 250 302.4 9.0799 310 257.6 10.6590 220 183.6 7.2166 120 156.4 8.4716 70 54 4.7407 30 46 5.5652
Now we just need to add all of our values together. In cell c8, place the following formula: =SUM(C2:C7). This should give you a total of 45.7330 which is our test statistic value.
Observed (O) Expected (E) (O - E)2 / E 250 302.4 9.0799 310 257.6 10.6590 220 183.6 7.2166 120 156.4 8.4716 70 54 4.7407 30 46 5.5652 45.7330
Make the decision. Since our test statistic value of 45.733 is greater than our critical value of 5.99, we reject H0.
Summarize the results. This indicates that people's opinion about the deficit is not independent of gender.
Analysis of Variance
Introduction
The Analysis of Variance test (ANOVA) is used to compare the means between three or more normal populations to determine if they are equal or not. We will use Excel to complete all of our calculations since doing the calculations manually is very time consuming.
Basic Thoughts and Terms
Analysis of Variance uses the F distribution as its statistical test. Tables for the F distribution can be found in the back of your textbook. The F distribution has the following characteristics:
Since the F distribution values are based upon degrees of freedom, there is a different F distribution for different degrees of freedom in the numerator and denominator.
There is an infinite number of values for the F distribution from 0 to infinity.
The F distribution is positively skewed to the right.
As the values of X increase, the curve gets closer to the x axis but it never intersects it.
There are several different ways that ANOVA can be used to compare populations. For our purposes here, we will be conducting tests based upon a single factor. This is called a one-way ANOVA test. Using statistical software packages like Excel or SPSS enable us to conduct more sophisticated tests using two or more factors without difficulties.
ANOVA testing requires that the populations under study are normally distributed, the samples are obtained randomly from independent populations, and the population variances are assumed to be equal. The number of samples from each independent population can be different. In other words, the value of n for each sample can be different. The interesting part of this test is that even though it is a test of means, variances are actually used in the calculations.
Procedural Steps Using Hypothesis Testing
We will once again use the five step hypothesis testing process for ANOVA testing.
State the null and alternative hypothesis. For ANOVA the null hypothesis is stated as: H0: All population means are equal. For example, if we are testing four populations, then we would state: H0: µ1 = µ2 = µ3 = µ4. Or we can simply state: All population means are equal. The alternative hypothesis is stated as: H1: Not all population means are equal. It is important to note that ANOVA testing does not tell you which mean does not equal the other means, or the combination of means that are not equal. It will only tell you that at least one mean does not equal the other means.
Find the critical value. The best way to find the critical value for an ANOVA problem, is to extract the number from your Excel output. It is found under the column titled F Crit. To find the critical F value manually, we must first calculate the degrees of freedom for our numerator and our denominator. The degrees of freedom for the numerator is equal to k - 1 where k is the number of samples. For example, if we have independent samples from five populations, k equals 5 and our degrees of freedom equal 4. The degrees of freedom for our denominator is equal to T - k, where T is the total number of items or variables in all samples, and k is the number of samples. So if we have 5 independent populations, our k value is 5. If each sample contained 6 items, then our value for T equals 30 (6 + 6 + 6 + 6 + 6 + 6). The F table has column values for the numerator and row values for the denominator. Simply take your numerator value and your denominator value to find the appropriate F value. For example, if our numerator is equal to 10 and our denominator is equal to 15, our F value from the table is equal to 2.54. HINT: Excel automatically provides you in its output the critical value located under the column called F Crit. The same basic format is used for the decision rule. Reject H0 in favor of H1 if our test statistic F (calculated F value in step 3) is > the critical value from step 2, otherwise fail to reject H0. The decision rule for ANOVA testing will always use the > symbol.
Compute the test statistic. The test statistic for ANOVA testing can be found in Formula Box 9.4.
Formula Box 9.4. Test Statistic for ANOVA Testing.

As you can see from the formula, we will be making two sets of computations. One set is for the numerator and is concerned with the estimated standard error between the independent populations, and the other set of computations is concerned with the estimated standard error within each independent population.
You should stop reading further and skip to the Excel instructions, unless you intend on finding the values manually. It is STRONGLY suggested that you let Excel do the work for you which saves about 15 - 20 minutes per problem!
Both of these formulas will need to be broken down into formulas that we can use to find our values. Let's start with the estimated standard error within the populations, which is the denominator of our formula in Formula Box 9.4. The formula in Formula Box 9.5 shows the computations needed.
Formula Box 9.5. The Estimated Standard Error Within.

The formula indicates that for each independent population you have, you take each item or data value, subtract the mean for each set of items or data values from each item or data value, square it, and add them together. Once again T is equal to the total number of items or data values in all samples, and k is equal to the number of independent populations that we have samples from.
At this point it is not unusual for this formula to look very intimidating and confusing. That's OK! We are going to continue our discussion by looking at the formula for the numerator, our estimated standard error between. Once we are able to apply these formulas to a problem, it will become easier for you to understand how they are used.
Formula Box 9.6. The Estimated Standard Error Between.

Wow! What a formula! It basically states that for each independent population sample, take the of sample items (n) and multiply it by the squared difference between each sample mean and the grand mean. Then add them together and divide by the number of samples (k) minus 1. But wait, what is a grand mean? The grand mean is simply the mean of all of your sample means, weighted by the number of items in each sample. The grand mean has its own formula which you must first calculate in order to find the estimated standard error between.
Formula Box 9.7. The Grand Mean.

Let's interpret the formula for the grand mean. It states that for each sample, take the number of items in the sample and multiply it by the mean for the sample. You do this for each sample you have and then add them together. Finally you divide by the total number of items from all samples. The value k in the formula represents the number of samples from independent populations that you have.
So in order to find the values for our test statistic indicated in Formula Box 9.4, we must first calculate the estimated standard error within and the estimated standard error between. In order to calculate the estimated standard error between, we must first calculate the grand mean.
Once we have all of our values, we place them into our test statistic for F. Then we can proceed to step 4.
Make the Statistical Decision. At this stage we compare our critical value of F from step number 2 with our calculated value of F from step 3. Using the decision rule, we make our decision to reject or fail to reject H0.
Summarize the Results. This is where we state what we are going to do as a result of our analysis.
In the past, many students have panicked over the thought of manually calculating ANOVA. It was very time consuming and often prone to math errors and mistakes. Excel is great at ANOVA problems so let's look at using it!
Using Excel to Calculate ANOVA
Practice Exercise 9.5. Using Excel to Calculate ANOVA.
Excel 2007
A research study was conducted looking at the ages of community college students in Michigan. A random sample of student ages was obtained from three colleges being studied. The ages can be found in the table below.
| Kellogg Community College | Kalamazoo Valley Community College | Jackson Community College |
| 24 | 18 | 31 |
| 33 | 36 | 45 |
| 19 | 29 | 23 |
| 41 | 20 | 26 |
| 22 | 19 |
Test the hypothesis at the .05 level of significance that there is no difference in ages between the three community colleges.
State the null and alternative hypothesis. Our null hypothesis is: H0 = µ1 = µ2 = µ3. The alternative hypothesis is: H1: Not all population means are equal.
Find the critical value. Our numerator value equals 3 - 1 = 2. Our denominator value equals 14 - 3 = 11. Our critical F value from the table equals 3.98. You can also get the critical value from your Excel output. The decision rule: Reject H0 in favor of H1 if our calculated test statistic value of F is > 3.98, otherwise fail to reject H0.
Compute the test statistic. Now instead of using the formula, we use Excel. Open a blank worksheet in Excel. Copy the three columns of data from the table above, including the table headings into the blank worksheet. Your first column should start in cell A1 and have the title Kellogg Community College. Adjust your column widths as needed.
From the Excel menu ribbon, choose Data, Data Analysis, ANOVA:Single Factor.
Excel will then ask for the data input range. Using your mouse highlight the entire range of data excluding your data labels at the top. Your input range should be A2:C6. For the Grouped by box, click in the circle for Columns.
In the Alpha box, enter the significance level .05. Under Output Options, click in the circle next to Output Range, and type in A10 in the box next to Output Range. Now click OK.
Your output from the analysis should appear starting in cell A10 and it should look like the example below.
Anova: Single Factor SUMMARY Groups Count Sum Average Variance Column 1 5 139 27.8 81.7 Column 2 4 103 25.75 69.58333 Column 3 5 144 28.8 101.2