Session 23 Notes, Section 10-5

Coefficient of Determination - Standard Error of the Estimate

In this unit, we have learned that, if the correlation coefficient is significant, we can determine the equation of the linear regression line.  Using this regression line and given values for the independent variable x, we can "predict" values for the dependent variable "y".  In this session we will learn other measures associated with the correlation coefficient: the coefficient of determination, the standard error of the estimate, and the prediction interval.

At the end of this sesson you will be able to:

If you have learned all of these objectives, then close this window to return to where you were.


Vocabulary

Figure 10-17 is a nice illustration of the relationship between these variations.  You do not have go memorize each kind of variation.

Return to objectives


Compute the coefficient of determination.

Return to objectives


Summary: Coefficient of Determination

If the correlation relationship between two sets of data is significant, then we can make a regression  line.

A regression line is the data line of best fit for values of x and y and can be used to predict results for y given certain values of x.  To predict values of y means mathematically that y depends on x.  If you studied functions in your algebra classes, this means y is a function of x, y is the dependent variable, and x is the independent variable.

Keep in mind that the regression line is a data line of best fit and, therefore, is only a predictor of y-values.  Therefore, the is a variation between the y-value on the regression line and the y-value for a point.

The "explained" variation is the difference between the y-value of the regression line and the average y-value.  See all of this discussion implies y is the dependent variable depending on the x-values.

Coefficient of determination is a measure of explained variation.  Therefore, the coefficient of determination is a measure of the variation of the dependent variable y (from the regression line) accounted for by the variation of the independent variable x.

Return to objectives


 

Example of Cofficient of Determination.

Find the coefficient of determination given r = 0.15.

Return to objectives


Example of Coefficient of Nondetermination

Return to objectives


Compute the standard error of estimate.

The standard error of estimate, denoted by Sest is the standard deviation of the observed y values about the predicted y' values. The formula for the standard error of estimate is:

Return to objectives


Example of Standard of Estimate

A researcher collects the following data and determines that there is a significant relationship between the age of a copy machine and its monthly maintenance cost. The regression equation is y = 54.57 + 8.63x. Find the standard error of the estimate.

Machine       Age x (years)    Monthly cost y
tiny
A 1 63
B 2 75
C 3 72
D 4 91
E 4 94
F 6 105
 

Since the formula for the standard error of estimate is:

Make a table as shown.
      x             y             yminute             y - yminute       (y - yminute)2
tiny
1 63      
2 75      
3 72      
4 91      
4 94      
6 105      
 

Using the regression line equation y = 54.57 + 8.63x, compute the predicted values y for each y and place the results in the column labeled y
x = 1, y' = 54.57 + (8.63)(1) = 63.2
x = 2, y' = 54.57 + (8.63)(2) = 71.83
x = 3, y' = 54.57 + (8.63)(3) = 80.46
x = 4, y' = 54.57 + (8.63)(4) = 89.09
x = 6, y' = 54.57 + (8.63)(6) = 106.35

For each y, subtract y and place the answer in the column labeled y - y.
63 - 63.2 = -0.20
75 - 71.83 = 3.17
72 - 80.46 = -8.46
 91 - 89.09 = 1.91
94 - 89.09 = 4.91
105 - 106.35 = -1.35

Square the numbers found in the last step and place the squares in the column labeled (y - y)2 and find the sum of the numbers in the last column. The completed table should look like this:

      x             y             yminute             y - yminute       (y - yminute)2
tiny
1 63 63.2 -0.20 0.04
2 75 71.83 3.17 10.0489
3 72 80.46 -8.46 71.5716
4 91 89.09 1.91 3.6481
4 94 89.09 4.91 24.1081
6 105 106.35 -1.35   1.8225  
        111.2392

Substitute in the formula and find Sest.

sest
 
sum(y - yminute)2
n - 2
 = 
 
111.2392
6 - 2
 = 5.27

The standard error of the estimate is 5.27.

Return to objectives


Example of Standard Error of Estimate Using the Regression line

Find the standard error of the estimate for the given data using the formula

The regression equation is yminute = 59.37 + 6.24x.
Machine       Age x (years)    Monthly cost y
tiny
A 1 67
B 2 70
C 3 76
D 4 82
E 4 90
F 6 96

Round your answer to two decimal places.

  1. Make a table
  2. Find the product of x and y values, and place the results in the third column.
  3. Square the y, and place the results in the fourth column.
  4. Find the sums of the second, third, and fourth columns.

The completed table should look like this:

x       y       xy       y2
tiny
1   67   67   4489
2   70   140   4900
3   76   228   5776
4   82   328   6724
4   90   360   8100
6   96   576   9216
  sum y = 481 sum xy = 1699 sum y2 = 39205

From the regression equation y = 59.37 + 6.24x,

Substitute in the formula and find Sest.

 sest
=
 
 
39205 - (59.37)(481) - (6.24)(1699)
6 - 2
 = 3.4

The standard error of the estimate is 3.4

Return to objectives


Find a prediction interval.

You can see this is similar to a confidence interval that we studied earlier in the course.

Return to objectives


Example of Prediction Interval

For the given data and regression equation yminute = 58.39 + 5.28x, find the 95% prediction interval for the monthly maintenance cost of a machine that is 4 years old.  Round your answer to two decimal places.

Machine       Age x (years)    Monthly cost y
tiny
A 1 64
B 2 71
C 3 69
D 4 80
E 4 82
F 6 90
Find sum x, sum x2, and 
X
.
sum x = 20  ,  sum x2 = 82   ,
X
=
20
6
= 3.3
 
Find yminute for x = 4.
yminute
=
58.39 + 5.28x
 
yminute
=
58.39 + 5.28(4) = 79.51
 
Find sum y=456, sum y2=35122, and sum xy=1601
Find sest:
sest = 3.27
Substitute in the formula and solve: talpha/2 = 2.776, d.f. = 6 - 2 = 4 for 95%.
 
yminute - talpha/2sest
 
1 +  
1
n
+
n(x
X
)2 
n sum x2 - (sumx)2
< y < yminute + talpha/2sest
 
1 +  
1
n
+
n(x
X
)2 
n sum x2 - (sumx)2
 
 
79.51 - (2.776)(3.27)
 
1 +
1
6
+
6 (4 - 3.3)2 
6 (82) - (20)2
< y < 79.51 + (2.776)(3.27)
 
1 +
1
6
+
6 (4 - 3.3)2 
6 (82) - (20)2
 
79.51 - (2.776)(3.27)(1.09)
< y <
79.51 + (2.776)(3.27)(1.09)
 
79.51 - 9.92
< y <
79.51 + 9.92
 
69.97
< y <
89.45
 
Hence, one can be 95% confident that the interval 69.97 < y < 89.45 contains the actual value of y.

Return to objectives


Example of Prediction Interval.

For the given data and regression equation yminute = 58.67 + 5.85x, find the 95% prediction interval for the monthly maintenance cost of a machine that is 4 years old.
Machine       Age x (years)    Monthly cost y
tiny
A 1 66
B 2 74
C 3 71
D 4 77
E 4 84
F 6 97
Round your answer to two decimal places.
Find sum x, sum x2, and 
X
.
sum x = 20    sum x2 = 82   
X
=
20
6
= 3.3
 
Find yminute for x = 4.
yminute
=
58.67 + 5.85x
 
yminute
=
58.67 + 5.85(4) = 82.07
 
Find sum y=469, sum y2=37267, and sum xy=1653
 
 
Find sest:
sest = 4.49
Substitute in the formula and solve: talpha/2 = 2.776, d.f. = 6 - 2 = 4 for 95%. (page 731)
 
yminute - talpha/2sest
 
1 +  
1
n
+
n(x
X
)2 
n sum x2 - (sumx)2
< y < yminute + talpha/2sest
 
1 +  
1
n
+
n(x
X
)2 
n sum x2 - (sumx)2
 
 
82.07 - (2.776)(4.49)
 
1 +
1
6
+
6 (4 - 3.3)2 
6 (82) - (20)2
< y < 82.07 + (2.776)(4.49)
 
1 +
1
6
+
6 (4 - 3.3)2 
6 (82) - (20)2
 
82.07 - (2.776)(4.49)(1.09)
< y <
82.07 + (2.776)(4.49)(1.09)
 
82.07 - 13.62
< y <
82.07 + 13.62
 
68.43
< y <
95.71
 
Hence, one can be 95% confident that the interval 68.43 < y < 95.71 contains the actual value of y

Return to objectives


Summary


When you finish these notes, then close this window to return to where you were.