Session 23 Notes, Section 10-5
Coefficient of Determination - Standard Error of the Estimate
In this unit, we have learned that, if the correlation coefficient is
significant, we can determine the equation of the linear regression line.
Using this regression line and given values for the independent variable x, we
can "predict" values for the dependent variable "y". In this session we
will learn other measures associated with the correlation coefficient: the
coefficient of determination, the standard error of the estimate, and the
prediction interval.
At the end of this sesson you will be able to:
If you have learned all of these objectives, then close this window to
return to where you were.
- Total Variation
- The total variation,
,
is the sum of the squares of the vertical distance each point is from the
mean.
- The total variation can be divided into two parts:
- that which is attributed to the relationship of x and y,
- (i.e., from the predicted y' values) is
and is called the explained variation.
- that which is due to chance, found by
,
is called the unexplained variation. This variation cannot be attributed to
the relationships.
- Hence, the total variation is equal to the sum of the explained variation
and the unexplained variation.

- For a single point, the differences are called deviations.
Figure 10-17 is a nice illustration of the relationship between these
variations. You do not have go memorize each kind of variation.
Return to objectives
- The coefficient of
determination is the ratio of the explained variation to the
total variation. The symbol for the coefficient of determination is r2
and is given by

- The coefficient of determination is a measure of the variation of the
dependent variable that is explained by the regression line and the
independent variable
- Also he coefficient of determination is found by squaring
correlation coefficient r and
converting it to a percent.
- The coefficient of nondetermination is a measure of the unexplained
variation.
- The formula for the coefficient of nondetermination is:
Return to objectives
Summary: Coefficient of Determination
If the correlation relationship between two sets of data is
significant, then we can make a regression line.
A regression line is the data line of best fit
for values of x and y and can be used to predict results for y given certain
values of x. To predict values of y means mathematically that y
depends on x. If you studied functions in your
algebra classes, this means y is a function of x, y is the
dependent variable, and x is the
independent variable.
Keep in mind that the regression line is a data line of best fit and,
therefore, is only a predictor of y-values. Therefore, the is a
variation between the y-value on the regression line and the
y-value for a point.
The "explained" variation is the difference between the
y-value of the regression line and the average
y-value. See all of this discussion implies y is the dependent
variable depending on the x-values.
Coefficient of determination is a measure of
explained variation. Therefore, the coefficient of determination
is a measure of the variation of the dependent variable y (from the
regression line) accounted for by the variation of the independent
variable x.
Return to objectives
Find the coefficient of determination given r = 0.15.
- The coefficient of determination is found by squaring r
and converting it to a percent.
- If r = 0.15, then r2 = 0.0225.
- This result means that 2.25% of the variation in the
dependent variable is accounted for by the variations in the
independent variable.
- Note: it is not surprising that if the correlation
coefficient is only .15, the variation that is due to the
independent variable is not very great.
Return to objectives
- Find the coefficient of nondetermination given r = 0.24.
- The coefficient of nondetermination is found by
subtracting the square of the coefficient of determination
from 1.
- The symbol for the coefficient of determination is r2.
- Therefore if r = 0.24,
- Subtracting from 1 we obtain 1 - r2
= 1 - 0.0576 = 0.9424.
- Converting this to a percent we have 94.24%.
- This result means that 94.24% of the variation in the
dependent variable is not explained by the variations in the
independent variable.
- Note: it is not surprising that if the correlation
coefficient is only .24, the majority of the variation is
not due to the independent variable.
Return to objectives
The standard error of estimate, denoted by
Sest
is the standard deviation of the observed y values about the predicted
y' values. The formula for the standard error of estimate is:

- A second formula that can be used for the calculation of the
standard error of estimate is:

- Where a and b are from the equation of the regression line, written
as y'=a + bx, where
b is the slope of the line and
a is the y' intercept.
Return to objectives
A researcher collects the following data and determines that there is a
significant relationship between the age of a copy machine and its monthly
maintenance cost. The regression equation is y = 54.57 + 8.63x. Find the
standard error of the estimate.
| Machine |
Age x (years) |
Monthly cost y |
 |
| A |
1 |
63 |
| B |
2 |
75 |
| C |
3 |
72 |
| D |
4 |
91 |
| E |
4 |
94 |
| F |
6 |
105 |
|
|
Since the formula for the standard error of estimate is:
| x |
y |
y |
y - y |
(y - y )2 |
 |
| 1 |
63 |
|
|
|
| 2 |
75 |
|
|
|
| 3 |
72 |
|
|
|
| 4 |
91 |
|
|
|
| 4 |
94 |
|
|
|
| 6 |
105 |
|
|
|
|
|
Using the regression line equation y = 54.57 + 8.63x, compute the predicted
values y for each y and place the results in the column labeled y
x = 1, y' = 54.57 + (8.63)(1) = 63.2
x = 2, y' = 54.57 + (8.63)(2) = 71.83
x = 3, y' = 54.57 + (8.63)(3) = 80.46
x = 4, y' = 54.57 + (8.63)(4) = 89.09
x = 6, y' = 54.57 + (8.63)(6) = 106.35
For each y, subtract y and place the answer in the column labeled y - y.
63 - 63.2 = -0.20
75 - 71.83 = 3.17
72 - 80.46 = -8.46
91 - 89.09 = 1.91
94 - 89.09 = 4.91
105 - 106.35 = -1.35
Square the numbers found in the last step and place the squares in the column
labeled (y - y)2 and find the sum of the numbers in the last column. The
completed table should look like this:
| x |
y |
y |
y - y |
(y - y )2 |
 |
| 1 |
63 |
63.2 |
-0.20 |
0.04 |
| 2 |
75 |
71.83 |
3.17 |
10.0489 |
| 3 |
72 |
80.46 |
-8.46 |
71.5716 |
| 4 |
91 |
89.09 |
1.91 |
3.6481 |
| 4 |
94 |
89.09 |
4.91 |
24.1081 |
| 6 |
105 |
106.35 |
-1.35 |
1.8225 |
| |
|
|
|
111.2392 |
Substitute in the formula and find Sest.
The standard error of the estimate is 5.27.
Return to objectives
Find the standard error of the estimate for the given data using the formula

The regression equation is y
= 59.37 + 6.24x. |
| Machine |
Age x (years) |
Monthly cost y |
 |
| A |
1 |
67 |
| B |
2 |
70 |
| C |
3 |
76 |
| D |
4 |
82 |
| E |
4 |
90 |
| F |
6 |
96 |
|
Round your answer to two decimal places.
- Make a table
- Find the product of x and y values, and place the results in
the third column.
- Square the y, and place the results in the fourth column.
- Find the sums of the second, third, and fourth columns.
The completed table should look like this:
| x |
|
y |
|
xy |
|
y2 |
 |
| 1 |
|
67 |
|
67 |
|
4489 |
| 2 |
|
70 |
|
140 |
|
4900 |
| 3 |
|
76 |
|
228 |
|
5776 |
| 4 |
|
82 |
|
328 |
|
6724 |
| 4 |
|
90 |
|
360 |
|
8100 |
| 6 |
|
96 |
|
576 |
|
9216 |
| |
y = |
481 |
xy = |
1699 |
y2 = |
39205 |
From the regression equation y = 59.37 + 6.24x,
Substitute in the formula
and
find Sest.
|
|
|
| |
| |
 |
 |
|
39205 - (59.37)(481) - (6.24)(1699) |
 |
|
6 - 2 |
|
|
|
 |
|
= 3.4 |
|
The standard error of the estimate is 3.4
Return to objectives
- The standard error of estimate can be used for constructing a
prediction interval about a y' value.
- The formula for the prediction interval is:
You can see this is similar to a confidence interval that we
studied earlier in the course.
Return to objectives
For the given data and regression equation y
= 58.39 + 5.28x, find the 95% prediction interval for the monthly
maintenance cost of a machine that is 4 years old. Round your answer to
two decimal places.
| Machine |
Age x (years) |
Monthly cost y |
 |
| A |
1 |
64 |
| B |
2 |
71 |
| C |
3 |
69 |
| D |
4 |
80 |
| E |
4 |
82 |
| F |
6 |
90 |
|
 |
x |
= |
20 |
, |
 |
x2 |
= |
82 |
, |
 |
| X |
 |
|
= |
| 20 |
 |
| 6 |
|
= |
3.3 |
|
|
|
Find y
for x = 4. |
|
y |
|
|
|
|
y |
|
|
|
|
Find
y=456,
y2=35122, and
xy=1601 |
Find sest:
|
|
|
Substitute in the formula and solve:
t /2
=
2.776, d.f. = 6 - 2 = 4 for 95%. |
|
| |
y |
- |
t /2sest |
|
< |
y |
< |
y |
+ |
t /2sest |
|
|
|
|
| |
| 79.51 |
- |
(2.776)(3.27) |
| |
 |
 |
| 1 |
+ |
| 1 |
 |
| 6 |
|
+ |
| 6 (4
- 3.3)2 |
 |
| 6 (82)
- (20)2 |
|
|
|
|
 |
|
< |
y |
< |
79.51 |
+ |
(2.776)(3.27) |
| |
 |
 |
| 1 |
+ |
| 1 |
 |
| 6 |
|
+ |
| 6 (4
- 3.3)2 |
 |
| 6 (82)
- (20)2 |
|
|
|
|
 |
|
|
|
|
| 79.51 - (2.776)(3.27)(1.09) |
|
|
| 79.51 + (2.776)(3.27)(1.09) |
|
|
|
|
|
|
|
|
|
|
|
|
| Hence, one can be 95% confident that
the interval 69.97 < y < 89.45 contains the actual
value of y. |
|
Return to objectives
For the given data and
regression equation y
= 58.67 + 5.85x, find the 95% prediction
interval for the monthly maintenance cost of a machine
that is 4 years old. |
| Machine |
Age x (years) |
Monthly cost y |
 |
| A |
1 |
66 |
| B |
2 |
74 |
| C |
3 |
71 |
| D |
4 |
77 |
| E |
4 |
84 |
| F |
6 |
97 |
|
| Round your answer to
two decimal places. |
|
 |
x |
= |
20 |
|
 |
x2 |
= |
82 |
|
 |
| X |
 |
|
= |
| 20 |
 |
| 6 |
|
= |
3.3 |
|
|
|
Find y
for x = 4. |
|
y |
|
|
|
|
y |
|
|
|
|
Find
y=469,
y2=37267,
and
xy=1653 |
|
|
Find sest:
|
|
Substitute in the formula and solve:
t /2
=
2.776, d.f. = 6 - 2 = 4 for 95%. (page
731) |
|
| |
y |
- |
t /2sest |
|
< |
y |
< |
y |
+ |
t /2sest |
|
|
|
|
| |
| 82.07 |
- |
(2.776)(4.49) |
| |
 |
 |
| 1 |
+ |
| 1 |
 |
| 6 |
|
+ |
| 6 (4
- 3.3)2 |
 |
| 6 (82)
- (20)2 |
|
|
|
|
 |
|
< |
y |
< |
82.07 |
+ |
(2.776)(4.49) |
| |
 |
 |
| 1 |
+ |
| 1 |
 |
| 6 |
|
+ |
| 6 (4
- 3.3)2 |
 |
| 6 (82)
- (20)2 |
|
|
|
|
 |
|
|
|
|
| 82.07 - (2.776)(4.49)(1.09) |
|
|
| 82.07 + (2.776)(4.49)(1.09) |
|
|
|
|
|
|
|
|
|
|
|
|
| Hence, one can be 95% confident that
the interval 68.43 < y < 95.71 contains the actual
value of y |
|
Return to objectives
- The coefficient of determination is a better indicator of
the strength of a linear relationship than the correlation
coefficient.
- It is better because it identifies the percentage of
variation of the dependent variable that is directly
attributable to the variation of the independent variable.
- The coefficient of determination is obtained by squaring
the correlation coefficient and converting the result to a
percentage.
- Another statistic used in correlation and regression is
the standard error of estimate, which is an estimate of the
standard deviation of the y values about the predicted y'
values.
- The standard error of estimate can be used to construct a
prediction interval about a specific value point estimate y'
of the mean or the y values for a given x.
When you finish these notes, then close this window to return to where
you were.