Session 21 Notes, Section 10-1
Scatter Plots and Correlation
In this semester we have learned about descriptive statistics and
inferential statistics.
And in the last half of the semester, we leaned about two areas of
inferential statistics: hypothesis testing and confidence
intervals.
The last area of inferential statistics we are studying this semester
is determining whether a relationship exists between two or more
numerical or quantitative variables.
Many relationships among variables exist in the real world. One way to
determine whether a relationship exists is to use the statistical techniques
known as correlation and regression. In this session we will
explore graphing and study correlation to determine if a
relationship exists.
At the end of this session you will be able to:
If you have learned all of these objectives, then close this window to
return to where you were.
- Statistical Methods
- Correlation is a statistical method used to determine whether a
linear relationship between variables exists.
- Regression is a statistical method used to describe the nature of
the relationship between variables—that is, positive or negative, linear or
nonlinear.
- Statistical Questions: These are questions that we ask in
inferential statistics:
- Are two or more variables related?
- If so, what is the strength of the relationship?
- What type or relationship exists?
- What kind of predictions can be made from the relationship?
- Vocabulary
- A correlation coefficient is a measure of how variables are
related.
- In a simple relationship, there are only two types of
variables under study.
- In multiple relationships, many variables are under study.
Return to objectives
When you finish this lesson you will
be able to
- Draw a scatter plot on graph
paper
- Draw a scatter plot using your
calculator.
Draw scatter plot on graph paper.
- A scatter plot is a graph of the ordered pairs (x,y) of numbers
consisting of the independent variable, x, and the dependent variable,
y.
- A scatter plot is a visual way to describe the nature of the
relationship between the independent and dependent variables. In
the following scatter plot, we have a relationship between hours of
exercises for the independent variable and ounces of milk for a
dependent variable.

Draw a scatter plot using your graphing calculator.
Go to the end of Section 10-2 in your textbook. With your calculator in
hand, go through the steps, one step at a time, for creating a scatter plot
as shown on page 562 - 563. Use Example TI10-1. (Do not go beyond preparing a scatter plot.)
1. Enter the x values in L1 and the y values in L2.
- Press "Stat". The calculator will display EDIT, CALC, AND
TESTS across the top of the screen.
- Select 1: Edit. The top of the screen should display L1, L2,
L3 etc.
- Now enter your values for x under L1 and values for y under L2.
2. Set the Window size using the instructions from the bottom
of page 562. (The Xscl and Yscl determine how close together
the "tick" marks are on the horizontal and vertical scales.)"
3.
The "STAT PLOT F1" key is with the "Y=" key. After you
press 2nd [STAT PLOT F1] for Plot 1, press the ENTER key.
4. Using the right arrow key, move the curser to On and press
enter.
5. Step 5 is done as in the textbook. ("Mark" presents two
ways the plotted points can be displayed.)
6. Step 6 is done
as in the textbook. (""GRAPH" is the right key directly under the
screen.)
Return to objectives
Create the scatter plot for the following set of data associated with a
study of age and systolic blood pressure of six people selected at random:
The scatter plot would look like the graph below. With your
algebraic knowledge of plotting points, identify each point corresponding the an
ordered pair from the table above.

Notice the relationship is linear, like a straight line, and that the pressure
goes up as age goes up.
Return to objectives
When you finish this lesson you will
be able to
- Describe the characteristics of
the Pearson Correlation Coefficient
- Compute Pearson Correlation
Coefficient using the formula
- Compute Pearson Correlation
Coefficient using your calculator
- Rounding Pearson Correlation
Coefficient
- Interpreting the Pearson
Correlation Coefficient
Characteristics of the Pearson Correlation Coefficient
- The correlation coefficient computed from the sample data
measures the strength and direction of a linear relationship between
two variables.
- The symbol for the sample correlation coefficient is
r.
- The symbol for the population correlation coefficient is
the Greek letter ρ (rho).
- The range of the correlation coefficient is from 1 to 1.
- If there is a strong positive linear relationship between the variables, the value of
r or ρ
will be close to 1.
- If there is a strong negative linear relationship
between the variables, the value of r or
ρ will be close to-1.
- When there is no linear relationship between the variables or
only a weak relationship, the value of r or
ρ will be close to 0.

Formula for the Correlation Coefficient r
There are several formulas for the correlation coefficient. We
will use the most commonly used formula called the
Pearson Product Moment Correlation
Coefficient:

- where n is the number of data pairs
Calculate the Correlation Coefficient r with your
calculator
Go to page 520 of Section 10-4 in your textbook.
With your calculator in hand, go through the steps, one step at a time, as
outlined below . Use Example TI10-1 at the top of
page 521. (Do not go beyond finding the correlation coefficient.)
1. Enter the x values in L1 and the y values in L2.
- Press "Stat". The calculator will display EDIT, CALC, AND
TESTS across the top of the screen.
- Select 1: Edit. The top of the screen should display L1, L2,
L3 etc.
- Now enter your values for x under L1 and values for y under L2.
2. Make sure your Diagnostic Display Mode is on.
- Press "2nd" "Catalog". ("Catalog" is under the "0" key at the
bottom of your calculator.)
- Use the down arrow key to scroll down the menu until the little arrow
points to "DiagnosticsOn".
- Press "Enter". You should see "DiagnosticsOn" displayed on the
screen.
- Press "Enter" again.
Once you have done this, you will not have to do it again unless the calculator
is reset to factory specifications.
3. Obtain the correlation coefficient.
- Press "Stat".
- Use right arrow key to highlight CALC.
- Press "8" or scroll down to "8" and press "ENTER".
- Press "2nd LIST" and press ENTER for L1.(Or select a
different list if your data is someplace else.)
- Press ",".
- Press "2nd LIST", press "2", and press ENTER for L2. (Or
select a different list if your data is someplace else.)
Now your correlation coefficient r will be the last
number displayed on your screen.
Rounding the Correlation Coefficient r
Round the correlation coefficient r
to three decimal places so that it can be compared to critical values in
Table L. Make sure that only the final answer is rounded, not
intermediate calculations.
Interpreting the Significance of Correlation Coefficient r
Interpretations of being close to 0 or 1 is vague. Thus we use this
specific criterion: If the absolute value of the computed value
of r exceeds the value in
Table L, we conclude there is a significant correlation coefficient.
Otherwise, there is not sufficient evidence to support the conclusion of a
significant correlation coefficient.
Return to objectives
Compute the value of the correlation coefficient for the data
obtained in the study of age and blood pressure.

- Create a table to calculate the various elements of the formula
-
|
Subject |
Age
x |
Blood
Pressure
y |
xy |
x^2 |
y^2 |
|
A |
44 |
125 |
5500 |
1936 |
15625 |
|
B |
41 |
122 |
5002 |
1681 |
14884 |
|
C |
56 |
135 |
7560 |
3136 |
18225 |
|
D |
61 |
144 |
8784 |
3721 |
20736 |
|
E |
68 |
145 |
9860 |
4624 |
21025 |
|
F |
77 |
151 |
11627 |
5929 |
22801 |
|
6 |
347 |
822 |
48333 |
21027 |
113296 |
- Calculate r
-
|
Numerator of r |
square of
denominator |
r |
|
4764 |
23541276 |
0.981876 |
Return to objectives
- Formally defined, the population
correlation coefficient,
ρ,
is the correlation computed by using all
possible pairs of data values (x, y)
taken from a population
- The sample correlation coefficient
can be used as an estimator of he
population correlation coefficient if
the following assumptions are valid:
- The variables x and y are
linearly related
- The variables are random
variables
- The two variables have a
bivariate normal distribution
- (given any value of x, the y
variable is normally distribted)
- Traditional (Classical) Hypothesis-testing
procedure
-
State the hypotheses
-
Find the critical values
-
Compute the test value
-
Make the decision
-
Summarize the results
- In hypothesis testing, one of the
following is true:
- H0:
ρ
= 0
- This null hypothesis means that
there is no correlation between the
x and y variables in the population.
- H1:
ρ
not = 0
- This alternative hypothesis
means that there is a significant
correlation between the variables in
the population.
-
Formula for the t test for the
correlation coefficient:
-

- with degrees of freedom equal to
n - 2.
- Example
-
P-value method
-
State the hypotheses.
-
Find the test value.
-
Find the P-value
-
Make the decision
-
Summarize the results
-
Use
the PPMC (Pearson product moment
correlation coefficient)
table for the correlation coefficient
that are significant for a specific
alpha level and a specific number of
degrees of freedom. NOTE:
Table I
( p 738) in our book is only for two
tailed alpha for .05 or .01 and is
only for selected d.f. past 20.
(However you will like to use this table
compared to the other approaches for
hypothesis testing of H0!)
-
State the hypotheses.
-
Find the critical value.
-
Make the decision
-
Summarize the results
Return to objectives
Test the significance of the correlation coefficient for the data obtained in a
study of age and systolic blood pressure of six randomly selected subjects. The
data are shown in the table.
| Subject |
|
Age x |
|
Pressure y |
| A |
|
43 |
|
127 |
| B |
|
48 |
|
126 |
| C |
|
56 |
|
132 |
| D |
|
61 |
|
144 |
| E |
|
67 |
|
144 |
| F |
|
70 |
|
152 |
|
Use α= 0.20, and r = 0.949.
- State the hypotheses.
- H0:
ρ
= 0 and H1:
ρ
not = 0
- Find the critical values.
- Since α= 0.20 and there are 6 - 2 =
4 degrees of freedom, the critical values obtained from the t Distribution
table (two tailed on page 731) are 1.533 and -1.533
- Compute the test value.
- Make the decision.
- Reject the null hypothesis, since the test value falls in the critical
region.
- Summarize the results.
- There is a significant relationship between the variables of age and
blood pressure.
Return to objectives
Test the significance of the correlation coefficient for the data obtained in a
study of age and systolic blood pressure of six randomly selected subjects. The
data are shown in the table.
| Subject |
|
Age x |
|
Pressure y |
| A |
|
43 |
|
125 |
| B |
|
48 |
|
120 |
| C |
|
56 |
|
136 |
| D |
|
61 |
|
144 |
| E |
|
67 |
|
138 |
| F |
|
70 |
|
149 |
|
Use α= 0.01, and r = 0.889.
- State the hypotheses.
- H0:
ρ
= 0 and H1:
ρ
not = 0
- Find the test value.
-

- Since there are 6 - 2 = 4 degrees of freedom, look for 3.883 on the t
Distribution table (two tailed on page 731 in the d.f. = 4 row). 3.883
is greater than 3.747 and less than 4.604.
- Find the P-value
- 0.01 < P-value < .02 (read the α values
from the two tailed row above 3.747 and 4.604)
- Make the decision.
- Since 0.01 < 0.01 < .02 is not
a true statement, reject the null hypothesis. (NOTE: if our test value
had been 3.747 instead of 3.883, the P-value would have been equal to 0.01
and there would not have been a range and we would have not rejected the
null hypothesis)
- Summarize the results.
- There is a significant relationship between the variables of age and
blood pressure.
Return to objectives
The following data were obtained in a study on
the number of hours that nine people exercise each
week and the amount of milk (in ounces) each person
consumes each week.
| Subject |
|
Hours x |
|
Amount y |
| A |
|
3 |
|
57 |
| B |
|
0 |
|
8 |
| C |
|
2 |
|
66 |
| D |
|
5 |
|
61 |
| E |
|
8 |
|
41 |
| F |
|
5 |
|
66 |
| G |
|
10 |
|
57 |
| H |
|
2 |
|
50 |
| I |
|
1 |
|
57 |
|
Using the table for the critical values for PPMC,
test the significance of the correlation coefficient
r = 0.294 at α= 0.01.
- State the hypotheses.
- H0:
ρ
= 0 and H1:
ρ
not = 0
-
- Find the Critical Value
- Since the sample size is 9, there are 7 degrees of
freedom.
- When = 0.01 and with 7 degrees of freedom,
the value obtained from table I (page 738) for the critical
values for PPMC is 0.798.
- For a significant relationship, a value of r greater than + 0.798 or
less than - 0.798 is needed.
- Decision
- Since r = 0.294 and 0.294 des not fall in the critical ranges for
a significant relationship, the null hypothesis is not rejected.
-

- Summary
- There is not enough evidence to say that there is a significant linear
relationship between the variables.
Return to objectives
- The strength and direction of the linear relationship
between variables is measured by the value of the correlation
coefficient r.
- r can assume values between and including 1 and -1.
- The closer the value of the correlation coefficient is to
1 or -1, the stronger the linear relationship is between the
variables.
- A value of 1 or -1 indicates a perfect linear
relationship.
When you finish these notes, then close this window to return to where
you were.