## Lesson 10: Correlation and Scatterplots

#### Objectives

- Calculate correlation coefficients.
- Test the significance of correlation coefficients.
- Construct a scatterplot.
- Edit features of the scatterplot.

#### Overview

In correlational research, there is no experimental manipulation. Rather, we measure variables in their natural state. Instead of independent and dependent variables, it is useful to think of predictors and criteria. In bivariate (two-variable) correlation, we are assessing the degree of linear relationship between a predictor, *X*, and a criterion, *Y*. In multiple regression, we are assessing the degree of relationship between a linear combination of two or more predictors, *X*_{1}, *X*_{2}, ...*X*_{k}, and a criterion, *Y*. We will address correlation in the bivariate case in Lesson 10, linear regression in the bivariate case in Lesson 11, and multiple regression and correlation in Lesson 12.

The Pearson product moment correlation coefficient summarizes and quantifies the relationship between two variables in a single number. This number can range from -1 representing a perfect negative or inverse relationship to 0 representing no relationship or complete independence to +1 representing a perfect positive or direct relationship. When we calculate a correlation coefficient from sample data, we will need to determine whether the obtained correlation is significantly different from zero. We will also want to produce a scatterplot or scatter diagram to examine the nature of the relationship. Sometimes the correlation is low not because of a lack of relationship, but because of a lack of linear relationship. In such cases, examining the scatterplot will assist in determining whether a relationship may be nonlinear.

#### Example Data

Suppose that you have collected questionnaire responses to five questions concerning dormitory conditions from 10 college freshmen. (Normally you would like to have a larger sample, but the small sample in this case is useful for illustration.) The questionnaire assesses the students' level of satisfaction with noise, furniture, study area, safety, and privacy. Assume that you have also assessed the students' family income level, and you would like to test the hypothesis that satisfaction with the college living environment is related to wealth (family income).

The questionnaire contains five questions about satisfaction with the various aspects of the dormitory "noise," "furniture," "space," "study," "safety," and "privacy." These are answered on a 5-point Likert-type scale (very dissatisfied to very satisfied), which are coded as 1 to 5. The data sheet for this study is shown below.

Student |
Income |
Noise |
Furniture |
Study_Area |
Safety |
Privacy |

1 |
39 |
5 |
5 |
4 |
5 |
5 |

2 |
59 |
3 |
3 |
5 |
5 |
4 |

3 |
75 |
2 |
1 |
2 |
2 |
2 |

4 |
45 |
5 |
3 |
4 |
4 |
5 |

5 |
95 |
1 |
2 |
2 |
1 |
2 |

6 |
115 |
1 |
1 |
1 |
1 |
1 |

7 |
67 |
3 |
2 |
4 |
3 |
3 |

8 |
48 |
4 |
4 |
5 |
4 |
4 |

9 |
140 |
2 |
2 |
1 |
1 |
1 |

10 |
55 |
3 |
4 |
5 |
4 |
4 |

#### Entering the Data in SPSS

The data correctly entered in SPSS would look like the following (see Figure 10-1). Remember not only to enter the data, but to add appropriate labels in the Variable View to improve the readability of the output. If you prefer, you can download a copy of the data file.

Figure 10-1 Data entered in SPSS

#### Calculating and Testing Correlation Coefficients

To calculate and test the significance of correlation coefficients, select **Analyze**, **Correlate**, **Bivariate** (see Figure 10-2).

Figure 10-2 The bivariate correlation procedure

Move the desired variables to the Variables window, as shown in Figure 10-3.

Figure 10-3 Move desired variables to the Variables window

Under the Options menu, let us select means and standard deviations and then click **Continue**. The output contains a table of descriptive statistics (see Figure 10-4) and a table of correlations and related significance tests (see Figure 10-5).

Figure 10-4 Descriptive statistics

Figure 10-5 Correlation matrix

Note that SPSS flags significant correlations with asterisks. The correlation matrix is symmetrical, so the above-diagonal entries are the same as the below-diagonal entries. In our survey results we note strong negative correlations between family income and the various survey items and strong positive correlations among the various items.

#### Constructing a Scatterplot

For purposes of illustration, let us produce a scatterplot of the relationship between satisfaction with noise level in the dormitory and family income. We see from the correlation matrix that this is a significant negative correlation. As family income increases, satisfaction with the dormitory noise level decreases. To build the scatterplot, select Graphs, Interactive, Scatterplot (see Figure 10-6). Please note that there are several different ways to construct the scatterplot in SPSS, and that we are illustrating only one here.

Figure 10-6 Constructing a scatterplot

In the resulting dialog, enter Family Income on the *X*-axis and Noise on the *Y*-axis (see Figure 10-7).

Figure 10-7 Specifying variables for the scatterplot

The resulting scatterplot (see Figure 10-8) shows the relationship between family income and satisfaciton with dormitory noise.

Figure 10-8 Scatterplot

In the SPSS Viewer it is possible to edit a chart object by double-clicking on it in the SPSS Viewer. In attition to many other options, you can change the labeling and scaling of axes, add trend lines and other elements to the scatterplot, and change the marker types. The edited chart apears in Figure 10-9. If you like, you can save this particular combination as a chart template to use it again in the future.

Figure 10-9 Edited scatterplot