Lesson 2: Entering and Working with Data

Objectives

  1. Create a data file and data structure.
  2. Compute a new variable.
  3. Select cases.
  4. Sort cases.
  5. Split a file.

Overview

Data can be entered directly into the SPSS Data Editor or imported from a variety of file types. It is always important to check data entries carefully and ensure that the data are accurate. In this lesson you will learn how to build an SPSS data file from scratch, how to calculate a new variable, how to select and sort cases, and how to split a file into separate layers.

Creating a Data File

A common first step in working with SPSS is to create or open a data file. We will assume in this lesson that you will type data directly into the SPSS Data Editor to create a new data file. You should realize that you can also read data from many other programs, or copy and paste data from worksheets and tables to create new data files.

Launch SPSS. You will be given various options, as we discussed in Lesson 1. Select Type in Data or Cancel . You should now see a screen similar to the following, which is a blank dataset in the Data View of the SPSS Data Editor (see Figure 2-1):


Figure 2-1 SPSS Data Editor - Data View

Key Point: One Row Per Participant, One Column per Variable

It is important to note that each row in the SPSS data table should be assigned to a single participant, subject, or case, and that no case's data should appear on different rows. When there are multiple measures for a case, each measure should appear in a separate column (called a "variable" by SPSS). If you use a coding variable to indicate which group or condition was assigned to a case, that variable should also appear in a separate column. So if you were looking at the scores for five quizzes for each of 20 students, the data for each student would occupy a single row (line) in the data table, and the score for each quiz would occupy a separate column.

Although SPSS automatically numbers the rows of the data table, it is a very good habit to provide a separate participant (or subject) number column so that records can be easily sorted, filtered, or selected. Best practice also requires setting up the data structure for the data. For this purpose, we will switch to the Variable View of the Data Editor by clicking on the Variable View tab at the bottom of the Data Editor window. See Figure 2-2.


Figure 2-2 SPSS Data Editor - Variable View

Example Data

Let us establish the data structure for our example of five quizzes and 20 students. We will assume that we also know the age and the sex of each student. Although we could enter "F" for female and "M" for male, most statistical procedures are easier to perform if a number is used to code such categorical variables. Let us assign the number "1" to females and the number "0" to males. The hypothetical data are shown below:

Student
Sex
Age
Quiz1
Quiz2
Quiz3
Quiz4
Quiz5
1
0
18
83
87
81
80
69
2
0
19
76
89
61
85
75
3
0
17
85
86
65
64
81
4
0
20
92
73
76
88
64
5
1
23
82
75
96
87
78
6
1
18
88
73
76
91
81
7
0
21
89
71
61
70
75
8
1
20
89
70
87
76
88
9
1
23
92
85
95
89
62
10
1
21
86
83
77
64
63
11
1
23
90
71
91
86
87
12
0
18
84
71
67
62
70
13
0
21
83
80
89
60
60
14
0
17
79
77
82
63
74
15
0
19
89
80
64
94
78
16
1
20
76
85
65
92
82
17
1
19
92
76
76
74
91
18
1
22
75
90
78
70
76
19
1
22
87
87
63
73
64
20
0
20
75
74
63
91
87

Specifying the Data Structure

Switch to the Variable View by clicking on the Variable View tab (see Figure 2-2 above). The numbers at the left of the window now refer to variables rather than participants. Note that you can specify the variable Name, the Type of variable, the variable Width (in total characters or digits), the number of Decimals , a descriptive Label, labels for different Values, how to deal with Missing Values, the display Column width, how to Align the variable in the display, and whether the Measure is nominal, ordinal, or scale (interval and ratio). In many cases you can simply accept the defaults by leaving the entries blank. But you will definitely want to enter a variable Name and Label, and also specify Value labels for the levels of categorical or grouping variables such as sex or the levels of an independent variable. The variable names should be short and should not contain spaces or special characters other than perhaps underscores. Variable labels, on the other hand, can be longer and can contain spaces and special characters.

Let us specify the structure of our dataset by naming the variables as follows. We will also provide information concerning the width, number of decimals, and type of measure, along with a descriptive label:

    1. Student
    2. Sex
    3. Age
    4. Quiz1
    5. Quiz2
    6. Quiz3
    7. Quiz4
    8. Quiz5

No decimals appear in our raw data, so we will set the number of decimals to zero. After we enter the desired information, the completed data structure might appear as follows:


Figure 2-3 SPSS data structure (Variable View)

Notice that we provided value labels for Sex, so we won't confuse our 1's and 0's later. To do this, click on Values in the Sex variable row and enter the appropriate labels for males and females (see Figure 2-4).


Figure 2-4 Adding value labels

After entering the value and label for one sex, click on Add and then repeat the process for the other sex. Click on Add after entering this information and then click OK.

Entering the Data

Now return to the data view (click on the Data View tab), and type in the data. If you prefer, you may retrieve a copy of the data file by clicking here. Save the data file with a name that will help you remember it. In this case, we used lesson_2.sav as the file name. Remember that SPSS will provide the .sav extension for a data file. The data should appear as follows:


Figure 2-5 Completed data entry

Computing a New Variable

Now we will compute a new variable by averaging the five quiz scores for each student. When we compute this new variable, it will be added to our variable list, and a new column will be created for it. Let us call the new variable Quiz_Avg and use SPSS's built-in function called MEAN to compute it. Select Transform, then Compute. The Compute Variable dialog box appears. You may type in the new variable name, specify the type and provide a label, and enter the formula for computing the new variable. In this case, we will use the formula:

Quiz_Avg = MEAN(Quiz1, Quiz2, Quiz3, Quiz4, Quiz5)

You can enter the formula by selecting MEAN from the Functions window and then clicking on the variable names, or you can simply type in the formula, separating the variable names by commas.

The initial Compute Variable dialog box with the target variable named Quiz_Avg and the MEAN function selected is below. The question marks indicate that you must supply expressions for the computation.


Figure 2-6 Compute Variable screen

The appropriate formula is as follows:


Figure 2-7 Completed expression

When you click OK, the new variable appears in both the data and variable views (see below). As discussed earlier, you can change the number of decimals (numerical variables default to two decimals) and add a descriptive label for the new variable.


Figure 2-8 New variable appears in Data View


Figure 2-9 New variable appears in Variable View

Selecting Cases

You may want to select only certain cases, such as the data for females or for individuals with ages lower than 20 years. SPSS allows you to select cases either by filtering (which keeps all the cases but limits further analyses to the selected cases) or by removing the cases that do not meet your criteria. Usually, you will want to filter cases, but sometimes, you may want to create separate files for additional analyses by deleting records that do not match your selection criteria. We will select records for females and filter those records so that the records for males remain but will be excluded from analyses until we select them again.

From either the variable view or the data view, click on Data, then click on Select Cases. The resulting dialog box allows you to select the desired cases for further analysis, or to re-select all cases if data were previously filtered. Let us choose "If condition is satisfied," and specify that we want to select only records for which the sex of the participant is female. See the dialog box in the following figure.


Figure 2-10 Select Cases dialog

Click the "If..." button and enter the condition for selection. In this case we will enter the expression Sex = 1. You can type this in directly, or you can point and click to the entries in the dialog box


Figure 2-11 Select Cases expression

Click Continue, then Click OK, and then examine the data view (see Figure 2-12). Records for males will now have a diagonal line through the row number label, indicating that though still present, these records are excluded from further analyses.


Figure 2-12 Selected and filtered data

Also notice that a new variable called Filter_$ has been automatically added to your data file. If you return to the Data menu and select all the cases again, you can use this filter variable to select females instead of having to re-enter the selection formula. If you do not want to keep this new variable, you can right-click on its column label and select Clear.


Figure 2-13 Filter variable added by SPSS

Sorting Cases

Next you will learn to sort cases. Let's return to the Data, Select Cases menu and choose "Select all cases" in order to re-select the records for males.

We can sort on one or more variables, For example, we may want to sort the records in our dataset by age and sex. Select Data, Sort Cases:


Figure 2-14 Sort Cases option

Move Sex and Age to the "Sort by" window (see Figure 2-15) and then click OK.


Figure 2-15 Sort Cases dialog

Return to the Data View and confirm that the data are sorted by sex and by age within sex (see Figure 2-16).


Figure 2-16 Cases sorted by Sex and Age

Splitting a File

The last subject we will cover in this tutorial is splitting a file. Instead of filtering cases, splitting a file creates separate "layers" for the grouping variables. For example, instead of selecting only one sex at a time, you may want to run several analyses separately for males and females. One convenient way to accomplish that is to split the file so that every procedure you run will be automatically conducted and reported for the two groups separately. To split a file, select Data, Split File. The data in a goup need to be consecutive cases in the dataset, so the records must be sorted by groups. However, if your data are not already sorted, SPPS can do that for you at the same time the file is split (see Figure 2-17).


Figure 2-17 Split File menu

Now, when you run a command, such as a table command to summarize average quiz scores, the command will be performed for each group separately and those results will be reported in the same output (see Figure 2-18).

split_file
Figure 2-18 Split file results in separate analysis for each group

Return to the Top of This Page

Return to Menu Page

Proceed to Lesson 3