Research Projects SPSS Guide
created by
William B. Bergesen,
MA, ITC
About This
Guide
The Statistical
Package for the Social Sciences is a comprehensive and expensive program powerful
enough to analyze large data sets like the General Social Survey Composite index
1972 to 2002 (GSS72-2002.sav), which has more than 800 variables and over 46,000
respondents. By comparison, a graduate student version of SPSS is limited to
50 variables and 400 respondents. Many research faculty members in the social
sciences use SPSS to analyze primary and secondary data, and to teach the quantitative
portion of research methods. They present SPSS as a tool for analyzing data
and for tabulating survey results that students include in their research projects.
Here at Cal State
East Bay, we are fortunate to have a site license for SPSS. In the CLASS computer
lab, SPSS is available on workstations running Windows XP. When new
versions of SPSS are released, textbooks (especially lab workbooks) frequently
lag behind and do not reflect new or changed features. This guide was created
to help fill the gap in documentation and to provide simple, step-by-step instructions
for entering survey data into SPSS, creating frequency and crosstab tables,
recoding variables into new variables, and copying the resulting tables into
a research paper.
SPSS is not intuitive
and takes considerable time to learn. There is an online tutorial, available
from the Help menu, to acquaint the student with the basic concepts and procedures.
Beginning with version 12, extensive program documentation is available in Adobe Reader format
(.PDF). In the CLASS computer lab, access the documentation in the folder C:\SPSS
Docs. This guide is an introduction to the most commonly used procedures, but not a substitute for reading the textbook, learning SPSS
through the tutorial, taking class notes, or studying.
Lab assistants
are responsible for making sure that the lab computers and printers are functioning
properly, and that students can find and use the basic features of the software
programs. Lab assistants are instructed not to answer course-related questions
or in-depth inquiries about advanced research methods.
Starting SPSS
To start SPSS,
double-click the SPSS icon on the Desktop or select SPSS from All Programs on
the Start menu. When SPSS starts, a dialog box appears with the question, “What
do you want to do?”
To open a data
source file (.SAV), scroll through the list to locate and select the file. If
the file you want is not on the list, click More Files to open a standard file
selector. If the file you want is on removable media (like a floppy disk or
ZIP disk), first copy it to the Desktop or to the My Documents folder before
you open it. Remember to copy it back to your removable media when you finish
making changes. A data source file is often called a system file, because it
contains both variable components and data in one file.
Select your file
and click OK.
Views
When SPSS opens
a data source file, three windows appear: the Data Editor, the Output Viewer,
and the Journal View.
Data Editor
Use the Data Editor
to establish variables and to enter the characteristics contents of the variables.
To switch between the Variable View and Data View, click the tabs at the bottom
of the Data Editor window.
Variable View
Define the characteristics
of each variable in the Variable View. Use the TAB key on your keyboard to
navigate through the components from left to right and then down. Use the
Arrow keys to move the focus up or to the left.
Variable characteristics
include:
- Variable
Name Spaces are not permitted; use only letters, numbers, or
the underscore character (_). See Chapter 8, page 76 in the SPSS Basic Manual.
Earlier versions of SPSS were limited to eight characters and, for example,
EDUCATION might have been shortened to EDUC. In newer versions, variable names
can be 64 characters long. Using very long names, however, can make your
tables appear cluttered.
- Type of
Variable
Select from integer, date, currency, and so on.
- Column
Width
Set the width of the column. For example, if a variable is onlly one or
two characters, you can define a narrow column width.
- Column
Decimals Select the number of decimal places. For example, if
your data includes currency, designate two decimal places.
- Label The primary identifier for a variable, a label can be up to 254 characters.
- Value A variable value is a number. The number, along with its label, identifies
the answer to a question. At the right edge of the values box is an ellipsis
(...) button. Click the button to open the Value Labels dialog box. The
program cannot tally words or text strings, but it can tally values and
then display the total in a table along with the value label. For example,
if the variable "Sex" identifies the question "What is your
gender?" on a questionnaire, the answers are "Male" (value
1) and "Female" (value 2). The values are the numbers 1 and 2.
The value labels are the words "Male" and "Female."
- Missing Select how to handle the situation in which a survey respondent
does not answer a question (missing value) or in which the question does
not apply to the respondent (DK-don’t know or NA-not applicable).
- Column A number.
- Align Left, right, or center.
- Measure Scale, ordinal, or nominal.
Data View
In Data View,
the Data Editor uses a spreadsheet metaphor. Each column corresponds to a
variable, and each row contains the answers from one respondent.
Output Viewer
SPSS for Windows
employs an Output Viewer (Listing) to display the results of the data entered
and the selections made in the Data Editor. After selecting statistical arguments
in the Data Editor (frequency tables, crosstab tables, means, and so on), click
OK. The Output Viewer displays tables that describe the research. The information
in the Output Viewer can be saved to a .LIS file, to be reopened later in the
Output Viewer, or you can save the data to a floppy disk or ZIP disk as a .TXT
file from which you can copy and paste the information into a research paper.
Journal View
The Journal View
displays the syntax of the commands used by the program. Familiarity with the
journal commands will help you to understand quantitative research and to use
SPSS appropriately.
Menu Commands
The three SPSS
windowsData Editor, Output Viewer, and Journal Viewshare some common
menus and menu commands, including:
- File
Document management commands like New, Open, Save Data, Save as, Print, and
so on.
- Edit
Text manipulation commands, including Cut, Copy, Paste, and Select All. Note
the Options command, used to define how variables appear. Session Journal
provides a command line interface, as if the user was working on an older
mainframe computer. Choose Variable Lists to change how lists appear.
- View
Includes Status Bar, Toolbars, Fonts, Gridlines, and Value Labels.
- Data
Includes Define Variable, Define Dates, Templates, Insert Variable, Insert
Case, Sort Cases, Transpose, Merge Files, Aggregate, Orthogonal, Split File,
Select Cases, and Weigh Cases.
- Transform Compute includes Count, Recode (note the arrow indicating
that this command opens a submenu that contains a choice between Same and
Different), Categorize, Cases, Create Time Series, Replace Missing Values,
and Round Pending Transform.
- Analyze
Analyze is a frequently used menu that includes Reports, Descriptive Statistics
(including Frequencies and Crosstabs), and Custom.
- Graphs
Gallery, Bar, Line, Box Chart, Histogram, and so on.
- Tables
Compare Means, General Linear Model, Correlate, Regression, Log Linear, Classify,
Data Reduction, Scale, Nonparametric Tests, Survival, and Response.
- Utilities Variables, File Information User Sets, Auto New Case, Run Script,
and Menu Editor. Choose Variables to view variable labels and values.
- Add-ons
Variables, File Information User Sets, Auto New Case, Run Script, and Menu
Editor. Choose Variables to view variable labels and values.
- Windows
Switch between open windows, resize windows, and so on.
- Help Along with Topics, Case Studies, and so on, the major command is SPSS
Tutorial (we strongly recommend running the tutorial prior to using
SPSS).
Using Existing
Data
On the hard disk
are several system data files, including the General Social Survey Composite
Index of GSS from 1972 to 2002 (GSS72-2002.sav), a data set of about 47,000
respondents.
To open the file:
- From the File
menu, choose Open.
- Scroll through
the list to locate GSS72-2002.sav and click to select it.
- Click Open.
- When the data
appears, from the Analyze menu, choose the statistical procedure you want
to use to analyze the variables. The results of your choice appear in the
Output Viewer.
Frequencies
To analyze a single
variable or multiple variable, from the Analyze menu, choose Descriptive Statistics,
then Frequencies. Using the data is a three-step process:
- Analyze the
variables.
- Review the tables.
- Copy the relevant
tables into a word processing document.
The Frequency
dialog box contains two fields, the source variables (all the variables in the
database) on the left and the frequency field (the variables to be examined)
on the right. Between the two fields is an arrow.
- From the source
variables on the left, select the variable or variables to investigate.
- Click the arrow
to move the selected variable or variables into the frequency field on the
right.
- From the buttons
below, click Statistics. The Statistics dialog box contains several statistic
tools. Right click a statistic tool to read a brief description.
- Click to select
the statistic tools you want to use: percentile values, central tendency,
dispersion, distribution, or values as grouped midpoints.
- Click Continue.
The Statistics dialog box closes.
- In the Frequency
dialog box, click OK.
Tip: If a word
appears with an underlined letter, right click the word to see its meaning.
Tables are created
and displayed in the Output Viewer.
Copying and
Pasting a Table
To copy a table
into a word processing document:
- Select the table
components (title, tables, and statistics) so the table border appears highlighted.
To select a table, hold down the mouse button while you move the mouse diagonally
across the table.
- From the Edit
menu, choose Copy Object.
- Open a word
processing document.
- In the word
processing document, click to position the insertion point (cursor) where
where you want the table to appear.
- From the word
processor's Edit menu, choose Paste. The table appears in the word processing
document.
- If the rows
and columns do not line up correctly, select (highlight) the table and change
its font to Courier at 10 or 12 points.
- From the File
menu, choose Save to update the document.
Crosstabs
Use Crosstabs to
create a table that displays the anticipated relationship between two (or more)
variables. Crosstabs display the data in a table format. The components of a
crosstab are a row and a column variable. The row is the effect or dependent
variable and a column is the causal or independent variable.
To create a crosstab
table:
- From the Analyze
menu, choose Descriptive Statistics, and Crosstabs.
- Click to select
a variable in the source field on the left, and then click the top arrow to
move the selected variable into the Row Variable (Dependent) field on the
right.
- Select another
variable on the left and click the middle arrow to move that variable into
the Column Variables (Independent) field on the right.
- The third box
is for a control. For example, if you are analyzing Income by Education, controlled
by Gender, two tables are created: one for men and one for women.
- Below the columns,
click Cells and select count, row, column, and total.
- Click Statistics
and select Chi Square (for example).
- Click Continue.
- Click OK.
Crosstabs are
displayed in the Output Viewer (Listing) window. When multiple statistical
operations are run, the program concatenates the tables, showing the newest
table below the previous tables.
A word of
caution: As a test before running multiple variables within a crosstab,
multiply the number of row variables by the number of column variables.
The total number of the created tables cannot exceed 23. For instance, if
you have two row variables and two column variables, the product is four,
which is less than 24 and within the limit. If you have five row variables
and six column variables, the product is 30, which exceeds the limit of
23 and the program displays an error message.
Recoding
Sometimes data
needs to be separated into categories instead of continuous variables. In order
to make the change, data must be transformed, or recoded, from its present value
structure to a changed value structure. For example, if ages were originally
entered as a numeric field corresponding to number of years, analysis may be
easier if age is broken into several bands, perhaps 5-year or 10-year categories.
Recoding is the tool to create a new variable based upon existing data.
Before recoding,
examine how the values are divided. Run a frequency of the selected variable
to see the current values and labels. This information will be important later.
To recode:
- From the Transform
menu, choose Recode, and then choose Into New Variable. The GSS72-2002 database
is write protected, so it is not possible to change its existing data; you
must recode the data into a new variable.
- In the dialog
box, on the left, click to select the variable to be recoded. Then click the
arrow to move the selected variable into the Old Variable box.
- To the right
of the Old Variable box are two text boxes. Type a new variable name (up to
64 characters) and a new label (up to 254 characters).
- From the selected
variable, carefully determine the present values and a new set of values (and
labels). Click Old and New Values. A dialog box appears with an Old Value
box on left and a New Value box on the right. Changes in the old value are
initiated by changing the present values into a recoded value set. In Old
Value, notice the value and range choices (variable to variable, low to ...,
... to high, and so on).
- Use the TAB
key (on your keyboard) to move from the range on the left to a new value on
the right (for example, 1), and click Add.
- Continue inserting
new ranges or values until all the data you want to consider is included,
then click Continue.
- In Output Variable,
click Change. For
the changes to take effect, click OK. The new variable is placed at the end
of the variables in the Data Editor window.
- Use Define Variable
to set the new value and labels for the transformed variable. Scroll through the data until you find the newly created variable (at the
bottom of the variables). Scroll through the new variable name until the label
and other characteristics are entered.
- Click Labels.
Type the value and then press the Tab key to move the insertion point to the
Label field.
- Type the label
and click Add. The insertion point moves back to the Value Entry field.
- Continue entering
values and labels until all the data you want to consider is included.
- Click OK. Now
you can run Frequencies or Crosstabs based on the new variable.
Saving after
a Major Change
Whenever you make
significant changes to your data or to a document, you should save your work.
To save your work:
- From the File
menu, choose Save As.
- In the file
selector dialog box, select the Desktop or the My Documents folder as a location
and type a name for the document.
- Click Save.
Copying a Table
into a Word Processing Document
To copy a table
into a word processing document:
- In a Frequency
or Crosstabs Output Viewer window, from the Edit menu, choose Select All.
- From the Edit
menu, choose Copy.
- Open your word
processing document, point and click at the location where you want the table
to appear, and from the Edit menu, choose Paste.
- If the table
is too large or the columns do not line up correctly, select the table and
change the font to Courier 10 points.
- Examine to see
if everything is aligned correctly (that is, all of a frequency or crosstab
table appears on one page and not split across two pages).
- From the File
menu, choose Save (for a new document) or Save As (for an existing document).
- In the file
selector, select a location for the document, and type a name for the document.
- Click Save.
Creating a New
Data File
The transition
from a questionnaire to information that SPSS can use to tabulate tables and
perform hypothesis testing is called coding. Coding is a process whereby survey
responses are arranged in a precise order, by number, so a computer can tabulate
the responses.
Tip: When
the data is collected, put a number at the top of each questionnaire. Should
they become shuffled, it is easy to put them back in order. The number can also
be used as the id variable.
Sample Questionnaire
Table ID: ___
- What is your
last name?_______________________
- What is your
current age? _____
- What is the
highest year of education you have achieved? (For example, high school graduate=12)
___
- What is your
gender? Male__Female__
- What is your
current state of happiness? Very happy __Pretty happy __ Not too happy__
Completed Survey
This list illustrates
how ten people might answer the questionnaire.
| ID |
Name |
Age |
Years
of Education |
Gender |
Happiness
Level |
| 1 |
Acuna |
18 |
09 |
Male |
Happy |
| 2 |
Ada |
21 |
12 |
Female |
Not
Too Happy |
| 3 |
Bates |
26 |
15 |
Female |
Very
Happy |
| 4 |
Beall |
19 |
12 |
Female |
Pretty
Happy |
| 5 |
Cunningham |
18 |
06 |
Male |
Very
Happy |
| 6 |
Dunham |
30 |
16 |
Male |
Pretty
Happy |
| 7 |
Estrada |
20 |
15 |
Female |
Pretty
Happy |
| 8 |
Franklin |
24 |
16 |
Male |
Pretty
Happy |
| 9 |
Graham |
19 |
18 |
Female |
Very
Happy |
| 10 |
Hadi |
21 |
14 |
Male |
No
Answer |
Coded Data Set
Here the data has
been converted into a coded data set. Of particular note, variables are columns
and respondents are rows within a SPSS Data set. After establishing the Variable
view and starting the Data Editor, a dialog box appears where variable labels,
values, and value labels are entered.
ID |
Age |
Educ |
Gender |
Happy |
01 |
18 |
09 |
1 |
2 |
02 |
21 |
12 |
2 |
3 |
03 |
26 |
15 |
2 |
1 |
04 |
19 |
12 |
2 |
2 |
05 |
18 |
06 |
1 |
1 |
06 |
30 |
15 |
1 |
2 |
07 |
20 |
15 |
2 |
2 |
08 |
24 |
16 |
1 |
2 |
09 |
19 |
18 |
2 |
1 |
10 |
21 |
14 |
1 |
|
Entering Variables
Click the Variable
View tab in the lower left hand corner of the Data Editor window. The screen
changes to accept entry of the various components of a variable: Name, Type,
Width, Decimals, Label, Values, Missing value, Column, Align, and Measure characteristics.
Use the TAB key or the Arrow keys to navigate through the Variable View.
To enter variables
and values:
- On the Variable
View tab, select the first variable and type its variable name. Use the TAB
key to move to the next component of the variable.
- In the Label
field, type a descriptive label, up to 254 characters. Tip: If you are creating
a table, capitalize words that will appear as labels.
- Click the ellipsis
button. The Value Labels dialog box appears:
a. In the Value Labels dialog box, type a value.
b. Press TAB to move the focus to the Value Label field and type a label.
Click Add or press ALT+A.
c. Repeat for each value that corresponds to the variable.
d. When all the values for the current variable have been added, click OK
to close the Value Labels dialog box.
- Continue to
enter variables, labels, and other characteristics (missing values, measures,
and so on) until Variable View entries are complete.
- Save your file
(CTRL+S) at the end of each variable line.
The following example
demonstrates how two values from our example data would be entered. Repeat the
procedure for the other variables and values and their corresponding labels.
Have the variable characteristics clear in your mind.
To begin entering
variables from our example coded data set:
- On the first
line of variables, type ID. No other variable characteristics
are necessary. ID identifies a person interviewed, a survey received, the
sheet of paper upon which the respondent answered the research questions.
- In the second
variable name box, type the variable name Gender.
- Use the Tab
key or the Arrow keys to move the insertion point (cursor) to the right, into
the Labels field, and type Gender of Respondent.
- Press the TAB
key to move the focus into the Values field. Single click in the Values field
to see the ellipsis button (...). Click the ellipsis to open the Labels dialog
box.
a. In the Labels dialog box, the insertion point appears in the Value field.
Type the number 1.
b. Press the TAB key to move the insertion point to the Label field and type Male.
c. Click Add (or press ALT+A). The value and label you typed (1, Male) are
stored and the insertion point moves back to the Value field, ready for your next
entry.
d. Type 2 and press the TAB key to move the cursor to the
Label field.
e. Type Female and then press the Enter key.
f. Click OK to close the Value Labels dialog box.
- Click Continue
to enter variable components.
- Press Enter
to move onto a new variable line.
Missing
Values
In the example,
the sample did not include a value in the last column for the last respondent,
Hadi. Should a researcher encounter this situation, a decision needs to be made
on how SPSS should handle missing data (values).
In the Header of Variable View and in a column to the right of values is Missing.
Click in the missing value field to enable the ellipsis (…).
- Click the ellipsis
to open the Missing Values dialog box.
- If missing
values are given the value of 9 or 99, these values appear in the box below
Discrete missing values.
- When statistical
procedures are used and SPSS encounters a number 9 or 99, the program totals
the number of missing values and missing values are tabulated using frequency,
crosstab, and so on.
- Click OK to
exit Missing Values..
Entering Data
After the variables
have been set up in the Variable View, it is time to enter the data into the
Data View. During data entry, keep in mind that each line (row) represents one
respondent to a questionnaire.
Use the TAB key
or the Arrow keys to navigate between columns (variables). At the end of each
line of data, save the file.
Checking for
Errors
Even the most diligent
researchers make errors. There are several ways to guard against errors and
to find errors, including:
- Have a friend
assist in the data entry by reading the data to you while you type. Consider
entering the data into a word processing document or spreadsheet. You can
import the data into SPSS later. The column should fit exactly.
- Review each
line entered against the source of your data.
- Run a frequency
table:
- From the
Analyze menu, choose Descriptive Statistics, and then Frequencies.
- From the
source field on the left, select and move the variables into the variable
field.
- In the ID
variable, make sure there are not multiple entries of any number.
- Take a close
look at the ID frequencies. If there are multiple entries of ID, the data
for a respondent has been entered more than once.
- Knowing
the data and values, see if there is a number that does not correspond
to the answers. For example, a variable Gender may have a label, Gender
of Respondent. The values would be 1 for Male and 2 for Female. In the
frequencies table for Gender, look for numbers 3, 4, 5, and so on. Should
this occur, data has been entered incorrectly.
- Review the
process of running statistics from your class notes, or from the examples
of frequencies earlier in this document.
- A new feature
beginning in version 12 is the Duplicate Records Finder. Identify, flag, and filter
duplicate records with Identify Duplicate Cases. For more information,
see Identifying Duplicate Cases in Chapter 6, page 113 of the SPSS Basic
User’s Manual.
To to check
for duplicates:
- From the
Data menu, choose Identify Duplicate Cases by ID.
- The left
panel lists all variables created in the variable view. Use the arrow
to move them into Define matching cases by.
- The default
settings include:
- Variable
of primary cases with the first case in each group as primary.
- Move matching
cases to the top of the data file.
- Make your
selections and click OK.
To fix errors:
- Review the top
of the data after running Duplicate Records Finder. The duplicate data lines
are in sequence (1, 2, 3, and so on).
- The first field
is represented as 1 in the Primary First column. From your survey data, find
the ID for the first instance found (Primary First) and compare the values
entered in SPSS from the survey data. Delete the data lines with a zero in
the field Primary First.
- Repeat these
steps to correct other entries you find at the top of your data list after
running Duplicate Records Finder.
Statistical
Analysis
From your notes,
apply statistical arguments that are consistent with your data. If you need
a review, see Frequencies or Crosstabs.
If you are unsure about the terminology, see the glossary.
Saving Your
Work
After entering
variable attributes (name, label, values, or line of data), save the system
file (.SAV). The output is editable. Listings saved on a disk can be copied
and pasted into your report as tables within the report, or as appendices at
the end of the report.
To save your work:
- From the File
menu, choose Save As. The Save As dialog box appears.
- Select a location.
While working on a file, save the file to the hard drive, preferably to the
desktop or to the My Documents folder.
- Type a name
for the file. A suggested name appears in the dialog box. To accept the name,
continue to the next step. To change the name, select it and type your changes.
Limit the length of your file name so you can find it easily.
- Select a type
for the file. Below the File name box is the Save as type box. If the type
displayed is not the one you want, click it to see a list of alternate types
and click to select the one you want from the list. At the end of a file name
are a period and a three character extension (for example, SAV, POR, or TXT).
The extension is important, because it denotes the type of file. Some common
extensions and their file types are:
- SAV (System
File) is an SPSS file type that contains both variable and data components.
- LIS (Listing)
is a file containing a tables or statistics, or output of SPSS analysis.
- POR (Portable)
is a file type in which the data will be converted for use on another
platform (for example from a Windows PC to a Macintosh).
- TXT (Text)
is a Notepad file containing only ASCII text.
- DOC (Document)
is a word processing document.
- After selecting
a location, name, and type for your file, click Save.
- At the end of
your session in the lab, remember to copy the files you created from the Desktop
or My Documents folder to your removable media (floppy disk or ZIP disk).
Checking Spelling
and Grammar
Remember to use
the spelling and grammar checker in your word processor. The lab computers include
Microsoft Word, which has a good dictionary: extensive, but not exhaustive.
Keep in mind that the spell checker is not a substitute for proofreading. For
example, the words form and from are both correct, but have completely different
meanings. Read your final document repeatedly and, if possible, have a friend
read it to make sure it is correct, before printing.
Printing
One of the strengths
of SPSS is the option to select portions of a listing output and copy important
tables into a research project document.
To copy a table
from SPSS into your word processing document:
- Select a table
in the output listing (.SPO). To select a table, click it, or drag the mouse
from an upper corner to the alternate lower corner so the table, title, and
statistics all appear highlighted. To select all the tables, from the Edit
menu choose Select All or press CTRL+A.
- From the Edit
menu, choose Copy (or press CTRL+C). The tables are copied to the computer
clipboard.
- Start Microsoft
Word and open a new document or an existing research paper.
- Move the insertion
point (cursor) to the location in the document where the tables are to be
inserted.
- From the Edit
menu, choose Paste (or press CTRL+V). The table is copied from the clipboard
into the Microsoft Word document describing the table.
- After spell
checking and any other editing, proceed to printing.
To print:
- From the File
menu, choose Print (or press CTRL+P).
- Review the options
in the Print dialog box. The print dialog box includes several sections and
defaults:
- Printer
- the default printer is the laser printer in the lab.
- Page range
- the default is all. Other options include current, selection, or specific
pages.
- Copies –
default selections are 1 and collate.
- Zoom –
1 page per sheet and no scaling is default.
- Click OK when
you have confirmed the above options.
Cleaning Up
Please clean up
around the computer and remove your floppy disk or ZIP disk before leaving the
lab.
Copyright ©2002 Bill
Bergesen (Last Updated 04/02/2004)