Preparing SPSS Data Files Sample Assignment

The SPSS data file MUSIC CD DATA.sav contains the responses from 445 respondents who completed the Music CD Questionnaire. Before this data file can be used to conduct analyses on SPSS, it needs to be cleaned, the variables labeled, and the variable values (i.e., “codes”) inserted.

HELPFUL HINT #1: Before beginning the cleaning, labeling, etc., save the file under a different name (e.g., krs1 data.sav) so you can distinguish between the new clean data file and the original data file.

HELPFUL HINT #2: Remember to SAVE the file frequently.

HELPFUL HINT #3: It will be easier to clean the data if you label the variables and the variable values first.

Labeling the variables :

Click on the “Variable View” spreadsheet. In the column called “Label” you can type in a description for each variable simply by clicking on the appropriate cell and typing in a description of the variable. The variable description should match the codebook. The first three variable labels have been done for you.{" "}

Inserting value codes :

In the column called “Values” in the “Variable View” spreadsheet, you can enter the appropriate codes for each value. For example, for V2: Reason for buying CD, “1” corresponds to artist, “2” corresponds to type of music, etc. Each of the values/labels for a variable can be entered by clicking on the appropriate cell under the “Values” column. A box will pop up called “Value Labels.” In the “Value” box, type the number (“1”), in the “Value Label” box, type a brief description of the value (“artist”), then click “Add.” If you make a mistake and need to remove a value/label, highlight it, and click “Remove.” If you want to change a value/label, highlight it, make the change, then click “Change.” V2 has been done for you.

NOTE: not every variable will have value codes (e.g., V3, V9).{" "}

Cleaning the data file :

Three types of errors need to be examined in this data file

  1. Out-of-range values
  2. Logical inconsistencies
  3. Unlikely observations (extreme values)

1. Out-of-range values

The easiest way to check for out-of-range values is to run a frequency analysis on each variable (ANALYZE à DESCRIPTIVE STATISTICS à FREQUENCIES). You will need to know which values are valid and which are out-of-range so refer to the codebook or an annotated copy of the questionnaire. If you have already labeled the variables and values, the out-of-range values will show up in the frequency table as numbers instead of descriptive phrases. (If you don’t have value labels, then you will need to scan the list of numbers in the frequency table for numbers that don’t belong.)

If out-of-range values do show up in the frequency table, then go back to the “Data View” spreadsheet and look for the inappropriate values. Delete any out-of-range values (this will leave a blank cell with “.” in place of the value).

For example, ID #329, V15d has a value of “47.” The only allowable values are 1 through 7. This is an out-of-range value.

2. Logical inconsistencies

Check for responses that don’t make sense. For example, V25 corresponds to question 25, “Have you ever purchased something from an online retailer?” If no, the value entered in the spreadsheet is “0.” V26 (question 26) asks, “Which of the following products have you purchased from an online retailer?” V26 values correspond to the total number of product categories purchased. If a person responds “0” to V25, then there should also be a “0” for V26. Replace any logically inconsistent data with “.”

For example, ID #49 answered “0” to V25 but “3” is entered for V26. This is inconsistent.

3. Unlikely observations

Are there values in the data file that simply don’t seem probable? For example, ID #319 has listened to the CD purchased (V11) 1000 times. This seems like a data entry mistake, or a “guess” rather than a carefully considered response. Wildly unlikely observations can skew the results in problematic ways. One way of dealing with unlikely observations is to perform an outlier analysis. The rule of thumb is to exclude from the data analysis any observation that is more than three standard deviations from the mean.{" "}

We will assume all values are within the allowable range and we will not delete or exclude any of the data for being an unlikely observation.

FINAL REMEMBER: Save the clean data file with a new name.