DMV Assignment 2: My first SAS Program!

Assignment instructions: Following completion of your first program, create a blog entry where you post 1) your program 2) the output that displays three of your variables as frequency tables and 3) a few sentences describing your frequency distributions in terms of the values the variables take, how often they take them, the presence of missing data, etc.

(Apologies for those of you not interested in reading raw SAS code, but for those of you who are, I’m open to suggestions for how to improve. This is fairly basic code, but useful for someone who’s never used SAS.)

Behind the jump for all the details..

My program so far:

LIBNAME mydata “/courses/d1406ae5ba27fe300″ access=readonly;
/* mydata is the local name for the database */
/* Research question: Race and perception of law enforcement and opportunity for achievement between Blacks and Whites during the beginning of the #BlackLivesMatter movement
SPECIFICALLY H1: Are non-Hispanic Blacks less likely to trust the federal government, the police, and/or the legal system than non-Hispanic Whites?
H1a: Are non-Hispanic Blacks less likely to trust the federal government than non-Hispanic Whites?
H1b: Are non-Hispanic Blacks less likely to trust the police than non-Hispanic Whites?
H1a: Are non-Hispanic Blacks less likely to trust the legal system than non-Hispanic Whites?
SPECIFICALLY H2: Does income-level influence levels of trust in the federal government, the police, and/or the legal system in both Blacks and White? */

DATA new; set mydata.oll_pds;
LABEL ppethm=”Race / Ethnicity”
ppincimp=”Household Income”
pprent=”Ownership Status of Living Quarters”
ppwork=”Current Employment Status”
w1_h1=”Society has reached the point where Blacks and Whites have equal opportunities for achievement.”
w1_h6=”It’s Really a metter of some people not trying hard enough; if Blacks would only try harder they could be just as well off as Whites.”
w1_h7=”Generations of slavery and discrimination have created conditions that make it difficult for Blacks to work their way out of the lower class.”
w1_h8=”Discrimination against Blacks is no longer a problem in the U.S.”
w1_k1_a=”[The government in Washington] How much do you think you can trust the following institutions?”
w1_k1_b=”[The police] How much do you think you can trust the following institutions?”
w1_k1_c=”[The legal system] How much do you think you can trust the following institutions?”;
IF ppethm=1 or ppethm = 2;
/* Select statements limit the cases included in the analysis; includes only those who
indicated race/ethnicity of “White, Non-Hispanic” or “Black, Non-Hispanic” */

PROC FREQ; TABLES ppethm ppincimp pprent ppwork w1_h1 w1_h6 w1_h7 w1_h8 w1_k1_a
w1_k1_b w1_k1_c;


Output for the first four variables of interest, with notes and commentary embedded between tables:

The FREQ Procedure

Race / Ethnicity
PPETHM Frequency Percent Cumulative
1 814 38.91 814 38.91
2 1278 61.09 2092 100.00

I restricted the cases in the OOL dataset to just those with respondents that reported their race/ethnicity to be non-Hispanic White or non-Hispanic Black given my interest specifically in the differences between Blacks and Whites in their perceptions of the police and legal system. This limits the cases available for my analysis to 2,092 of the total 2,294 observations in the dataset. As the OOL dataset was intentionally oversampled for Blacks, it follows that 61% of the remaining respondents in my sample are Black (coded in the data as 2) and 39% are White (coded in the data as 1).

Household Income
PPINCIMP Frequency Percent Cumulative
1 121 5.78 121 5.78
2 61 2.92 182 8.70
3 61 2.92 243 11.62
4 58 2.77 301 14.39
5 57 2.72 358 17.11
6 93 4.45 451 21.56
7 98 4.68 549 26.24
8 126 6.02 675 32.27
9 99 4.73 774 37.00
10 119 5.69 893 42.69
11 149 7.12 1042 49.81
12 156 7.46 1198 57.27
13 214 10.23 1412 67.50
14 114 5.45 1526 72.94
15 132 6.31 1658 79.25
16 183 8.75 1841 88.00
17 114 5.45 1955 93.45
18 61 2.92 2016 96.37
19 76 3.63 2092 100.00

The values for Household income cover a wide range and do not increment evenly. The lowest range (coded above as 1) represents those with household incomes of $5,000 or less, but the next four values (coded above as 2, 3, 4, and 5) increment by $2,499 each. Starting with household incomes of $15,000 through household incomes of $39,999, the values increment by $4,999 each (codes 6 through 10), with the next two ranges represented increases of $9,999 each (codes 11 and 12) up to household incomes of $59,999. The next three groupings break with this pattern for reasons not provided; household incomes of $60,000 to $74,999 and $85,000 to $99,999 (both $14,999 increments) are coded as 13 and 15 respectively, but household incomes between $75,000 and $84,999 (a $9,999 increment) are coded as 14. The remainder of the values, up to household incomes of $174,999, are coded in increments of $24,999 (codes 16 through 18), with the final code (19) representing household incomes of $175,000 or more.

I will need to aggregate some of the response values to make the data easier to work with for my second research question relating to whether persons from lower-income households have more negative perceptions of the police and legal system. I will need to do additional investigation to determine the family size for each respondent and assign them to a group based on the federal poverty rate appropriate for their family size. I may also divide the sample into multiple groups to ascertain whether there are different relationships for low-income, middle-income, and high-income households.

Ownership Status of Living Quarters
PPRENT Frequency Percent Cumulative
1 1330 63.58 1330 63.58
2 705 33.70 2035 97.28
3 57 2.72 2092 100.00

Another option to consider when constructing my income groups may be whether or not respondents own or rent their living quarters. The distribution above shows that nearly 2/3rds (64%; coded as 1 above) of the sample own their living quarters, with just over 1/3rd (34%; coded as 2 above) renting. The remaining respondents (3%; coded as 3 above) reported that they neither own nor rent but instead occupy their living quarters without payment.

Current Employment Status
PPWORK Frequency Percent Cumulative
1 952 45.51 952 45.51
2 130 6.21 1082 51.72
3 22 1.05 1104 52.77
4 224 10.71 1328 63.48
5 453 21.65 1781 85.13
6 177 8.46 1958 93.59
7 134 6.41 2092 100.00

It may also be worthwhile to consider factoring current employment status in creating the comparison groups for my second research question. Not quite half (46%; coded above as 1) of the respondents reported working as a paid employee, with another 6% (coded above as 2) reporting that they are self-employed. The remaining respondents (48%) reported not working for a variety of reasons (e.g., on temporary layoff (1%; coded as 3 above); looking for work (11%; coded as 4 above); retired (22%; coded as 5 above); disabled (8%; coded as 6 above); or other (6%; coded as 7 above). Given the relatively large number of respondents who reported not working, specifically those who are retired, I will have to consider whether this variable will be useful to include or if I will rather use it to further limit the cases included in my sample to avoid unintentional skewness in the results. Given the timing of the data collection – during the period of slow recovery from the 2008 recession – consideration will need to be given to whether current household income is an appropriate variable to consider as a proxy for social class.


2 thoughts on “DMV Assignment 2: My first SAS Program!

  1. Sara Schley says:

    Nice SAS coding!
    My only suggestions would be to add a “keep” statement, so that your working data file doesn’t have ALL the variables in the original data set, and then you can use things like var10-var20 in proc freq statements (– between the first positioned and last-positioned variables you want freq statements on) – which just shorthands the typing 😉

    Cool course and cool project. And way better commented coded than I usually do ha ha ha ha.

    • Teej says:

      Thanks! I’m sure my commenting will deteriorate when I’m no longer actively focused on learning. 🙂 Right now it’s how I’m taking notes, though. Excellent suggestion on the “keep” statement – I’ll have to learn more about that soon as it definitely sounds like a useful bit of code.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s