(For those of you looking for something more interesting, go check out the pictures I took walking around my neighborhood after the blizzard in DC this weekend.)

**Following completion of the steps described above, create a blog entry where you submit syntax used to run a Chi-Square Test (copied and pasted from your program) along with corresponding output and a few sentences of interpretation.**

The rest is past the jump.

In keeping with my ongoing hypothesis, I’m examining data from the Outlook on Life survey specifically to evaluate whether there are differences in perception of/trust in law enforcement and level of opportunity between non-Hispanic Whites and non-Hispanic Blacks during the beginning of the #BlackLivesMatter movement. For the Chi Square example, I will be looking to see if there is a relationship between race/ethnicity and household income. Knowing if there is a relationship between income and race/ethnicity will tell me if I need to consider that income level may be an alternative or competing explanation for any differences in outcomes analyzed by race.

I’m really interested in approximate differences between lower-income, middle-income, and high-income households. After running a frequency distribution of PPINCIMP (results below), I decided to created a new variable (PPINCIMP_TERT) based on the Household Income variable (PPINCIMP) into three groups representing approximately 1/3rd of the cases in each group (approximate tertiles).

Household Income IF | ||||
---|---|---|---|---|

PPINCIMP | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |

1 | 127 | 5.54 | 127 | 5.54 |

2 | 64 | 2.79 | 191 | 8.33 |

3 | 61 | 2.66 | 252 | 10.99 |

4 | 68 | 2.96 | 320 | 13.95 |

5 | 62 | 2.70 | 382 | 16.65 |

6 | 98 | 4.27 | 480 | 20.92 |

7 | 109 | 4.75 | 589 | 25.68 |

8 | 140 | 6.10 | 729 | 31.78 |

9 | 108 | 4.71 | 837 | 36.49 |

10 | 132 | 5.75 | 969 | 42.24 |

11 | 162 | 7.06 | 1131 | 49.30 |

12 | 181 | 7.89 | 1312 | 57.19 |

13 | 235 | 10.24 | 1547 | 67.44 |

14 | 129 | 5.62 | 1676 | 73.06 |

15 | 145 | 6.32 | 1821 | 79.38 |

16 | 200 | 8.72 | 2021 | 88.10 |

17 | 125 | 5.45 | 2146 | 93.55 |

18 | 63 | 2.75 | 2209 | 96.29 |

19 | 85 | 3.71 | 2294 | 100.00 |

The first tertile (coded as 1) will represent all cases where Household income is less than $30k (values of PPINCIMP between 1 and 8, inclusive). The second tertile (coded as 2) will represent all cases where Household income is greater than or equal to $30k and less than $75k (values of PPINCIMP between 9 and 13, inclusive). The third tertile (coded as 3) will represent all cases where Household income is greater than or equal to $75k (values of PPINCIMP between 14 and 19 inclusive).

My syntax:

LIBNAME mydata “/courses/d1406ae5ba27fe300″ access=readonly;

/* mydata is the local name for the database */

/* Research question: Race and perception of law enforcement and opportunity for

achievement between Blacks and Whites during the beginning of the

#BlackLivesMatter movement

SPECIFICALLY H1: Are non-Hispanic Blacks less likely to trust the federal

government, the police, and/or the legal system than non-Hispanic Whites?

H1a: Are non-Hispanic Blacks less likely to trust the federal government

than non-Hispanic Whites?

H1b: Are non-Hispanic Blacks less likely to trust the police than

non-Hispanic Whites?

H1a: Are non-Hispanic Blacks less likely to trust the legal system than

non-Hispanic Whites?

SPECIFICALLY H2: Does income-level influence levels of trust in the federal

government, the police, and/or the legal system in both Blacks and White?For the Chi Square example, I will be looking to see if there is a relationship

between race/ethnicity and household income. Knowing if there is a

relationship between income and race/ethnicity will tell me if I need to

consider that income level may be the actual cause of differences in outcomes

analyzed by race.

*/

DATA new; set mydata.oll_pds;

LABEL ppethm=”Race / Ethnicity”

ppincimp=”Household Income”/* Select statements limit the cases included in the analysis; includes only those who

indicated race/ethnicity of “White, Non-Hispanic” (coded as 1) or “Black,

Non-Hispanic” (coded as 2) */

IF ppethm=1 or ppethm = 2;/* Recode PPINCIMP: Household Income into fewer groups for easier analysis. I’m

most interested in comparing low-income, middle-income, and high-income households.

Based on Frequencies, I will divide PPINCIMP into tertiles.

The first tertile (coded as 1) will represent all cases where Household income is

less than $30k.

The second tertile (coded as 2) will represent all cases where Household income is

greater than or equal to $30k and less than $75k.

The third tertile (coded as 3) will represent all cases where Household income is

greater than or equal to $75k.*/

IF ppincimp <= 8 then ppincimp_tert = 1;

ELSE IF ppincimp <= 13 then ppincimp_tert = 2;

ELSE ppincimp_tert = 3;PROC SORT; by CASEID;

/* Chi Square syntax is an addition to the PROC FREQ statement as follows:

PROC FREQ; TABLES VAR1*VAR2/CHISQ;

*/PROC FREQ; TABLES ppethm*ppincimp_tert/CHISQ;

/* Post-hoc Chi Square tests for when multiple groups are examined: Bonferroni Adjustment is made to expected p-value for individual comparisons. Adjusted p-value is equal to desired p-value divided by the number of independent comparisons needed.

For my analysis, I will need 3 individual comparisons, so my Bonferroni adjusted p-value is

0.05/3 = 0.017.

*/RUN;

DATA COMPARISON1; SET NEW;

IF PPINCIMP_TERT = 1 OR PPINCIMP_TERT = 2;

PROC SORT; BY CASEID;

PROC FREQ; TABLES PPETHM*PPINCIMP_TERT/CHISQ;

RUN;DATA COMPARISON2; SET NEW;

IF PPINCIMP_TERT = 1 OR PPINCIMP_TERT = 3;

PROC SORT; BY CASEID;

PROC FREQ; TABLES PPETHM*PPINCIMP_TERT/CHISQ;

RUN;DATA COMPARISON3; SET NEW;

IF PPINCIMP_TERT = 2 OR PPINCIMP_TERT = 3;

PROC SORT; BY CASEID;

PROC FREQ; TABLES PPETHM*PPINCIMP_TERT/CHISQ;

RUN;

My output:

The FREQ Procedure

FrequencyPercentRow PctCol Pct

Table of PPETHM by ppincimp_tert PPETHM(Race / Ethnicity) ppincimp_tert 1 2 3 Total 1 1708.1320.8825.19 30614.6337.5941.52 33816.1641.5249.71 81438.912 50524.1439.5174.81 43120.6033.7258.48 34216.3526.7650.29 127861.09Total 67532.27 73735.23 68032.50 2092100.00Statistics for Table of PPETHM by ppincimp_tert

Statistic DF Value Prob Chi-Square 2 88.9452 <.0001 Likelihood Ratio Chi-Square 2 91.4144 <.0001 Mantel-Haenszel Chi-Square 1 85.5709 <.0001 Phi Coefficient 0.2062 Contingency Coefficient 0.2019 Cramer’s V 0.2062

Sample Size = 2092

A Chi Square test of independence revealed that race/ethnicity in the OOL survey and level of household income were significantly associated, X2 =88.95, 2 df, p<.0001. However, as I have three levels to my household income variable, I will need to run additional post-hoc analysis to determine which groups are significantly different and use the Bonferroni adjustment to my p-value to determine statistical significance. As I will need to complete three individual comparisons and am using a p-value of 0.05, my Bonferroni adjusted p-value for the individual comparisons = 0.05/3 = 0.017.

The FREQ Procedure

FrequencyPercentRow PctCol Pct

Table of PPETHM by ppincimp_tert PPETHM(Race / Ethnicity) ppincimp_tert 1 2 Total 1 17012.0435.7125.19 30621.6764.2941.52 47633.712 50535.7653.9574.81 43130.5246.0558.48 93666.29Total 67547.80 73752.20 1412100.00Statistics for Table of PPETHM by ppincimp_tert

Statistic DF Value Prob Chi-Square 1 42.0663 <.0001 Likelihood Ratio Chi-Square 1 42.5372 <.0001 Continuity Adj. Chi-Square 1 41.3385 <.0001 Mantel-Haenszel Chi-Square 1 42.0365 <.0001 Phi Coefficient -0.1726 Contingency Coefficient 0.1701 Cramer’s V -0.1726

Fisher’s Exact Test Cell (1,1) Frequency (F) 170 Left-sided Pr <= F <.0001 Right-sided Pr >= F 1.0000 Table Probability (P) <.0001 Two-sided Pr <= P <.0001 Sample Size = 1412

The FREQ Procedure

FrequencyPercentRow PctCol Pct

Table of PPETHM by ppincimp_tert PPETHM(Race / Ethnicity) ppincimp_tert 1 3 Total 1 17012.5533.4625.19 33824.9466.5449.71 50837.492 50537.2759.6274.81 34225.2440.3850.29 84762.51Total 67549.82 68050.18 1355100.00Statistics for Table of PPETHM by ppincimp_tert

Statistic DF Value Prob Chi-Square 1 86.9101 <.0001 Likelihood Ratio Chi-Square 1 88.1653 <.0001 Continuity Adj. Chi-Square 1 85.8670 <.0001 Mantel-Haenszel Chi-Square 1 86.8460 <.0001 Phi Coefficient -0.2533 Contingency Coefficient 0.2455 Cramer’s V -0.2533

Fisher’s Exact Test Cell (1,1) Frequency (F) 170 Left-sided Pr <= F <.0001 Right-sided Pr >= F 1.0000 Table Probability (P) <.0001 Two-sided Pr <= P <.0001 Sample Size = 1355

The FREQ Procedure

FrequencyPercentRow PctCol Pct

Table of PPETHM by ppincimp_tert PPETHM(Race / Ethnicity) ppincimp_tert 2 3 Total 1 30621.5947.5241.52 33823.8552.4849.71 64445.452 43130.4255.7658.48 34224.1444.2450.29 77354.55Total 73752.01 68047.99 1417100.00Statistics for Table of PPETHM by ppincimp_tert

Statistic DF Value Prob Chi-Square 1 9.5597 0.0020 Likelihood Ratio Chi-Square 1 9.5671 0.0020 Continuity Adj. Chi-Square 1 9.2324 0.0024 Mantel-Haenszel Chi-Square 1 9.5530 0.0020 Phi Coefficient -0.0821 Contingency Coefficient 0.0819 Cramer’s V -0.0821

Fisher’s Exact Test Cell (1,1) Frequency (F) 306 Left-sided Pr <= F 0.0012 Right-sided Pr >= F 0.9992 Table Probability (P) 0.0004 Two-sided Pr <= P 0.0023

Sample Size = 1417

Post hoc comparisons of difference in household income by race/ethnicity categories revealed that for all household income levels examined (low-income, middle-income, and high-income) there are statistically significant differences by race/ethnicity. This means that I will need to account for income in my analyses so that I don’t mistakenly attribute any differences in my analyses to race when income may be an alternative explanation.