DAT Assignment 2: Chi Square

(For those of you looking for something more interesting, go check out the pictures I took walking around my neighborhood after the blizzard in DC this weekend.)

Following completion of the steps described above, create a blog entry where you submit syntax used to run a Chi-Square Test (copied and pasted from your program) along with corresponding output and a few sentences of interpretation.

The rest is past the jump.

In keeping with my ongoing hypothesis, I’m examining data from the Outlook on Life survey specifically to evaluate whether there are differences in perception of/trust in law enforcement and level of opportunity between non-Hispanic Whites and non-Hispanic Blacks during the beginning of the #BlackLivesMatter movement. For the Chi Square example, I will be looking to see if there is a relationship between race/ethnicity and household income. Knowing if there is a relationship between income and race/ethnicity will tell me if I need to consider that income level may be an alternative or competing explanation for any differences in outcomes analyzed by race.

I’m really interested in approximate differences between lower-income, middle-income, and high-income households. After running a frequency distribution of PPINCIMP (results below), I decided to created a new variable (PPINCIMP_TERT) based on the Household Income variable (PPINCIMP) into three groups representing approximately 1/3rd of the cases in each group (approximate tertiles).

Household Income IF
PPINCIMP Frequency Percent Cumulative
Frequency
Cumulative
Percent
1 127 5.54 127 5.54
2 64 2.79 191 8.33
3 61 2.66 252 10.99
4 68 2.96 320 13.95
5 62 2.70 382 16.65
6 98 4.27 480 20.92
7 109 4.75 589 25.68
8 140 6.10 729 31.78
9 108 4.71 837 36.49
10 132 5.75 969 42.24
11 162 7.06 1131 49.30
12 181 7.89 1312 57.19
13 235 10.24 1547 67.44
14 129 5.62 1676 73.06
15 145 6.32 1821 79.38
16 200 8.72 2021 88.10
17 125 5.45 2146 93.55
18 63 2.75 2209 96.29
19 85 3.71 2294 100.00

The first tertile (coded as 1) will represent all cases where Household income is less than $30k (values of PPINCIMP between 1 and 8, inclusive). The second tertile (coded as 2) will represent all cases where Household income is greater than or equal to $30k and less than $75k (values of PPINCIMP between 9 and 13, inclusive). The third tertile (coded as 3) will represent all cases where Household income is greater than or equal to $75k (values of PPINCIMP between 14 and 19 inclusive).

My syntax:

LIBNAME mydata “/courses/d1406ae5ba27fe300″ access=readonly;
/* mydata is the local name for the database */
/* Research question: Race and perception of law enforcement and opportunity for
achievement between Blacks and Whites during the beginning of the
#BlackLivesMatter movement
SPECIFICALLY H1: Are non-Hispanic Blacks less likely to trust the federal
government, the police, and/or the legal system than non-Hispanic Whites?
H1a: Are non-Hispanic Blacks less likely to trust the federal government
than non-Hispanic Whites?
H1b: Are non-Hispanic Blacks less likely to trust the police than
non-Hispanic Whites?
H1a: Are non-Hispanic Blacks less likely to trust the legal system than
non-Hispanic Whites?
SPECIFICALLY H2: Does income-level influence levels of trust in the federal
government, the police, and/or the legal system in both Blacks and White?

For the Chi Square example, I will be looking to see if there is a relationship
between race/ethnicity and household income. Knowing if there is a
relationship between income and race/ethnicity will tell me if I need to
consider that income level may be the actual cause of differences in outcomes
analyzed by race.
*/
DATA new; set mydata.oll_pds;
LABEL ppethm=”Race / Ethnicity”
ppincimp=”Household Income”

/* Select statements limit the cases included in the analysis; includes only those who
indicated race/ethnicity of “White, Non-Hispanic” (coded as 1) or “Black,
Non-Hispanic” (coded as 2) */
IF ppethm=1 or ppethm = 2;

/* Recode PPINCIMP: Household Income into fewer groups for easier analysis. I’m
most interested in comparing low-income, middle-income, and high-income households.
Based on Frequencies, I will divide PPINCIMP into tertiles.
The first tertile (coded as 1) will represent all cases where Household income is
less than $30k.
The second tertile (coded as 2) will represent all cases where Household income is
greater than or equal to $30k and less than $75k.
The third tertile (coded as 3) will represent all cases where Household income is
greater than or equal to $75k.*/
IF ppincimp <= 8 then ppincimp_tert = 1;
ELSE IF ppincimp <= 13 then ppincimp_tert = 2;
ELSE ppincimp_tert = 3;

PROC SORT; by CASEID;

/* Chi Square syntax is an addition to the PROC FREQ statement as follows:
PROC FREQ; TABLES VAR1*VAR2/CHISQ;
*/

PROC FREQ; TABLES ppethm*ppincimp_tert/CHISQ;

/* Post-hoc Chi Square tests for when multiple groups are examined: Bonferroni Adjustment is made to expected p-value for individual comparisons. Adjusted p-value is equal to desired p-value divided by the number of independent comparisons needed.

For my analysis, I will need 3 individual comparisons, so my Bonferroni adjusted p-value is
0.05/3 = 0.017.
*/

RUN;

DATA COMPARISON1; SET NEW;
IF PPINCIMP_TERT = 1 OR PPINCIMP_TERT = 2;
PROC SORT; BY CASEID;
PROC FREQ; TABLES PPETHM*PPINCIMP_TERT/CHISQ;
RUN;

DATA COMPARISON2; SET NEW;
IF PPINCIMP_TERT = 1 OR PPINCIMP_TERT = 3;
PROC SORT; BY CASEID;
PROC FREQ; TABLES PPETHM*PPINCIMP_TERT/CHISQ;
RUN;

DATA COMPARISON3; SET NEW;
IF PPINCIMP_TERT = 2 OR PPINCIMP_TERT = 3;
PROC SORT; BY CASEID;
PROC FREQ; TABLES PPETHM*PPINCIMP_TERT/CHISQ;
RUN;

My output:

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of PPETHM by ppincimp_tert
PPETHM(Race / Ethnicity) ppincimp_tert
1 2 3 Total
1
170
8.13
20.88
25.19
306
14.63
37.59
41.52
338
16.16
41.52
49.71
814
38.91
2
505
24.14
39.51
74.81
431
20.60
33.72
58.48
342
16.35
26.76
50.29
1278
61.09
Total
675
32.27
737
35.23
680
32.50
2092
100.00

Statistics for Table of PPETHM by ppincimp_tert

Statistic DF Value Prob
Chi-Square 2 88.9452 <.0001
Likelihood Ratio Chi-Square 2 91.4144 <.0001
Mantel-Haenszel Chi-Square 1 85.5709 <.0001
Phi Coefficient 0.2062
Contingency Coefficient 0.2019
Cramer’s V 0.2062

Sample Size = 2092

A Chi Square test of independence revealed that race/ethnicity in the OOL survey and level of household income were significantly associated, X2 =88.95, 2 df, p<.0001. However, as I have three levels to my household income variable, I will need to run additional post-hoc analysis to determine which groups are significantly different and use the Bonferroni adjustment to my p-value to determine statistical significance. As I will need to complete three individual comparisons and am using a p-value of 0.05, my Bonferroni adjusted p-value for the individual comparisons = 0.05/3 = 0.017.


The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of PPETHM by ppincimp_tert
PPETHM(Race / Ethnicity) ppincimp_tert
1 2 Total
1
170
12.04
35.71
25.19
306
21.67
64.29
41.52
476
33.71
2
505
35.76
53.95
74.81
431
30.52
46.05
58.48
936
66.29
Total
675
47.80
737
52.20
1412
100.00

Statistics for Table of PPETHM by ppincimp_tert

Statistic DF Value Prob
Chi-Square 1 42.0663 <.0001
Likelihood Ratio Chi-Square 1 42.5372 <.0001
Continuity Adj. Chi-Square 1 41.3385 <.0001
Mantel-Haenszel Chi-Square 1 42.0365 <.0001
Phi Coefficient -0.1726
Contingency Coefficient 0.1701
Cramer’s V -0.1726
Fisher’s Exact Test
Cell (1,1) Frequency (F) 170
Left-sided Pr <= F <.0001
Right-sided Pr >= F 1.0000
Table Probability (P) <.0001
Two-sided Pr <= P <.0001

Sample Size = 1412


The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of PPETHM by ppincimp_tert
PPETHM(Race / Ethnicity) ppincimp_tert
1 3 Total
1
170
12.55
33.46
25.19
338
24.94
66.54
49.71
508
37.49
2
505
37.27
59.62
74.81
342
25.24
40.38
50.29
847
62.51
Total
675
49.82
680
50.18
1355
100.00

Statistics for Table of PPETHM by ppincimp_tert

Statistic DF Value Prob
Chi-Square 1 86.9101 <.0001
Likelihood Ratio Chi-Square 1 88.1653 <.0001
Continuity Adj. Chi-Square 1 85.8670 <.0001
Mantel-Haenszel Chi-Square 1 86.8460 <.0001
Phi Coefficient -0.2533
Contingency Coefficient 0.2455
Cramer’s V -0.2533
Fisher’s Exact Test
Cell (1,1) Frequency (F) 170
Left-sided Pr <= F <.0001
Right-sided Pr >= F 1.0000
Table Probability (P) <.0001
Two-sided Pr <= P <.0001

Sample Size = 1355


The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of PPETHM by ppincimp_tert
PPETHM(Race / Ethnicity) ppincimp_tert
2 3 Total
1
306
21.59
47.52
41.52
338
23.85
52.48
49.71
644
45.45
2
431
30.42
55.76
58.48
342
24.14
44.24
50.29
773
54.55
Total
737
52.01
680
47.99
1417
100.00

Statistics for Table of PPETHM by ppincimp_tert

Statistic DF Value Prob
Chi-Square 1 9.5597 0.0020
Likelihood Ratio Chi-Square 1 9.5671 0.0020
Continuity Adj. Chi-Square 1 9.2324 0.0024
Mantel-Haenszel Chi-Square 1 9.5530 0.0020
Phi Coefficient -0.0821
Contingency Coefficient 0.0819
Cramer’s V -0.0821
Fisher’s Exact Test
Cell (1,1) Frequency (F) 306
Left-sided Pr <= F 0.0012
Right-sided Pr >= F 0.9992
Table Probability (P) 0.0004
Two-sided Pr <= P 0.0023

Sample Size = 1417

 

Post hoc comparisons of difference in household income by race/ethnicity categories revealed that for all household income levels examined (low-income, middle-income, and high-income) there are statistically significant differences by race/ethnicity. This means that I will need to account for income in my analyses so that I don’t mistakenly attribute any differences in my analyses to race when income may be an alternative explanation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s