# Lab 12 - Chi-square and ANOVA

The class video is attached here so that you can watch my lecture again when you prepare the exams.

• If you have questions about my lecture, please use the comment section at the bottom of this document.

## Chi-square more than 2 proportions

We can use Chi-square test more than two categories.

### The usage of Instagram

Let us assume we have the following data:

Freshmen Non-Freshmen Total
Do not use 30 40 70
Often 180 180 360
Everyday 400 350 750
———– ——— ————– ——-
Total 610 570 1180

This table is about two variables: School year and Instagram usage.

The Instagram usage variable has three categories:

• Do not use (D)
• Often (O)
• Everyday (E)

Our goal is to test whether the usage distribution of Instagram defer from the school year. We can test this by looking the data as follows:

prop. F
Do not use 30/70
Often 180/360
Everyday 400/750

If we assume that the population proportion of being a freshmen $$p$$, then $$p$$ should not differ from each category when the two variables are indenpendent. Thus, our null hypothesis looks like this:

$H_0: p_D = p_O = p_E$ which can be interpreted as follows: the proportion of being a freshmen is not changed by each usage category.

data instagram ;
input schoolyear $frequency$ count ;
datalines ;
F D 30
F O 180
F E 400
S D 40
S O 180
S E 350
;
run ;

#### Chi-square test in SAS

You can use expected chisq option at the same time.

proc freq data = instagram ;
weight count ;
tables schoolyear * frequency / expected chisq;
run ;

Result

                                      The FREQ Procedure

Table of frequency by schoolyear

frequency     schoolyear

Frequency‚
Expected ‚
Percent  ‚
Row Pct  ‚
Col Pct  ‚F       ‚S       ‚  Total
---------ˆ--------ˆ--------ˆ
D        ‚     30 ‚     40 ‚     70
‚ 36.186 ‚ 33.814 ‚
‚   2.54 ‚   3.39 ‚   5.93
‚  42.86 ‚  57.14 ‚
‚   4.92 ‚   7.02 ‚
---------ˆ--------ˆ--------ˆ
E        ‚    400 ‚    350 ‚    750
‚ 387.71 ‚ 362.29 ‚
‚  33.90 ‚  29.66 ‚  63.56
‚  53.33 ‚  46.67 ‚
‚  65.57 ‚  61.40 ‚
---------ˆ--------ˆ--------ˆ
O        ‚    180 ‚    180 ‚    360
‚  186.1 ‚  173.9 ‚
‚  15.25 ‚  15.25 ‚  30.51
‚  50.00 ‚  50.00 ‚
‚  29.51 ‚  31.58 ‚
---------ˆ--------ˆ--------ˆ
Total         610      570     1180
51.69    48.31   100.00

Statistics for Table of frequency by schoolyear

Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     2      3.4099    0.1818
Likelihood Ratio Chi-Square    2      3.4131    0.1815
Mantel-Haenszel Chi-Square     1      0.0001    0.9929
Phi Coefficient                       0.0538
Contingency Coefficient               0.0537
Cramer's V                            0.0538

Sample Size = 1180


Q. We want to test whether the two variables affects to each other. What is the Chi-square test statistic?

Ans: 3.4099

Q. What is d.f.?

Ans: 2 because we have 2 columns and 3 rows, so $$(2-1) \times (3-1) = 2$$

Q. If we use $$\alpha = 0.05$$, what is the decision?

Ans: p-value is 0.1818 which is larger than the significance level. Thus, we cannot reject the null hypothesis.

## ANOVA: Analysis of Variance

Let us compare the population means under three different situations.

### Corndogs data

Let us load data from the prof. Cowles’ website directly this time. We will feed the url of the data from our professor’s data set site as follows;

The hotdogs dataset contains data on the sodium and calories contained in each of 54 major corndog brands.

The variables are:

• type: Beef, Meat, or Poultry
• calories per corn dog
• sodium per corn dog

There are many other brands of hotdogs on the market besides those included in this dataset. We are interested in determining whether the mean of the calories per hotdog is the same in all of the three types of hotdogs.

filename mydata url "http://homepage.divms.uiowa.edu/~kcowles/Datasets/hotdogs.dat";
data corndog;
infile mydata;
input type \$ calories sodium ;
run ;

#### Check the assumption for ANOVA

• I independent simple random samples
• Each population $$i$$ has a normal distribution with unknown mean $$μ_i$$.
• All of the populations have the same standard deviation $$\sigma$$ (unknown)
proc univariate plot data = corndog ;
var calories ;
by type;

proc means data = corndog ;
var calories ;
by type ;
run ;

#### ANOVA test in SAS

proc anova data = corndog ;
class type ;
model calories = type  ;
means type / bon alpha = .05 ;
run ;

Result


The ANOVA Procedure

Dependent Variable: calories

Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F

Model                        2     17692.19510      8846.09755      16.07    <.0001

Error                       51     28067.13824       550.33604

Corrected Total             53     45759.33333

The ANOVA Procedure

Bonferroni (Dunn) t Tests for calories

NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher
Type II error rate than Tukey's for all pairwise comparisons.

Alpha                        0.05
Error Degrees of Freedom       51
Error Mean Square         550.336
Critical Value of t       2.47551

Comparisons significant at the 0.05 level are indicated by ***.

Difference
type              Between     Simultaneous 95%
Comparison             Means    Confidence Limits

Meat    - Beef            1.856     -17.302   21.013
Meat    - Poultry        39.941      20.022   59.860  ***
Beef    - Meat           -1.856     -21.013   17.302
Beef    - Poultry        38.085      18.928   57.243  ***
Poultry - Meat          -39.941     -59.860  -20.022  ***
Poultry - Beef          -38.085     -57.243  -18.928  ***