# Lab 12 - Chi-square and ANOVA

The class video is attached here so that you can watch my lecture again when you prepare the exams.

- If you have questions about my lecture, please use
**the comment section**at the bottom of this document.

## Chi-square more than 2 proportions

We can use Chi-square test more than two categories.

### The usage of Instagram

Let us assume we have the following data:

Freshmen | Non-Freshmen | Total | |
---|---|---|---|

Do not use | 30 | 40 | 70 |

Often | 180 | 180 | 360 |

Everyday | 400 | 350 | 750 |

———– | ——— | ————– | ——- |

Total | 610 | 570 | 1180 |

This table is about two variables: School year and Instagram usage.

The Instagram usage variable has three categories:

- Do not use (D)
- Often (O)
- Everyday (E)

Our goal is to test whether the usage distribution of Instagram defer from the school year. We can test this by looking the data as follows:

prop. F | |
---|---|

Do not use | 30/70 |

Often | 180/360 |

Everyday | 400/750 |

If we assume that the population proportion of being a freshmen \(p\), then \(p\) should not differ from each category when the two variables are indenpendent. Thus, our null hypothesis looks like this:

\[ H_0: p_D = p_O = p_E \] which can be interpreted as follows: the proportion of being a freshmen is not changed by each usage category.

#### Data load in SAS

```
data instagram ;
input schoolyear $ frequency $ count ;
datalines ;
F D 30
F O 180
F E 400
S D 40
S O 180
S E 350
;
run ;
```

#### Chi-square test in SAS

You can use `expected chisq`

option at the same time.

```
proc freq data = instagram ;
weight count ;
tables schoolyear * frequency / expected chisq;
run ;
```

**Result**

```
The FREQ Procedure
Table of frequency by schoolyear
frequency schoolyear
Frequency‚
Expected ‚
Percent ‚
Row Pct ‚
Col Pct ‚F ‚S ‚ Total
---------ˆ--------ˆ--------ˆ
D ‚ 30 ‚ 40 ‚ 70
‚ 36.186 ‚ 33.814 ‚
‚ 2.54 ‚ 3.39 ‚ 5.93
‚ 42.86 ‚ 57.14 ‚
‚ 4.92 ‚ 7.02 ‚
---------ˆ--------ˆ--------ˆ
E ‚ 400 ‚ 350 ‚ 750
‚ 387.71 ‚ 362.29 ‚
‚ 33.90 ‚ 29.66 ‚ 63.56
‚ 53.33 ‚ 46.67 ‚
‚ 65.57 ‚ 61.40 ‚
---------ˆ--------ˆ--------ˆ
O ‚ 180 ‚ 180 ‚ 360
‚ 186.1 ‚ 173.9 ‚
‚ 15.25 ‚ 15.25 ‚ 30.51
‚ 50.00 ‚ 50.00 ‚
‚ 29.51 ‚ 31.58 ‚
---------ˆ--------ˆ--------ˆ
Total 610 570 1180
51.69 48.31 100.00
Statistics for Table of frequency by schoolyear
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 2 3.4099 0.1818
Likelihood Ratio Chi-Square 2 3.4131 0.1815
Mantel-Haenszel Chi-Square 1 0.0001 0.9929
Phi Coefficient 0.0538
Contingency Coefficient 0.0537
Cramer's V 0.0538
Sample Size = 1180
```

Q. We want to test whether the two variables affects to each other. What is the Chi-square test statistic?

**Ans**: 3.4099

Q. What is d.f.?

**Ans**: 2 because we have 2 columns and 3 rows, so \((2-1) \times (3-1) = 2\)

Q. If we use \(\alpha = 0.05\), what is the decision?

**Ans**: p-value is 0.1818 which is larger than the significance level. Thus, we cannot reject the null hypothesis.

## ANOVA: Analysis of Variance

Let us compare the population means under three different situations.

### Corndogs data

Let us load data from the prof. Cowles’ website directly this time. We will feed the url of the data from our professor’s data set site as follows;

The `hotdogs`

dataset contains data on the sodium and calories contained in each
of 54 major corndog brands.

The variables are:

- type: Beef, Meat, or Poultry
- calories per corn dog
- sodium per corn dog

There are many other brands of hotdogs on the market besides those included in
this dataset. We are interested in **determining whether the mean of the calories
per hotdog is the same** in all of the three types of hotdogs.

#### Data load

```
filename mydata url "http://homepage.divms.uiowa.edu/~kcowles/Datasets/hotdogs.dat";
data corndog;
infile mydata;
input type $ calories sodium ;
run ;
```

#### Check the assumption for ANOVA

- I independent simple random samples
- Each population \(i\) has a normal distribution with unknown mean \(μ_i\).
- All of the populations have the same standard deviation \(\sigma\) (unknown)

```
proc univariate plot data = corndog ;
var calories ;
by type;
proc means data = corndog ;
var calories ;
by type ;
run ;
```

#### ANOVA test in SAS

```
proc anova data = corndog ;
class type ;
model calories = type ;
means type / bon alpha = .05 ;
run ;
```

**Result**

```
The ANOVA Procedure
Dependent Variable: calories
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 17692.19510 8846.09755 16.07 <.0001
Error 51 28067.13824 550.33604
Corrected Total 53 45759.33333
The ANOVA Procedure
Bonferroni (Dunn) t Tests for calories
NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher
Type II error rate than Tukey's for all pairwise comparisons.
Alpha 0.05
Error Degrees of Freedom 51
Error Mean Square 550.336
Critical Value of t 2.47551
Comparisons significant at the 0.05 level are indicated by ***.
Difference
type Between Simultaneous 95%
Comparison Means Confidence Limits
Meat - Beef 1.856 -17.302 21.013
Meat - Poultry 39.941 20.022 59.860 ***
Beef - Meat -1.856 -21.013 17.302
Beef - Poultry 38.085 18.928 57.243 ***
Poultry - Meat -39.941 -59.860 -20.022 ***
Poultry - Beef -38.085 -57.243 -18.928 ***
```