# Lab 8 - Tests of Significance

The class video is attached here so that you can watch my lecture again when you prepare the exams.

- If you have questions about my lecture, please use
**the comment section**at the bottom of this document.

## Are you a good pitcher?

I have a friend who can throw a ball really fast. But I always doubt about what he said because he always says,

`I can throw a fastball like 90 mph!`

Well, I don’t buy it, because

- If he can do it, he should go to the professional baseball league instead of being my friend. :/
- After I measured his 20 throws, here is what I got:

`## [1] 77 80 82 75 80 81 78 78 78 78 79 78 78 80 81 79 78 78 78 84`

Average 79 mph! Oh, common, this is not even close! However, my friend did not admit he is wrong and say, “Well, today is the very unlucky day! But usually, my ball speed is around 90 mph!”

### One side test

We don’t believe what he is saying because **the probability that the average of the thrown 20 ball speeds is around 79 is very very low if his opinion is true!**

Let us describe the situation using some statistical notation.

\[ H_0: \mu = 90 \ mph \quad vs. \quad H_a: \mu < 90 \ mph \] \(H_0\) is a null hypothesis, which represents what my friend insists about his ball speed. \(H_a\) is an alternative hypothesis, which represents my opinion.

If we assume that the distribution of his pitch speed follows a Normal distribution, with **UNKNOWN** mean \(\mu\) and **KNOWN** standard deviation \(\sigma\) (in this example 2), we have studied that the distribution of \(\overline{X}\) follows a Normal distribution as follows:
\[
\overline{X} \sim \mathcal{N} (\mu, \frac{\sigma}{\sqrt{n}} )
\]

Thus, if his opinion is TRUE; \(\mu = 90\), then we have \[ \overline{X} \sim \mathcal{N} (90, \frac{2}{\sqrt{20}} ) \] In this situation, what is the probability that I can see the observation like \(\overline{x} = 79\)? Using statistical notation, I can write it down as follows: \[ P(\overline{X} \le 79) = ? \] To calculate the probability, I need to convert the number into the standard Normal scale, because we can calculate the probability only on the standard Normal distribution scale. So let me use the standardization formula! \[ z = \frac{79 - 90}{2/\sqrt{20}} \approx -24.6 \] Well, now I can see, this is a very, very rare event if his real ball speed is 90 mph. Because, in our standard Normal table, the probability that corresponds to z score -3.4 was 0.0003, in other words, \[ P(\overline{X} \le 79) = P(Z \le -24.6) \ \ll \ P(Z \le -3.4) = 0.0003 \] where I use \(\ll\) symbol to say that it is wayyyyy less than the right-hand side. Thus, I conclude that his opinion is total nonsense, but I want to say it more academical way, so I would say as follow:

**I can reject \(H_0\) (His opinion) based on my sample.**

### One side test (Unknown \(\sigma\))

What if we don’t know about the population \(\sigma\)? Remember, we used \(t\) distribution for the construction of confidence intervals. Similarly, we will use the sample standard deviation, \(s\), for the \(\sigma\) value in the standardization formula! Thus, we have the following conversion value, which I denote \(t\), as follows: \[ t = \frac{79 - 90}{1.974/\sqrt{20}} \approx -24.92 \] where \(1.974\) is a sample standard deviation of my friend’s throwings. This converted value follows \(t\) distribution with degree of freedom \(19\), which gives a probability \[ P(T_{19} \le -24.92) \ll P(T\le -3.883)=0.0005 \] Note that I can not use the notation \(Z\) since it does not follow the standard Normal distribution but \(t\) distribution with d.f. 19.

### Statistical terms

#### Test statistic

Above, we have calculated the z-score assuming his opinions (\(H_0\)) is true, if I write this down using symbolic notations,
\[
z = \frac{79 - 90}{2/\sqrt{20}} \Longrightarrow z = \frac{\overline{x} - \mu_0}{\sigma / \sqrt{n}}
\]
We call this special standardization formula **\(z\) test statistic**, it is special since we use imaginary \(\mu\). Likewise, we have **\(t\) test statistic**
\[
t = \frac{79 - 90}{1.974/\sqrt{20}} \Longrightarrow t = \frac{\overline{x} - \mu_0}{s / \sqrt{n}}
\]

#### P-value

After calculating the test statistic, what we have tried to find was the probability that the observation \(\overline{x}\) can happen in the assuming distribution, which was, in our case,
\[
P(\overline{X} \le 79) \quad when \ \mu = 90
\]
We call this probability **\(p\)-value**.

### Testing in SAS

Let us do this in SAS now.

```
data pitch ;
input speed ;
datalines;
* note: copy and paste data in here ;
;
run;
proc print data = pitch;
run;
proc means data = pitch;
var speed;
run;
```

```
The MEANS Procedure
Analysis Variable : speed
N Mean Std Dev Minimum Maximum
------------------------------------------------------------------
20 79.0000000 1.9735088 75.0000000 84.0000000
------------------------------------------------------------------
```

#### Z-Test in SAS

Using the above result we hvae the information about the \(n\) and \(\overline{x}\). Since SAS does not provide the *z-test*, let just make our own code for the test!

```
%let numObs= 20;
%let smean = 79;
%let sigma = 2;
%let mu0 = 90;
%let Side = L;
%let slevel= 0.05;
title "Test for population mean";
data zTestMean;
n = &NumObs;
x_bar = &smean;
SE = &sigma / sqrt(n);
z = (x_bar - &mu0) / SE;
alpha = &slevel;
/* p-Value */
Side = "&Side";
if Side="L" then /* one-sided, lower tail */
pValue = cdf("normal", z);
label pValue = "Pr(Z < z)";
else if Side="U" then /* one-sided, upper tail */
pValue = 1 - cdf("normal", z);
label pValue = "Pr(Z > z)";
else if Side="2" then
pValue = 2*(1-cdf("normal", abs(z))); /* two-sided */
label pValue = "Pr(Z > |z|)";
/* Test result */
length result $ 16;
if pValue <= alpha then
result = "Reject Ho";
else if pValue > alpha then
result = "Cannot reject Ho";
format pValue PVALUE6.4 z 7.4 result $16.;
label result = "Test result";
label alpha = "Sig. level";
run ;
proc print data = zTestMean label noobs; run;
run;
```

The result will be as follows;

```
Test for population mean
Sig. Pr(Z >
n x_bar SE z level Side |z|) Test result
----------------------------------------------------------------------------
20 79 0.44721 -24.597 0.05 L <.0001 Reject Ho
----------------------------------------------------------------------------
```

#### t-Test in SAS

I know the z-test in SAS is actually really burdensome, I would rather do it by hand. However, it will be useful for you to have the code for your future usage. For *t-test* SAS provides us with t-test result by using `proc univariate`

;

```
proc univariate mu0 = 90 data = pitch;
var speed;
run;
```

```
Tests for Location: Mu0=90
Test -Statistic- -----p Value------
Student's t t -24.9269 Pr > |t| <.0001
Sign M -10 Pr >= |M| <.0001
Signed Rank S -105 Pr >= |S| <.0001
```

Note that SAS will give you two-side test p-value, so you need to adjust the result into your test situation.