Lab 9 - Two sample t-tests
The class video is attached here so that you can watch my lecture again when you prepare the exams.
- If you have questions about my lecture, please use the comment section at the bottom of this document.
Two types of two sample t-test
We have two types of t-test in our course:
- Paired-sample: this is just one sample t-test disguised as a two sample t-test
- Independent sample: the real two sample t-test
Let us consider we have 20 people who are willing to participate the research about the impact of smoking on the lung performance.
What we will do is that we measure the performance of their lung before smoking. After that we let them be a smoker for a while and gether them again and measure the performance of their lung again.
FEV1 represents the forced expiratory volume in 1 second, which measures how much air a person can blow out in 1 second and is considered a measure of lung function.
Here are the data set:
FEV1_before FEV1_after 3.40 3.55 4.14 3.30 4.54 3.32 2.83 3.68 4.21 3.22 4.25 2.92 3.71 3.73 3.73 3.09 3.72 3.49 3.55 3.13 3.76 3.94 3.50 3.31 3.61 3.22 4.03 3.30 4.48 2.85 3.94 3.03 3.74 2.63 3.54 2.96 3.58 3.38 5.21 3.31
We want to evaluate that,
\[ H_0: \mu_1 = \mu_2 \quad vs. \quad H_1: \mu_1 \ne \mu_2 \] where \(\mu_1\) indicates the mean of FEV1 values for non-smokers and \(\mu_2\) indicates the mean of FEV1 values for smokers. Note that we can rewrite this as follows: \[ H_0: \mu_1 - \mu_2 = 0 \quad vs. \quad H_1: \mu_1-\mu_2 \ne 0 \] What is the observation for \(\mu_1 - \mu_2\)? It is the difference between FEV1 values for before and after measurements! Thus, we can use sample t-test in SAS as follows:
data smoking ; input before after ; diff = after - before ; datalines ; * note: copy and paste data in here ; ; run ; proc univariate data = smoking; var diff; run;
Alternative method will be:
proc ttest data = smoking ; paired after*before ; run ;
Independent two-sample data
What is ice seeding?
I just attached this just for fun.
Our data is about cloud seeding with silver nitrate. Does it really work? Here is the data set.
rainfall seeded 1697.8 S 29.0 U 17.3 U 274.7 S 118.3 S 40.6 S 95.0 U 321.2 U 255.0 S 7.7 S 345.5 U 334.1 S 242.5 S 17.5 S 1.0 U 244.3 U 36.6 U 41.1 U 302.8 S 129.6 S 430.0 S 21.7 U 147.8 U 24.4 U 4.9 U 372.4 U 489.1 S 115.3 S 830.1 U 274.7 S
Using this data, we want to ditermine, that silver nitrate actaully makes a difference when it comes to rainfall.
\[ H_0: \mu_s = \mu_u \quad vs. \quad H_1: \mu_s \ne \mu_u \] where \(\mu_s\) and \(\mu_u\) represents the mean of the amount of rainfall from the seeded clouds and unseeded clouds respectively.
Note that the null hypothesis is set as the silver nitrate does not work, in other words, \(\mu_s = \mu_u\).
Load the data set in SAS
Let us load this data to SAS:
data cloud ; input rainfall seeded $ ; lograin = log(rainfall) ; datalines; /* paste your data here */ ; run;
lograin = log(rainfall) means we are making another variable using log function. Let us check out the data set we made.
proc print data = cloud; run;
You can sort the data using seede variable as follows:
proc sort data=cloud; by seeded; run;
Two-sample test in SAS
proc ttest will give you the beautiful plots and test results.
proc ttest ; class seeded ; var lograin ; run;
Among many tables we have from the above code, we need to focus on the following part:
Method Variances DF t Value Pr > |t| Pooled Equal 28 1.79 0.0841
Under the assumption that the two groups have the equal variance, we can not reject the null hypothesis. In other words, we cannot say that the silver nitrate affects to the amount of rainfall.