# Lab 5 - Simulation in SAS

## Class video

The class video is attached here so that you can watch my lecture again when you prepare the exams.

- If you have questions about my lecture, please use
**the comment section**at the bottom of this documents.

## Creating a simulated dataset in SAS

The following code will create a simulated dataset with 1000 observations drawn from a normal distribution with mean \(\mu\) = 2 and standard deviation \(\sigma\) = 1. Note that the function `rand()`

has a three options for generating the random sample following normal distribution as follows:

`rand("Normal", mu, sigma)`

Let us run the following code in SAS;

```
data symm(keep=y) ; * only keep y;
seed = 32542 ;
do i = 1 to 1000 ;
y = rand("Normal", 2, 1);
output ;
end ;
drop seed ;
run ;
```

To check the result, we don’t need to check the whole data set, instead, we can use the following option in print procedure.

```
proc print data = symm(obs = 10);
run;
```

## Proc means

We can use `proc means`

to get various summary statistics in a more compact format than `proc univariate`

provides. The default statistics provided are

- n = number of observations
- mean
- std dev = standard deviation
- minimum
- maximum

```
proc means data = symm ;
var y ;
run ;
```

Output:

```
Analysis Variable : Y
N Mean Std Dev Minimum Maximum
--------------------------------------------------------------------
1000 2.0250454 0.9787607 -2.5489342 4.9129041
--------------------------------------------------------------------
```

## Drawing simple random samples from our population

We will use `proc surveyselect`

to draw a simple random sample of size \(10\) from our “population” data `symm`

of \(1000\) values. We can then use `proc means`

to get summary statistics for our simple random sample.

```
proc surveyselect
data=symm out=sample_data method=SRS
sampsize=100 seed=1234;
run;
proc means data = sample_data ;
var y ;
run ;
```

Output:

```
N Mean Std Dev Minimum Maximum
--------------------------------------------------------------------
100 1.9783044 1.0317330 -0.1644005 4.2722121
--------------------------------------------------------------------
```

## Drawing several different samples, and record sample statistics (mean and standard deviation) of each

Do the following:

Draw a simple random sample of size 10 from our simulated “population.” Use a different seed each time so you get different samples.

Calculate the sample mean and the sample standard deviation from the sample and record them in the table in the end of this lab worksheet.

Sample means from a skewed distribution.

Use the code below to simulate a dataset from a skewed distribution.

`data skewed ; seed = 325 ; do i = 1 to 1000 ; y = rand("Gamma", 2, 1) ; output ; end ; drop seed ; run ;`

Use

`proc univariate`

to verify that you got a skewed distribution.`proc univariate plot data = skewed ; var y ; run ;`

Draw a simple random sample of size 10 from the skewed population. Calculate the sample mean and the sample standard deviation from the sample and record them in the table in the end of this lab worksheet.