# Lab 14 - Confidence interval in Regression

The class video is attached here so that you can watch my lecture again when you prepare the exams.

- If you have questions about my lecture, please use
**the comment section**at the bottom of this document.

### Confidence interval for mean response

For given a particular value of independent variable \(x_0\), we may want to construct the confidence interval for our prediction. After we fitted the regression line, we have used the fitted regression to predict the dependent variable for given value of \(x_0\) as follows;

\[ \hat{y}_0 = \hat{a} + \hat{b} x_0 \]

One of the thing we need to take as a granted is the following:

\[ \hat{y}_0 \sim \mathcal{N}\left(a + bx_0, \sigma^2 \left(\frac{1}{n}+\frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}\right)\right) \]

If you saw the previous week’s discussion section, it sounds very similar to what we have done last week. We found the distribution follows normal distribution, and replace the \(\sigma\) with sample standard deviation which leads us to t-test statistic. Now, we will do the same thing,

#### Replacing \(\sigma\)

Since we don’t know what is \(\sigma\) here, and we will replace this \(\sigma\) with the following: \[ s_{y|x}=\sqrt{\frac{1}{n-2}\sum\left(y_{i}-\hat{y}_{i}\right)^{2}} \] thus, we have \[ S.E.(\hat{y}_0) = \hat{\sigma}_{\hat{y}_0} = s_{y|x} \sqrt{\frac{1}{n}+\frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}} \]

### C.I. for the mean response prediction \(\hat{y}_0\)

Since we replace \(\sigma\) with \(s_{y|x}\), it will lead us to t-critical value for C.I. as follows:

\[
\hat{y}_0 \pm t_{1-\alpha/2} S.E.(\hat{y}_0)
\]
where the critical value \(t\) with `degree of freedom n-2`

.

### C.I. for the new observation \(y_{new}\)

Now, our goal become little different. We want to construct the C.I. for the new observation at specific point in x axis. We should distinguish this case with the previous prediction. The previous C.I. are related to `our prediction`

which uses the fitted regression line. In this section, we want to predict a random observation \(y_{new}\) which occurs at \(x_0\)

- Our prediction given \(x_0\) (mean response): we always use formula \(\hat{a} + \hat{b} x_0\).
- Random observation \(y_{new}\): we know its center is \(a + b x_0\) but we don’t know where it will occur.

Therefore, it should have bigger standard error than the mean response case as follows:

\[ S.E.(\hat{y}_{new}) = \hat{\sigma}_{\hat{y}_0} = s_{y|x} \sqrt{1 + \frac{1}{n}+\frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}} \]

Note that we have same structure of S.E. except the added `1`

in the square root.

### SAS example for C.I. of predictions

#### Load data in SAS

Let us use the same data in our lecture. The Florida Department of Natural Resources reported data on the number of boat registrations in Florida and the number of mantees killed by boats for the years between 1977 and 2009.

The variables in the dataset are:

- year
- boats: number of powerboat registrations in 1000’s
- manatees: number of manatees killed by boats

```
data manatee ;
filename manatee url "https://homepage.divms.uiowa.edu/~kcowles/Datasets/manatees.dat";
infile manatee ;
input year powerbt killed ;
run ;
```

#### Plotting

```
proc sgscatter data = manatee ;
title "Scatter plot of powerbt and killed";
plot killed * powerbt /
datalabel = year reg = (nogroup) grid;
run ;
```

#### Two options for C.I. with the new \(x_0\): `clm`

, `cli`

Tho following SAS code will do the regression with the confidence intervals for the parameters as we have seen this in the last week.

```
proc reg data = manatee ;
model killed = powerbt / clb ;
run ;
```

SAS provides the two options; `clm`

and `cli`

for the C.I. of the mean response and the new observation cases above.

- Use
`clm`

option to construct the C.I. for the mean response

```
proc reg data = manatee ;
model killed = powerbt / clm ;
run ;
```

- Use
`cli`

option to construct the C.I. for the new obs. \(y_0\)

```
proc reg data = manatee ;
model killed = powerbt / cli ;
run ;
```

However, the result shows the C.I.s for the data points that are only in the data set you have entered. Thus,

you need to add the \(x_0\) first to get the confidence interval for them.

#### Add data point for constructing C.I.

For example, let us assume that we want to construct the C.I. for the new observation \(powerbt = 650\). The following SAS code will add the one data point which has the 650 for `powerbt`

and blank values for `killed`

.

```
data Xvalues;
input powerbt killed;
datalines;
650 .
;
data manatee;
set manatee Xvalues;
;
run;
```

You can check the result with `proc print data = manatee`

to see that the one line added to the original data set `manatee`

.

Now, you can use the `clm’ options for the 95% C.I. of mean response with respect to the powerbt = 650.

```
proc reg data = manatee ;
model killed = powerbt / clm alpha = 0.05;
run ;
```

Result:

```
The REG Procedure
Model: MODEL1
Dependent Variable: killed
Output Statistics
Std
Error
Dependent Predicted Mean
Obs Variable Value Predict 95% CL Mean Residual
1 13 14.5826 2.5893 9.3017 19.8635 -1.5826
...
33 97 83.7303 2.3126 79.0138 88.4468 13.2697
34 . 40.8199 1.5450 37.6689 43.9710 .
```

Observation 34 hase the 95% C.I. for mean response at powerbt = 650, \((37.6689, 43.9710)\).