Lab 14 - Confidence interval in Regression

The class video is attached here so that you can watch my lecture again when you prepare the exams.

  • If you have questions about my lecture, please use the comment section at the bottom of this document.

Confidence interval for mean response

For given a particular value of independent variable \(x_0\), we may want to construct the confidence interval for our prediction. After we fitted the regression line, we have used the fitted regression to predict the dependent variable for given value of \(x_0\) as follows;

\[ \hat{y}_0 = \hat{a} + \hat{b} x_0 \]

One of the thing we need to take as a granted is the following:

\[ \hat{y}_0 \sim \mathcal{N}\left(a + bx_0, \sigma^2 \left(\frac{1}{n}+\frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}\right)\right) \]

If you saw the previous week’s discussion section, it sounds very similar to what we have done last week. We found the distribution follows normal distribution, and replace the \(\sigma\) with sample standard deviation which leads us to t-test statistic. Now, we will do the same thing,

Replacing \(\sigma\)

Since we don’t know what is \(\sigma\) here, and we will replace this \(\sigma\) with the following: \[ s_{y|x}=\sqrt{\frac{1}{n-2}\sum\left(y_{i}-\hat{y}_{i}\right)^{2}} \] thus, we have \[ S.E.(\hat{y}_0) = \hat{\sigma}_{\hat{y}_0} = s_{y|x} \sqrt{\frac{1}{n}+\frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}} \]

C.I. for the mean response prediction \(\hat{y}_0\)

Since we replace \(\sigma\) with \(s_{y|x}\), it will lead us to t-critical value for C.I. as follows:

\[ \hat{y}_0 \pm t_{1-\alpha/2} S.E.(\hat{y}_0) \] where the critical value \(t\) with degree of freedom n-2.

C.I. for the new observation \(y_{new}\)

Now, our goal become little different. We want to construct the C.I. for the new observation at specific point in x axis. We should distinguish this case with the previous prediction. The previous C.I. are related to our prediction which uses the fitted regression line. In this section, we want to predict a random observation \(y_{new}\) which occurs at \(x_0\)

  • Our prediction given \(x_0\) (mean response): we always use formula \(\hat{a} + \hat{b} x_0\).
  • Random observation \(y_{new}\): we know its center is \(a + b x_0\) but we don’t know where it will occur.

Therefore, it should have bigger standard error than the mean response case as follows:

\[ S.E.(\hat{y}_{new}) = \hat{\sigma}_{\hat{y}_0} = s_{y|x} \sqrt{1 + \frac{1}{n}+\frac{(x_0 - \bar{x})^2}{\sum(x_i - \bar{x})^2}} \]

Note that we have same structure of S.E. except the added 1 in the square root.

SAS example for C.I. of predictions

Load data in SAS

Let us use the same data in our lecture. The Florida Department of Natural Resources reported data on the number of boat registrations in Florida and the number of mantees killed by boats for the years between 1977 and 2009.

The variables in the dataset are:

  • year
  • boats: number of powerboat registrations in 1000’s
  • manatees: number of manatees killed by boats
data manatee ;
filename manatee url "https://homepage.divms.uiowa.edu/~kcowles/Datasets/manatees.dat";
infile manatee ;
input year powerbt killed ;
run ;

Plotting

proc sgscatter data = manatee ;
title "Scatter plot of powerbt and killed";
  plot killed * powerbt /
    datalabel = year reg = (nogroup) grid;
run ;

Two options for C.I. with the new \(x_0\): clm, cli

Tho following SAS code will do the regression with the confidence intervals for the parameters as we have seen this in the last week.

proc reg data = manatee ;
model killed = powerbt / clb ;
run ;

SAS provides the two options; clm and cli for the C.I. of the mean response and the new observation cases above.

  • Use clm option to construct the C.I. for the mean response
proc reg data = manatee ;
model killed = powerbt / clm ;
run ;
  • Use cli option to construct the C.I. for the new obs. \(y_0\)
proc reg data = manatee ;
model killed = powerbt / cli ;
run ;

However, the result shows the C.I.s for the data points that are only in the data set you have entered. Thus,

you need to add the \(x_0\) first to get the confidence interval for them.

Add data point for constructing C.I.

For example, let us assume that we want to construct the C.I. for the new observation \(powerbt = 650\). The following SAS code will add the one data point which has the 650 for powerbt and blank values for killed.

data Xvalues;
input powerbt killed;
datalines;
650 .
;

data manatee;
set manatee Xvalues;
;
run;

You can check the result with proc print data = manatee to see that the one line added to the original data set manatee.

Now, you can use the `clm’ options for the 95% C.I. of mean response with respect to the powerbt = 650.

proc reg data = manatee ;
model killed = powerbt / clm alpha = 0.05;
run ;

Result:

                           The REG Procedure
                             Model: MODEL1
                      Dependent Variable: killed

                           Output Statistics

                                  Std
                                Error
      Dependent  Predicted       Mean
 Obs   Variable      Value    Predict       95% CL Mean       Residual

   1         13    14.5826     2.5893     9.3017    19.8635    -1.5826
  ...
  33         97    83.7303     2.3126    79.0138    88.4468    13.2697
  34          .    40.8199     1.5450    37.6689    43.9710          .

Observation 34 hase the 95% C.I. for mean response at powerbt = 650, \((37.6689, 43.9710)\).

Previous