## The Simple Regression Model

Consider the two variable linear model

2.1:

Synonyms for y.

Synonyms for x.

In what sense is the equation linear?

How do we interpret ?

Suppose y represents the yield of bushels of corn per acre and x represents the amount of fertilizer applied per acre. What are the specific interpretations of ?

Suppose y represents compensation and x represents years of education. Now how do we interpret the coefficients?

### Some Assumptions

2.5 Unconditional mean of the disturbance: E(u).

2.6 The conditional mean of the error: E(u|x).

Implication of 2.5 and 2.6: Suppose y represents compensation and x represents years of education in the model 2.1. We know that in addition to education one's compensation depends on ability. Note that ability is not on the RHS of 2.1, therefore its effect on education must be captured by the error term. The implication of this is that E(ability|education) should be constant across all levels of education, otherwise the result would be a violation of 2.5 - 2.6. Is E(ability|education) = k plausible?

Diagram

### Deriving the OLS Estimator

E(u) = 0

Cov(u,x) = E(ux) = 0

Now substitute away from u in the two assumptions to get

Using the principle of analogies, let's construct the sample analog to these assumptions about the population model.

(2.14)

(2.15)

Note that this analogy approach is tantamount to the method of moments since the parameters of interest will turn out to be functions of the moments of the underlying random variable y. Place hats on top of the betas to indicate that we want to chose these values for these unknowns so that the equalities hold. Solving (2.14) for the intercept in terms of the other expressions yields

(2.17)

Take (2.17), put it into (2.15) and solve for our best guess for the unknown slope parameter.

(2.19)

Equation (2.17) tells us that the sample regression line must go through the point of means in a graph of the function.

Equation (2.19) tells us that the slope is the ratio of the covariance between the independent and dependent variables to the variance of the independent variable.

Graph showing population regression function {E(Y|X=x)}, the sample regression line, observed values, fitted values and residuals.

Your text uses a data file called CEOSAL1 to illustrate this discussion. Go here for that data. The scatter with the sample regression line, in red, is pictured below:

 All the data Just 15 observations

Just to be sure that I agree with the author of the book I have included output from EVIEWS.

 Dependent Variable: SALARY Method: Least Squares Date: 02/09/11   Time: 08:34 Sample: 1 209 Included observations: 209 Variable Coefficient Std. Error t-Statistic Prob. C 963.1913 213.2403 4.516930 0.0000 ROE 18.50119 11.12325 1.663290 0.0978 R-squared 0.013189 Mean dependent var 1281.120 Adjusted R-squared 0.008421 S.D. dependent var 1372.345 S.E. of regression 1366.555 Akaike info criterion 17.28750 Sum squared resid 3.87E+08 Schwarz criterion 17.31948 Log likelihood -1804.543 Hannan-Quinn criter. 17.30043 F-statistic 2.766532 Durbin-Watson stat 2.104990 Prob(F-statistic) 0.097768

## Properties of OLS Statistics

The residual

First, a couple of definitions courtesy of some algebra:

(2.30)       The sum of the residuals is zero by assumption and construction; see (2.14)

(2.31)         The correlations is zero by assumption and construction; see (2.15)

Sums of Squares

Total sum of squares: Note that this is the numerator of the sample variance of y. Another interpretation is that our best guess for y is to use the sample mean if we are prepared to disregard what we know about the relationship between x and y.

Explained sum of squares: Note that this is the numerator of the sample variance of the fitted values for y.

Residual sum of squares. This is the amount of variation in y that is left over after we account for the association between x and y. It is the variation in y that is NOT explained by our model.

SST = SSE + SSR The variation in y must be accounted for by the part that we can explain and that which we cannot.

### Goodness of Fit

How good is our model? If we are considering two different specifications, or models, which should we choose?

The coefficient of determination

R2 = SSE/SST         The ratio of explained sum of squares to the total sum of squares.

R2 = 1 - SSR/SST    This representation comes from a little algebraic manipulation. If the model could explain y perfectly then the coefficient of determination ought to be 1. It seems reasonable then to subtract the fraction of the variation in y that we can't explain in order to find the coefficient of determination.

So far we have discussed only models with a single explanatory variable on the RHS of the regression function. Looking at SSE and giving it a little thought suggests that one can make it bigger by including more explanatory variables, even if they don't have much to do with explaining y. The consequence of this is that one can always increase the coefficient of determination by including more variables. Later in the semester will talk about penalizing the R2 for including variables indiscriminately.

## Units of Measurement and Functional Form

Centigrade versus Fahrenheit; inches or feet or yards? Linear or log?

Measurement

1. Change the scale of only the dependent variable by the constant k:

Just multiply the original coefficient estimate by the constant.

2. Change the explanatory variable by a multiplicative constant, but leave the dependent variable alone. If you multiply the observations on x by k, then the slope coefficient must change by the multiple (1/k). There is no impact on the intercept.

3. Suppose you add a constant to every observation on the dependent variable. What will be the impact on intercept and slope?

4. Suppose you add a constant to each observation on the independent variable. What will be the impact on the intercept and slope?

Functional Form

As long as we preserve the linearity in the unknowns we can still use OLS. The computer isn't so smart after all.

Level - level -- A straight line

Level - log -- A 1% change in x produces how big a change in y? Depending on the sign of the slope coefficient this produces either a concave or convex line.

Log - Level

log - log -- A constant elasticity model

Level - reciprocal: y = bzero + b(1/x) and y = bzero - b(1/x)

Can you fill in the table below?

 Model Name Symbolic Model Interpretation of slope Marginal Effect (dY/dX) Elasticity (X/Y)(dY/dX) Level - Level Level - Log Log - Level Log - Log Level - Reciprocal Quadratic Level-Interaction Log-Reciprocal Log-quadratic Logistic

### Estimating the error variance:

Our model of the process that generated the data is

in which the intercept and slope are unknown and u is an unobservable random variable. With each sample we are able to estimate the slope and intercept and we get a set of least squares residuals that serve as "guesses" for the unobservable values taken by u.

The OLS residual is computed from the data. The researcher never sees or observes the population error since the population regression function is unknown to the researcher. Nevertheless, there is a connection between the error, u, and the least squares residual. A bit of reflection on equation (2.1) and the graph leads us to write

And then substitute away from the actual and fitted values for y:

From this you can see two things: First, the least squares residual is never exactly equal to the error; the difference between the two is sample dependent. Second, on average the least squares residual is equal to the error, and is in turn equal to zero on average.

To estimate the population error variance we work by analogy once again. We would like the error variance to be independent of any particular realization of xi. That is,

for all i.

By analogy we would then use

But note that in this computation we will have used the data twice in constructing the OLS residual series. Therefore, while it qualifies as an estimator, there ought to be a penalty for previously using the data to estimate the slope and intercept. The revised estimator is then

The subscript LS, which stands for least squares, is added to distinguish it from what we first proposed. As it happens, the estimator we first proposed is the maximum likelihood estimator of the error variance.

### Properties of the LS Estimators

The intercept and slope estimators are both unbiased. They are also consistent.

The LS error variance estimator is unbiased and consistent.

### Sampling Variances of the OLS Estimators for the Intercept and the Slope

The standard errors of the estimators are just the square roots of these variances