A. Omission of a Relevant Variable
Suppose that the true model is
but through some error in judgement we estimate the parameters of the model
That is we incorrectly omit the kth independent variable. What are the
consequences of having omitted the kth variable? Define
The estimate from the incorrectly specified model is
To check for bias we will substitute in for y
Let us take a closer look at P. Begin with the product of the design matrix with the
last column omitted and the correct design matrix.
Substitute back into the estimator
Now consider the second term. Except for bk, it looks like an OLS estimator resulting from
regressing xk on the remaining
independent variables. That is,
So the bias of the least squares estimator for the omitted variables model is
The LS estimator is biased except when either
1. bk =
2. for all i. If this is the case then xk is linearly independent of all the other columns of x.
3. Suppose xk is uncorrelated with the other columns of x, then the slopes are unbiased but the intercept is biased unless the mean of xk is also zero.
B. Inclusion of an Irrelevant Variable
No problem, Mon. If an irrelevant variable is included, then in expectation it takes the value of zero and has no impact on the correctly included variables. Nor, in expectation, does it change the efficiency of least squares. However, we don't live in a perfect long run world. Consequently, since including more variables necessarily reduces the residual sum of squares, the estimate of the coefficient variance will be smaller. In practice we may find ourselves rejecting too many null hypotheses.
1. Functions Linear in Parameters
Almost every modern econometrics textbook presents material on nonlinear estimation. This seems to me to be much ado about nothing. To begin with, the small sample properties of these estimators are not well understood. At best we can say something asymptotically. With OLS we can say a great deal both in large samples and in small samples.
Secondly, the nonlinear estimation routines are computationally burdensome. They rely on a hill climbing algorithm that either minimizes the sum of squared errors or maximizes the likelihood function. Most software packages offer at least two algorithms for this. In the fine print they always remind the user to choose starting values carefully. There is usually a caveat that choice of functional form is important for avoiding local minima or maxima.
A third problem is that economic data is often very flat. That is, for the observed data, the ascent on the likelihood function may not be very steep. As a result the algorithm may wander around on likelihood function until the number of iterations is used up. The closing round estimates may be quite different depending on the chosen starting values for the model parameters.
The final reason that non-linear estimation is a great deal of noise to little effect is that there are other useful alternatives. Not only are there other alternatives, but in most cases we would be quite justified in using the alternative. We know from the Weierstrasse Theorem that we can approximate, to any desired degree, any continuous and continuously differentiable function with a polynomial of sufficiently high degree.
There are exceptions to these comments. The most notable one is in time series analysis. If ARIMA models are to be estimated by maximum likelihood then nonlinear routines are the only way to go.
y = a + b ln(x)
As a first example consider the above figure. This is just the semi-log function .
ln(y) = a + b ln(x)
The above example is the log-log function. Whether the curve is concave or convex depends on the size of b. If there were another right hand side variable or two then the function would be the familiar Cobb-Douglas function. By simply changing the sign on b we can get a curve that looks like an indifference curve or an isoquant, as in the following figure. By altering the magnitude of b it is possible, in the context of indifference curves, to show a relatively stronger preference for one good or the other.
ln(y) = a - b ln(x)
If one uses the inverse of the independent variable then one gets a curve as in figure 4.
Figure 4 y = a +
The curve is convex and approaches an asymptote from above.
This might be useful for estimating a money demand function. Elsewhere in this chapter you will encounter the Box-Cox transformation. This is the usual model for money demand. As you will see, it is an analytically cumbersome technique.
Figure 5 y = a -
In figure 5 we just change the sign on the coefficient in the 'money demand' function. The result is that it is concave and approaches the asymptote from below.
In figure 6 we can produce a sigmoidal shape by proper choice of the parameters. For example, a production function might have this shape if over the relevant range of output there is not a negative marginal product. For most problems in economics this seems quite plausible.
Figure 6 ln(y) = a +
In Figure 7 we have a parabola.
Figure 7 y = a +
bx + gx2
By suitable choice of parameters we could flip it over and have an average cost curve.
Finally, although the coefficients have been chosen to exaggerate the shape, a cubic function can be chosen to create a production function or cost function that is well behaved.
Figure 8 y = a+
bx + gx2 + dx3
The curves in figures 7 and 8 do present their own challenges at the time of
estimation. It is often found that when the independent variable is entered on the right
hand side in polynomial form the problem of multicollinearity rears its ugly head. This
can be overcome in part, for example, by estimating an average cost curve instead of a
total cost curve. So doing entails dropping, say, the cube term from the function. Another
solution is found in the use of orthogonal polynomials. See F.A. Graybill, An Introduction
to Linear Statistical Models, McGraw-Hill, 1961, Pp. 172-182.
2. The Box-Cox Transformation
We begin with the observation that if we define f(z) as follows
The last line is found by applying L'Hopital's Rule. Given the flexibility of this
transformation it is suggested that the classical linear regression model be written as
The least squares estimates are
One would proceed by any one of several methods. The easiest would be a Taylor Series
expansion of (i). Use the OLS estimates of b as a starting
point then conduct a grid search over .
D. Dummy Variables and Splines
The models discussed in this section are very closely related to the earlier work on restrictions and tests of hypothesis.
1. Dummy Variables
Suppose we have earnings data on workers who can be divided as
We also have a continuous variable called x, perhaps age, so the full model is
Using OLS to estimate all the parameters we can then discuss the effects of membership
in particular groups. For example, suppose that we want to predict the earnings of a
non-union, uneducated male. The predicted earnings, conditional on x, are
If we have a union member, educated, single female then the predicted earnings are
The idea is that each group has its own independent intercept, but all have the same
slope on x.
We could use dummies to model different slopes for different groups. Consider an
example with just two groups.
We have the model with one RHS variable
If the individual is a male then we get
For a female
The female differs from the male in both intercept and slope. In this case, in which
all of the RHS variables receive the dummy variable treatment, we could apply OLS to the
individual subsets of data and get the same results. In an instance where some slope is
common then we want to apply OLS to the pooled data.
Suppose we have a quarterly time series. Instead of setting up a dummy variable for
three of the four quarters as in
we get lazy and construct a variable that takes the following form
Then, blundering along, we estimate the three parameters of
The results by quarter are
The effect of creating this funny RHS variable, S, to account for the seasonal
differences is to make the quarter-to-quarter shift the same between any pair of quarters.
Is this plausible? You will also see this kind of careless construction in cross sections
in which firms in industry 1 have a variable with a value of 1, firms in industry 2 have a
variable with a value of 2 and so on.
Splines have their origin in architecture and engineering. In those disciplines a fair
curve needs to be fitted to a set of points and a french curve just won't do the trick.
Instead they use a flexible rubber edge which can be bent to any curve. It is held in
place by weights called ducks. A point where the curve changes its shape is called a knot.
Return to the earlier example in which there were two groups of wage earners differing in slope and intercept, but the slope did not change as the worker aged. The estimated model would appear
But suppose, in a new study, we think that the rate of change of wages should depend on age. The model needs to be modified so that the slope coefficient differs over prespecified intervals. There is also the stipulation that the resulting curve be continuous at the points where there is a change in slope. The new model should look like
The function parameters to be estimated are
Wage = ao + bo
Age if Age < 18
Wage = a1 + b1
Age if 18 £ Age < 22
Wage = a2 + b2
Age if 22 £ Age
To implement this in a form that satisfies the continuity constraint and which is
amenable to least squares let us define
The full model will be
The three slopes are
In order to ensure that the segments meet at the join points we must impose the
This linear spline can be generalized in a couple of ways. First, the segments between the knots can be polynomials. Second, there can be more RHS variables with interactions. This has the effect of fitting planes with different gradients over the space spanned by the RHS variables.
Testing Non-nested Hypotheses
Suppose we have two alternative models that we are entertaining and wish to choose
between them on statistical grounds. That is,
The models differ in the set of right hand side variables.
Then the estimates of the error variances from the two specifications are
Suppose that model (1) is the truth. Then our estimate of the error variance of model
two can be rewritten as
Upon taking expectations the middle term will drop out.
We can use a trick from our earlier examination of the unbiasedness of the error
variance estimator. Namely, we will take the trace.
We can see that in expectation the estimated error variance of the incorrect model
exceeds the error variance of the correct model. The question is whether we can detect
this difference statistically. To do this we construct the statistic
What is the probability limit of this test statistic? By Slutsky's theorem we know that
the probability limit of a function of a random variable is equal to the function of the
probability limit of the random variable.
from (3) we can see that the numerator and the denominator are equal in the limit, so
plim C12 = 0.
and we can use the test statistic
There is a serious drawback to this test. Namely, it is not symmetric. We began by stating that (1) was the correct model and constructed a test statistic on this basis. If we begin by stating that (2) is the correct model and construct the test statistic then it is quite possible that we could reach a different conclusion. This is not the only time we encounter a conflict of criteria. It happens in the use of the LM, LR and Wald tests in finite samples. We also noted that using R2 as a rule for variable inclusion is a bad idea since order matters.