Introduction to Econometrics
Chapter 6 Notes: Further Issues
The Effect of Scaling the Data
In chapter 2of the text ,which dealt with simple regression, the impact of rescaling the independent or dependent variables on the regression coefficients was dealt with. The story does not change when dealing with multiple regression.
The greater concern is whether rescaling the data has an effect on the test statistics that we use in testing hypotheses. To make a lengthy story short, the answer is that the test statistics are scale invariant.
In addition to the test statistics being scale invariant, the coefficient of determination (R2) is also scale invariant.
We use standardized, or beta, coefficients when we want to compare the relative importance of different exogenous variables in the model. When using the data in its raw form the magnitude of a coefficient does not tell us that its variable is more important than another in the size of its effect on the dependent variable. The intuition is pretty simple given what we already know about the effects of rescaling the data; you could make a variable look more or less important simply by moving the decimal in the original data. So, what to do about it?
Let's start with a regression model
and then subtract from each variable its own mean. This has no impact on the coefficients, it just re-centers the regression at the origin and thereby gets rid of the intercept.
Now divide everything by the standard deviation of the data on the dependent variable.
This has the effect of turning the dependent variable into a set of z-scores for the original data on the endogenous variable. We know from grade school that multiplying anything by 1 leaves the original number unchanged. Also, we can define 1 any way we want. Let's define 1 in the following two ways:
In this new representation of the model the data used for estimating the coefficients is all in z-score form. The model can be restated as