A Multiple Regression Example

Multiple Regression

This is an example of multiple regression. It is taken from Ernst Berndt, The Practice of Econometrics: Classic and Contemporary, Addison Wesley, Reading, MA:1991. The specific problem is Exercise 3-4. The entirety of the chapter is worth reading. The object of the exercise is to replicate Marc Nerlove's classic work on scale economies ("Returns to Scale in Electricity Supply," Chapter 7 in C.F. Christ, ed., Measurement in Economics, Stanford, Calif., Stanford University Press, pp. 167-198).
To set the stage, consider the following presentation on factor price frontiers.

The model being estimated is

We will impose the restriction that the cost frontier be homogeneous of degree 1 in prices. Making this substitution and doing a bit of algebra yields:

with the happy result that . Thus we can see if there are constant returns to scale by inverting the coefficient on output. A definitive conclusion would be contingent on a test of hypothesis.

The code which follows, in red, is a LimDep program that reads a data file, estimates a regression model and produces some descriptive statistics. The data file is called Nerlove and can be accessed by clicking here or going to the homework and data sets section of the course web. If you were so inclined, you could download the data and try this on your own. The LimDep program is illustrative of how you would use the data in most other software packages.

? Chapter 3 Exercise 4
READ ; NREC = 145 ; NVAR = 6; NAMES = 1 ; FILE=A:NERLOV $
? Part (a)
CREATE ; LNCP3 = LOG(COSTS/PF)
; LNP13 = LOG(PL/PF)
; LNP23 = LOG(PK/PF)
; LNKWH = LOG(KWH) $
? Part (b)
CRMODEL ; LHS = LNCP3 ; RHS = ONE,LNKWH,LNP13,LNP23
; RES = COSTRES $
? Part (e)
PLOT ; LHS = LNKWH ; RHS = COSTRES $
DSTATS ; RHS = LNKWH,COSTRES ; OUTPUT = 3 $

The output from this regression model follows:

Ordinary least squares regression. Dep. Variable = LNCP3
Observations = 145                         Weights = ONE
Mean of LHS = -0.1484195E+01               Std.Dev of LHS = 0.1482087E+01
StdDev of residuals= 0.3917620E+00         Sum of squares = 0.2164032E+02
R-squared = 0.9315846E+00                  Adjusted R-squared= 0.9301290E+00
F[ 3, 141] = 0.6399802E+03
Log-likelihood = -0.6783837E+02            Restr.(á=0) Log-l = -0.2622948E+03
Amemiya Pr. Criter.= 0.9908741E+00         Akaike Info.Crit. = 0.1577113E+00
ANOVA 
 Source           Variation          Degrees of Freedom     Mean Square
 Regression     0.2946676E+03           3.                  0.9822253E+02
Residual        0.2164032E+02         141.                  0.1534775E+00
Total           0.3163079E+03         144.                  0.2196583E+01
Durbin-Watson stat.= 1.0153695                  Autocorrelation = 0.4923152
Variable    Coefficient   Std. Error    t-ratio    Prob|t|>x  Mean of X    Std.Dev.of X
------------------------------------------------------------------------------- 
Constant     -4.6908      0.8849         -5.301     0.00000
LNKWH         0.72069     0.1744E-01     41.334     0.00000    6.5567        1.9128
LNP13         0.59291     0.2046          2.898     0.00435   -2.5372        0.33858
LNP23        -0.73811E-02 0.1907         -0.039     0.96919    1.9479        0.35980

To test the hypothesis that there are constant returns to scale construct the usual test statistic

t = -16.01. Apparently there are increasing returns to scale.

There seems to be a small problem with these results. The implied estimates for exponents in the production function are a = (.72069*.59291) = .43 and c = (.72069*-.00738) = -.005. This latter estimate implies a negative marginal product for capital, clearly an undesirable result.

MODEL COMMAND:
PLOT;LHS=LNKWH;RHS=COSTRES$

From the scattergram it appears that there is a u-shaped relationship between the residuals and output. Is this due to heteroscedasticity?

Also, go back and take a look at the Durbin-Watson statistic. Without getting into details, the DW statistic should be close to 2 in a statistical sense. It is also conventionally applied in time series data. Nevertheless, the observed value of 1.01 is quite small. The data was arranged by firm size so one might conclude that the DW serves as a measure of the correlation between firms of similar size.

A third possibility is that the scattergram and DW statistic suggest that the model has been misspecified.

It was Nerlove's supposition that the distinctive scattergram and low DW caused by differing returns to scale for firms in different class sizes.

MODEL COMMAND:
DSTATS;RHS=LNKWH,COSTRES;OUTPUT=3$ 
Descriptive Statistics 
Variable Mean        Std. Dev. Skew.  Kurt.  Minimum Maximum Cases
LNKWH    6.5567      1.9128    -0.958 3.627  0.6931   9.724  145
COSTRES -0.30348E-09 0.38766    1.247 7.564 -1.012    1.819  145

Covariance Matrix 
           1-LNKWH     2-COSTRES 
1-LNKWH    3.6588 
2-COSTRES  0.18204E-08 0.15028 

Correlation Matrix 
          1-LNKWH     2-COSTRES 
1-LNKWH   1.0000 
2-COSTRES 0.24550E-08 1.0000

Although there is clearly a connection between the cost residuals and output, it is not linear. This explains why the correlation between the two variables is so small.

To explore the possibility of returns to scale varying by firm size we will estimate a model for each of five size classes, then a model in which the price coefficients are constrained to be equal but the output coefficient and intercept can differ, and a single model that is quadratic in the output variable. The final model is another way to permit returns to scale to vary with firm size.

? Chapter 3 Exercise 5
READ ; NREC = 145 ; NVAR = 6
     ; NAMES = 1 ; FILE=A:NERLOV $
? Part (a)
CREATE ; LNCP3 = LOG(COSTS/PF)
       ; LNP13 = LOG(PL/PF)
       ; LNP23 = LOG(PK/PF)
       ; LNKWH = LOG(KWH) $
SAMPLE ; 1-29 $
CRMODEL ; LHS = LNCP3 ; RHS = ONE,LNKWH,LNP13,LNP23  $
SAMPLE ; 30-58 $
CRMODEL ; LHS = LNCP3 ; RHS = ONE,LNKWH,LNP13,LNP23  $
SAMPLE ; 59-87 $
CRMODEL ; LHS = LNCP3 ; RHS = ONE,LNKWH,LNP13,LNP23  $
SAMPLE ; 88-116 $
CRMODEL ; LHS = LNCP3 ; RHS = ONE,LNKWH,LNP13,LNP23  $
SAMPLE ; 117-145 $
CRMODEL ; LHS = LNCP3 ; RHS = ONE,LNKWH,LNP13,LNP23  $
? Part (c)
SAMPLE ; 1-145 $
CREATE ; D1 = ORDER <= 129 $
create ; if(order >= 201 & order <= 229) D2=1 $ 
CREATE ; if(order >= 301 & order <= 329) D3=1 $
CREATE ; if(order >= 401 & order <= 429) D4=1 $
CREATE ; D5 = ORDER >= 501 $
CREATE ; LNKWH1 = LNKWH*D1
       ; LNKWH2 = LNKWH*D2
       ; LNKWH3 = LNKWH*D3
       ; LNKWH4 = LNKWH*D4
       ; LNKWH5 = LNKWH*D5 $
CRMODEL ; LHS=LNCP3 ; RHS = D1,D2,D3,D4,D5,LNKWH1,LNKWH2,LNKWH3,
     LNKWH4,LNKWH5,LNP13,LNP23 $
? Part (f)
CREATE ; KWH2 = LNKWH*LNKWH $
CRMODEL ; LHS=LNCP3 ; RHS=ONE,LNKWH,KWH2,LNP13,LNP23 $
LIST ; ORDER,Lnkwh $_
 *-* LIMDEP *-* File created 09/02/93 / 05:13:37

Notice that in order to estimate a separate set of coefficients for each size class we just use a set of dummy variables defined over the observation index. The output from this program is

Sample set to -> 1-29

MODEL COMMAND:
CRMODEL;LHS=LNCP3;RHS=ONE,LNKWH,LNP13,LNP23$ 

Ordinary least squares regression. Dep. Variable = LNCP3
Observations = 29                       Weights = ONE
Mean of LHS = -0.3713174E+01            Std.Dev of LHS = 0.8075507E+00
StdDev of residuals= 0.5961617E+00      Sum of squares = 0.8885220E+01
R-squared = 0.5134018E+00               Adjusted R-squared= 0.4550100E+00
F[ 3, 25] = 0.8792363E+01
Log-likelihood = -0.2399707E+02         Restr.(á=0) Log-l = -0.3444166E+02
Amemiya Pr. Criter.= 0.1930833E+01      Akaike Info.Crit. = 0.4044307E+00
ANOVA
Source     Variation    Degrees of Freedom     Mean Square
Regression 0.9374649E+01         3.            0.3124883E+01
Residual   0.8885220E+01        25.            0.3554088E+00
Total      0.1825987E+02        28.            0.6521382E+00
Durbin-Watson stat.= 1.7031035           Autocorrelation = 0.1484483
Variable Coefficient   Std. Error    t-ratio Prob|t|>x Mean of X Std.Dev.of X
-------------------------------------------------------------------------------
Constant    -3.3433      3.146       -1.063   0.29801
LNKWH        0.40029     0.8445E-01   4.740   0.00007   3.5015    1.3526
LNP13        0.61517     0.7293       0.843   0.40696  -2.6295    0.35401
LNP23       -0.81356E-01 0.7064      -0.115   0.90923   1.8908    0.36725
Sample set to -> 30-58

MODEL COMMAND:
CRMODEL;LHS=LNCP3;RHS=ONE,LNKWH,LNP13,LNP23$ 

Ordinary least squares regression. Dep. Variable = LNCP3
Observations = 29                     Weights = ONE
Mean of LHS = -0.2149437E+01          Std.Dev of LHS = 0.3780031E+00
StdDev of residuals= 0.2424091E+00    Sum of squares = 0.1469054E+01
R-squared = 0.6328116E+00             Adjusted R-squared= 0.5887490E+00
F[ 3, 25] = 0.1436165E+02
Log-likelihood = 0.2099601E+01        Restr.(á=0) Log-l = -0.1242766E+02
Amemiya Pr. Criter.= 0.1310620E+00    Akaike Info.Crit. = 0.6686728E-01
ANOVA Source Variation     Degrees of Freedom Mean Square
Regression   0.2531764E+01    3.              0.8439214E+00
Residual     0.1469054E+01   25.              0.5876216E-01
Total        0.4000818E+01   28.              0.1428864E+00
Durbin-Watson stat.= 1.8593375 Autocorrelation = 0.0703312
Variable Coefficient Std. Error t-ratio Prob|t|òx Mean of X Std.Dev.of X
-------------------------------------------------------------------------------
Constant -6.4890      1.413   -4.593    0.00011
LNKWH     0.65815     0.1163   5.659    0.00001    5.8950    0.39491
LNP13     0.93800E-01 0.2743   0.342    0.73523   -2.6112    0.32533
LNP23     0.37794     0.2765   1.367    0.18388    1.8645    0.32247
Sample set to -> 59-87

MODEL COMMAND:
CRMODEL;LHS=LNCP3;RHS=ONE,LNKWH,LNP13,LNP23$ 

Ordinary least squares regression. Dep. Variable = LNCP3
Observations = 29                            Weights = ONE
Mean of LHS = -0.1406563E+01                 Std.Dev of LHS = 0.2863842E+00
StdDev of residuals= 0.1980064E+00           Sum of squares = 0.9801633E+00
R-squared = 0.5731824E+00                    Adjusted R-squared= 0.5219642E+00
F[ 3, 25] = 0.1119101E+02
Log-likelihood = 0.7967096E+01               Restr.(á=0) Log-l = -0.4378182E+01
Amemiya Pr. Criter.= -0.2735928E+00          Akaike Info.Crit. = 0.4461433E-01
ANOVA Source Variation    Degrees of Freedom Mean Square
Regression   0.1316282E+01    3.             0.4387606E+00
Residual     0.9801633E+00   25.             0.3920653E-01
Total        0.2296445E+01   28.             0.8201589E-01
Durbin-Watson stat.= 1.9969248                Autocorrelation = 0.0015376
Variable Coefficient Std. Error t-ratio Prob|t|<x Mean of X Std.Dev.of X
-------------------------------------------------------------------------------
Constant -7.3329      1.689     -4.342   0.00021
LNKWH     0.93828     0.1980     4.740   0.00007    6.9605   0.19700
LNP13     0.40226     0.1994     2.017   0.05456   -2.6442   0.25783
LNP23     0.25001     0.1870     1.337   0.19334    1.8367   0.28288

Sample set to -> 88-116

MODEL COMMAND:
CRMODEL;LHS=LNCP3;RHS=ONE,LNKWH,LNP13,LNP23$ 

Ordinary least squares regression. Dep. Variable = LNCP3
Observations = 29                              Weights = ONE
Mean of LHS = -0.4833377E+00                   Std.Dev of LHS = 0.3192165E+00
StdDev of residuals= 0.1205973E+00             Sum of squares = 0.3635928E+00
R-squared = 0.8725656E+00                      Adjusted R-squared= 0.8572735E+00
F[ 3, 25] = 0.5705979E+02
Log-likelihood = 0.2234652E+02                 Restr.(á=0) Log-l = -0.7525705E+01
Amemiya Pr. Criter.= -0.1265277E+01 Akaike Info.Crit. = 0.1654974E-01
ANOVA Source Variation     Degrees of Freedom Mean Square
Regression   0.2489583E+01      3.             0.8298611E+00
Residual     0.3635928E+00     25.             0.1454371E-01
Total        0.2853176E+01     28.             0.1018992E+00
Durbin-Watson stat.= 2.0286206                   Autocorrelation = -0.0143103
Variable Coefficient Std. Error t-ratio Prob|t|>x Mean of X Std.Dev.of X
-------------------------------------------------------------------------------
Constant   -6.5460      1.165   -5.620   0.00001
LNKWH       0.91204     0.1075   8.485   0.00000   7.6855    0.21247
LNP13       0.50696     0.1875   2.704   0.01215  -2.2746    0.38257
LNP23       0.93352E-01 0.1641   0.569   0.57453   2.2106    0.43737

Sample set to -> 117-145

MODEL COMMAND:
CRMODEL;LHS=LNCP3;RHS=ONE,LNKWH,LNP13,LNP23$ 

Ordinary least squares regression. Dep. Variable = LNCP3
Observations = 29                         Weights = ONE
Mean of LHS = 0.3315352E+00               Std.Dev of LHS = 0.5052502E+00
StdDev of residuals= 0.1502536E+00        Sum of squares = 0.5644039E+00
R-squared = 0.9210378E+00                 Adjusted R-squared= 0.9115624E+00
F[ 3, 25] = 0.9720246E+02
Log-likelihood = 0.1597036E+02            Restr.(á=0) Log-l = -0.2084205E+02
Amemiya Pr. Criter.= -0.8255418E+00       Akaike Info.Crit. = 0.2569011E-01
ANOVA Source Variation     Degrees of Freedom Mean Square
Regression   0.6583374E+01  3.                0.2194458E+01
Residual     0.5644039E+00 25.                0.2257616E-01
Total        0.7147778E+01 28.                0.2552778E+00
Durbin-Watson stat.= 1.9237334               Autocorrelation = 0.0381333
Variable Coefficient Std. Error t-ratio Prob|t|òx Mean of X Std.Dev.of X
-------------------------------------------------------------------------------
Constant -6.7143      1.046      -6.417 0.00000
LNKWH     1.0444      0.6498E-01 16.072 0.00000   8.7408     0.44372
LNP13     0.60259     0.1973      3.054 0.00530  -2.5264     0.21829
LNP23    -0.28944     0.1749     -1.655 0.11039   1.9370     0.24615

Sample set to -> 1-145

MODEL COMMAND:
CRMODEL;LHS=LNCP3;RHS=D1,D2,D3,D4,D5,LNKWH1,LNKWH2,LNKWH3,LN
KWH4,LNKWH5,LNP13,LNP23$ 

Ordinary least squares regression.            Dep. Variable = LNCP3
Observations = 145                            Weights = ONE
Mean of LHS = -0.1484195E+01                  Std.Dev of LHS = 0.1482087E+01
StdDev of residuals= 0.3075163E+00            Sum of squares = 0.1257731E+02
R-squared = 0.9602371E+00                     Adjusted R-squared= 0.9569485E+00
F[ 11, 133] = 0.2919843E+03
Log-likelihood = -0.2849526E+02               Restr.(á=0) Log-l = -0.2622948E+03
Amemiya Pr. Criter.= 0.5585553E+00            Akaike Info.Crit. = 0.1023924E+00
ANOVA Source    Variation      Degrees of Freedom        Mean Square
Regression    0.3037306E+03           11.                0.2761187E+02
Residual      0.1257731E+02          133.                0.9456627E-01
Total         0.3163079E+03          144.                0.2196583E+01
Durbin-Watson stat.= 1.7426507                Autocorrelation = 0.1286746
Variable Coefficient Std. Error t-ratio Prob|t|òx Mean of X Std.Dev.of X
-------------------------------------------------------------------------------
D1         -4.1798      0.7022     -5.952    0.00000   0.20000    0.40139
D2         -5.0524      1.125      -4.491    0.00002   0.20000    0.40139
D3         -6.6301      2.237      -2.963    0.00361   0.20000    0.40139
D4         -6.7286      2.225      -3.024    0.00299   0.20000    0.40139
D5         -8.0834      1.380      -5.857    0.00000   0.20000    0.40139
LNKWH1      0.39688     0.4307E-01  9.214    0.00000   0.70030    1.5268
LNKWH2      0.64816     0.1472      4.402    0.00002   1.1790     2.3726
LNKWH3      0.88478     0.2973      2.976    0.00347   1.3921     2.7952
LNKWH4      0.90874     0.2737      3.321    0.00116   1.5371     3.0863
LNKWH5      1.0627      0.1313      8.091    0.00000   1.7482     3.5139
LNP13       0.42561     0.1632      2.608    0.01014  -2.5372     0.33858
LNP23       0.10373     0.1522      0.681    0.49675   1.9479     0.35980

MODEL COMMAND:
CRMODEL;LHS=LNCP3;RHS=ONE,LNKWH,KWH2,LNP13,LNP23$ 

Ordinary least squares regression.             Dep. Variable = LNCP3
Observations = 145                             Weights = ONE
Mean of LHS = -0.1484195E+01                   Std.Dev of LHS = 0.1482087E+01
StdDev of residuals= 0.3076119E+00             Sum of squares = 0.1324751E+02
R-squared = 0.9581183E+00                      Adjusted R-squared= 0.9569217E+00
F[ 4, 140] = 0.8006872E+03
Log-likelihood = -0.3225910E+02                Restr.(á=0) Log-l = -0.2622948E+03
Amemiya Pr. Criter.= 0.5139186E+00             Akaike Info.Crit. = 0.9788802E-01
ANOVA Source   Variation     Degrees of Freedom      Mean Square
Regression     0.3030604E+03         4.              0.7576510E+02
Residual       0.1324751E+02       140.              0.9462508E-01
Total          0.3163079E+03       144.              0.2196583E+01
Durbin-Watson stat.= 1.6652595                 Autocorrelation = 0.1673702
Variable Coefficient Std. Error   t-ratio Prob|t|>x Mean of X Std.Dev.of X
-------------------------------------------------------------------------------
Constant -3.7646       0.7017     -5.365   0.00000
LNKWH     0.15255      0.6186E-01  2.466   0.01487    6.5567    1.9128
KWH2      0.50514E-01  0.5364E-02  9.418   0.00000   46.623    22.104
LNP13     0.48059      0.1611      2.984   0.00336   -2.5372    0.33858
LNP23     0.74166E-01  0.1500      0.494   0.62181    1.9479    0.35980

We are now prepared to test some hypotheses. Remember that the generic F-statistic is

The trick is identifying or producing the requisite sums of squares. Sometimes it is easier to use

We'll first compute a few sums of squares. The residual sum of squares when it is assumed that the firms are from a single homogeneous population is RSS₁=21.6403. When the equality restrictions on the intercepts and output coefficients are relaxed the the residual sum of squares is RSS₂=12.5773. If intercepts, output coefficients and input price coefficients are allowed to be different across all five size classes then the residual sum of squares is RSS₃=(.5644+.3636+.7802+1.4690+8.8852)=12.0624.

To test the hypothesis that the respective coefficients are equal across all size classes against the alternative that each class has a different set of coefficients one computes F=((21.6403-12.0624)/16)/(12.0624/(145-20))=6.2. The critical F at the 5% level is about 1.7, so we reject the null of equal coefficients.

Next, test the null hypothesis that the respective coefficients are equal across all size classes against the alternative that the price coefficients are equal but the intercepts and output coefficients are not. The observed statistic is F=((21.6403-12.5773)/8)/(12.5773/(145-12))=11.98, so we reject the null.

Next, test the null that the respective price coefficients are equal against the alternate that the coefficients are unique across size classes. Under both hypotheses the intercepts and output coefficients are allowed to differ. The observed statistic is F=(( 12.5773-12.0624)/8)/(12.0624/(145-20))=.66. We cannot reject the null. The preferred model is one in which the price coefficients are constrained to be equal, but the output coefficients and intercepts are not equal across size class.

In order to test the last hypothesis that the intercepts and price coefficients are equal across size class but returns to scale differs by firm size we will use the second construction of the F-test shown above to test the joint restriction that the coefficient on the output variable is one and the coefficient on the output squared term is zero.

This is an extremely large F so we reject the null that returns to scale are constant. Indeed, just looking at the t-statistics in the last set of results should have been enough to know that the coefficient Since the F is constructed in MathView you can change the values on the output and output-squared terms in b ( the second and third elements ) to see just how different things would have to have been in order to accept the null.