## Temple University

## Introductory Econometrics

## Chapter 7 Notes

Data comes to us in several different forms. Can you explain the differences?

Cardinal

Ordinal

Categorical

wage | educ | exper | tenure | nonwhite | female | married |

3.1 | 11 | 2 | 0 | 0 | 1 | 0 |

3.24 | 12 | 22 | 2 | 0 | 1 | 1 |

3 | 11 | 2 | 0 | 0 | 0 | 0 |

6 | 8 | 44 | 28 | 0 | 0 | 1 |

5.3 | 12 | 7 | 2 | 0 | 0 | 1 |

8.75 | 16 | 9 | 8 | 0 | 0 | 1 |

11.25 | 18 | 15 | 7 | 0 | 0 | 0 |

5 | 12 | 5 | 3 | 0 | 1 | 0 |

3.6 | 12 | 26 | 4 | 0 | 1 | 0 |

18.18 | 17 | 22 | 21 | 0 | 0 | 1 |

6.25 | 16 | 8 | 2 | 0 | 1 | 0 |

8.13 | 13 | 3 | 0 | 0 | 1 | 0 |

Note the coding of ** nonwhite**,

On the other hand, a survey might ask you for the region of the country in which you reside. This cannot be answered directly in a binary fashion. The responses are inherently categorical, but not binary. Instead, we can turn the multiple categories into a set of binary responses.

wage | educ | exper | tenure | nonwhite | female | married | numdep | smsa | northcen | south | west |

3.1 | 11 | 2 | 0 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 1 |

3.24 | 12 | 22 | 2 | 0 | 1 | 1 | 3 | 1 | 0 | 0 | 1 |

3 | 11 | 2 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 1 |

6 | 8 | 44 | 28 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |

5.3 | 12 | 7 | 2 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |

8.75 | 16 | 9 | 8 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |

11.25 | 18 | 15 | 7 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |

5 | 12 | 5 | 3 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |

3.6 | 12 | 26 | 4 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 1 |

18.18 | 17 | 22 | 21 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |

6.25 | 16 | 8 | 2 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |

A few columns have been added to the data set. The columns ** northcen**,

In this country we believe that there are positive returns to education. Matt Riculate is a student interested in investigating this proposition. He plans to estimate the coefficients of the following model

His results are

Good scholars that we are, we point out to him the following facts about the American workplace:

Overall average wage = $5.89, male average wage = $7.10 , female average wage = $4.58

Matt's response is 'whatever.' What do you explain to him about the whatever? After a long explanation, that includes making * male* the benchmark, you provide the following result

*wage* = 0.6228 -2.2733*female* +0.5064*educ* + *resid*

What do you tell him about the (dis)advantage of being a female? If you were to draw a graph to illustrate the regression result, what would it look like?

Suppose that the specification was changed so that the wage variable is in natural-log form. The new result is

*ln(wage)* = 0.8262 - 0.3608* female* + 0.0772 *educ* + *resid*

One could also ask if it is the simple fact that one is female that matters, or whether being married carries a furhter penalty. With this in mind there are now four states of the world that need consideration. They can be summarized in the following table:

married | not married | |

male | married male | single male |

female | married female | single female |

Keeping in mind yuor discussion with Matt about benchmarks, how many binary dummy variables do you need in order to account for all four categories?

To see if it is even worth going down this path we can set up the average wage of each group:

married | single | |

male | 7.97 | 5.16 |

female | 4.56 | 4.61 |

There is a difference between men and women regardless of marital status. The difference between married and single women is $0.05/hour, $0.40/day, or $100/year. This doesn't seem like much, but as WalMart is learning there is the principal of the thing, and it can also be indicative of wider problems. The regression model with all marital-gender groups accounted for is

*wage* = -1.024 -0.5567 *marrfem* -.3689 *singfem* + 2.6411*marrmale*+0.4935*educ* + *resid*

The only statistically significant coefficients are ** educ** and

This kind of data provides a rank order. There are many banners in the halls of the Fox School of Business and Management proclaiming its rank in one survey or another. If the Fox School is ranked 23rd on some survey do we know how much better it is than the one ranked 24th, or how much worse it is than the one ranked 22nd? We can, perhaps, see if rankings matter in a measurable fashion.

Entire Sample | Separate Intercepts | Female | Male | |

Constant | 0.2719 |
0.5589 (7.0) |
0.0407 (.32) |
0.5375 (5.42) |

female | -0.4532 (15.51) |
|||

belavg | -0.1800 (-3.9) |
-0.1542 (-3.6) |
-0.1257 (-1.8) |
-0.1693 (-3.2) |

abvavg | -0.0129 (-0.3) |
-0.0066 (-0.2) |
0.0433 (.86) |
-0.0390 (-1.0) |

exper | 0.0483 (10.13) |
0.0408 (9.3) |
0.0298 (4.1) |
0.0504 (8.9) |

expersq | -0.0006 (-6.5) |
-0.0006 (-6.4) |
-0.0004 (-2.7) |
-0.0008 (-6.5) |

educ | 0.0687 (0.0687) |
0.0663 (12.5) |
0.0786 (8.7) |
0.0609 (9.4) |

RSS | 339.61 | 284.88 | 94.17 | 186.84 |

1259 | 1259 | 435 | 823 |

On the basis of the table is it appearance, or gender, or both that matters?

We discussed the interaction between dummies in the discussion of wages, marital status, and gender. Such interactions permitted the creation of different intercepts for different groups, while keeping the slope constant across the groups.

Let's return to women and wages and see if the return to education is the same for men and women. We'll allow the intercept to differ for the two groups and we'll allow the slope coefficient on education to differ between the two groups, but all other slopes will remain equal. This is accomplished by creating an interaction between * educ* and

ln(*wage*) = 0.6114 - 0.7161*female* + 0.0205 *femeduc* + 0.0611 *educ* + 0.0403 *exper* - 0.0006245 *expersq* + *resid*

Can you use the information in the table of results about beauty to see if intercepts, or slopes or both are different across men and women?

This is an enormous topic. A discussion of the linear probability model only scratches the surface.

Suppose that y = 1 with probability p and y = 0 with probability 1-p. Furthermore, we think that the variable x can be used to explain y and we have posited the following simple regression model .

What can be said about the expected value of y?

y | probability | prob*y |

0 | 1-p | 0 |

1 | p | p |

From the table E(y) = p. If the mean of the error is 0 then the implication is that E(y) = p = and 1-p = . so we can write

What is the implication of this result?

Another small problem is that the linear probability model is almost surely wrong. First, the cumulative density almost certainly is sigmoidal in shape. Related to that is the observation that for small or large values of x the predicted probability can be less than zero or greater than one.