Estimation: Methods and Properties

IV. PROPERTIES OF ESTIMATORS

SMALL SAMPLE PROPERTIES

UNBIASEDNESS: An estimator is said to be unbiased if in the long run it takes on the value of the population parameter. That is, if you were to draw a sample, compute the statistic, repeat this many, many times, then the average over all of the sample statistics would equal the population parameter.

Examples:

EFFICIENCY: An estimator is said to be efficient if in the class of unbiased estimators it has minimum variance.

Example: Suppose we have some prior knowledge that the population from which we are about to sample is normal. The mean of this population is however unknown to us. Because it is normal we know thatand median_sample are unbiased

However, consider their variances

Clearly,is the more efficient since it has the smaller variance.

SUFFICIENCY: We say that an estimator is sufficient if it uses all the sample information. The median, because it considers only rank, is not sufficient. The sample mean considers each member of the sample as well as its size, so is a sufficient statistic. Or, given the sample mean, the distribution of no other statistic can contribute more information about the population mean. We use the factorization theorem to prove sufficiency. If the likelihood function of a random variable can be factored into a part which has as its arguments only the statistic and the population parameter and a part which involves only the sample data, the statistic is sufficient.

LARGE SAMPLE PROPERTIES

Letbe an estimate of where n denotes sample size. is a random variable with densitywith expectationand variance.

As the sample size varies we have a sequence of estimates

a sequence of density functions

a sequence of expectations

and a sequence of variances

Asymptotic theory considers the behavior of these sequences as n becomes large.

We sayis the limiting distribution ofif for

ASYMPTOTIC UNBIASEDNESS: An estimator is said to be asymptotically unbiased if the following is true

ASYMPTOTIC EFFICIENCY: Define the asymptotic variance as

An asymptotically efficient estimator is an unbiased estimator with smallest asymptotic variance.

CONSISTENCY: A sequence of estimators is said to be consistent if it converges in probability to the true value of the parameter

Example: Define

By the weak law of large numbers we can write

converges to zero as

so the sequenceis a consistent estimator for.

IV.B. METHODS OF ESTIMATION

1. METHOD OF MOMENTS

x₁, ... ,x_n are independent random variables. Take the functions g₁, g₂, ... , g_n and look at the new random variables

then the y's are also independent. If all the g are the same, then the y are iid.

Now suppose x₁, x₂, ... are iid. Fix k a positive integer. Thenare iid and by the weak law of large numbers

which are the k^th moments about the origin.

Define m_k = Ex^k as the k^th moment of x so

Suppose you wish to estimate some parameter,.

We know m_k = E(x^k) for k = 1, 2, ... and suppose

and g is continuous.

The sample moment is

Idea: If n is large thenshould be close to m_k for k = 1, 2, ... , N so should be close to.

Example:

The method of moments estimator foris

Example:

X is a continuous r.v. distributed uniformly on the interval [a, b]. We wish to estimate a and b. From experience we know

Suppose = 5 and S² = 2.5

LEAST SQUARES

We have an unknown mean . Our estimator will be . Take each x_i, subtract its predicted value, i.e. its mean, square the difference and add up the squared differences

This is nothing more than , which we know to be the least squares estimator of for any distribution.

MAXIMUM LIKELIHOOD ESTIMATORS

Suppose we have the random vaiable x_i with the following density function

Figure 1

The position of the density function depends on. The likelihood function is found by holding the statistic constant and letting the parameter vary.

Figure 2

While the overall picture does not look different, there are some fundamental alterations in the way it is labeled. See figure 2.

Notice that in the revised figure the domain is now mand the range is conditional on the sample statistic . We choose that value for the parameter which makes it most likely to have observed the sample data.

Example: We have a binomial and we wish to estimate P. The sample size is n = 5. Choices for the probability of success are P = .5, .6, .7. We do the experiment and find x = 3. Let us vary P and compute the probability of observing our particular sample. Given the results shown in figure 3, would you ever choose either .5 or .6 as the best guess for the true proportion P?

Figure 3

Example: You are a medieval serf and wish to marry the Queen's son. You must plunge your hand into an urn of balls numbered consecutively 1, 2, 3, ... to some unknown maximum. If you guess the number of balls in the urn then you are permitted to marry the prince.qis the number of balls in the urn. You have drawn out a ball with the number 27 on it. What is your guess for the total number of balls?

Would you ever guess a number greater than 27? Were you to do so then the probability of drawing a 27 would decline.

Example: We don't know the particular that applies to a steel rail production process. Wishing to estimate, we observe 5 flaws in one mile of rail, x = 5 so,

One approach to estimating the rate parameter would be to construct the probability distribution for different possible values. This is done in the figure below.

Figure 5

A more efficient way to estimate the rate parameter is as follows

set the derivative to zero and solve for.

IV. C. CONFIDENCE INTERVALS

It should be noted that the probability of a point estimate being correct is zero! Consequently, we construct more practical interval estimates.

Consider the first sample mean problems we dealt with

where. This can be rewritten as

This is called aconfidence interval.

Note: 1. is a fixed number

2. is a random variable

As long as we do not plug in numbers we can leave this in the form of a probability statement.

Example: Suppose. We observefor n = 81

is a 95% confidence interval.

INTERPRETATION: Of all confidence intervals calculated in a similar fashion {95%, n=81} we would expect that 95% of them would cover. does not change, only the position of the interval. Think of a big barrel containing 1000 different confidence intervals, different because they each use a different value of the random variable. The probability of us reaching in and grabbing a "correct" interval is 95%. But, as soon as we break open the capsule and read the numbers the mean is either there or it isn't.