VIIISome Distribution and Matrix Theorems

VIII Some Distribution and Matrix Theorems

Definition: A k-dimensional random variable, y, is said to have a k-variate normal distribution if and only if every linear combination of y has a univariate normal distribution.

Definition: The density function for the univariate normal random variable, y, with mean m and variance s² is

Definition: The density function for the multivariate normal random variable, y, with n dimensional mean vector m and nxn dimensional covariance matrix W is

Theorem: If y has a k-variate normal distribution then its mean vector, Ey = m, and its variance covariance matrix, Eyy' = W, exist. Further, m and W completely specify the distribution.

Theorem: If y has a k-variate normal distribution, y ~ N_k(m,W), and P is any lxk matrix then the l-vector, Py, has the l-variate normal distribution N_l(Pm,PWP').

Theorem: If y ~ N_k(m,D) with D a diagonal matrix then the components of y: (y₁, y₂, ..., y_k) are independent and each is a univariate normal.

Definition: A k-variate normal vector, y, is said to have rank M if Y can be expressed as Y = m + Bz with B: kxM matrix of rank M and z a vector of independent N(0,1) random variables.

NOTE: In the above definition k ł M. (Why?) Also, if y has rank M it means that the variation in y is confined to an M dimensional subspace of Â^k.

Theorem: If y ~ N_k(m,W) is of rank M, then r(W) = M and there exists a kxM matrix B such that BB' = W.

Theorem: If y ~ N_k(m,W) is of rank k, then W is nonsingular and there exists a nonsingular matrix B:kxk such that BB' = W.
Theorem: If y ~ N_k(m,W) is of rank k, then W is nonsingular and there exists a kxk nonsingular matrix B such that BB' = W and B'W^-1B = I. Also, if we denote C = B^-1 then C'C = W^-1 and CWC' = I.

Theorem: a) if x ~ N(0,1) then x² ~ c₁²b) if y₁ is c_m² and y₂ is c_n² and y₁ and y₂ are independent then y₁ + y₂ is c²_m+n.

Theorem: a) If x ~ N(0,1) and x₁, x₂, ..., x_n is a random sample then .
b) If y ~ N(m,s²) and y₁, y₂, ..., y_n is a random sample then .

Theorem: If the nx1 vector x is N(0,I_n) then x'x is c_n².

Theorem: If the nx1 vector x is N(0,I_n) and A is an nxn idempotent matrix of rank r then x'Ax ~ c_r².

Theorem: Let x be an nx1 vector distributed as N(0,s²I_n), A an nxn idempotent matrix, and B an nxn matrix such that BA = 0, then Bx ~ N_n(0,s²BB') is distributed independently of the quadratic form x'Ax ~ c_n².

Theorem: Let x ~ N_n(m,s²I_n). A and B are idempotent matrices of rank r and s respectively and AB = 0. Then x'Ax ~ c_r² and x'Bx ~ c_s² are independent of each other.

Theorem: Let x ~ N_n(0,s²I_n), r(A) = r, r(B) = s, AB = 0 and both are idempotent. Then

Theorem: If x ~ F_n,m then Y = 1/x ~ F_m,n.

Theorem: If y₁, y₂, ..., y_n are independent normal variables with means m₁, m₂, ..., m_n and variances equal to unity, then Sy_i² is distributed as c² with n degrees of freedom and non-centrality parameter l = ˝Sm_i².

Theorem: If the nx1 vector y is N(m,s²I_n) then y'Ay/s², where A is idempotent of rank k, is distributed as c² with k degrees of freedom and non-centrality parameter l = m'Am/2s².

Theorem: If the nx1 vector y is N(m,s²I_n), the nx1 vector x is N(0,1), and A has rank k, then

with noncentrality parameter l = m'Am/2s².

Theorem: If the nx1 vector y is N(m,s²I_n), A has rank k, B has rank m, AB = 0, then

with two non-centrality parameters
l₁ = m'Am/2s² and l₂ = m'Bm/2s²

NOTE: The effect of the noncentrality parameters in the c² and F distributions is to make them more skew. That is, for any given number in the domain of either the c² or F distribution, the area in the upper tail is greater for the noncentral distribution than for the central distribution.
Since the Student's t distribution has a chi square in the denominator it too could be non-central.
Recall that in almost all situations we reject null hypotheses for extreme values of the observed test statistic. Thus, central and non-central distributions play a role in hypothesis testing. When we test an hypothesis we transform our data so that when the null is true our test statistics are constructed from N(0,1) random variables. But if the null is false then our test statistics will be constructed from random variables which are really N(m,s²), not N(0,1). The consequence is that when the null is false we are more likely to observe extreme values of the test statistics.

Theorem: If B is 1xn and A is nxn and y ~ N(m,sI) then By is independent of y'Ay if BA = 0.
Proof: Suppose P:nxn is an orthogonal matrix such that P'AP = D, where D is diagonal. Let P'y = z ~ N(P'm,s²I). Let C = BP and

where D₁ has nonzero terms only on the diagonal. If BA = 0 then BA = BAP. We can write this as BAPP'AP = CD.

Since CD = 0, it must be the case that C₁₁D₁ = 0 and C₂₁D₁ = 0. We know D₁ š 0, therefore C₁₁ and C₂₁ are both null matrices. So

Now By = BPP'y = CZ

and y'Ay = y'PP'APP'y = z'Dz

We already know that z₁ and z₂ are independent by construction, from above.

Example: In this example we consider the distribution of some sampling statistics computed from a random sample drawn from a normal distribution. Suppose that x_i ~ N(m,s²), i = 1, 2, ..., n, but neither m nor s² is known to us. We do have a hypothesis about the unknown value for the mean of these random variables. Namely, we believe the mean to be m_o. From each of the x_i we construct . Each of these random variables has a mean of 0 and a variance of 1. In order to test our hypothesis we must examine the following two random variables

where

and

where

To show that the two random variables are independent we need to check the product AB.

Since AB = 0 the two random variables are independent. The first is a c² divided by its degrees of freedom. Since its construction does not depend on the null hypothesis it is a central c². The second is a normal random variable. Therefore

has a Student's t distribution with n-1 degrees of freedom.