**A Brief Introduction to Modern Time Series**

** Definition** A
time series is a random function x

** Definition** An observed
time series {x

To put things more rigorously, the time series (or random function) is a real function x(w,t) of the two variables w and t, where wÎW and tÎT. If we fix the value of w, we have a real function x(t|w) of the time t, which is a realization of the time series. If we fix the value of t, then we have a random variable x(w|t). For a given point in time there is a probability distribution over x. Thus a random function x(w,t) can be regarded as either a family of random variables or as a family of realizations.

** Definition** We define the
distribution function of the random variable w given t

The points which distinguish time series analysis from ordinary statistical analyses
are the following

(1) The dependency among observations at different chronological points in time plays an
essential role. In other words, the order of observations is important. In ordinary
statistical analysis it is assumed that the observations are mutually independent.

(2) The domain of t is infinite.

(3) We have to make an inference from one realization. The realization of the random
variable can be observed only once at each point in time. In multivariate analysis we have
many observations on a finite number of variables. This critical difference necessitates
the assumption of stationarity.

** Definition** The random
function x

for any integers t_{1}, t_{2}, ..., t_{n} and k. Graphically,
one could picture the realization of a strictly stationary series as having not only the
same level in two different intervals, but also the same distribution function, right down
to the parameters which define it.

The assumption of stationarity makes our lives simpler and less costly. Without
stationarity we would have to sample the process frequently at each time point in order to
build up a characterization of the distribution functions in the earlier definition.
Stationarity means that we can confine our attention to a few of the simplest numerical
functions, i.e., the moments of the distributions. The central moments are given by

* Definition *(i) The

i.e., the first order moment.

(ii) The

i.e., the second moment about the mean. If t=s then you have the variance of x_{t}.
We will useto denote the autocovariance of a
stationary series, where k denotes the difference between t and s.

(iii) The ** autocorrelation function (ACF)** of {x

We will useto denote the autocorrelation of a
stationary series, where k denotes the difference between t and s.

(iv) The ** partial autocorrelation (PACF)**, f

One simple way to compute the partial autocorrelation between z

then compute the correlation between the two residual vectors. Or, after measuring the
variables as deviations from their means, the partial autocorrelation can be found as the
LS regression coefficient on z_{t} in the model

10 |

where the dot over the variable indicates that it is measured as a
deviation from its mean.

(v) The ** Yule-Walker equations** provide an important relationship
between the partial autocorrelations and the autocorrelations. Multiply both sides of
equation 10 by z

or, in terms of the autocorrelations

This seemingly simple representation is really a powerful result. Namely, for j=1,2,
..., k we can write the full system of equations, known as the Yule-Walker equations,

From linear algebra you know that the matrix of r's is of
full rank. Therefore it is possible to apply Cramer's rule successively for k=1,2,... to
solve the system for the partial autocorrelations. The first three are

We have three important results on strictly stationary series.

**First**, if {x_{t}} is strictly stationary and E{x_{t}}^{2}
< ¥ then

The implication is that we can use any finite realization of the sequence to estimate
the mean.

**Second**, if {x_{t}} is strictly stationary and E{x_{t}}^{2}
< ¥ then

The implication is that the autocovariance depends only on the difference between t and
s, not their chronological point in time. We could use any pair of intervals in the
computation of the autocovariance as long as the time between them was constant. And we
can use any finite realization of the data to estimate the autocovariances.

**Thirdly,** the autocorrelation function in the case of strict stationarity
is given by

The implication is that the autocorrelation depends only on the difference between t
and s as well, and again they can be estimated by any finite realization of the data.

If our goal is to estimate parameters which are descriptive of the possible realizations
of the time series, then perhaps strict stationarity is too restrictive. For example, if
the mean and covariances of x_{t} are constant and independent of the
chronological point in time, then perhaps it is not important to us that the distribution
function be the same for different time intervals.

** Definition**
A random function is stationary in the wide sense (or weakly stationary, or
stationary in Khinchin's sense, or covariance stationary) if m

Strict stationarity does not in itself imply weak stationarity. Weak stationarity does not imply strict stationarity. Strict stationarity with E{x

Ergodic theorems are concerned with the question of the necessary and sufficient conditions for making inference from a single realization of a time series. Basically it boils down to assuming weak stationarity.

That is, for any given e > 0 and h > 0 there exists some number T

for all T > T

This necessary and sufficient condition is that the autocovariances die out, in which
case the sample mean is a consistent estimator for the population mean.

*Corollary* If {x_{t}} is weakly
stationary with E{x_{t+k}x_{t}}^{2} < ¥
for any t, and E{x_{t+k}x_{t}x_{t+s+k}x_{t+s}} is
independent of t for any integer s, then

if and only if where

A consequence of the corollary is the assumption that x_{t}x_{t+k} is
weakly stationary. The Ergodic Theorem is no more than a law of large numbers when the
observations are correlated.

One might ask at this point about the practical implications of stationarity. The most common application of use of time series techniques is in modelling macroeconomic data, both theoretic and atheoretic. As an example of the former, one might have a multiplier- accelerator model. For the model to be stationary, the parameters must have certain values. A test of the model is then to collect the relevant data and estimate the parameters. If the estimates are not consistent with stationarity, then one must rethink either the theoretical model or the statisticla model, or both.

We now have enough machinery to begin to talk about the modeling of univariate time
series data. There are four steps in the process.

1. building models from theoretical and/or experiential knowledge

2. identifying models based on the data (observed series)

3. fitting the models (estimating the parameters of the model(s))

4. checking the model

If in the fourth step we are not satisfied we return to step one. The process is iterative
until further checking and respecification yields no further improvement in results.
Diagrammatically

** Definition **Some simple
operations include the following:

The backshift operator Bx

Ñx

The integrate operator S = Ñ

In this section we offer a brief review of the most common sort of time series models. On the basis of one's knowledge of the data generating process one picks a class of models for identification and estimation from the possibilities which follow.

**Autoregressive Models**

** Definition** Suppose that
Ex

with the characteristics is called the autoregressive model of order p, AR(p).

** Definition** If a time
dependent variable (stochastic process) {x

Using the backshift operator we can write our AR model as

lie outside the unit circle.

* Example 1
*Consider the AR(1)

The only root of 1 - f

If then the observed series will appear very
frenetic. E.g., consider

in which the white noise term has a normal distribution with a zero mean and a variance
of one. The observations switch sign with almost every observation.

If, on the other hand, then the observed series
will be much smoother.

E.g.

In this series an observation tends to be above 0 if its predecessor was above zero.

The variance of e_{t} is s_{e}^{2}
for all t. The variance of x_{t}, when it has zero mean, is given by

Since the series is stationary we can write .
Hence,

The autocovariance function of an AR(1) series is, supposing without loss of generality m=0

To see what this looks like in terms of the AR parameters we will make use of the fact
that we can write x_{t} as follows

Multiplying by x_{t-k} and taking expectations

Note that the autocovariances die out as k grows. The autocorrelation function is the
autocovariance divided by the variance of the white noise term. Or,.

Using the earlier Yule-Walker formulae for the partial autocorrelations we have

For an AR(1) the autocorrelations die out exponentially and the partial autocorrelations
exhibit a spike at one lag and are zero thereafter.

* Example 2
*Consider the AR(2) The associated
polynomial in the lag operator is

The roots could be found using the quadratic formula. The roots are

Whenthe roots are real and as a consequence the
series will decline exponentially in response to a shock. Whenthe roots are complex and the series will appear as a
damped sign wave.

The stationarity theorem imposes the following conditions on the AR coefficients

The autocovariance for an AR(2) process, with zero mean, is

Dividing through by the variance of x_{t} gives the autocorrelation function

Sincewe
can write

Similarly for the second and third autocorrelations

The other autocorrelations are solved for recursively. Their pattern is governed by the
roots of the second order linear difference equation

If the roots are real then the autocorrelations will decline exponentially. When the
roots are complex the autocorrelations will appear as a damped sine wave.

Using the Yule-Walker equations, the partial autocorrelations are

Again, the autocorrelations die out slowly. The partial autocorrelation on the other
hand is quite distinctive. It has spikes at one and two lags and is zero thereafter.

** Theorem** If x

* Example
*Suppose z

Square both sides and take expectations

the right hand side vanishes as k ® ¥ since ½f½ < 1. Therefore the sum converges to z

**The Autocorrelation Function
and Partial Autocorrelation Generally**

Suppose that a stationary series z

and dividing through by the variance of z

If you have either MathCAD or MathCAD Explorer
then you can experiment interactivley with some fo the AR(p)
ideas presented here.

**Moving Average Models**

Consider a dynamic model in which the series of interest depends only on some part
of the history of the white noise term. Diagrammatically this might be represented as

** Definition** Suppose a

** Theorem:** A moving average
process is always stationary.

You can see that the mean of the random variable does not depend on time in any way. You can also see that the autocovariance depends only on the offset s, not on where in the series we start. We can prove the same result more generally by starting with , which has the alternate moving average representation . Consider first the variance of z

By recursive substitution you can show that this is equal to

The sum we know to be a convergent series so the variance is finite and is independent of time. The covariances are, for example,

You can also see that the auto covariances depend only on the relative points in time, not the chronological point in time. Our conclusion from all this is that an MA(¥) process is stationary.

For the general MA(q) process the autocorrelation function is given by

The partial autocorrelation function will die out smoothly. You can see this by inverting the process to get an AR( ) process.

If you have either MathCAD or MathCAD Explorer
then you can experiment interactively with some of the MA(q)
ideas presented here.

**Mixed Autoregressive - Moving Average Models**

** Definition** Suppose a

The roots of the autoregressive operator must all lie outside the unit circle. The number of unknowns is p+q+2. The p and q are obvious. The 2 includes the level of the process, m, and the variance of the white noise term, s

Suppose that we combine our AR and MA representations so that the model is

(1) |

and the coefficients are normalized so that b_{o} = 1. Then this representation is
called an ARMA(p,q) if the roots of (1) all lie outside the unit circle.

Suppose that the y_{t} are measured as deviations from the mean so we can drop a_{o},
then the autocovariance function is derived from

if j>q then the MA terms drop out in expectation to give

That is, the autocovariance function looks like a typical AR for lags after q; they die
out smoothly after q, but we cannot say how 1,2,…,q will look.

We can also examine the PACF for this class of model. The model can be written as

We can write this as a MA(inf) process

which suggests that the PACF's die out slowly. With some arithmetic we could show that
this happens only after the first p spikes contributed by the AR part.

** Empirical Law** In actuality, a stationary time series may well be
represented by p £ 2 and q £ 2. If
your business is to provide a good approximation to reality and goodness of fit is your
criterion then a prodigal model is preferred. If your interest is predictive efficiency
then the parsimonious model is preferred.

Experiment with the ARMA ideas presented above with
a MathCAD worksheet.

**Autoregressive Integrate Moving Average Models**

MA filter AR filter Integrate filter

Sometimes the process, or series, we are trying to model is not stationary in levels.
But it might be stationary in, say, first differences. That is, in its original form the
autocovariances for the series might not be independent of the chronological point in
time. However, if we construct a new series which is the first differences of the original
series, this new series satisfies the definition of stationarity. This is often the case
with economic data which is highly trended.

** Definition** Suppose that z

This is named an ARIMA(p,d,q) model. p identifies the order of the AR operator, d
identifies the power on Ñ, q identifies the order of the MA
operator.

If the roots of f(B)Ñ lie outside
the unit circle then we can rewrite the ARIMA(p,d,q) as a linear filter. I.e., it can be
written as an MA(¥). We reserve the discussion of the
detection of unit roots for another part of the lecture notes.

**Transfer Function Models**

Consider a dynamic system with x_{t} as an input series and y_{t} as an
output series. Diagrammatically we have

These models are a discrete analogy of linear differential equations. We suppose the
following relation

where b indicates a pure delay. Recall that Ñ = (1-B).
Making this substitution the model can be written

If the coefficient polynomial on y_{t} can be inverted then the model can be
written as

V(B) is known as the impulse response function. We will come across this terminology
again in our later discussion of vector autoregressive , cointegration and error
correction models.

MODEL IDENTIFICATION

Having decided on a class of models, one must now identify the order of the
processes generating the data. That is, one must make best guesses as to the order of the
AR and MA processes driving the stationary series. A stationary series is completely
characterized by its mean and autocovariances. For analytical reasons we usually work with
the autocorrelations and partial autocorrelations. These two basic tools have unique
patterns for stationary AR and MA processes. One could compute sample estimates of the
autocorrelation and partial autocorrelation functions and compare them to tabulated
results for standard models.

*Definitions*

Sample Mean

Sample Autocovariance Function

Sample Autocorrelation Function

The sample partial autocorrelations will be

Using the autocorrelations and partial autocorrelations is quite simple in principle.
Suppose that we have a series z_{t}, with zero mean, which is AR(1). If we were to
run the regression of z_{t+2} on z_{t+1} and z_{t} we would expect
to find that the coefficient on z_{t} was not different from zero since this
partial autocorrelation ought to be zero. On the other hand, the autocorrelations for this
series ought to be decreasing exponentially for increasing lags (see the AR(1) example
above).

Suppose that the series is really a moving average. The autocorrelation should be zero
everywhere but at the first lag. The partial autocorrelation ought to die out
exponentially.

Even from our very cursory romp through the basics of time series analysis it is apparent
that there is a duality between AR and MA processes. This duality may be summarized in the
following table.

AR(p) | MA(q) | ARMA(p,q) | |

The stationary AR(p) f(B)z_{t} = a_{t} can be represented as an MA
of infinite order. |
The invertible MA(q) z_{t} = q (B)a_{t} can be represented as an
infinite order AR. |
Can be represented as either an AR() or MA(), conditional on roots. | |

Autocorrelation | r_{k} ® 0 but r ¹ 0
from f(B)=0. That is, the spikes in the correlogram decrease
exp onentially. |
r_{k} = 0 for k ³ q+1. There are spikes until lag q. |
Autocorrelations die out smoothly after q lags. |

Partial autocorrelation | f_{kk} = 0 for k ³ p+1. There are spikes in the correlogram at lags 1 through p. |
f_{kk} ® 0 but f_{kk} ¹ 0 from q(B)=0. The spikes die out
exponentially |
PACF's die out smoothly after p lags |

Stationarity | All roots of f(B)=0 lie outside the unit circle. | No restriction. | |

Invertibility |
No restriction. |
All roots of q (B)=0 lie outside the unit circle. |

Diagrams of the theoretical correlograms for AR and MA processes can be found in
Hoff(1983, Pps. 59-71) or Wei(1990, Pp. 32-66). The correlograms for a stationary ARMA are
somewhat more problematic. The autocorrelations show an irregular pattern of spikes
through lag q, then the remaining pattern is the same as that for an AR process. The
partial autocorrelations are irregular through lag p, then the PACs decay exponentially as
in an MA process.