Idea Transcript
Time Series Analysis (3cfu) Why should I learn Time Series Analysis? Some possible answers: 1) Much of Economics is concerned with dynamics, i.e., with Time 2) 70’s: univariate ARIMA models outperform the predictive ability of large-size multivariate macroeconometric models 3) 80’s: Macroeconomics and Reality, i.e., VAR
Outline of the Course
1. Four Crucial Concepts of TS analysis 2. Linear Process: ARMA 3. ARMA as Difference Equations and Lag Operators 4. ARMA: Model Selection and Estimation 5. ARMA: Prediction 6. Integrated Process (ARIMA): Uroot Tests (DF, ADF) 7. VAR model: Definition, Representations, Use 8. Innovation Accounting (IRF, GIRF, VarDec) 9. Granger Causality 10. SVAR
Textbooks: Johnston-Dinardo “Econometric Methods” (1997) McGraw-Hill J. (Chapters 7 and 9) Hamilton “Time Series Analysis” (1994) Princeton University Press (Chapters 1 to 5, 11, Appendix: Math&Stat) M. Bovi
Pag. 1
Aim of this class: to give an intuition of the concepts of stochastic=random process and of time series.
The reason is in the following
Time-series analysis is concerned with evaluating the properties of the stochastic process, i.e. the probability model, which generated the observed time series.
Time-series modeling is concerned with inferring the properties of the stochastic process, i.e. the probability model, which generated the observed time series.
M. Bovi
Pag. 2
Time vs Space
Cross-sectional
Data collected at given point of time – A photo
E.g. a sample of firms (i=1,…,N) from each of which variables like # of employees, market value of shares, etc., are measured.
From the econometric point of view, it is important that the observations consist of a random sample from the underlying population.
The collection of random variables Xi is said to be a random sample if they are independent and identically distributed (i.i.d.), i.e., Xi
- are independent random variables (RV), and - have the same distribution (density function) f i = f
If this is the case, with i=1,…,N, then (xi realization of Xi)
That is, the density function of our random sample is just the product of the (all equal) density functions of the single realization xi.
In this framework, statistical inference is based on repeated random sampling.
Interested readers may refer to Hamilton Ch. 7.
M. Bovi
Pag. 3
Time Series Data
Observations on a variable(s) over time. – A movie E.g., daily interest rates, monthly CPI, quarterly GDP,…
Definitions (all valid): A time series is: - a part of a unique, unrepeatable, realization of a number of RV; - the finite part (the sample-path) of a particular realization of a stochastic process (just like tossing an unbiased coin is the realization of a RV with equal head/tail probability); - a sample of T observations – consecutive and collected at regularly spaced intervals of time - indexed by the date of each observation, i.e. a sequence; Hamilton (p. 25). Time series as a sequence
In these note I usually - express the entire sequence of values {…,yt-2, yt-1,yt, yt-1, yt-2,…} as {yt} - refer to any particular value in the sequence as yt.
Deterministic time series:
Stochastic time series (each observation is a realization of a RV):
M. Bovi
Pag. 4
Example. Time series as a unique, unrepeatable, set of realizations. Solid line = what we have observed. Unique, unrepeatable set of realizations. Thin lines = other – only “ex ante” possible - (sets of) realizations from the underlying stochastic process:
All these “potential realizations” constitute an ensemble, i.e., contemporaneous multiple time-series data of the same process. Typically, we observe only one realization of the ensemble. Think about the GDP in the Figure. Today we observe the line until the red X (today’s realization). What about the future (the blue area)?1
It is this uncertainty - the ex-ante possibility of different realizations - leading us to the concept of stochastic process: the idea is to model the data as a realization (or as a part of a realization) of a SP.
Stochastic process (SP) Broadly speaking it may be defined (equivalently) as - An arbitrary sequence of random data - A dynamic extension of the notion of RV (cf. the previous GWN example) - A random process running along in time and controlled by probabilistic laws
1
As we will say, if the process is stationary then its moments can usually be well approximated by sufficiently long time averages based on the single set of observations/realizations.
M. Bovi
Pag. 5
The stochastic process is denoted as
{… , 𝑌1 , 𝑌2 , … , 𝑌𝑡 , 𝑌𝑡+1 , … } A realization of a SP with T observations is the sequence of observed data: 𝑇 {𝑌1 = 𝑦1 , 𝑌2 = 𝑦2 , … , 𝑌𝑇 = 𝑦𝑇 } = {𝑦𝑡 }𝑡=1
In sum, unlike CS, in time series analysis: - we base our inference on a sample of dimension 1 with T observations coming from the SP - we have only one observation from the SP, and we call it a time series
Some additional features of TS w.r.t cross-section data: 1) The data frequency may require special attention (e.g. seasonality)
Three Components of a Time Series (first two deterministics)
2) MORE IMPORTANT: The ordering of the observations, which may convey
important information. Crucially, ordering preserves persistence/dependence between observations
M. Bovi
Pag. 6
In Cross Section analyses, we may order data in several way without necessarily loosing information. Trivially, this is not true with TS. Time series analysis is concerned with techniques for the analysis of this dependence. This requires the development of stochastic/dynamic models for time series data. As mentioned, for a stochastic process we can define a density function. It is also possible to marginalize this density for each subsample of its components: marginal densities are defined for each yt, for pairs (yt, yt-1), and so on. If the marginal densities have moments one can say, e.g., that E(yt)= V(yt)=,... Since consecutive observations are likely to be NOT independent: we cannot represent the density of the sample simply multiplying marginals. Example. Consider two RV 1 = the March inflation rate (time 1=now) 2 = the April inflation rate (time 2=later). We may think that 1 could be used as a predictor of 2 with some degree of uncertainty (what about vice versa?). The stochastic dependency between 1 and 2 is resumed by the joint density p(1, 2) or, equivalently, by the conditional probability p(2|1) = p(1, 2) / p(1) If p(2|1) ≠ p(2), then 1 and 2 are not independent that is the knowledge of the value of 1 reduces the uncertainty about 2.
3) Stationarity and Ergodicity
Both deal with inferential issues and with time-homogeneity. These concepts are among the
Four key building blocks in the analysis of time series: M. Bovi
Pag. 7
1)
Stationarity
2)
Ergodicity
3)
White Noise
4)
Information set
M. Bovi
Pag. 8
1) Stationarity Intuition - Stationarity requires the SP to be in a particular state of “statistical equilibrium”. - If you observe two equal-length separate “pieces” of a SP and these realizations exhibit similar statistical characteristics, then the SP is stationary. Examples (from Hamilton) WN (0,1): a stationary process
RW: a not stationary process
M. Bovi
Pag. 9
Stationarity is a probabilistically meaningful measure of regularity of the SP. This regularity can be exploited to estimate unknown parameters and characterize the dependence between observations across time. If the time series frequently changes in an unpredictable manner (there is not enough time homogeneity), constructing a meaningful probabilistic model would be difficult or even impossible. More formally. 1.1
Strict Stationarity
Consider a h-size subsample of our stochastic process: 𝑆𝑡ℎ =(yt,yt+1,…,yt+h-1) 𝑆𝑡ℎ is a h-dimension RV with a density function which may (or not) depend on time. If the density does not depend on time then ℎ ℎ distribution of 𝑆𝑡ℎ = distribution of 𝑆𝑡+1 = distribution of 𝑆𝑡+2 =… (the subsamples must not be necessarily consecutive)
When this independence holds for ∀h then the process is strictly stationarity: +∞ A stochastic process {𝑌𝑡 }𝑡=−∞ is strictly stationary if the joint distribution of {Yt , Yt +1, . . . , Yt +h} is constant: depends only on h and not on t.
That is, the only factor affecting the relationship between two observations is the gap (h) between them. Example of strictly stationary stochastic process: the joint distribution of {𝑌1 , 𝑌5 , 𝑌7 } is the same as the distribution of {𝑌12 , 𝑌16 , 𝑌18 }
NB: stationarity only applies to unconditional=population moments. 1.2 Covariance (or Weak) Stationarity Weak stationarity deals with subsamples of size 2 (h=2), i.e. with bidimensional RV: 𝑆𝑡2 =(yt,yt+1). If the first two moments exist, then the covariance exists as well. If these moments do not depend on time, then the process is covariance stationary. M. Bovi
Pag. 10
More formally. A stochastic process {yt} is covariance stationary if:
AUTOCOVARIANCE: the covariance of yt with itself at a different point in time. E.g., lag=s is the sth autocovariance:
The lag-0 autocovariance, 0, is the same quantity as the long-run variance of yt
The term long-run variance is used to distinguish V[yt] from the innovation variance, V[t]=2, also known as the short-run variance.
AUTOCORRELATION
ss/0 As we will see these s - if ≠0 - describe the memory of the process: that’s why stochastic processes are useful to represent persistent time series.
With respect to CS analysis, in TS analysis the weak dependence replaces the notion of random sampling (i.i.d), implying the LLN and the central limit theorem holds. M. Bovi
Pag. 11
1.3 Relationships between Covariance and Strict Stationarity The mentioned two types of stationarity are related, although neither nests the other. Strict but not Weak stationary process E.g.: a strictly stationary process without moments Consider a sequence of RV: yt=1/xt Where {xt} is a sequence of independent standard normal RV. In this case, the sequence {yt} has no moments Weak but not Strict stationary process A process with third or higher order moments which depend on time (e.g. timevarying kurtosis), may be covariance stationary but not strictly stationary. In general, time invariant moments do not imply that marginal densities are the same. Strict & Weak coincide when the process is Gaussian: A process is Gaussian when the joint distribution of whatever subsample of the elements of the process is multivariate normal.
Hamilton p. 46:
Covariance stationarity: important when modeling the mean of a process. Strict stationarity: useful in more complicated settings (e.g. non-linear models). A stationary time series is defined by its mean, variance and ACF.
M. Bovi
Pag. 12
2 ERGODICITY Intuition. Ergodicity is a condition which limits the memory of the process: - The memory of a NON ERGODIC process is so persistent that a subsample of it – no matter how long – is simply insufficient to infer the probabilistic features of the stochastic process. - The memory of an ERGODIC process, in the long run, is weak and such that increasing the sample size is informative about the probabilistic features of the stochastic process. It is hard to test for ergodicity using just (part of) a single realization (we assume it). Under ergodicity we can make inference. “Sit and count” is a reliable and consistent statistical procedure: - the sample moments for finite stretches of the realization will approach, almost surely (a.s.), their population counterparts as the length of the realization becomes infinite. More formally (Hamilton 46): Let’s denote a single realization of size T, say from simulation 1 out of N simulations, from the SP as: (1)
(1)
{𝑦1 , 𝑦2
(1)
,…, 𝑦𝑇 }
We can obviously compute its sample mean as
This is the TIME MEAN, not the UNCONDITIONAL2 MEAN of the SP, which we denote E(Yt). This said, a weak stationary SP is ergodic for the mean if 𝑦̅ converges in prob. to E(Yt) as T→∞ 2
Unconditional=population=ensemble (cf. p. 5).
M. Bovi
Pag. 13
ERGODICITY FOR THE MEAN, AUTOCOVARIANCE (), SUMMABILITY A cova-stationary SP is ergodic for the mean provided that
j→0 “sufficiently quickly” as j become large (it has less and less memory). In particular, we will see that a cova-stationary SP is ergodic for the mean provided that
ERGODIC THEOREM We need some preliminary definitions.
Bounded function is a function whose values are bounded to a (lower/upper) limit. Example: x=cos(
Bounded sequence. The sequence {yt} is bounded if a finite number, say m, is such that |yt| i.i.d.), specifically the N(0,2) distribution.
GWN is important because under Normality uncorrelated means independent. Example: a simulated GWN(0,1)
M. Bovi
Pag. 17
4. INFORMATION SET The information set at time t, It, contains all time t measurable events. Measurable event: it is any event that can have a probability assigned to it at time t. NB: Uncertainty is unmeasurable risk.
It includes realization of all variables which have occurred on or before t. Under the REH, It includes all the relevant information/skills to produce model consistent predictions: no useful info can be wasted in (certain) Economic Theory. Example for stock prices: the information set for April 3, 2016 contains all stock prices up to and including those which occurred on April 3, 2016. It also includes everything else known at this time such as interest rates, foreign exchange rates but, possibly, also the number of strikes in some important sector of the economy, etc.
Information and Expectations Many expectations will often be made conditional on the current information set: Et[yt+h|It] In Macroeconomics, you typically read (t=current period): Et[Et+1(xt+2)] = Etxt+2 In words: You have to predict now the realization of x two-step ahead: Etxt+2. Well, you are supposed to set your current best guess of your best guess next period, Et[Et+1(xt+2)], equal to your current best guess of xt+2, i.e., to Etxt+2. This is the Law of Iterated Expectations (LIE) with nested information sets. In general (X,Y are RVs), the LIE (equivalently) says: EY(Y) = EX[EY|X(Y|X)] - the expected value of the conditional expected value of Y given X (hence the name LIE), is the same as the unconditional expected value of Y. - the RV “EY|X(Y|X)” has the same expectation as the RV “Y”
M. Bovi
Pag. 18
A recall of some basic statistical concepts before proving the LIE. A discrete joint probability distribution for a RV X=(X1, X2), Xi∈{0, 1, 2, 3}, is:
Joint probabilities: the numbers in each of the cell of Xi∈{0, 1, 2, 3}. Unconditional probabilities: the numbers in the margins, i.e. in the very last row/column. They sum to one and are called the marginal distributions of, respectively, X2 and X1. Marginal probability: according to the Law of total probability they can be computed as the sum of conditional probabilities. E.g., first row (=> marginalizing out X2), Pr[X1 = 0]=(0.000+0.048+0.090+0.064)=0.210. Conditional probabilities: defined as joint/marginal probabilities. Intuition: Pr[X1 = 1| X2 = 1] = Pr[X1 = 1∩X2 = 1]/Pr[X2 = 1] = (0.180/0.42)~0.43 Conditional probability provides us with a way to reason about the outcome of an experiment, based on partial information. Expectation (Uppercase=RV; lowercase=realizations. Cf. Hamilton, p. 740):
Thus, the expectation is a weighted average of g(x), with weight=probabilities. Marginal density/probability The integral of the joint distribution, (fX,Y(x,y)), with respect to X (or to Y):
M. Bovi
Pag. 19
(a) Conditional density It is the (new) joint probability satisfying the realized X=x:
NB 1. the conditional density is a density (it sums to one):
The marginal of X, fX(x), is a constant when integrating wrt Y => outside the integral. The remaining integrand is the joint, which amounts to fX(x) when integrated wrt y ■ 2. Since the conditional is a density, we can compute its Conditional Expectation:
Note: • The conditional expectation is a function of the realization x: for a different realization of X, the conditional expectation will be a different number. This is hardly surprising: the conditional expectation is a statistics, and a statistics is a function of RV. (Cf., e.g., the definition of the expectation); M. Bovi
Pag. 20
• The conditional expectation is no more risky than unconditional because we have some more information: we know that X=x. • Since E(Y|X) is a function of X=x, it is a RV and we can compute the expectation of this expectation w.r.t. X:
So, we are iterating expectations Recall now what the LIE says: EX[EY|X(Y|X)] = EY(Y). To prove it, note that
[fY|X(y|x)dy]*fX(x)dx = conditional*marginal = joint = fY,X(y,x) =>
Now, let us examine the RHS: • the integral in dy is the joint distribution integrated w.r.t. Y, i.e. it is the marginal of Y: fY(y)dy (cf. equation a). Thus:
Now, since that the marginal of Y, fY(y)dy, is multiplied by y, the RHS integral is nothing but the unconditional expectation of Y: y*fY(y)dy = EY Thus, we may eventually confirm that: ■
EXAMPLES M. Bovi
Pag. 21
The LIE in Cross Sections Ex. 1. Two factories supply tires to the market. Factory X: - Its tires work for an average of 9000 Km, - It covers 60% of the total market Factory Y: - Its tires work for an average of 7000 Km, - It covers 40% of the total market What is the expected distance, ED, that a purchased tire will work? The LIE says: ED = ED|XP(X) + ED|YP(Y) = 9000(0.6) + 7000(0.4) = 8200 Km In fact, the LIE is a reformulation of the total expectation=probability theorem: “the unconditional average can be obtained by averaging the conditional averages.”
The LIE in Time Series Ex. 2: Forecasting the sales of a company (the joint prob. of X and Y is assumed to be known, i.e. we have a probability model) t0 = beginning of the year t1 = first semester of the year t2 = end of the year y1 = sales of a company in t1 x2 = sales over the entire year At t0, ➢ we forecast E[x2]: we use the expected value (what else?) as a forecast of x2 ➢ y1 is a RV At t1, M. Bovi
Pag. 22
➢ y1 is known: we now live in a new “universe” where everything is conditioned on the observed value of Y ➢ Based on the knowledge of y1, we revise our forecast using the expected value in this new universe: E(x2|y1)]. Thus, the revision is [E1(x2|y1) - E0(x2)]
Now, the LIE says that the expected revision must be zero: E0(x2)=E1(x2|y1) The basic logic of the LIE is that we cannot expect to be wrong
Usually, (actual revision)≠0: • it is hard to be a perfect forecaster. If (expected revision)≠0: • our expectations are systematically biased; • we are forming predictions that, on average, are wrong; • we are “over-revising” (w.r.t. what uncertainty dictates): E0(x2) ≠ E1(x2|y1).
M. Bovi
Pag. 23