Time Series Analysis - Sapienza [PDF]

7. VAR model: Definition, Representations, Use. 8. Innovation Accounting (IRF, GIRF, VarDec). 9. Granger Causality. 10.

7 downloads 42 Views 1MB Size

Recommend Stories


Modulbeschreibung „Time Series Analysis“
In every community, there is work to be done. In every nation, there are wounds to heal. In every heart,

Time Series Analysis
Seek knowledge from cradle to the grave. Prophet Muhammad (Peace be upon him)

time series analysis
Ask yourself: What kind of person do you enjoy spending time with? Next

Time Series Analysis
Ask yourself: Do I feel and express enough gratitude and appreciation for what I have? Next

Time Series Analysis Ebook
Ask yourself: How can you love yourself more today? Next

Financial Time Series Analysis
You're not going to master the rest of your life in one day. Just relax. Master the day. Than just keep

PDF Time Series Analysis and Its Applications
If you feel beautiful, then you are. Even if you don't, you still are. Terri Guillemets

Time-Frequency analysis of biophysical time series
Ask yourself: What's one thing I would like to do more of and why? How can I make that happen? Next

Analysis of Financial Time Series
Ask yourself: What is your ideal life partner like? Where can you find him/her? Next

Macroeconometrics and Time Series Analysis
Never let your sense of morals prevent you from doing what is right. Isaac Asimov

Idea Transcript


Time Series Analysis (3cfu) Why should I learn Time Series Analysis? Some possible answers: 1) Much of Economics is concerned with dynamics, i.e., with Time 2) 70’s: univariate ARIMA models outperform the predictive ability of large-size multivariate macroeconometric models 3) 80’s: Macroeconomics and Reality, i.e., VAR

Outline of the Course

1. Four Crucial Concepts of TS analysis 2. Linear Process: ARMA 3. ARMA as Difference Equations and Lag Operators 4. ARMA: Model Selection and Estimation 5. ARMA: Prediction 6. Integrated Process (ARIMA): Uroot Tests (DF, ADF) 7. VAR model: Definition, Representations, Use 8. Innovation Accounting (IRF, GIRF, VarDec) 9. Granger Causality 10. SVAR

Textbooks: Johnston-Dinardo “Econometric Methods” (1997) McGraw-Hill J. (Chapters 7 and 9) Hamilton “Time Series Analysis” (1994) Princeton University Press (Chapters 1 to 5, 11, Appendix: Math&Stat) M. Bovi

Pag. 1

Aim of this class: to give an intuition of the concepts of stochastic=random process and of time series.

The reason is in the following

Time-series analysis is concerned with evaluating the properties of the stochastic process, i.e. the probability model, which generated the observed time series.

Time-series modeling is concerned with inferring the properties of the stochastic process, i.e. the probability model, which generated the observed time series.

M. Bovi

Pag. 2

Time vs Space

Cross-sectional

Data collected at given point of time – A photo

E.g. a sample of firms (i=1,…,N) from each of which variables like # of employees, market value of shares, etc., are measured.

From the econometric point of view, it is important that the observations consist of a random sample from the underlying population.

The collection of random variables Xi is said to be a random sample if they are independent and identically distributed (i.i.d.), i.e., Xi

- are independent random variables (RV), and - have the same distribution (density function) f i = f

If this is the case, with i=1,…,N, then (xi realization of Xi)

That is, the density function of our random sample is just the product of the (all equal) density functions of the single realization xi.

In this framework, statistical inference is based on repeated random sampling.

Interested readers may refer to Hamilton Ch. 7.

M. Bovi

Pag. 3

Time Series Data

Observations on a variable(s) over time. – A movie E.g., daily interest rates, monthly CPI, quarterly GDP,…

Definitions (all valid): A time series is: - a part of a unique, unrepeatable, realization of a number of RV; - the finite part (the sample-path) of a particular realization of a stochastic process (just like tossing an unbiased coin is the realization of a RV with equal head/tail probability); - a sample of T observations – consecutive and collected at regularly spaced intervals of time - indexed by the date of each observation, i.e. a sequence; Hamilton (p. 25). Time series as a sequence

In these note I usually - express the entire sequence of values {…,yt-2, yt-1,yt, yt-1, yt-2,…} as {yt} - refer to any particular value in the sequence as yt.

Deterministic time series:

Stochastic time series (each observation is a realization of a RV):

M. Bovi

Pag. 4

Example. Time series as a unique, unrepeatable, set of realizations. Solid line = what we have observed. Unique, unrepeatable set of realizations. Thin lines = other – only “ex ante” possible - (sets of) realizations from the underlying stochastic process:

All these “potential realizations” constitute an ensemble, i.e., contemporaneous multiple time-series data of the same process. Typically, we observe only one realization of the ensemble. Think about the GDP in the Figure. Today we observe the line until the red X (today’s realization). What about the future (the blue area)?1

It is this uncertainty - the ex-ante possibility of different realizations - leading us to the concept of stochastic process: the idea is to model the data as a realization (or as a part of a realization) of a SP.

Stochastic process (SP) Broadly speaking it may be defined (equivalently) as - An arbitrary sequence of random data - A dynamic extension of the notion of RV (cf. the previous GWN example) - A random process running along in time and controlled by probabilistic laws

1

As we will say, if the process is stationary then its moments can usually be well approximated by sufficiently long time averages based on the single set of observations/realizations.

M. Bovi

Pag. 5

The stochastic process is denoted as

{… , 𝑌1 , 𝑌2 , … , 𝑌𝑡 , 𝑌𝑡+1 , … } A realization of a SP with T observations is the sequence of observed data: 𝑇 {𝑌1 = 𝑦1 , 𝑌2 = 𝑦2 , … , 𝑌𝑇 = 𝑦𝑇 } = {𝑦𝑡 }𝑡=1

In sum, unlike CS, in time series analysis: - we base our inference on a sample of dimension 1 with T observations coming from the SP - we have only one observation from the SP, and we call it a time series

Some additional features of TS w.r.t cross-section data: 1) The data frequency may require special attention (e.g. seasonality)

Three Components of a Time Series (first two deterministics)

2) MORE IMPORTANT: The ordering of the observations, which may convey

important information. Crucially, ordering preserves persistence/dependence between observations

M. Bovi

Pag. 6

In Cross Section analyses, we may order data in several way without necessarily loosing information. Trivially, this is not true with TS. Time series analysis is concerned with techniques for the analysis of this dependence. This requires the development of stochastic/dynamic models for time series data. As mentioned, for a stochastic process we can define a density function. It is also possible to marginalize this density for each subsample of its components: marginal densities are defined for each yt, for pairs (yt, yt-1), and so on. If the marginal densities have moments one can say, e.g., that E(yt)= V(yt)=,... Since consecutive observations are likely to be NOT independent: we cannot represent the density of the sample simply multiplying marginals. Example. Consider two RV 1 = the March inflation rate (time 1=now) 2 = the April inflation rate (time 2=later). We may think that 1 could be used as a predictor of 2 with some degree of uncertainty (what about vice versa?). The stochastic dependency between 1 and 2 is resumed by the joint density p(1, 2) or, equivalently, by the conditional probability p(2|1) = p(1, 2) / p(1) If p(2|1) ≠ p(2), then 1 and 2 are not independent that is the knowledge of the value of 1 reduces the uncertainty about 2.

3) Stationarity and Ergodicity

Both deal with inferential issues and with time-homogeneity. These concepts are among the

Four key building blocks in the analysis of time series: M. Bovi

Pag. 7

1)

Stationarity

2)

Ergodicity

3)

White Noise

4)

Information set

M. Bovi

Pag. 8

1) Stationarity Intuition - Stationarity requires the SP to be in a particular state of “statistical equilibrium”. - If you observe two equal-length separate “pieces” of a SP and these realizations exhibit similar statistical characteristics, then the SP is stationary. Examples (from Hamilton) WN (0,1): a stationary process

RW: a not stationary process

M. Bovi

Pag. 9

Stationarity is a probabilistically meaningful measure of regularity of the SP. This regularity can be exploited to estimate unknown parameters and characterize the dependence between observations across time. If the time series frequently changes in an unpredictable manner (there is not enough time homogeneity), constructing a meaningful probabilistic model would be difficult or even impossible. More formally. 1.1

Strict Stationarity

Consider a h-size subsample of our stochastic process: 𝑆𝑡ℎ =(yt,yt+1,…,yt+h-1) 𝑆𝑡ℎ is a h-dimension RV with a density function which may (or not) depend on time. If the density does not depend on time then ℎ ℎ distribution of 𝑆𝑡ℎ = distribution of 𝑆𝑡+1 = distribution of 𝑆𝑡+2 =… (the subsamples must not be necessarily consecutive)

When this independence holds for ∀h then the process is strictly stationarity: +∞ A stochastic process {𝑌𝑡 }𝑡=−∞ is strictly stationary if the joint distribution of {Yt , Yt +1, . . . , Yt +h} is constant: depends only on h and not on t.

That is, the only factor affecting the relationship between two observations is the gap (h) between them. Example of strictly stationary stochastic process: the joint distribution of {𝑌1 , 𝑌5 , 𝑌7 } is the same as the distribution of {𝑌12 , 𝑌16 , 𝑌18 }

NB: stationarity only applies to unconditional=population moments. 1.2 Covariance (or Weak) Stationarity Weak stationarity deals with subsamples of size 2 (h=2), i.e. with bidimensional RV: 𝑆𝑡2 =(yt,yt+1). If the first two moments exist, then the covariance exists as well. If these moments do not depend on time, then the process is covariance stationary. M. Bovi

Pag. 10

More formally. A stochastic process {yt} is covariance stationary if:

AUTOCOVARIANCE: the covariance of yt with itself at a different point in time. E.g., lag=s is the sth autocovariance:

The lag-0 autocovariance, 0, is the same quantity as the long-run variance of yt

The term long-run variance is used to distinguish V[yt] from the innovation variance, V[t]=2, also known as the short-run variance.

AUTOCORRELATION

ss/0 As we will see these s - if ≠0 - describe the memory of the process: that’s why stochastic processes are useful to represent persistent time series.

With respect to CS analysis, in TS analysis the weak dependence replaces the notion of random sampling (i.i.d), implying the LLN and the central limit theorem holds. M. Bovi

Pag. 11

1.3 Relationships between Covariance and Strict Stationarity The mentioned two types of stationarity are related, although neither nests the other. Strict but not Weak stationary process E.g.: a strictly stationary process without moments Consider a sequence of RV: yt=1/xt Where {xt} is a sequence of independent standard normal RV. In this case, the sequence {yt} has no moments Weak but not Strict stationary process A process with third or higher order moments which depend on time (e.g. timevarying kurtosis), may be covariance stationary but not strictly stationary. In general, time invariant moments do not imply that marginal densities are the same. Strict & Weak coincide when the process is Gaussian: A process is Gaussian when the joint distribution of whatever subsample of the elements of the process is multivariate normal.

Hamilton p. 46:

Covariance stationarity: important when modeling the mean of a process. Strict stationarity: useful in more complicated settings (e.g. non-linear models). A stationary time series is defined by its mean, variance and ACF.

M. Bovi

Pag. 12

2 ERGODICITY Intuition. Ergodicity is a condition which limits the memory of the process: - The memory of a NON ERGODIC process is so persistent that a subsample of it – no matter how long – is simply insufficient to infer the probabilistic features of the stochastic process. - The memory of an ERGODIC process, in the long run, is weak and such that increasing the sample size is informative about the probabilistic features of the stochastic process. It is hard to test for ergodicity using just (part of) a single realization (we assume it). Under ergodicity we can make inference. “Sit and count” is a reliable and consistent statistical procedure: - the sample moments for finite stretches of the realization will approach, almost surely (a.s.), their population counterparts as the length of the realization becomes infinite. More formally (Hamilton 46): Let’s denote a single realization of size T, say from simulation 1 out of N simulations, from the SP as: (1)

(1)

{𝑦1 , 𝑦2

(1)

,…, 𝑦𝑇 }

We can obviously compute its sample mean as

This is the TIME MEAN, not the UNCONDITIONAL2 MEAN of the SP, which we denote E(Yt). This said, a weak stationary SP is ergodic for the mean if 𝑦̅ converges in prob. to E(Yt) as T→∞ 2

Unconditional=population=ensemble (cf. p. 5).

M. Bovi

Pag. 13

ERGODICITY FOR THE MEAN, AUTOCOVARIANCE (), SUMMABILITY A cova-stationary SP is ergodic for the mean provided that

j→0 “sufficiently quickly” as j become large (it has less and less memory). In particular, we will see that a cova-stationary SP is ergodic for the mean provided that

ERGODIC THEOREM We need some preliminary definitions.

Bounded function is a function whose values are bounded to a (lower/upper) limit. Example: x=cos(

Bounded sequence. The sequence {yt} is bounded if a finite number, say m, is such that |yt| i.i.d.), specifically the N(0,2) distribution.

GWN is important because under Normality uncorrelated means independent. Example: a simulated GWN(0,1)

M. Bovi

Pag. 17

4. INFORMATION SET The information set at time t, It, contains all time t measurable events. Measurable event: it is any event that can have a probability assigned to it at time t. NB: Uncertainty is unmeasurable risk.

It includes realization of all variables which have occurred on or before t. Under the REH, It includes all the relevant information/skills to produce model consistent predictions: no useful info can be wasted in (certain) Economic Theory. Example for stock prices: the information set for April 3, 2016 contains all stock prices up to and including those which occurred on April 3, 2016. It also includes everything else known at this time such as interest rates, foreign exchange rates but, possibly, also the number of strikes in some important sector of the economy, etc.

Information and Expectations Many expectations will often be made conditional on the current information set: Et[yt+h|It] In Macroeconomics, you typically read (t=current period): Et[Et+1(xt+2)] = Etxt+2 In words: You have to predict now the realization of x two-step ahead: Etxt+2. Well, you are supposed to set your current best guess of your best guess next period, Et[Et+1(xt+2)], equal to your current best guess of xt+2, i.e., to Etxt+2. This is the Law of Iterated Expectations (LIE) with nested information sets. In general (X,Y are RVs), the LIE (equivalently) says: EY(Y) = EX[EY|X(Y|X)] - the expected value of the conditional expected value of Y given X (hence the name LIE), is the same as the unconditional expected value of Y. - the RV “EY|X(Y|X)” has the same expectation as the RV “Y”

M. Bovi

Pag. 18

A recall of some basic statistical concepts before proving the LIE. A discrete joint probability distribution for a RV X=(X1, X2), Xi∈{0, 1, 2, 3}, is:

Joint probabilities: the numbers in each of the cell of Xi∈{0, 1, 2, 3}. Unconditional probabilities: the numbers in the margins, i.e. in the very last row/column. They sum to one and are called the marginal distributions of, respectively, X2 and X1. Marginal probability: according to the Law of total probability they can be computed as the sum of conditional probabilities. E.g., first row (=> marginalizing out X2), Pr[X1 = 0]=(0.000+0.048+0.090+0.064)=0.210. Conditional probabilities: defined as joint/marginal probabilities. Intuition: Pr[X1 = 1| X2 = 1] = Pr[X1 = 1∩X2 = 1]/Pr[X2 = 1] = (0.180/0.42)~0.43 Conditional probability provides us with a way to reason about the outcome of an experiment, based on partial information. Expectation (Uppercase=RV; lowercase=realizations. Cf. Hamilton, p. 740):

Thus, the expectation is a weighted average of g(x), with weight=probabilities. Marginal density/probability The integral of the joint distribution, (fX,Y(x,y)), with respect to X (or to Y):

M. Bovi

Pag. 19

(a) Conditional density It is the (new) joint probability satisfying the realized X=x:

NB 1. the conditional density is a density (it sums to one):

The marginal of X, fX(x), is a constant when integrating wrt Y => outside the integral. The remaining integrand is the joint, which amounts to fX(x) when integrated wrt y ■ 2. Since the conditional is a density, we can compute its Conditional Expectation:

Note: • The conditional expectation is a function of the realization x: for a different realization of X, the conditional expectation will be a different number. This is hardly surprising: the conditional expectation is a statistics, and a statistics is a function of RV. (Cf., e.g., the definition of the expectation); M. Bovi

Pag. 20

• The conditional expectation is no more risky than unconditional because we have some more information: we know that X=x. • Since E(Y|X) is a function of X=x, it is a RV and we can compute the expectation of this expectation w.r.t. X:

So, we are iterating expectations Recall now what the LIE says: EX[EY|X(Y|X)] = EY(Y). To prove it, note that

[fY|X(y|x)dy]*fX(x)dx = conditional*marginal = joint = fY,X(y,x) =>

Now, let us examine the RHS: • the integral in dy is the joint distribution integrated w.r.t. Y, i.e. it is the marginal of Y: fY(y)dy (cf. equation a). Thus:

Now, since that the marginal of Y, fY(y)dy, is multiplied by y, the RHS integral is nothing but the unconditional expectation of Y: y*fY(y)dy = EY Thus, we may eventually confirm that: ■

EXAMPLES M. Bovi

Pag. 21

The LIE in Cross Sections Ex. 1. Two factories supply tires to the market. Factory X: - Its tires work for an average of 9000 Km, - It covers 60% of the total market Factory Y: - Its tires work for an average of 7000 Km, - It covers 40% of the total market What is the expected distance, ED, that a purchased tire will work? The LIE says: ED = ED|XP(X) + ED|YP(Y) = 9000(0.6) + 7000(0.4) = 8200 Km In fact, the LIE is a reformulation of the total expectation=probability theorem: “the unconditional average can be obtained by averaging the conditional averages.”

The LIE in Time Series Ex. 2: Forecasting the sales of a company (the joint prob. of X and Y is assumed to be known, i.e. we have a probability model) t0 = beginning of the year t1 = first semester of the year t2 = end of the year y1 = sales of a company in t1 x2 = sales over the entire year At t0, ➢ we forecast E[x2]: we use the expected value (what else?) as a forecast of x2 ➢ y1 is a RV At t1, M. Bovi

Pag. 22

➢ y1 is known: we now live in a new “universe” where everything is conditioned on the observed value of Y ➢ Based on the knowledge of y1, we revise our forecast using the expected value in this new universe: E(x2|y1)]. Thus, the revision is [E1(x2|y1) - E0(x2)]

Now, the LIE says that the expected revision must be zero: E0(x2)=E1(x2|y1) The basic logic of the LIE is that we cannot expect to be wrong

Usually, (actual revision)≠0: • it is hard to be a perfect forecaster. If (expected revision)≠0: • our expectations are systematically biased; • we are forming predictions that, on average, are wrong; • we are “over-revising” (w.r.t. what uncertainty dictates): E0(x2) ≠ E1(x2|y1).

M. Bovi

Pag. 23

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.