Applied Econometrics [PDF]

Applied Econometrics – 14 Questions for Review. 6. What does the (bid-ask) spread mean? What are explanations for the

0 downloads 3 Views 600KB Size

Recommend Stories


Applied Econometrics with R
The butterfly counts not months but moments, and has time enough. Rabindranath Tagore

Applied Econometrics -David F
Don't ruin a good today by thinking about a bad yesterday. Let it go. Anonymous

Applied Macro and Financial Econometrics
I tried to make sense of the Four Books, until love arrived, and it all became a single syllable. Yunus

ETC3410: Applied econometrics - 2017 Handbook
And you? When will you begin that long journey into yourself? Rumi

working papers in econometrics and applied statistics
Be grateful for whoever comes, because each has been sent as a guide from beyond. Rumi

Online PDF Basic Econometrics
There are only two mistakes one can make along the road to truth; not going all the way, and not starting.

PDF Econometrics For Dummies
You miss 100% of the shots you don’t take. Wayne Gretzky

[PDF] Introduction to Econometrics
Ask yourself: What activity in your life lights you up with joy? Next

PDF Download Introductory Econometrics
Don't ruin a good today by thinking about a bad yesterday. Let it go. Anonymous

[PDF] Introductory Econometrics
Life isn't about getting and having, it's about giving and being. Kevin Kruse

Idea Transcript


Applied Econometrics 15. Juli 2010

Inhaltsverzeichnis 1 Introduction and Motivation 1.1 3 motivations for CLRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 3

2 The Classical Linear Regression Model (CLRM): Parameter Estimation by OLS

7

3 Assumptions of the CLRM

10

4 Finite sample properties of the OLS estimator

13

5 Hypothesis Testing under Normality

16

6 Confidence intervals 6.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22 23

7 Goodness-of-fit measures

24

8 Introduction to large sample theory 8.1 A. Modes of stochastic convergence . . . . . 8.2 B. Law of Largen Numbers (LLN) . . . . . . 8.3 C. Central Limit Theorems (CLTs) . . . . . . 8.4 D. Useful lemmas of large sample theory . . . 8.5 Large sample properties for the OLS estimator

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

25 26 28 29 30 31

9 Time Series Basics (Stationarity and Ergodicity)

34

10 Generalized Least Squares

37

11 Multicollinearity

39

12 Endogeneity

40

Applied Econometrics – Inhaltsverzeichnis 13 IV estimation

42

14 Questions for Review

45

2

Applied Econometrics – 1 Introduction and Motivation

1 Introduction and Motivation What is econometrics? Econometrics=economic statistics/data analysis∩ economic theory∩ mathematics Conceptional: • Data perceived as realizations of random variables • Parameters are real numbers, not random variables • Joint distributions of random variables depend on parameters General regression equations Generalization: yi = β1 xi1 + β2 xi2 + ... + βK xiK + i (linear model) mit: yi = dependent variable (observable) xik =explanatory variable (observable) i =unobservable component Index of observations i=1,2,...,n (for individuals, time etc.) and regressors k=1,2...,K

β0

yi

xi

i

(1x1)=(1xK) ∗ (Kx1) + (1x1)

   β1 xi1 β =  ...  andxi =  ...  βK xiK

(1)



(2)

Structural parameters=suggested by economic theory The key problem of econometrics: We deal with non-experimental data Unobservable variables, interdependence, endogeneity, (reversed) causality

1.1 3 motivations for CLRM Theory: Glosten/Harris Notation: • Transaction price: Pt

3

Applied Econometrics – 1 Introduction and Motivation • Indicator of transaction type: ( 1 buyer initiated trade Qt = −1 seller initiated trade

(3)

• Trade volume: vt • Drift parameter: µ • Earnings/costs of the market maker: c (operational costs, asymmetric information costs) • Private information: Qt zt • Public information: t Efficient/’fair’ price:mt =

µ + mt−1 + t | {z }

+Qt zt mit zt = z0 + z1 vt

Random walk with drift

Market maker sets: • Sell price (ask): Pta = µ + mt−1 + t + zt + c • Buy price (bid): Ptb = µ + mt−1 + t − zt − c | {z } (for oneself)

• ⇒Spread: market maker anticipates price impact Transactions occur at ask or bid price: Pt − Pt−1 = µ + mt−1 + t + Qt zt + cQt − [mt−1 + cQt−1 ]

(4)

∆Pt = µ + z0 Qt + z1 vt Qt + c∆Qt + t

(5)

• Observable: ∆Pt , Qt , vt → xi = [1, Qt , vt Qt , ∆Qt ]0 and yi = ∆Pt • Unobservable: t • Estimation of unknown structural parameters β = (µ, z0 , z1 , c)0 Theory: Asset pricing Finance theory: Investors compensated for holding risk; demand expected return beyond risk-free rate xjt+1 : payoff of risky asset j → f1,t+1 ...fR,t+1 (f:risk factor) →Linear (risk) factor models: ej = βj ∗ λ E(Rt+1 ) |{z} |{z} | {z }

expected asset return

exposure of payoff asset j to factor k risk

mit β j = (β1j , ..., βkj )0 und

price of factor k risk

4

Applied Econometrics – 1 Introduction and Motivation λ = (λ1 , ..., λk )0 ej Rt+1 =

j Rt+1 | {z }

gross return

mit gross return=



f Rt+1 | {z }

(6)

risk-free rate xjt+1 Ptj

=

Single risk factor: β j =

j +djt+1 Pt+1

Ptj

Cov(Rej ,f ) V ar(f )

Some AP models: • CAPM: f = Rem = (Rm − Rf ) {z } | marketrisk

• Fama French (FF): f = (Rem , HM L, SM B)0 • → f1 , ..., fK excess returns themselves Then λ = [E(f1 ), ..., E(fK )]0 ej j em ) und FF: E(Rej ) = β j E(Rem )+β j E(HM L CAPM: E(Rt+1 ) = β j ∗E(Rt+1 t+1 )+β3 E(SM Bt+1 ) t+1 2 1 t+1 To estimate risk loadings β, we formulate sampling (regression, „compatible“) model: ej em = β1 Rt+1 + β2 HM Lt+1 + β3 SM Bt+1 + jt+1 Rt+1

(7)

j em , HM L Assume E(jt+1 |Rt+1 t+1 , SM Bt+1 ) = 0 (implies E(t+1 ) = 0). This model is compatible with the theoretical model which gets clear when you use expected values on both sides.

This sampling model does not contain a constant: β0j = 0 (from theory) = testable restriction Theory: Mincer equation ln(W AGEi ) = β1 + β2 Si + β3 T EN U REi + β4 EXP Ri + i

(8)

Notation: • Logarithm of the wage rate: ln(W AGEi ) • Years of schooling: Si • Experience in the current job: T EN U REi • Experience in the labor market: EXP Ri

5

Applied Econometrics – 1 Introduction and Motivation →Estimation of the parameters βk , where β2 : return to schooling Statistical specification E(yi |xi ) = x0i ∗ βk |{z} |{z} (xi1 ,...,xik )

Marginal effect:

(linear function of the x’s)

(β1 ,...,βk )0

∂E(yi |xi ) = βK ∂xK

(9)

„Compatible“ regression model: yi = x0i β + i with E(yi |xi ) = x0i β → LTE (law of total expectation) implies: ( E(i |xi ) = 0 (10) Cov(i , xik ) = 0∀k The specification E(yi |xi ) = x0i β is ad hoc→ Alternative: non-parametric regression (leaves the functional form open). Justifyable by normality assumption:      2  σy σxy Y µy ∼ BV N , X µx σxy σx2   Y     X1    2 µy    ...  ∼ M V N  µx  , σy Σxy    |{z} Σxy Σ2x  Xk  (k+1x1) |{z}

(11)

(12)

(k+1x1)

mit µx = E(x) = [E(x1 )...E(xk )] und Σx = V ar(x) = kxk E(y|x) = µy + Σ−1 x Σyx (x − µx ) =

e α + βx | {z }

linear conditional mean

e x und βe = Σ−1 Σyx α = µy − βµ x V ar(y|x) = σy2 − Σ0yx Σx Σyx {z } |

does not depend on x, „homoscedasticity“

Rubin’s causal model = Regression analysis of experiments STAR experiment = small class experiment Treatment: binary variable Di = 0, 1 (class size) Outcome: yi (SAT scores) Does Di → yi ? (causal effects) Potential outcome: ( y1i One of the outcomes is hypothetical = y0i

if Di = 1 if Di = 0

(13)

6

Applied Econometrics – 2 The Classical Linear Regression Model (CLRM): Parameter Estimation by OLS Actual outcome: observed yi = y0i + (y1i − y0i ) ∗Di (the effect may be identical or different across | {z } causalef f ect

individuals i) Uses causality explicitly in contrast to the other models.

Yi = |{z} α + ρ Di + |{z} E(yoi )

y1i −yoi

η |{z}

zi0 γ |{z}

+

yoi −E(yoi )

(14)

STAR(gender, race, free lunch)

ρ is constant across i. Goal: estimate ρ STAR: experiment→random assignment of i to treatment: E(yoi |Di = 1) = E(yoi |Di = 0) In a non-experiment→selection bias: E(yoi |Di = 1) − E(yoi |Di = 0) 6= 0

E(yi |Di = 1) = α + ρ ∗ E(ηi |Di = 1) | {z }

(15)

E(yoi |Di =1)

sowie E(yi |Di = 0) = α + E(ηi |Di = 0) {z } |

(16)

E(yoi |Di =0)

→ E(yi |Di = 1) − E(yi |Di = 0) = ρ +

selectionbias | {z }

E(yio |Di =1)−E(yoi |Di =0)

Another way, apart from experiments, to avoid selection bias are natural (quasi) experiments.

2 The Classical Linear Regression Model (CLRM): Parameter Estimation by OLS Classical linear regression model: yi = β1 xi1 + β2 xi2 + ... + βK xiK + i = x0i ∗ β +i |{z} |{z} (1xK)

(17)

(Kx1)

yi =(y1 , ..., yn )’: Dependent variable, observed x0i =(xi1 , ..., xiK ): Explanatory variable, observed β 0 =(β1 , ..., βK ): Unknown parameters i =Disturbance component, unobserved →b’=(b1 , ..., bK ): Estimate of β 0 →ei =yi − x0i b: Estimated residual

7

Applied Econometrics – 2 The Classical Linear Regression Model (CLRM): Parameter Estimation by OLS Based on i.i.d. variables, which refers to the independence of the i’s, not the x’s. Preferred technique: Least-squares (instead of maximum-likelihood and moments). Two-sidedness (ambiquity): x’s as random variables or realisations For convenience we introduce matrix notation: y = |{z} X ∗ β + |{z}  |{z} |{z}

(nx1)

(nxK)

(Kx1)

(18)

(nx1)

Constant: (x11 , ..., xn1 )’=(1,...,1)’ Writing extensively: A system of linear equations

y1 = β1 + β2 x12 + ... + βK x1K + 1

(19)

y2 = β1 + β2 x22 + ... + βK x2K + 2

(20)

...

(21)

yn = β1 + β2 xn2 + ... + βK xnK + n

(22) (23)

OLS estimation in detail: Estimate β by choosing b: P P P argmin(b) e2i = (yi − x0i b)2 = (yi − βxi1 − ... − βxiK )2 = S(b) | {z }

minimizesumof squaredresiduals

Differentiate with respect to b1 , ..., bK →FOC

X 1X ∂S(b) = −2[ yi − x0i b] = ei = 0 ∂b1 n X ∂S(b) 1X = −2[ yi − x0i b]xi2 = ei xi2 = 0 ∂b2 n ... X X ∂S(b) 1 = −2[ yi − x0i b]xiK = ei xiK = 0 ∂bK n

(24) (25) (26) (27) (28)

8

Applied Econometrics – 2 The Classical Linear Regression Model (CLRM): Parameter Estimation by OLS →System of K equations and K unknowns: solve for b (OLS estimator) „Characteristics“: e=0 Cov(ei , xi2 ) = 0 Cov(ei , xiK ) = 0 With 1 regressor: yi = βP1 + β2 xPi2 +Pi yi xi2 − Pyi xi2 b2 = P = x2 −[ x ]2 i2

i2

sample cov sample var

Solution more complicated for K≥2 → The system of K equations is solved by matrix algebra: e=y-Xb → FOC rewritten: X

ei = 0

(29)

xi2 ei = 0

(30)

...

(31)

xiK ei = 0

(32)

X

X

0

→Xe=0

(33)

Extensively:      1...1   e1       x12 ...xn2  ∗  e2         ...  ... x1K ...xnK en

(34)

0 → X 0 e = X 0 (y − Xb) = X 0 y − X X b =0 |{z} | {z } |{z} Kx1

KxK Kx1

X’X has a rank of K (full rank, then [X 0 X]−1 exists and premultiplying by [X 0 X]−1 results in: [X 0 X]−1 X’y-[X 0 X]−1 X’Xb=0 [X 0 X]−1 X’y-Ib=0 →b=[X 0 X]−1 X’y Alternative notation: 1 1 b = ( X 0 X)−1 X 0 y = n n

1X ( xi x0i )−1 n | {z }

1X xi yi |n {z }

(35)

matrixof samplemeans vectorof samplemeans

Questions: Properties? Unbiased, efficient, consistent?

9

Applied Econometrics – 3 Assumptions of the CLRM

3 Assumptions of the CLRM The four core assumptions of CLRM 1.1 Linearity in parameters: yi = x0i β + i This is not too restrictive, because reformulation is possible, e.g. using ln, quadratics 1.2 Strict exogeneity: E(i |X) = E(i |x11 , ..., x1k , ..., xi1 , ..., xik , ..., xn1 , ..., xnk ) = 0 Implications: a) E(i |x) = 0 → E(i |xik ) = 0 → E(i ) = 0 | {z } | {z } by LTE

by LTE

a)

z}|{ b) E(i xjk ) = Cov(i , xjk ) = 0 ∀ i,j,k (use LTE, LIE) | {z } by LTE

⇒unconditional moment restrictions (compare to OLS FOC) Example where this may be violated: ln(wagesi ) = β1 + β2 Si +... + i → Ability |{z} (+)

crimei = β1 + β2 P olicei +... + i → Social factors | {z } | {z }

in district i

(−)

unempli = β1 + β2 Libi +... + i → Macro shock |{z} | {z }

in country i

(−)

When E(i |x) 6= 0 Endogeneity (prevents us from estimating β 0 s consistently). Discussion: Endogeneity and sample selection bias Rubin’s causal model: yi = α+ρDi +ηi ; E(yi |Di = 1)−E(yi |Di = 0) = ρ+E(yoi |Di = 1) − E(yoi |Di = 0) {z } | selection bias

Di and

{yoi , y1i } | {z }

assumed independent → {yoi , y1i }⊥Di

partly unobservable

With independence : E(yoi |Di = 1) = E(yoi |Di = 0) = E(yoi )=E( b ηi | Di ) = E(ηi ) = 0 |{z} |{z} i

xik

10

Applied Econometrics – 3 Assumptions of the CLRM Independence is normally not the case because of endogeneity and sample selection bias. Conditional independence assumption (CIA): {yoi , y1i }⊥Di |xi → selection bias vanishes conditioning on xi How do we instrumentalize CIA: by adding „control variables“ to the right-hand side of the equation Example: Mincer-type regression ln(wagei ) = β1 + β2 Highschooli + β3 T eni + β4 Expi + β5 Abilityi + β6 F amilyi +i | {z } Control variables

→I assume i ⊥Highschool|Ability, Family,... Justifies E(i |Highschool, ...) = 0 CIA justifies inclusion control variable and E(i |x) = 0 Matching= sorting individuals into groups and then comparing the outcomes 1.3 No exact multicollinearity,

P (rank(X) = K) | {z }

=1

Bernoulli distributed (is a random variable)

⇒ No linear dependencies in the data matrix, otherwise (X 0 X)−1 does not exist ⇒ Does not refer to a high correlation between the X’s 1.4 Spherical disturbances: V ar(i |x) = E(2i |x) = V ar(i ) = σ 2 ∀i → Homoscedasticity (relates to the MVN) Cov(i , j |x) = E(i ∗ j |x) = E(i ∗ j ) = 0 → No serial correlation E(1 , ..., n )   2  E(21 |x) ... ... σ ... 0 ...  =  ... ... ...  = Cov(|x) E[ ∗ 0 |x] = E(2 ∗ 1 |x) ... ... ... E(2n |x) 0 ... σ 2 

(36)

By LTE E( ∗ 0 ) = V ar() = σ 2 In → V ar(i ) = E(2i ) = σ 2 → Cov(i , j ) = E(i , ..., j ) = 0∀i 6= j Interpreting the parameters β of different types of linear equations Linear model: yi = β1 + β2 xi2 + ... + βK xiK + i . A one unit increase in the independent variable xiK increases the dependent variable by βK units Semi-log form: ln(yi ) = β1 + β2 xi2 + ... + βK xiK + i . A one unit increase in the independent variable increases the dependent variable approximately by 100*βk percent Log-linear model: ln(yi ) = β1 ln(xi1 ) + β2 ln(xi2 ) + ... + βK ln(xiK ) + i . A one percent increase in xiK increases the dependent variable yi approximately by βk percent. e.g. yi = A ∗ xαi1 xγi2 i (Cobb-Douglas)→ lnyi = lnA + αlnxi1 + γlnxi2 + lni

11

Applied Econometrics – 3 Assumptions of the CLRM Before the OLS proofs a useful tool: Law of total expectations (LTE)

Z

+∞

Z y ∗ fy|x (y, x)dy =

E(y|X = x) = −∞

y∗

fxy (x, y) dy fx (x)

Using random variable x: Z fxy (x, y) E(y|x) = y ∗ dy f (x) {zx } |

(37)

(38)

g(x)

Z Ex (g(x)) =

⇒ Ex [

Z Z Z Z Z fxy (x, y) g(x)∗fx (x)dx = [ y∗ dy]fx (x)dx = y fxy dxdy = y∗fy (y)dy = E(y) fx (x) (39)

Ey|x [y|x] | {z }

] = Ey (y) ↔ E[E(y|x)] = E(y)

(40)

measurable function of x

Notes: • works when X a vector • forecasting interpretation LTE extension: Law of iterated expectation (LIE) Ez|x [[

Ey|x,z (y|x, z) | {z }

]|x] = E(y|x)

(41)

measurable function of x,z

Other important laws Double Expectation Theorem (DET): Ex [Ey|x (g(y)|x)] = Ey (g(y))

(42)

Generalized DET: Ex [Ey|x (g(x, y))|x] = Ex,y (g(x, y))

(43)

Linearity of Conditional Expectations: Ey|x [g(x)y|x] = g(x)Ey|x [y|x]

(44)

12

Applied Econometrics – 4 Finite sample properties of the OLS estimator

4 Finite sample properties of the OLS estimator Finite Sample Properties of b = (X 0 X)−1 X 0 y 1. With 1.1-1.3 and holding for any sample size: E[b|X] = β and by LTE: E[b] = β →unbiasedness 2. With 1.1-1.4: V ar[b|X] = σ 2 [X 0 X]−1 (importance for testing, depends on the data) →conditional variance b 3. With 1.1-1.4: OLS is efficient: V ar[b|X] ≤ V ar[β|X] →Gauss-Markov theorem 4. ⇒ OLS is BLUE Starting point: sampling error   b1 − β1 b − β =  ...  = [X 0 X]−1 X 0 y − β = [X 0 X]−1 X 0 [Xβ + ] − β = [X 0 X]−1 X 0  bK − βK Defining A = [X 0 X]−1 X 0 , so that:     a11 ... a1n 1    A ∗  = ... ... ... ∗ ...  ak1 ... akn n

(45)

(46)

whereas A is treated as a constant Derive unbiasedness Step 1: E(b − β|X) = E(A ∗ |X) = A ∗

E(|X) | {z }

=0

(47)

=0(assumption 1.2)

Step 2: „b conditionally unbiased“ Step1

z}|{ E(b − β|X) = E(b|X) − E(β|X) = E(b|X) − β = 0 ⇔ E(b|X) = β

(48)

Step 3: „OLS unbiased“ LT E

Step2

z}|{ z}|{ Ex [E(b|X)] = E(b) = β

(49)

13

Applied Econometrics – 4 Finite sample properties of the OLS estimator Derive conditional variance sampling error

V ar(b|X) = V ar(b−β|X)

z}|{ =

1.4

z}|{ V ar(A∗|X) = A∗V ar(|x)∗A0 = A∗E(0 |x)∗A0 = A∗σ 2 In∗A0 = σ 2 AA0 (50)

Using [BA]’=A’B’ and inserting for A: σ 2 [X 0 X]−1 X 0 X[X 0 X]−1 = σ 2 [X 0 X]−1 = V ar(b|X)

(51)

Und es gilt: 

 V ar(b1 |x) Cov(b2 , b1 |x) ...  ... ... V ar(b|X) = Cov(b1 , b2 |x) ... ... V ar(bk |x)

(52)

Derive Gauss-Markov-Theorem b ≥ V ar(b|x) V ar(β|x) b − V ar(b|x) is This refers to the fact that we are claiming that the elementwise difference V ar(β|x) positive semi-definite. Insertion: A and B: separate matrizes, same size A≥ B if A-B is positive semi-definite C is psd if |{z} X 0 |{z} C |{z} X ≥ 0 for all x6= 0 |{z} kxk

1xk kxk kx1

b − V ar(b|x)]a ≥ 0∀a 6= 0 a0 [V ar(β|x)

(53)

If this is true for all a, it is true in particular for a=[1,0,...,0]’ which leads to: V ar(βb1 |x) ≥ V ar(b1 |x) This works also for b2 , ..., bk with the respective a. This means that any conditional variance of β is bigger than the repsective conditional variance. The proof: Note that b=A*y b βb is linear in y and unbiased and β=C*y with C being a function of X D=C-A ↔ C=D+A Step 1: βb = (D + A)y = Dy + Ay = D(Xβ + ) + b = DXβ + D + b

(54)

Step 2: b = DXβ + E(D|x) + E(b|X) = DXβ + D ∗ E(|x) +β = DXβ + β E(β|x) | {z }

(55)

=0

14

Applied Econometrics – 4 Finite sample properties of the OLS estimator As we are working with an unbiased estimator: DX=0 Step 3: Going back to Step 1

βb − β = D + | {z } s.e.of βb

b−β | {z }

βb = D + b

(56)

= [D + A]

(57)

s.e.of b=A∗

Step 4:

b = V ar(βb − β|x) = V ar[(D + A)|x] = [D + A]V ar(|x)[D + A]0 = σ 2 [D + A][D + A]0 (58) V ar(β|x) = σ 2 [DD0 + AD0 + DA0 + AA0 ](59) Es gilt: AA0 = [X 0 X]−1 X 0 X[X 0 X]−1 = [X 0 X]−1 AD0 = [X 0 X]−1 X 0 D0 = [X 0 X]−1 [DX]0 = 0 (as DX=0) DA0 = D[[X 0 X]−1 X 0 ]0 = DX[X 0 X]−1 = 0 (as DX=0)

b = σ 2 [DD0 + [X 0 X]−1 ] V ar(β|x)

(60)

Step 5: Es muss also gelten

a0 [σ 2 [DD0 + [X 0 X]−1 ] − [X 0 X]−1 σ 2 ]a ≥ 0 0

2

0

a [σ DD ]a ≥ 0

(61) (62) (63)

We havePto show that a’DD’a≥ 0∀ a is sdp As z’z= zi2 ≥ 0∀ a the above is true. The OLS estimator is BLUE • OLS is linear: Holds under assumption 1.1 • OLS is unbiased: Holds under assumption 1.1-1.3 • OLS is the best estimator: Holds under the Gauss Markov theorem

15

Applied Econometrics – 5 Hypothesis Testing under Normality OLS anatomy • ybi ≡ x0i b; yb = Xb; e = y − yb • p = X[X 0 X]−1 X 0 „projection matrix“; use: yb = p ∗ y = x ∗ b (lies in space spanned by columnes of X); P is symmetric and idempotent (P*P=P) • M=In-p „residual maker“; use: e=M*y; M is symmetric and idempotent • y=Py+My [X’e=0 from FOC] (e orthogonal to columns of X; space is spanned by columns of X) → yb0 e = 0 P • e = M ∗  → e0 e = e2i = 0 M  • with a constant P – ei = 0 [FOC] – y = x0 b (for x and y vectors of sample means) P P yi = n1 ybi = yb – y = n1

5 Hypothesis Testing under Normality Economic theory provides hypotheses about parameter. Eg. asset pricing example: Rte = α + βRtem + t → Hypothesis implied by APT: α=0 If theory os right→Testable implications. In order to test we need the distribution of b, so that hypotheses can’t be tested without distributional assumptions about . In addition to 1.1-1.4 we assume that 1 , ..., n |X are normally distributed: Distributional assumption (Assumption 1.5): Normality assumption about the conditional distribution of |X ∼ M V N (E(|x) = 0, V ar(|x) = σ 2 In ) (Can be dispersed of later). → i |x ∼ N (0, σ 2 ) Useful tools/results: P 2 2 • Fact 1: If xi ∼ N (0, 1) with i=1,...,m and xi independent → y = m i=1 xi ∼ χ (m) x 2 If x ∼ N(0,1) and y ∼ χ (m) and x,y independent: t = √ ∼ t(m) y/m

• Fact 2 omitted • Fact 3: If x and y are independent, so are f(x) and g(y) • Fact 4: Let x ∼ M V N (µ, Σ) and Σ nonsingular: (x − µ)0 Σ−1 (x − µ) ∼ χ2 (random variable) • Fact 5: W ∼ χ2 (a), g ∼ χ2 (b) and W,g independent:

(W/a) (g/b)

∼ F (a, b)

16

Applied Econometrics – 5 Hypothesis Testing under Normality • Fact 6: X ∼ M V N (µ, Σ) and y=c+Ax mit c,A non-random vector/matrix → y ∼ M V N (c + Aµ, AΣA0 ) We use b − β = (X 0 X)−1 X 0 : | {z } s.e.

Assuming |X ∼ M V N (0, σ 2 In ) from 1.5 and using Fact 6:

b − β|X ∼ M V N ((X 0 X)−1 X 0 E(|X), (X 0 X)−1 X 0 σ 2 In X(X 0 X)−1 ) 2

0

−1

b − β|X ∼ M V N (0, σ (X X)

(64)

)

(65)

b − β|X ∼ M V N (E(b − β|X), V ar(b − β|X))

(66)

Note that V ar(bk |X) = σ 2 ((X 0 X)1 ) By the way: e|X ∼ M V N (0, σ 2 M ) → show using e=M* and M is symmetric and idempotent Testing hypothesis about individual parameters (t-Test) Null hypothesis: H0 : βk = β k (a hypothesized value, a real number). β k is often assumed to be 0. It is suggested by theory. Alternative hypothesis HA : βk 6= β k We control the type 1 error (=rejection when right) by fixing a significance level. We then aim for a high power (=rejection when false) (low type 2 error (=no rejection when false)). Construction of the test statistic: By Fact 6: bk − βk ∼ N (0, σ 2 ((X 0 X)−1 )kk ) [((X 0 X)−1 )kk is the k-th row k-th column element of (X 0 X)−1 ] OLS-estimator conditionally normal distributed if |X is multivariate normal bk |X ∼ N (βk , σ 2 ((X 0 X)−1 )kk ) Under H0 : βk = β k : bk |X ∼ N (β k , σ 2 ((X 0 X)−1 )kk ) If H0 is true E(bk ) = β k : bk − β k ∼ N (0, 1) zk = p σ 2 ((X 0 X)−1 )kk

(67)

• Standard normally distributed under the null hypothesis • We don’t say anything about the distribution under the alternative hypothesis • Distribution under H0 does not depend on X • Value of the test statistic depends on X and on σ 2 whereas σ 2 is unknown

17

Applied Econometrics – 5 Hypothesis Testing under Normality Call s2 an unbiased estimate of σ 2 : bk − β k

tk = p s2 ((X 0 X)−1 )kk

a

z}|{ ∼ t(n − k)or ∼ N (0, 1)for(n − k) > 30

(68)

Nuisance parameter σ 2 can be estimated σ 2 = E(2i |X) = V ar(i |X) = E(2i ) = V ar(i )

(69)

We don’t know i , but we use the estimator ei = yi − x0i b σ b2 = V ar(ei ) =

1X 2 1X 2 1 1X (ei − ei ) = ei = e0 e n n n n

(70)

d 2 is a biased estimator: sigma E(b σ 2 |x) =

n−K 2 σ n

(71)

Proof: in Hayashi p.30-31. Use: e2i = e0 e = 0 M  P • trace(A)= aii (sum of the diagonal of the matrix)



P

Note that for n → ∞b σ 2 is asymptotically unbiased as limn→∞

n−K =1 n

(72)

Having an estimator where the bias and the variance vanishes when n → ∞, is what we call a consistent estimator. An unbiased estimator of σ 2 (see Hayashi p.30-31) 1 P 2 1 2 For s = n−K ei = n−K e0 e we get an unbiased estimator: 1 E(e0 e|X) = σ 2 n−K E(E(s2 |X)) = E(s2 ) = σ 2

E(s2 |X) =

(73) (74)

Using this provides an unbiased estimator of V ar(b|X) = σ 2 (X 0 X)−1 : d V ar(b|X) = s2 (X 0 X)−1

(75)

t-statistic under H0 : bk − βk bk − βk bk − βk tk = q = =q ∼ t(n − K) SE(bk ) d d (V ar(b|X)) ( V ar(b |X)) kk k

(76)

18

Applied Econometrics – 5 Hypothesis Testing under Normality Hayashi p.36-37 shows that with the unbiased estimator of σ 2 we have to replace the distribution of the t-stat by the t-distribution. Sketch of the proof: r bk − βk σ2 zk zk tk = p ∗ =p (77) =p 2 s q/n − k σ 2 [(X 0 X)−1 ]kk e ∗ e0 (n − k)/σ 2 We need to shoq that q/n − k ∼ χ2 (n − k) and q,zk independent. Decision rule for the t-test 1. State your null hypothesis: H0 : βk = βk is often βk = 0; HA : βk 6= βk 2. Given βk , OLS-estimate bk and s2 , we compute tk =

bk −βk SE(bk )

3. Fix significance level α of two-sided test 4. Fix non-rejection and rejection regions 5. ⇒ decision Remark: p 2 0 −1 pσ [(X X) ]kk : standard deviation bk |X 2 0 −1 s [(X X) ]kk : standard error bk |X We want a test that keeps its size and at the same time has a high power. Find critical value such that: P rob[

−tα/2 (n − k) | {z }

real n◦ ; lower crit value

< t(n − k) < | {z } r.v.;t-stat

tα/2 (n − k) {z } |

]=1−α

real n◦ ; upper crit value

Example: n-k=30, α=0.05 tα/2 (n − k) = t.975 (30) = 2.042 The probability that the t-stat takes on a value of 2.042 or smaller is equal to 97.5%. −tα/2 (n − k) = t.025 (30) = −2.042 The probability that the t-stat takes on a value of -2.042 or smaller is equal to 2.5% Conventional α’s: 0.01; 0.05; 0.1 • Use stars • Use p-values: compute the quantile taking your t-stat as the x (Interpretation!) • Use t-stat • Report standard error • Report confidence intervals (Interpretation!)

19

Applied Econometrics – 5 Hypothesis Testing under Normality F-test/Wald test Model more complex hypotheses („linear hypotheses“) = testing joint hypotheses. Example 1: G/H-Modell ∆Pi = µ + c∆Qi + z0 Qi + z1 vi Qi + i Hypothesis: no infromational content of traders H0 : z0 = 0 and z1 = 0; HA : z0 6= 0 or z1 6= 0 Example 2: Mincer eq. ln(wagei ) = β1 + β2 Si + β3 tenurei + β4 experi + i Hypothesis: 1y more schooling shows the same effect as 1y more tenure AND experience has no effect H0 : β2 = β3 and β4 = 0; HA : β2 6= β3 or β4 6= 0 For the F-test write the null hypothesis as a system of linear equations: R ∗ β = |{z} r |{z} |{z} #rxK

Kx1

#rx1

     r1 β1 R11 ... R1K      ... ... ... ∗ ... = ...  r# βk R#1 ... R#K 

(78)

mit: #=number of restrictions R=matrix of real N◦ ’s r=vector of real N◦ ’s Example 1: 

  y    c 0 0 0 1 0   ∗ = z0 0 0 0 0 1 z1

(79)

Example 2: 

  β1     0 1 −1 0 β 0 2  ∗ β3  = 0 0 0 0 1 β4 

(80)

Use Fact 6: Y=A*Z (Y ∼ M V N (Aµ, AΣA0 )) In our context: Replacing the β = (β1 , ..., βk ) by estimator b = (b1 , ..., bK )0 . R ∗ b = re (b is conditionally normally distributed under 1.1-1.5) For the Wald-test: you only need the unconstrained parameter estimates. For the other two, you need both the constrained and the unconstrained parameter estimate.

20

Applied Econometrics – 5 Hypothesis Testing under Normality The Wald-test keeps its size and gives the highest power. The restrictions are fixed by the H0 Definition of the F-test statistic E(e r|x) = R ∗ E(b|x) = R ∗ β (under 1.1-1.3) V ar(e r|x) = R ∗ V ar(b|x)R0 = Rσ 2 (X 0 X)−1 R0 (under 1.1-1.4) re|x = R ∗ b|x ∼ M V N (Rβ, Rσ 2 (X 0 X)−1 R) Using Fact 4 to construct the test for: H0 : R ∗ β = r HA : R ∗ β 6= r Under the hypothesis: E(Rb|x)=r (=hypothesized value) Var(Rb|x)=Rσ 2 (X 0 X)−1 R’ (unaffected by hypothesis) Rb|x=r∼ M V N (r, Rσ 2 (X 0 X)−1 R0 ) (this is where the hypothesis goes in) Problem: Rb is a random vector and the distribution depends on x. ⇒Distribution of a random variabl under H0 whose distribution does not depend on X. With Fact 4: (Rb − E(Rb|x))0 (V ar(Rb|x))−1 (Rb − E(Rb|x)) This leads to the Wald-statistic: (Rb − r)0 [Rσ 2 (X 0 X)−1 R]−1 (Rb − r) ∼ χ2 (#r)

(81)

The smallest number for this is zero (which is the case when the restrictions are perfectly met and the parenthesis are zero). Thus we always have a one-sided test here. 1 P 2 1 Replace σ 2 by its unbiased estimate s2 = n−k ei = n−k e0 e and dividing by #r (to find the distribution using Fact 5): F = (Rb − r)0 [Rs2 (X 0 X)−1 R0 ](Rb − r)/#r ∼ F (#r, n − k) (Rb − r)0 [R(X 0 X)−1 R0 ]−1 (Rb − r)/#r (e0 e)/(n − K) 0 −1 d = (Rb − r)0 [RV ar(b|X)R ] (Rb − r)/#r ∼ F (#r, n − K) F =

(82)

(83) (84)

F-Test is one-sided. For applied work: c2 ) get close to σ 2 , so we can use the approximation. N→ ∞ : s2 (and σ Decision rule for the Wald-test 1. Specify H0 in the form Rβ = r and HA : Rβ 6= r 2. Calculate F-statistic

21

Applied Econometrics – 6 Confidence intervals 3. Look up entry in the table of the F-distribution for #r and n-K at given significance level 4. Null is not rejected on the significance level α for F less than Fα (#r, n − K) Alternative representation of the Wald/F-statistic P Minimization of the unrestricted sum of squared residuals min (yi − x0i b)2 → SSRu P Minimization of the restricted sum of squared residuals (constraints imposed) min (yi − x0ieb)2 SSRr SSRu ≤ SSRr (always!)

F − ratio : F =

(SSRr − SSRu )/#r SSRu /(n − k)

(85)

This F-ratio is equivalent to the Wald-ratio. Problem: For the F-ratio we need two estimates b. This is also known as the „likelihood-ratio principle“ in contrast to the Wald-principles. (Third principle: only using unrestricted estimates „Langrange Multiplier principle“)

6 Confidence intervals Duality of t-test and confidence interval Probability for non-rejection for t-test: P (−tα/2 (n − K) ≤ tk ≤ tα/2 (n − K)) = | {z } do not reject if this even occurs

|1 − {z α}

(86)

if H0 is true

(tk is a random variable) In 1-α% of our samples tk lies inside the critical values. Rewrite (86): P (bk − SE(bk )tα/2 (n − K) ≤ βk ≤ bk + SE(bk )tα/2 (n − K)) = 1 − α

(87)

(bk , SE(bk ) is a random variable) The confidence interval Confidence interval for βk (=if H0 is true): P (bk − SE(bk )tα/2 (n − K) ≤ βk ≤ bk + SE(bk )tα/2 (n − K)) = 1 − α

(88)

The values inside the bounds of the confidence interval are the value for which you cannot reject the null-hypothesis. In 1-α% of our samples βk would lie inside our bounds (the bounds would enclose βk ).

22

Applied Econometrics – 6 Confidence intervals The confidence bounds are random variables! bk − SE(bk )tα/2 (n − K): lower bound of the 1-α confidence interval. bk + SE(bk )tα/2 (n − K): upper bound of the 1-α confidence interval. Wrong interpretation: True parameter βk lies with probability 1-α within the bounds of the confidence interval. Problem: Confidence bounds are not fixed; they are random! H0 is rejected at significance level α if the hypothesized value does not lie within the confidence bounds of the 1-α interval. Count the number of times that βk is inside the confidence interval. After n sample, this gives us the probability for the confidence interval 1-α. (βk not inside the confidence interval is equivalent to the event that we reject the H0 using the t-statistic). This works the same way if βk 6= βk . On the correct interpretation of confidence intervals: given: • b • s2 and se(bk ) • tα/2 (n − k) ⇒ (bk − β k )/se(bk ) (reject if outside the bounds of the critical values) Rephrasing: for which values β k do we (not) reject? bk − se(bk ) ∗ tα/2 (n − k) < β k < bk + se(bk ) ∗ tα/2 (n − k) = not reject bk − se(bk ) ∗ tα/2 (n − k) > β k > bk + se(bk ) ∗ tα/2 (n − k) = reject We want small confidence intervals, because the allow us to reject hypothesis (narrow range=high power). We achieve this by increasing n (decreasing the standard error) or by increasing α.

6.1 Simulation Frequentists work with the idea that we can conduct experiments over and over again. To avoid type 2 error we need smaller standard errors (narrower distribution) and thus we have to increase the sample size. We are running a simulation with multiple draws. To proof unbiasedness, we compute the mean for all the draws and compare this to the true parameters. To get closer, we have to increase the number of draws. → Increasing n leads to a smaller standard error. → Increasing draws leads to more precise estimates, but the standard error doesn’t decrease.

23

Applied Econometrics – 7 Goodness-of-fit measures You can use a confidence interval to determine the power of a test. Determine in how many % of the cases a wrong parameter estimate lies inside the confidence interval (for the true parameter this should be 1-α). This is the power of the test. To increase this, increase the sample size. The closer the hypothesized value and the tue value are, the less the power of the test.

7 Goodness-of-fit measures Is needed with conflicting equations (theories) 2 Uncentered R2 :P Ruc „Variability of yi : yi2 = y 0 y Decomposition of y’y: =(Xb + e)0 (Xb + e) = (b y + e)0 (b y + e) = yb0 yb + 2e0 yb + e0 e =

yb0 yb |{z}

+

explainedvariation

e0 e |{z}

unexplainedvariation(SSR)

(e0 yb = (b y 0 e)0 = (e0 yb)0 = yb0 e = (Xb)0 e = b0 X 0 e = 0 (as X’e=0)) 2 = Ruc 0 yb yb y 0 y ∗ 100% (% of explained variation)

1−

e0 e y0 y

∗ 100% = 1 −

P 2 e P i2 yi

∗ 100% (% of unexplained variation)

A good model explains much and therefore the residual variation is very small compared to the explained variation. Centered R2 : Rc2 Use centered R2 if there is a constant in the model (xi1 = 1) P Variance of yi : n1 (yi − y)2 X X P (ybi − y)2 Decomposition of n1 (yi − y)2 = + e2i | {z } | {z } variance of predicted values

SSR

Proof: P P P P 1 P (yi − y)2 = n1 (b yi + ei − y)2 = (b y − y)2 + e2i + 2(b y − y)ei = n X P P 2 P P P 2 P 2 2 (b y − y) + ei + 2b Xei − 2yei = (b y − y) + ei − 2y ei |{z} | {z } 0

Rc2 =

1 n 1 n

P (b yi −y)2 P (yi −y)2

=1−

1 n

1 n P

P

0

e2i

(yi −y)2

Both uncentered and centered R2 lie in the interval [0,1] but describe different models. They are not comparable. The centered R2 of a model with only a constant whereas the uncentered R2 is not 0 as long as the

24

Applied Econometrics – 8 Introduction to large sample theory constant is 6=0 (but we explain only the level P of y and not the variation). Without a constant the 2 cnetered R can’T be compared, because ei = 0 only holds with a constant. • R2 : coefficients of determination; =[corr(y, yb)]2 • Explanatory power of regression beyond constant • Test that β2 = ... = βk = 0 is the same as the test that R2 =0 (R2 is a random variable/estimate) • →Overall F-test: P SSRR : yi = β0 + i = (yi − y)2 P 2 SSRU R : yi = β0 + β1 x1 + ... + βk xk = ei (SSRR −SSRU R )/k → SSRU R /(n−k−1) Trick: increase R2 , add more regressors (k↑). Modell comparison: • Conduct an F-test between the parsimonious and the extended model • Use model selection criteria, which give a penalty for parameterization Selection criterion 2 =1− Radj

SSR/(n − k) n − 1 SSR =1− ∗ SST /(n − 1) − k} SST |n {z

(89)

>1f ork>1

(can become negative) Alternative model selection criteria Implement „Occams razor“ (the simpler explanation (parsimonious model) tends to be the right one) Akaike criterion (from information theory): log[SSR/n]+2k/n (minimize, can be neg.) Schwarz criterion (from Bayesian theory): log[SSR/n]+k*log(n)/n (minimize, can be neg.) Schwarz tends to penalize more heavily for larger n (=favors parsimonious models). →not one is favored (all three are generally accepted), but one has to argue why one is used.

8 Introduction to large sample theory Basic concepts of large sample theory Using large sample theory we can dispense with basic assumptions from finite sample theory: 1.2 E(i |X) = 0: strict exogeneity 1.4 V ar(|X) = σ 2 In : homoscedasticity 1.5 |X ∼ N (0, σ 2 In ): normality of the error term

25

Applied Econometrics – 8 Introduction to large sample theory Approximate/assymptotic distribution of b, and t- and the F-stat can be obtained. Contents: A. Stochastic convergence B. Law of large numbers C. Central limit theorem D. Useful lemmas A-D: pillars and building blocks of modern applied econometrics →CAN(consistent asymptotiv normal) - property of OLS (and other estimates)

8.1 A. Modes of stochastic convergence First: non-stochastic convergence {cn }=(c1 , c2 , ...)=sequence of real numbers Q: Can you find n() such that |cn -c|n(,w) for all  ] = 0( > 0) or plimZn − µ; Zn p µ Extensions: The WLLN holds for 1. Multivariate Extension (seq. of random vectors): {Zi }    z21 z11  ... ,  ... , ... z2k z1k | {z } | {z } 

i=1

(91)

i=2

E(Zi ) = µ < ∞, µ being a vector of expectations of the rows [E(z1 ), ..., E(zk )]0 Compute sample means for each element of seq. „over the rows“: n

1X 1X 1X Zi = [ Z1i , ..., Zki ]0 n n n

(92)

i=1

Element-wise convergence:

1 n

P



Zi p µ

2. Relax independence: Relax iid, allow for dependence in {Zi } through cov(Zi , Zi−j ) 6= 0, j 6= 0 (especially important in time series; draws still from the same distribution) →ergodic theorem. 3. Functions of Zi h(zi ) | {z }

measurable function

e.g.{ ln(Zi ) } | {z }

(93)

new sequence

28

Applied Econometrics – 8 Introduction to large sample theory If {Zi } allows application of LLN and E(h(Zi )) < ∞, we have: be used)

1 n

P



h(zi ) p E(h(zi )) (LLN also can

Application {Zi }iid, E(zi ) < ∞ h(Zi ) = (zi − µ)2 E(h(zi )) = V ar(zi ) = σ 2 → 1 P (zi − µ)2 p V ar(zi ) n 4. Vector-valued function f(zi ) Vector-valued functions: one or many arguments and one or many returns. {Zi }→{f(Zi )} (mit f (Zi ) = f (zi1 , ..., zik ))’ If {Zi } allow application of LLN and E(f(Zi ))0 for all nonzero x, then A is positive definite 2. if x’Ax≥0 for all nonzero x, then A is positive semi definite They are symmetric by construction X ∼ (µ, Σ) a0 = [a1 , ..., an ] → Z=a’X with V ar(Z) = a0 V ar(X)a | {z } has to be >0

6. How does the conditional independence (CIA) assumption justify the inclusion of additional regressors (control variables) in addition to the „key regressor“, i.e. the explanatory variable of most interest (e.g. small class, schooling, foreign indebtedness). In which way can one, by invoking the CIA get closer to the „experimental ideal“? CIA in general: p(a|b,c)=p(a|c) or a± b|c: a is conditionally independent of b given c → i ± x1 |x2 , x3 , ... The more control variables, the closer we come to the ideal that i ± x1

53

Applied Econometrics – 14 Questions for Review 7. The CIA motivates the idea of a „matching estimator“ briefly discussed in the lecture. What is the simple basic idea of it? To measure treatment effects without random assignment: Matching means sorting individuals in similar groups, where one received treatment and then compare the out comes. E(yi |xi , di = 1) − E(yi |xi , di = 0) = local average treatment effect Sort groups according to x’s. Replace expectations by the sample means for the individuals and the difference tells the treatment effect of a certain group. Exi [E(yi |xi , di = 1) − E(yi |xi , di = 0)] = average treatment effect Averaging over all x’s, so we get the treatment effect for all groups. Fourth set 0. Where does the null hypothesis enter into the t-statistic? In the numerator, as we replace βk with βk 1. Explain what unbiasedness, efficiency, and consistency mean. Give an explanation of how we can use these concepts to assess the quality of the OLS estimator b = (X 0 X)−1 X 0 y. (We will be more specific about consistency in later lectures, this question is just to make you aware of the issue.) Unbiasedness E(b)=β: If an estimator is unbiased, its probability distribution has an expected value equal to the parameter it is supposed to estimate. b Efficiency V ar(β|x) ≥ V ar(b|X): We would like the distribution of the sample estimates to be as tightly distributed around the true β as possible. Consistency P (b − β > ) → 0 as n → ∞: b is asymtotically unbiased ⇒ b is BLUE =

best |{z}

linear unbiased estimator

ef f iciency

2. Refresh your basic Mathematics and Statistics: a) v=a’*Z. Z is a (nx1) vector of random variables. a’ is (1xn) vector of real numbers. b) V=A*Z. Z is a (nx1) vector of random variables. A is (mxn) matrix of real numbers. Compute for v mean and variance as well as mean vector and variance covariance matrix of V . a) E(v) = E(a0 z) = |{z} a0 E(z) | {z } V ar(v) = V

ar(a0 z)

(1xn) (nx1) 0

= |{z} a V ar(z) |{z} a | {z }

(1xn) (nxn) (nx1)

b) E(V ) = E(AZ) = |{z} A ∗ E(Z) | {z } (mxn)

(nx1)

V ar(V ) = V ar(A ∗ Z) = |{z} A ∗ V ar(Z) ∗ |{z} A0 | {z } (mxn)

(nxn)

(nxm)

54

Applied Econometrics – 14 Questions for Review 3. Under which assumptions can the OLS estimator b = (X 0 X)−1 X 0 y be called BLUE? Best estimator: Gauss-Markov (1.1-1.5) Linear: 1.1 Unbiased: 1.1-1.3 4. Which additional assumption is used to establish that the OLS estimator is conditionally normally distributed? Which key results from mathematical statistics have we used to employed to show the normality of the OLS estimator b = (X 0 X)−1 X 0 y which results from this assumption? After assumption 1.5: |X ∼ M V N (0, σ 2 In) As we know b − β = (X 0 X)X 0 e and using Fact 6: b − β|X ∼ M V N (E(b − β|X), V ar(b − β|X)) b − β|X ∼ M V N (0, σ 2 (X 0 X)−1 ) b|X ∼ M V N (β, σ 2 (X 0 X)−1 ) 5. Why are the elements of b random variables in the first place? They are randomly drawn and are a result of a random experiment. 6. For which purpose is it important to know the distribution of the parameter estimate in the first place? Hypothesis testing (including confidence intervals, critical values, p-values) 7. What does the expression „a random variable is distributed under the null hypothesis as ....“ mean? Random variable has a distribtuiton, as the realizations are only one way that the estimate can look like. Under the null hypothesis means that we suppose that b=0 meaning that the null hypothesis is true. 8. Have we said something about the unconditional distribution of the OLS estimate b? Have we said something about the unconditional means? The unconditional variances and covariances? Unconditional distribution: nein Unconditional means: ja (LTE) Unconditional variances: nein 9. What can we say about the conditional (on X) and unconditional distribution of the t and z statistic (under the null hypothesis)? What is their respective mean and variance? The type of distribution stays the same when conditioning. zk : conditional: N(0,1); mean: βk ; variance: σ 2 [(X 0 X)−1 ]kk tk : unconditional: t(n-k); mean: βk ; variance: s2 [(X 0 X)−1 ]kk 10. Explain • what is meant by a type 1 error and a type 2 error? • what is meant by „significance level“ = „size“. • what is meant by the „power of a test“?

55

Applied Econometrics – 14 Questions for Review • Type 1 error α: rejected although correct; Type 2 error β: accepted although wrong • Probability of making Type 1 error • 1-Type 2 error: Rejecting when false 11. What are the two key properties that a good statistical test has to provide? Tests should have a high power (=reject a false null) and should keep its size 12. There are two main schools of statistics: Frequentist and Bayesian. Describe their main differences! Frequentists: fix the probability (significance level) → H0 can be rejected ⇒ rejection approach (favored) Bayesian: the probability changes with the evidence we get from the data → H0 cannot be rejected, but we get closer to it ⇒ confirmation approach 13. What is the null and the alternative hypothesis of the t-test presented in the lecture? How is the test statistic constructed? H0 : βk = βk , HA : βk 6= βk bk |X ∼ N (βk , σ 2 [(X 0 X)−1 ]kk ) H0 : bk |X ∼ N (βk , σ 2 [(X 0 X)−1 ]kk ) k ∼ N (0, 1) zk = √ 2 bk −β 0 −1 σ [(X X)

]kk

14. On which grounds could you criticize a researcher for choosing a significance level of 0.00000001 (or 0.95, respectively)? Give examples for more „conventional“ significance levels. 0.00000001: hypothesis almost never rejected (type 1 error too low) 0.95: hypothesis is too often rejected (type 1 error too high) 15. Assume the value of a t-test statistic equals 3. The number of observations n in the study is 105, the number of explanatory variables is 5. Give a quick interpretation of this result. t-stat=3; n=105 and k=5 leads to df=100 We can use standard normal distribution as df>30. For 5% significance level the critical values are ±2 (rule of thumb), so we would reject the hypothesis. 16. Assume the value of a t-test statistic equals 0.4 and n-K=100. Give a quick interpretation of this result. We can’t reject the H0 on a 5% significance level (t-stat approximately normal and thus we can use the rule of thumb for the critical values) 17. In a regression analysis with K=4 you obtain parameter estimates b2 = 0.2 and b3 = 0.04. You are interested in testing (separately) the hypotheses β2 = 0.1 against β 6= 0.1 and β3 = 0 against

56

Applied Econometrics – 14 Questions for Review β3 6= 0. You obtain 

s2 X 0 X −1

 0.00125 0.023 0.0017 0.0005  0.023 0.0016 0.015 0.0097  =  0.0017 0.015 0.0064 0.0006 0.0005 0.0097 0.00066 0.001

(123)

where s2 is an unbiased estimated of σ 2 . Compute the standard errors of the two estimates, the two t-statistics and the associated p-values. Note that the t-test (as we have defined it) is two-sided. I assume that you are familiar with p-values, students of Quantitative Methods definitely should be. Otherwise, refresh you knowledge! √ b2 = 0.2, β2 = 0.1 → standard error for b2 =√ 0.0016 = 0.04 b3 = 0.04, β3 = 0 → standard error for b3 = 0.0064 = 0.08 t-stat 2: 0.2−0.1 0.04 =2.5 t-stat 3: 0.04 0.08 =0.5 p-value 2: 1-0.9938=0.0062 p-value 3: 1-0.6915=0.3085 Computing the p-value is computing the implied α. 18. List the assumptions that are necessary for the result that the z-statistic zk = √

bk −βk σ 2 [(X 0 X)−1 ]kk

is

standard normally distributed under the null hypothesis, i.e. zk ∼ N (0, 1). Benötigt: 1.1-1.5 Zusätzlich: n-k>30 bzw. Var(b|X) known. 19. Is the t-statistic „nuisance parameter-free“? Yes, as the nuisance parameter σ2 is replaced by s2 . Definition: any parameter which is not of immediate interest but must be accounted for in the analysis of those parameters which are of interest. 20. I argue that using the quantile table of the standard normal distribution instead of the quantile table of the t-distribution doesn’t make much of a difference in many applications: The respective quantiles are very similar anyway. Do you agree? Discuss. This is correct for n-k>30. For lower values of n-k: t-dist is more spread out in the tails. 21. I argue that with respect to the distribution of the t-statistic under the null substituting for σ 2 1 the unbiased estimator σ b2 = n−k e0 e of σ 2 does not make much of a difference. When would you subscribe to that argument? Discuss. for σ b2 vanishes. Thus the statement is ok for large n. For small n the For n→ ∞ the bias n−K n unbiased estimator should be used. 22. a. Show that P = X(X 0 X)−1 X 0 and M=In-P are symmetric (i.e. A=A’) and idempotent, (i.e.

57

Applied Econometrics – 14 Questions for Review A=A*A). A useful result is: (BA)’=A’B’. b. Show that yb=Xb=Py P and e=y-Xb=My are orthogonal vectors, i.e. yb0 e = 0 c. Show that e=M and e2i = e0P e = 0 M  d. Show that if xi1 = 1 we have ei = 0 (use OLS FOC) and that y = x0 b (i.e. the regression hyperplane passes through the sample means). a. P symmetric: X(X 0 X)−1 X’ ’=[X’]’[(X 0 X)−1 ]’[X]’=X[(X 0 X)0 ]−1 X’=X[X 0 [X 0 ]0 ]−1 X’=X(X 0 X)−1 X’ M symmetric: -X(X 0 X)−1 X’ ’=In’-[X(X 0 X)−1 X’]’=In-X(X 0 X)−1 X’ P idempotent: X(X 0 X)−1 X’ [X(X 0 X)−1 X’]=X(X 0 X)−1 (X’X)(X 0 X)−1 X’=X(X 0 X)−1 X’ M idempotent:

-X(X 0 X)−1 X’ [In-X(X 0 X)−1 X’]= In[In-X(X 0 X)−1 X’]-X(X 0 X)−1 X’[In-X(X 0 X)−1 X’]= In-X(X 0 X)−1 X’-X(X 0 X)−1 X’+X(X 0 X)−1 X X(X 0 X)−1 X’ b. Py’*My=0 X(X 0 X)−1 X’y ’*[(In-X(X 0 X)−1 X’)y]= y’[X(X 0 X)−1 X’]*[y-X(X 0 X)−1 X’y]= y’X(X 0 X)−1 X’y-y’X(X 0 X)−1 X’X(X 0 X)−1 X’y=0 c. e=M* -X(X 0 X)−1 X’ =-X(X 0 X)−1 X’=(y-xβ)-X(X 0 X)−1 X’(y-Xβ)= y-Xβ-X(X 0 X)−1 X’y+X(X 0 X)−1 X’Xβ= y-Xβ-X(X 0 X)−1 X’y+Xβ y-X(X 0 X)−1 X’y=y-Xb=e e’e=0 M  e’e=(M )0 M  = 0 M 0 M  = 0 M  d. Starting point: X’(y-Xb)=X’y-X’Xb=0           1 ... 1 1 ... 1 b1 1 x12 ... x1k y 1 x12 ... xn2  x12 ... xn2    ∗  ...  −   ∗ ... ... ... ...  ∗  ...  = 0  ... ... ...   ... ... ...  yn 1 xn2 ... xnk bk x1k ... xnk x1k ... xnk

(124)

58

Applied Econometrics – 14 Questions for Review Only first row: X

yi − [n ∗

X

xi2 , ...,

X

xik ] ∗ b = 0 → y = b1 + b2 x2 + .... + bk xk

(125)

yi = x0i b + ei = ybi + ei 1X 1 P 1 P ei y = y b + i i n n |n {z } 0, if x=1

y = yb Fifth set 1. What is the difference between the standard deviation of a parameter estimate bk and the standard error of the estimate? Standard deviation is the square root of the true variance and the standard error is the square root of the estimated variance (and thus is dependent on the sample size). 2. Discuss the pros and conts of alternative ways to present the results for a t-test: a) parameter estimate and *** for significant parameter estimate (at α=1%) ** for significant parameter estimate (at α=5%) * for significant parameter estimate (at α=10%) b) parameter estimate and p-value c) parameter estimate and t-statistic d) parameter estimate and parameter standard error e) your preferred choice a. Pro: Choice for all conventional significance levels and easy to recognize; Con: not as much information as with p-value (densed information) b. Pro: p-value with information for any significance level; Con: not as fast/recognizable as stars; for a specific null hypothesis c. Pro: t-stat is basis for p-values; Con: no clear choice possible based on the information given (table); for a specific null hypothesis d. Pro: Is the basis for p-values and shows variance and thus if any significance level is useful, can be used to compute different null hypothesis; Con: no clear choice possible based on the information given (table, t-stat) e. I would take case b. because it gives the best information very fast. 4. Consider Fama-French’s asset pricing model and its „compatible“ regression representation (see your lecture notes of the first week). Suppose you want to test the restriction that none of these three risk factors play a role in explaining the expected excess return of the asset (that is the parameters in the regression equation are all zero) State the null- and alternative hypothesis in proper statistical terms and construct the Wald statistic for that hypothesis, i.e. define R and r.

59

Applied Econometrics – 14 Questions for Review ej j em + β HM L Rt+1 = β1 Rt+1 2 t+1 + β3 SM Bt+1 + t+1 Hypothesis: H0 : β1 = β2 = β3 = 0; HA : at least one 6=0 bzw. 0 : Rβ = r; HA : Rβ 6= r         1 0 0 β1 0 b1 R = 0 1 0 ; β = β2  ; r = 0 ; b = b2  0 0 1 β3 0 b3

(126)

5. How is the result from multivariate statistics that (z − µ)0 Σ−1 (z − µ) ∼ χ2 (rows(z))withz ∼ M V N (µ, Σ) used to construct the Wald statistic to test linear hypotheses about the parameters? We compute E(r|x) = R ∗ E(b|x) = Rβ; V ar(r|x) = R ∗ V ar(b|X)R0 = Rσ 2 (X 0 X)−1 R0 ; r|x = Rb|x ∼ M V N (Rβ, Rσ 2 (X 0 X)−1 R) Under the null hypothesis: E(Rb|x)=r (=hypothesized value) Var(Rb|x)=Rσ 2 (X 0 X)−1 R0 (=unaffected by hypothesis) Rb|x ∼ M V N (r, σ 2 (X 0 X)−1 R0 ) (=this is where the null hypothesis goes in) Using Fact 4: [Rb − E(Rb|x)]0 [V ar(Rb|x)]−1 [Rb − E(Rb|x)] [Rb − r]0 [Rσ 2 (X 0 X)−1 R]−1 [Rb − r] ∼ χ2 (#r) Sixth set Suppose you have estimated a parameter vector b=(0.55,0.37,1.46,0.01)’ with an estimated variancecovariance matrix.   0.01 0.023 0.0017 0.0005  0.023 0.0025 0.015 0.0097 d  V ar(b|X) = (127) 0.0017 0.015 0.64 0.0006 0.0005 0.0097 0.0006 0.001 a) Compute the 95% confidence interval each parameter bk . b) What does the specific confidence interval computed in a) tell you? c) Why are the bounds of a confidence interval for βk random variables? d) Another estimation yields an estimated bk with the corresponding standard error se(bk ). You conclude from computing the t-statistic tk =

βk −β k se(bk )

that you can reject the null hypothesis H0 : bk =

βk on the α% significance level. Now, you compute the (1-α)% confidence interval. Will βk lie inside or outside the confidence interval? a. Confidence intervals for: b1 : 0.55±1.96*0.1=[0.354;0.746] (bounds as realizations of r.v.) b2 : 0.37±1.96*0.05=[0.272;0.468] b3 : 1.46±1.96*0.8=[-0.108;3.028] b4 : 0.01±1.96*0.032=[-0.0527;0.0727]

60

Applied Econometrics – 14 Questions for Review b. The first two reject the H0 m the other two don’t reject on a 5% significance level for a two-sided test. c. Bounds are a function of X (=r.v.): se(bk ) = s2 [X 0 X]−1 d. Lies outside as we reject the t-stat. 2. Suppose, computing the lower bound of the 95% confidence interval yields bk − tα/2 (n − K) ∗ se(bk ) = −0.01. The upper bound is bk +tα/2 (n−K)se(bk )=0.01. Which of the following statements are correct? • With probability of 5% the true parameter βk lies in the interval -0.01 and 0.01. • The null hypothesis H0 : βk = βk cannot be rejected for values (−0.01 ≤ betak ≤ 0.01) on the 5% significance level. • The null hypothesis H0 : βk = 1 can be rejected on the 5% significance level. • The true parameter βk is with probability 1 − α = 0.95 greater than -0.01 and smaller than 0.01. • The stochastic bounds of the 1 − α confidence interval overlap the true parameter with probability 1 − α. • If the hypothesized parameter value βk falls within the range of the 1 − α confidence interval computed from the estimates bk and se(bk ) then we do not reject H0 : βk = βk at the significance level of 5%. • false • correct • correct • false • correct (if we suppose that the null hypothesis is true) • correct Seventh Set 1. a) Show that if the regression includes a constant: yi = β1 + β2 xi2 + ... + βK xiK + i then the variance of the dependent variable can be written as: N N N 1 X 1 X 1 X 2 (yi − y)2 = (b yi − y)2 + i N N N i=1

i=1

(128)

i=1

61

Applied Econometrics – 14 Questions for Review Hint: y = yb b) Take your result from a) and formulate an expression for the coefficient of determination R2 . c) Suppose, you estimated a regression with an R2 = 0.63. Interpret this value. d) Suppose, you estimate the same model as in c) without a constant. You know that you cannot 2 : R2 = yb0 yb = 0.84 compute a meaningful centered R2 . Therefore, you compute the uncentered Ruc uc y0 y Compare the two goodness of fit measures in c) and d). Would you conclude that the constant can 2 > R2 ? be excluded because Ruc a) 1 X (yi − y)2 N | {z }

(129)

SST

1 X 1 X 1 X = (b yi + ei − y)2 = (b yi − y + ei )2 = (b yi − y)2 + N N N 1 X 1 X 1 X 1 X 2(b yi − y)ei = (b yi − y)2 + (ei )2 − 2b yi e i + − N N N N X X X 1 1 1 1 = (b yi − y)2 + (ei )2 − 2b0 Xei + 2y N N {z } N |N 0by FOC

1 X (ei )2 N X 1 2y ei N X ei | {z }

(131) (132)

0with constant

1 X 1 X (b yi − y)2 + (ei )2 = N N | {z } | {z } SSE

(130)

(133)

SSR

(Is our desired result as y = yb if xi1 = 1) b) R2 =

SSR SSE =1− SST SST

(134)

c) 63% of the variance of the dependent variable can be explained with the estimated variance. 2 can never be compared as they are based on different models. d) R2 and Ruc

2. In a hedonic price model the price of an asset is explained with its characteristics. In the following we assume that housing pricing can be explained by its size sqrft (measured as square feet), the number of bedrooms bdrms and the size of the lot lotsize (also measured as square feet. Therefore, we estimate the following equation with OLS: log(price) = β1 + β2 log(sqrf t) + β3 bdrms + β4 log(lotsize) +  Results of the estimation can be found in the following table: (a) Interpret the estimates b2 und b3 . (b) Compute the missing values for Std. Error and t-Statistic in the table and comment on the statistical significance of the estimated coefficients (H0 : βj = 0 vs. H1 : βj 6= 0, j=0,1,2,3). (c) Test the null hypothesis H0 : β1 = 1vs.H1 : β1 6= 1. (d) Compute the p-value for the estimate b2 and interpret the result.

62

Applied Econometrics – 14 Questions for Review

(e) What is the null hypothesis of that specific F- statistic missing in the table? How does it relate to the R2 ? Compute the missing value of the statistic and interpret the result. (f) Interpret the value of R-squared. (g) An alternative specification of the model that excludes the lot size as an explanatory variable provides you with values for the Akaike information criterion of -0.313 and a Schwartz criterion of -0.229. Which specification would you prefer? a) Everything else constant, an increase of 1% in size leads to a .7% increase in price. Everything else constant, a one unit increase in bedrooms leads to a 3.696% increase in price. b) Starting point: Coefficient/Std.Err=T-stat Se2 =0.0929 T − stat3 =1.3425 The constant, sqrft and lotsize are significant on a 5% significant level; bdrms is not significant c)

−1.29704−1 0.65128

= −3.52696 As the critical value is 1.98 we reject on a 5% significance level.

d) bdrms=0.70023; t-stat: 7.54031 => almost 0 in table for N(0,1) e) H0 : β2 = β3 = β4 = 0 bzw.       β1 0 1 0 0 0 β2  0 0 1 0 ∗   = 0 β3  0 0 0 1 0 β4 F =

R2 /k 0.64297/4 0.1607 = = = 38.25 2 (1 − R )/(n − K) (1 − 0.64297)/84 0.0042003

(135)

(136)

So we reject for any conventional significance level P Alternative: get SSRR = (yi − y)2 =n*sample variance of yi f) 64% of the variance of the dependent variable can be explained with the estimated variance. g) -0.313 vs. -0.4968 for Akaike and -0.229 vs. -0.3842 for Schwartz. We prefer the first model (with lot size) as both AIC and SIC are smaller.

63

Applied Econometrics – 14 Questions for Review 3. What can you do to narrow the parameter confidence bounds? Discuss the possibilities: - Increase α - Increase n Can you explain how the effect of an increase of the sample size on the confidence bounds works? - Increase in α: not good, as type 1 error increases - Increases n: standard error descreases→bounds decrease 2 and when would you use the centered R2 . Why is the 4. When would you use the uncentered Ruc 2 higher than a centered R2 ? What is the range of the R2 and R2 ? uncentered Ruc uc 2 : without a constant→higher (as the level is explained too) Ruc Rc2 : with constant→lower (explaines only the variance around the level) Range for both is from 0 to 1.

5. How do you interpret a R2 of 0.38? 38% of the variance of the dependent variable can be explained with the estimated variance. 2 ? What is the idea behind the adjustment of the R2 ? Which 6. Why would you use an adjusted Radj adj 2 values can the Radj take? 2 : any value between −∞ and 1. Has a penalty term for the use of degrees of freedom (heavy Radj parameterization) (Occams Razor)

7. What is the intuition behind the computation of AIC and SBC? Find the smallest SSR, but adds a penalty term for the use of degrees of freedom (=heavy parameterization) Eighth set 1. A researcher conducts an OLS regression with K = 4 with a computer software that is unfortunately not able to report p-values. Besides the four coefficients and their standard errors the program only reports the t-statistics that test the null hypothesis H0 : βk = 0 against HA : βk 6= 0 for k=1,2,3,4. Interpret the t-statistics below and compute the associated p-values. (Interpret the p-values for a reader who works with a significance level α=5%.) a)t1=-1.99 b)t2=0.99 c)t3=-3.22 d)t4=2.3 T-stat shows where we are on the x-axis of the cdf. Assumption: n-k>30. Then we reject for β1 , β3 , β4 using the critical value ±1.96. P-values: a) T-stat in standard normal distribution is 0.9767. 2*(1-0.9767)=0.0466 →reject b) T-stat in standard normal distribution is 0.8398. 2*(1-0.8398)=0.3224 →not reject c) T-stat in standard normal distribution is close to one, so p-value close to zero →reject d) T-stat in standard normal distribution is 0.9893. 2*(1-0.9893)=0.0214 →reject

64

Applied Econometrics – 14 Questions for Review 2. Explain your own in words: What is convergence in probability? What is convergence almost surely? Which concept is stronger? Convergence in probability: The probability that the n-th element of a row of n numbers lies within the bounds around a certain number (e.g. mean) is equal to 1 as n goes into infinity. Almost sure convergence: The probability that the n-th element of a row of n numbers is equal to a certain number (e.g. mean) is equal to 1 as n goes into infinity. →

→ →a.s is stronger as it requires Zn to be equal to α whereas p only requires Zn to lie in bounds around α.

3. Illustrate graphically the concept of convergence in probability. Illustrate graphically a random sequence that does not converge in probability. See Vorlesungsunterlagen 4. Review the notion of non-stochastic convergence. In which way do we refer to the notion of nonstochastic convergence when we consider almost sure convergence, convergence in probability and convergence in mean square? What are the non-stochastic sequences in each of the three modes of stochastic convergence? In non-stochastic convergence we review what happens to a series when n goes into infinity (there is no randomness then in the elements of the series). →→ → In a.s p , und m.s we have n going into infinity as the non-stochastic component. As the stochastic component we have to consider the different worlds of infinite draws. Examples for non-stochastic series: → A series of probabilities p 1 → Realizations of Xa.s.0 5. Explain in your own words: What does convergence in mean square mean? Does convergence in mean square imply convergence in probability? Or does convergence in probability imply convergence in mean square? Convergence in mean square: The expectation value of the mean error is equal to 0 as n grows into infinity (also: the variance). Convergence in mean square implies convergence in probability. 6. Illustrate graphically the concept of convergence in distribution. What does convergence in distribution mean? Think of an example and provide a graphical illustration where the c.d.f. of the sequence of random variables does not converge in distribution. Convergence in distribution: The series Zn converges to Z, if the distribution function Fn of Zn converges to the cdf of Z at each point of FZ . With this mode of convergence, we increasingly expect to see the next outcome in a sequence of random experiments becoming better and better be modeled by a given probability distribution. Example where convergence in distribution doesn’t work: 1. We draw random variables with a probability of 0.5 from a normal distribution with either mean 0

65

Applied Econometrics – 14 Questions for Review or with mean 1. 2. The distribution of bn − β, as it doesn’t have a distribution (only a point) 3. We draw from a distribution where the variance doesn’t exist. Computing the means doesn’t have a limit distribution. 7. I argue that extending convergence almost surely/in probability/mean square to vector sequences (or matrices) does not increase complexeity as the basic concept remains the same. However, extending convergence in distribution of a random sequence to a random vector sequence entails an increase of complexity. Why? →

→ → For a.s, p and m.s extending the concepts to vectors only require elementwise-convergence. →

For d all the k elements of the vectors have to be lower or equal to a constant at the same time for every of the n elements of the series Zn (joint distribution). 8. Convergence in distribution implies the existence of a limit distribution. What do we mean by that? {Zn } has a distribution Fn . The limit distribution F of Zn is the distribution for which we assume that Fn converges to at every point (F and Fn belong to the same class of distributions and we just adjust the parameters) 9. Suppose that the random sequence zn converges in distribution to z, where z is a χ2 (2) random variable. Write this formally using two alternative notations. →

Zn d Z ∼ χ2 (2) → Zn d χ2 (2) 10. What assumptions have to be fullfilled that you can apply Khinchin’s WLLN? {Zi } sequence of iid random variables E(Zi ) = µ < ∞ 11. I argue that extending the WLLN to a vector random sequences does not add complexeity to the case of a scalar sequence, it just saves space because of notational convencience. Why? for a vector we have to use elementwise convergence (of the means towards the expectation values). We use the same argument as for convergence in probability (we just have to define the means and expectations for the rows for elementwise convergence). Ninth set 1. What does the Lindeberg-Levy (LL) central limit theorem state? What assumptions have to be fullfilled that you can apply the LL CLT? → √ Formula: n(z n − µ) d N (0, σ 2 ). The difference between the mean of a series and its expected value follows a normal distribution and 2 thus the mean of the series follows also a normal distribution (z n ∼a N (µ, σn )). Assumptions:

66

Applied Econometrics – 14 Questions for Review • {Zi } iid • E(Zi ) = µ < ∞ • V ar(Zi ) = σ 2 < ∞ • WLLN can be applied 2. Name the concept that is associated to the following short hand notations and explain their meaning: → a. zn d N (0, 1): Convergence in distribution b. plimn→∞ zn = α: Convergence in probability → c. zn m.s. α: Convergence in mean square → d. zn d z: Convergence in distribution → √ e. n(z n − µ) d M V N (0, Σ): Lindenberg-Levy CLT (Multivariate CLT) → f. yn p α: Convergence in probability 2 g. z n ∼a N (µ, σn ): CLZ (follows from the univariate CLT) → h. zn a.s. α: Convergence almost surely/almost sure convergence →

i. zn2 d χ2 (1): Convergence in distribution/Lemma 2 3. Apply the üseful lemmasöf the lecture. State the name of the respective lemma(s) that you use. Whenever matrix multiplications or summations are involved, assume that the respective operations are possible. The underscore means that we deal with vector or matrix sequences. µ and A indicate vectors or matrices of real numbers. Im is the identity matrix of dimension m: → a. z n p α, then: plimexp(z n ) =? →



b. zn d z ∼ N (0, 1), then: zn2 d ? → → c. z n d z ∼ M V N (µ, Σ), An p A, then: An (z n − µ) ∼a ? →







d. xn d M V N (µ, Σ), y n p α, An p A, then: An xn + y n d ? →



e. xn d x, y n is Op, then: xn + y n d ? → → f. xn d x ∼ M V N (µ, Σ), y n p 0, then: plimxn ∗ y n =? →





g. xn d M V N (0, Im ), An p A, then: z n = An xn d ? and then: z 0n (An A0n )−1 z n ∼a ? a. Using Lemma 1: plimexp(z n ) = exp(α) (if exp does not depend on n) →

b. Using Lemma 2: z 2n d [N (0, 1)]2 = χ2 (1) → c. Using Lemma 2: z n − µ d M V N (0, Σ) and then using Lemma 5: An (z n − µ) ∼a M V N (0, AΣA0 ) →



d. Using Lemma 5: An ∗ xn d A ∗ x = M V N (µ, AΣA0 ) and then using Lemma 3: An ∗ xn + y n d A ∗ x + α = M V N (µ + α, AΣA0 ) →

e. Using Lemma 3: xn + y n d x + 0 → f. Using Lemma 4: plimxn ∗ y p 0 n

67

Applied Econometrics – 14 Questions for Review →

g. Using Lemma 5: z n d A ∗ x = M V N (0, AA0 ) and then using Lemma 6: z 0n (An A0n )−1 z n ∼a Fact 4 (Ax)0 (AA0 )−1 (Ax) → χ2 (rows(z)) | {z } | {z } | {z } z0

V ar(z)

z

4. When large sample theory is used to derive the properties of the OLS estimator the set of assumptions for the finite sample properties of the OLS estimator are altered. Which assumptions are retained? Which are replaced? Two are retained: (1.1)→(2.1) Linearity (1.3)→(2.4) Rank condition Four are replaced: iid→(2.2) Dependencies allowed (1.2)→(2.3) Replacing strict exogeneity by orthogonality: E(xik ∗ i ) = 0 (1.4)→deleted (1.5)→deleted 5. What are the properties of bn = (X 0 X)−1 X 0 y under the "newässumptions? → bn p β (consistency) b ∼a M V N (β, Avar(b) ) (approximate normal distribution) n →Unbiasedness & efficiency lost 6. What does CAN mean? CAN = Consistent Asymptotically Normal 7. Does consistency of bn need an iid sample of dependent variable and regressors? iid is a special case of a martingale difference sequence. Necessary is only a stationary & ergodic m.d.s., which means an identical but not an independent sample. 8. Where does a WLLN and where does a CLT come into play when deriving the properties of b = (X 0 X)−1 X 0 y? P Deriving consitency: We use Lemma 1 twice. But to apply Lemma 1 on [ n1 xi x0i ], we have to assure → P xi x0i p E(xi x0i ) which we do by applying WLLN. that n1 → P √ Deriving distribution: We define n1 xi i = g and apply CLT to get n(g−E(gi )) d M V N (0, E(gi gi0 )). This is then used together with Lemma 5.

Tenth Set 1. At which stage of the derivation of the consistency property of the OLS estimator do we have to invoke a WLLN? 2. What does it mean when an estimator has the CAN property? 3. At which stage of the derivation of the asymptotic normality of the OLS estimator do we have to invoke a WLLN and when a CLT? 1-3: see Ninth set

68

Applied Econometrics – 14 Questions for Review 4. Which of the useful lemmas 1-6 is used at which stage of a) consistency proof and b) asymptotic normality proof? → P a) We use Lemma 1 to show that together with WLLN: [ n1 xi x0i ]−1 p [E(xi x0i )]−1 √ √ 1X b) We know that n(b − β) = [ xi x0i ]−1 ng . For: |n {z } |{z} xn →

An p A = →

An

Σ−1 xx

and

xn d x ∼ M V N (0, E(gi gi0 )) √ 0 −1 So we can apply Lemma 5: n(b − β) ∼ M V N (0, Σ−1 xx E(gi gi )Σxx ) 5. Explain the difference of the assumptions regarding the variances of the disturbances in the finite sample context and using asymptotic reasoning. Finite sample variance: E(2i |X) = σ 2 (assumption 1.4) Large sample variance: E(2i |xi ) = σ 2 → We only condition on xi and not on all x’s at the same time 6. There is a special case when the finite sample variances of the OLS estimator based on finite sample assumptions and based on large sample theory assumptions (almost) coincide. When does this happen? When we have only one observation i. 7. What would you counter an argument of someone who says that working with the variance covariance estimate s2 (X 0 X)−1 is quite OK as it is mainly consistency of the parameter estimates that counts? b 0 X)−1 Using s2 (X 0 X)−1 we would have to assume conditional homoscedasticity and using (X 0 X)−1 S(X would allow us to get rid of that assumption. 8. Consider the following assumptions: (a) linearity (b) rank condition: KxK matrix E(xi x0i ) = Σxx is nonsingular (c) predetermined regressors: E(gi ) = 0 where gi = xi i (d) gi is a martingale difference sequence with finite second moments i) Show that under those assumptions, the OLS estimator is approximately normally distributed: √



2 0 −1 n(b − β) d N (0, Σ−1 xx E(i xi xi )Σxx )

(137)

Starting point: sampling error √ √ 1X We have n(b − β) = [ xi x0i ]−1 ng . For: |n {z } |{z} xn →

An

An p A = Σ−1 xx (only possible if Σxx is nonsingular) → → √ xn d x ∼ M V N (0, E(gi gi0 )) (as n(g − E(gi )) d M V N (0, E(gi gi0 )) by CLT)

69

Applied Econometrics – 14 Questions for Review So we can apply Lemma 5:



n(b − β) ∼ M V N (0, Σ−1 E(g g 0 )Σ−1 ) | xx {zi i xx} Avar(b)

→ bn

∼a

M V N (β,

Avar(b) ) n

ii) Further, show that assumption 4 implies that the i are serially uncorrelated or E(i ∗ i−j ) = 0. E(gi |gi−1 , ..., g1 ). As xi1 = 1 we focus on the first row: E(i |i−1 , ..., 1 , i−1 xi−1,2 , ..., 1 x1K ) = 0 by LIE: Ez|x [E(y|x, z)|x] = E(y|x) with x = i−1 , ..., 1 ; y = i and z = i−1 xi−1,2 , ..., 1 x1K : E(i |i−1 , ..., 1 ) = 0 by LTE: E(i ) = 0 Second step in exercise 19 9. Show that the test statistic (bk − βk ) tk = q

(138)

d k) Avar(b n

converges in distribution to a standard normal distribution. Note, that bk is the k-th element of b → √ and Avar(bk ) is the (k,k) element of the KxK matrix Avar(b). Use the facts, that n(bk − βk ) d d → p Avar(b). Why is the latter true? Hint: use continuous mapping and N (0, Avar(bk )) and Avar(b) the Slutzky theorem (the üseful lemmas")! bk −β k r

=

d ) Avar(b k n

√ n(b −β k ) q k d k) Avar(b

∼a N (0, 1)

We know: → √ 1. n(bk − β k ) d N (0, Avar(bk )) by CLT and from joint distribution of the estimators. → 1 1 d k) p √ 2. q by Lemma 1 and from the derivation of V ar(b d k) Avar(b

Avar(bk )

⇒ Lemma 5: → √ tk = qn(bk −β k ) d N (E(a ∗ x), V ar(a ∗ x)) = N (0, a2 V ar(x)) = N (0, √Avar(bk ) 2 ) = N (0, 1) Avar(bk )

d k) Avar(b

10. Show that the Wald statistic: d Avar(b) R0 ]−1 (Rb − r) (139) n converges in distribution to a Chi-square with degrees of freedom equal to the number of restrictions. As a hint, rewrite the equation above as W = c0n Q−1 n cn . Use Hayashi’s lemma 2.4(d) and the footnote on page 41. W = (Rb − r)0 [R

W = [Rb − r]0 [R Avar(b) R0 ]−1 [Rb − r] n Under the null: Rb-r=Rb-Rβ=R(b-β) → √ cn = R n(bn − β) d c ∼ M V N (0, R ∗ Avar(b) ∗ R0 ) d

(Lemma 2)

70

Applied Econometrics – 14 Questions for Review d ∗ R0 → p Q = R ∗ Avar(b) ∗ R0 = V ar(c) Qn = R ∗ Avar(b) →

−1 0 c ∼ χ2 ( #r ) ⇒ Wn = c0n Q−1 n cn d c Q |{z} |{z} V ar(c)

(derivation of V d ar(b) and Lemma 1)

(Lemma 6 + Fact 4)

rows(c)

11. What is an ensemble mean? An ensemble is consisting of a large number of mental copies. The ensemble mean is obtained by averaging all ensemble forecasts. This has the effect of filtering out features of the forecast that are less predictable. 12. Why do we need the stationarity and ergodicity in the first place? Especially for time series we want to get rid of the iid assumption, as we observe dependencies. Using s&e allows dependencies, but we still need to draw from the same distribution and the dependencies should be less between draws that are far apart from each other. But: Mean and variance do not change over time→We need this for correct estimates, tests. 13. Explain the concepts of weak stationarity and strong stationarity. Strong stationarity: the joint distribution of (zi , zir ) is the same as the joint distribution of (zj , zjr ) if i-ir=j-jr →relative position Weak stationarity: only the first two moments are constant (not the whole distribution) and the cov(zi , zi−j ) only depends on j. 15. When is a stationary process ergodic? If the dependence decreases with distance so that: limn→∞ E(f (zi , zi+1 , ..., zi+k ) ∗ g(zi+n , zi+n+1 , ..., zi+n+l )) = E(f (zi , zi+1 , ..., zi+k )) ∗ E(g(zi+n , zi+n+1 , ..., zi+n+l )) 16. Which assumptions have to be fulfilled to apply Kinchin’s WLLN? Which of these assumptions are weakened by the ergodic theorem? Which assumption is used instead? Assumptions: {Zi } iid→instead: stationarity&ergodicity E(Zi ) = µ < ∞ 17. Which assumptions have to be ful lled to apply the Lindeberg-Levy CLT? Is stationarity and ergodicity sufficient to apply a CLT? What property of the sequence {gi }0{i xi } do we assume to apply a CLT? Assumptions: {Zi } iid→instead: stationary&ergodic m.d.s. E(Zi ) = µ < ∞ V ar(Zi ) = σ 2 < ∞ WLLN 18. How do you call a stochastic process for which E(gi |gi−1 , gi−2 , ...) = 0?

71

Applied Econometrics – 14 Questions for Review M.D.S 19. Show that if a constant is included in the model it follows from E(gi |gi−1 , gi−2 , ...) = 0, that cov(i , i−j ) = 0∀j 6= 0 Hint: Use the law of iterated expectations. From exercise 8: E(i |i−1 , ...) = E(i ) = 0 Cov(i , i−j ) = E(i ∗ i−j ) − E(i ) ∗E(i−j ) | {z } | {z } ?

=0

→ E(i ∗ i−j ) = 0 E[E(i ∗ i−j )|i−1 , ..., i−j , ..., 1 ] by LTE backwards E[i−j ∗ E(i |i−1 , ..., i−j , ..., 1 )] = E(0) = 0 by Linearity of Conditional Expectations {z } | 0(s.o.)

→ Cov(i , i−j ) = 0 20. When using the heteroskedasticity-consistent covariance matrix, 2 0 −1 Avar(b) = Σ−1 xx E(i xi xi )Σxx

(140)

which assumption regarding the covariances of the disturbances i and i−j do we make? Cov(i , i−j ) = 0 Eleventh Set 1. When and why would you use GLS? Describe the limiting nature of the GLS approach. GLS is used to estimate unknown parameters in a linear regression model. It is used when heteroscedasticity is present and when observations are correlated in the error term (cf. WLS). Assumptions: • Linearity yi = x0i β + i • Full rank • Strict exogeneity • V ar(|X) = sigma2 V (X). V(X) has to be known, symmmetric and positive definite. Limits: V(X) normally not known, and thus we even have to estimate more. If V ar(|x) is estimated the BLUE property is lost and βbGLS might even be worse than OLS. If X not strictly exogeneous, GLS might be inconsistent Large sample properties are more difficult to obtain. 2. A special case of the GLS approach is weighted least square (WLS). What difficulties could arise in a WLS estimation? How are the weights constructed? Observations are wighted by standard deviations (higher penalty for wrong estimates). Used when all the off-diagonal entries of Σ (Var.-Cov.-Matrix) are zero. Here: V ar(|X) = σ 2 V (xi ) → V (xi ) = Zi0 α (V (xi ) typically unknown)

72

Applied Econometrics – 14 Questions for Review →So again, the variance has to be estimated and thus our estimates might be inconsistent and even less efficient than OLS. 3. When does exact multicollinearity occur? What happens to the OLS estimator in this case? Occurs when one regressor is a linear combination of other regressors: rank(X)6=K Then: Assumption 1.3/2.4 violated and (X 0 X)−1 and thus the OLS estimator cannot be computed. We can still estimate the linear combination of the parameters, though. 4. How is the OLS estimator and its standard error affected by (not exact) multicollinearity? • BLUE result is not affected • Var(b|X) is affected: Coefficients have high standard errors (effect: wide confidence intervals and low t-stats and thus difficulties with rejection) • Estimates may have the wrong sign • Small changes in the data produce wide swings in the parameters 5. Which steps can be taken to overcome the problem of multicollinearity? • Increase n (and thus the variance of the x’s) • Get smaller σ 2 (´=better fitting model) • Get smaller correlation (=exclude some regressors, but: omitted variable bias) Other ways: • Don’t do anything (OLS is still BLUE) • Joint hypothesis testing • Use empirical data • Combine the multicollinear variables Twelth Set 1. When does an omitted regressor not cause biased estimates? With an omitted regressor we get: b1 = β1 + (X10 X1 )−1 X10 X2 β2 + (X10 X1 )−1 X10  | {z } | {z } 0?

0with strict exogeneity

→ β2 = 0 if X2 is not part of the model in the first place → (X10 X1 )−1 X10 X2 = 0 if regression of X2 on X1 gives you zero coefficients 2. Explain the term endogeneity bias. Give a practical economic example when an endogeniety bias occurs.

73

Applied Econometrics – 14 Questions for Review Endogeneity = no strict exogeneity bzw. no predetermined regressots: X’s correlated with the error term →estimator is biased. Example: Haavelmo: Ci = α0 + α1 Yi + ui (Ii part of ui and correlated with Yi ) 3. What is solution to the endogeneity problem in a linear regression framework? IV-variables: include variables from ui that are correlated with xi 4. Use the same tools used to derive the CAN property of the OLS property to derive the CAN property of the IV-estimator. Start with the sampling error and make your assumptions on the way. (applicability of WLLN, CLT...) Compute the sampling error: (X 0 Z)−1 X 0 y − δ (X 0 Z)−1 X 0 (Zδ + ) − δ (X 0 Z)−1 X 0 Zδ + (X 0 Z)−1 X 0  − δ P P δ + (X 0 Z)−1 X 0  − δ = (X 0 Z)−1 X 0  = [ n1 xi zi0 ]−1 n1 xi i → P P Consistency: δb = [ n1 xi zi0 ]−1 n1 x i yi p δ → P 1. n1 xi zi0 p E(xi zi0 ) and by Lemma 1: → P [ n1 xi zi0 ]−1 p [E(xi zi0 )]−1 → P 2. n1 xi i p E(xi i ) = 0 → P P xi zi0 ]−1 n1 xi i p E(xi zi )−1 E(xi ∗ i ) = 0 ⇒ δb − δ = [ n1 | {z } 0

→ √ b Asymptotic normality: n(δb − δ) d M V N (0, Avar(δ)) X √ √ √ b 1X 1 n*sampling error: n(δ − δ) = [ xi zi0 ]−1 n xi i |n {z } | n {z }

1. An = 2. xn =

[ n1



xi zi0 ]−1 →

P

xn

An



p A = E(xi zi0 )−1

ng d x ∼ M V N (0, E(gi gi0 )) (apply CLT as E(gi ) = 0) | {z }

⇒Lemma 5:





S.33

n(δb − δ) d M V N (0, E(xi zi0 )−1 E(gi gi0 )E(xi zi0 )−1 ) | {z } b Avar(δ)

5. When would you use an IV estimator instead of an OLS estimator? (Hint: Which assumption of the OLS estimation is violated and what is the consequence.) When xi and i correlated (violated: Assumption 2.3) 6. Describe the basic idea of instrumental variables estimation. How are the unknown parameters related to the data generating process? IV can be used to obtain consistent estimators in presence of omitted variables. xi and zi correlated; xi and ui not correlated → xi = instrument

74

Applied Econometrics – 14 Questions for Review 7. Which assumptions are necessary to derive the instrumental variables estimator and the CAN property of the IV estimator? Assumptions: 3.1 Linearity 3.2 Ergodic stationarity 3.3 Orthogonality conditions 3.4 Rank condition for identification (full rank) 8. Where do the assumptions enter the derivation of the instrumental variables estimator and the CAN property of the IV estimator? Derivation: E(xi ∗ i ) = E(xi (yi − Zi δ)) |{z} = 0 | {z } 3.1

3.3

E(xi ∗ yi ) − E(xi zi0 δ) = 0 δ = [E(xi zi0 )]−1 E(xi yi ) {z } | → p δ δb |{z}

3.4

3.2

9. Show that the OLS estimator can be conceived as a special case of the IV estimator. Relation: E(i ∗ zP xi = zi . Then: i ) = 0 and P δb = [ n1 zi zi0 ]−1 n1 zi yi0 = OLS estimator 10. What are possible sources endogeneity? see additional page

75

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.