Robust inference in two-phase sampling with application to unit [PDF]

Jun 21, 2011 - 5. Simulation study. 6. Concluding remarks. David Haziza and Jean-François Beaumont (). Robust inference

5 downloads 2 Views 1MB Size

Recommend Stories


Dimensionless Robust Control With Application to Vehicles
Life is not meant to be easy, my child; but take courage: it can be delightful. George Bernard Shaw

Recognising Textual Entailment with Robust Logical Inference
Don't be satisfied with stories, how things have gone with others. Unfold your own myth. Rumi

Robust inference in nonlinear models with mixed identification strength
Sorrow prepares you for joy. It violently sweeps everything out of your house, so that new joy can find

Inference in Successive Sampling Discovery Models
Every block of stone has a statue inside it and it is the task of the sculptor to discover it. Mich

FREQUENCY ANALYSIS USING NON-UNIFORM SAMPLING WITH APPLICATION TO ACTIVE
At the end of your life, you will never regret not having passed one more test, not winning one more

Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors
And you? When will you begin that long journey into yourself? Rumi

Robust Bayesian inference under model misspecification
If you want to go quickly, go alone. If you want to go far, go together. African proverb

Robust Entity Clustering via Phylogenetic Inference
Seek knowledge from cradle to the grave. Prophet Muhammad (Peace be upon him)

Variational Inference based on Robust Divergences
Never wish them pain. That's not who you are. If they caused you pain, they must have pain inside. Wish

Alkarylbiguanides with Robust In Vivo
If you want to become full, let yourself be empty. Lao Tzu

Idea Transcript


Robust inference in two-phase sampling with application to unit nonresponse David Haziza and Jean-Fran¸cois Beaumont Universit´ e de Montr´ eal and Statistics Canada

International Total Survey Error Workshops 2011

Quebec, Canada

June 21, 2011

Outline of the presentation

1. Introduction 2. Measuring the influence: the conditional bias 3. Robust estimators 4. Application to unit nonresponse 5. Simulation study 6. Concluding remarks

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

2 / 22

Influential units

Unusual observations with possibly large design weights

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

3 / 22

Influential units

Unusual observations with possibly large design weights Many survey statistics are sensitive to the presence of influential units

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

3 / 22

Influential units

Unusual observations with possibly large design weights Many survey statistics are sensitive to the presence of influential units Including or excluding an influential unit in the calculation of these statistics can have a dramatic impact on their magnitude.

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

3 / 22

Influential units

Unusual observations with possibly large design weights Many survey statistics are sensitive to the presence of influential units Including or excluding an influential unit in the calculation of these statistics can have a dramatic impact on their magnitude. The occurrence of outliers is common in business surveys because the distributions of variables (e.g., revenue, sales, etc.) are highly skewed (heavy right tail)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

3 / 22

Influential units

Unusual observations with possibly large design weights Many survey statistics are sensitive to the presence of influential units Including or excluding an influential unit in the calculation of these statistics can have a dramatic impact on their magnitude. The occurrence of outliers is common in business surveys because the distributions of variables (e.g., revenue, sales, etc.) are highly skewed (heavy right tail) Influential units are legitimate observations

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

3 / 22

Influential units

Unusual observations with possibly large design weights Many survey statistics are sensitive to the presence of influential units Including or excluding an influential unit in the calculation of these statistics can have a dramatic impact on their magnitude. The occurrence of outliers is common in business surveys because the distributions of variables (e.g., revenue, sales, etc.) are highly skewed (heavy right tail) Influential units are legitimate observations The impact of influential units can be minimized by using a good sampling design: for example, stratified sampling with a take-all stratum

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

3 / 22

Influential units

Unusual observations with possibly large design weights Many survey statistics are sensitive to the presence of influential units Including or excluding an influential unit in the calculation of these statistics can have a dramatic impact on their magnitude. The occurrence of outliers is common in business surveys because the distributions of variables (e.g., revenue, sales, etc.) are highly skewed (heavy right tail) Influential units are legitimate observations The impact of influential units can be minimized by using a good sampling design: for example, stratified sampling with a take-all stratum

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

3 / 22

Influential units

Even with a good sampling design, influential units may still be selected in the sample (e.g., stratum jumpers)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

4 / 22

Influential units

Even with a good sampling design, influential units may still be selected in the sample (e.g., stratum jumpers) In the presence of influential units, survey statistics are (approximately) unbiased but they can have a very large variance.

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

4 / 22

Influential units

Even with a good sampling design, influential units may still be selected in the sample (e.g., stratum jumpers) In the presence of influential units, survey statistics are (approximately) unbiased but they can have a very large variance. Reducing the influence of large values produces stable but biased estimators

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

4 / 22

Influential units

Even with a good sampling design, influential units may still be selected in the sample (e.g., stratum jumpers) In the presence of influential units, survey statistics are (approximately) unbiased but they can have a very large variance. Reducing the influence of large values produces stable but biased estimators Treatment of influential units: trade-off between bias and variance

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

4 / 22

Influential units

Even with a good sampling design, influential units may still be selected in the sample (e.g., stratum jumpers) In the presence of influential units, survey statistics are (approximately) unbiased but they can have a very large variance. Reducing the influence of large values produces stable but biased estimators Treatment of influential units: trade-off between bias and variance

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

4 / 22

Two-phase designs

U: finite population of size N

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

5 / 22

Two-phase designs

U: finite population of size N s1 : first-phase sample, of size n1

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

5 / 22

Two-phase designs

U: finite population of size N s1 : first-phase sample, of size n1 s2 : second-phase sample, of size n2 , selected from s1

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

5 / 22

Two-phase designs

U: finite population of size N s1 : first-phase sample, of size n1 s2 : second-phase sample, of size n2 , selected from s1 I1i : first-phase sample selection indicator for unit i

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

5 / 22

Two-phase designs

U: finite population of size N s1 : first-phase sample, of size n1 s2 : second-phase sample, of size n2 , selected from s1 I1i : first-phase sample selection indicator for unit i I2i : second-phase sample selection indicator for unit i

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

5 / 22

Two-phase designs

U: finite population of size N s1 : first-phase sample, of size n1 s2 : second-phase sample, of size n2 , selected from s1 I1i : first-phase sample selection indicator for unit i I2i : second-phase sample selection indicator for unit i Vectors of indicators: I1 = (I11 , · · · , I1N )0 and I2 = (I21 , · · · , I2N )0 First-phase inclusion probability for unit i: π1i = P(I1i = 1)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

5 / 22

Two-phase designs

U: finite population of size N s1 : first-phase sample, of size n1 s2 : second-phase sample, of size n2 , selected from s1 I1i : first-phase sample selection indicator for unit i I2i : second-phase sample selection indicator for unit i Vectors of indicators: I1 = (I11 , · · · , I1N )0 and I2 = (I21 , · · · , I2N )0 First-phase inclusion probability for unit i: π1i = P(I1i = 1) Second-phase inclusion probability for unit i: π2i (I1 ) = P(I2i = 1|I1 ; I1i = 1)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

5 / 22

Two-phase designs

U: finite population of size N s1 : first-phase sample, of size n1 s2 : second-phase sample, of size n2 , selected from s1 I1i : first-phase sample selection indicator for unit i I2i : second-phase sample selection indicator for unit i Vectors of indicators: I1 = (I11 , · · · , I1N )0 and I2 = (I21 , · · · , I2N )0 First-phase inclusion probability for unit i: π1i = P(I1i = 1) Second-phase inclusion probability for unit i: π2i (I1 ) = P(I2i = 1|I1 ; I1i = 1)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

5 / 22

Two-phase sampling

U(N) I1i  0, I 2i  0

I1i  1, I 2i  1 s2 ( n2 )

I1i  1, I 2i  0

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

s1 ( n1 )

June 21, 2011

6 / 22

Invariance A two-phase sampling design possesses the invariance property if P(I2 |I1 ) = P(I2 )

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

7 / 22

Invariance A two-phase sampling design possesses the invariance property if P(I2 |I1 ) = P(I2 ) Invariance ⇒ π2i (I1 ) = π2i Example of invariance: simple random sampling without replacement in both phases and both n1 and n2 are fixed prior to sampling

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

7 / 22

Invariance A two-phase sampling design possesses the invariance property if P(I2 |I1 ) = P(I2 ) Invariance ⇒ π2i (I1 ) = π2i Example of invariance: simple random sampling without replacement in both phases and both n1 and n2 are fixed prior to sampling Example of non-invariance: simple random sampling without replacement in the first phase

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

7 / 22

Invariance A two-phase sampling design possesses the invariance property if P(I2 |I1 ) = P(I2 ) Invariance ⇒ π2i (I1 ) = π2i Example of invariance: simple random sampling without replacement in both phases and both n1 and n2 are fixed prior to sampling Example of non-invariance: simple random sampling without replacement in the first phase proportional-to-size sampling in the second phase. That is, π2i (I1 ) = n2 P

xi

i∈s1

xi

,

where x is a size variable available for all i ∈ s1

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

7 / 22

Invariance A two-phase sampling design possesses the invariance property if P(I2 |I1 ) = P(I2 ) Invariance ⇒ π2i (I1 ) = π2i Example of invariance: simple random sampling without replacement in both phases and both n1 and n2 are fixed prior to sampling Example of non-invariance: simple random sampling without replacement in the first phase proportional-to-size sampling in the second phase. That is, π2i (I1 ) = n2 P

xi

i∈s1

xi

,

where x is a size variable available for all i ∈ s1

In the remaining, we assume that the two-phase design satisfies the invariance property David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

7 / 22

Point estimation Goal: estimate a population total of a variable of interest y , X Y = yi

David Haziza and Jean-Fran¸cois Beaumont ()

i∈U

Robust inference in two-phase sampling

June 21, 2011

8 / 22

Point estimation Goal: estimate a population total of a variable of interest y , X Y = yi i∈U

y -values: available only for i ∈ s2

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

8 / 22

Point estimation Goal: estimate a population total of a variable of interest y , X Y = yi i∈U

y -values: available only for i ∈ s2 Complete data estimator: Double expansion estimator X yi X yi YˆDE = = π1i π2i πi∗

David Haziza and Jean-Fran¸cois Beaumont ()

i∈s2

i∈s2

Robust inference in two-phase sampling

June 21, 2011

8 / 22

Point estimation Goal: estimate a population total of a variable of interest y , X Y = yi i∈U

y -values: available only for i ∈ s2 Complete data estimator: Double expansion estimator X yi X yi YˆDE = = π1i π2i πi∗ i∈s2

i∈s2

YˆDE is design-unbiased for Y ; that is,

David Haziza and Jean-Fran¸cois Beaumont ()

E1 E2 (YˆDE |I1 ) = Y Robust inference in two-phase sampling

June 21, 2011

8 / 22

Total error The total error of YˆDE :   YˆDE − Y = YˆE − Y + YˆDE − YˆE | {z } | {z } first-phase error

(1)

second-phase error

P −1 where YˆE = i∈s1 π1i yi is the estimator one would have used in a single-phase sampling design

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

9 / 22

Total error The total error of YˆDE :   YˆDE − Y = YˆE − Y + YˆDE − YˆE | {z } | {z } first-phase error

(1)

second-phase error

P −1 where YˆE = i∈s1 π1i yi is the estimator one would have used in a single-phase sampling design An influential unit may have an impact on both the first phase and the second phase errors

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

9 / 22

Total error The total error of YˆDE :   YˆDE − Y = YˆE − Y + YˆDE − YˆE | {z } | {z } first-phase error

(1)

second-phase error

P −1 where YˆE = i∈s1 π1i yi is the estimator one would have used in a single-phase sampling design An influential unit may have an impact on both the first phase and the second phase errors How to measure the influence (or impact) of a unit on both errors?

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

9 / 22

Total error The total error of YˆDE :   YˆDE − Y = YˆE − Y + YˆDE − YˆE | {z } | {z } first-phase error

(1)

second-phase error

P −1 where YˆE = i∈s1 π1i yi is the estimator one would have used in a single-phase sampling design An influential unit may have an impact on both the first phase and the second phase errors How to measure the influence (or impact) of a unit on both errors? Single phase sampling: the conditional bias; Moreno-Rebollo, Munoz-Reyez and Munoz-Pichardo (1999), Beaumont, Haziza and Ruiz-Gazen (2011).

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

9 / 22

Total error The total error of YˆDE :   YˆDE − Y = YˆE − Y + YˆDE − YˆE | {z } | {z } first-phase error

(1)

second-phase error

P −1 where YˆE = i∈s1 π1i yi is the estimator one would have used in a single-phase sampling design An influential unit may have an impact on both the first phase and the second phase errors How to measure the influence (or impact) of a unit on both errors? Single phase sampling: the conditional bias; Moreno-Rebollo, Munoz-Reyez and Munoz-Pichardo (1999), Beaumont, Haziza and Ruiz-Gazen (2011). How to construct a robust estimator to the presence of influential units? Single phase designs: Beaumont, Haziza and Ruiz-Gazen (2011). David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

9 / 22

Measuring the influence: the conditional bias We distinguish between three cases: i ∈ s2 : sampled unit i ∈ s1 − s2 : sampled in first phase but not in the second phase i ∈ U − s1 : nonsampled unit

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

10 / 22

Measuring the influence: the conditional bias We distinguish between three cases: i ∈ s2 : sampled unit i ∈ s1 − s2 : sampled in first phase but not in the second phase i ∈ U − s1 : nonsampled unit

We can only reduce the influence of the sampled units (i.e., the units belonging to s2 )

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

10 / 22

Measuring the influence: the conditional bias We distinguish between three cases: i ∈ s2 : sampled unit i ∈ s1 − s2 : sampled in first phase but not in the second phase i ∈ U − s1 : nonsampled unit

We can only reduce the influence of the sampled units (i.e., the units belonging to s2 ) Nothing can be done for the other units at the estimation stage

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

10 / 22

Measuring the influence: the conditional bias We distinguish between three cases: i ∈ s2 : sampled unit i ∈ s1 − s2 : sampled in first phase but not in the second phase i ∈ U − s1 : nonsampled unit

We can only reduce the influence of the sampled units (i.e., the units belonging to s2 ) Nothing can be done for the other units at the estimation stage Influence of sampled unit i ∈ s2 : BiDE (I1i = 1, I2i = 1) = E1 E2 (YˆDE − Y |I1 , I1i = 1, I2i = 1) = E1 (YˆE − Y |I1i = 1) + E1 E2 (YˆDE − YˆE |I1 , I1i = 1, I2i = 1)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

10 / 22

Measuring the influence: the conditional bias We distinguish between three cases: i ∈ s2 : sampled unit i ∈ s1 − s2 : sampled in first phase but not in the second phase i ∈ U − s1 : nonsampled unit

We can only reduce the influence of the sampled units (i.e., the units belonging to s2 ) Nothing can be done for the other units at the estimation stage Influence of sampled unit i ∈ s2 : BiDE (I1i = 1, I2i = 1) = E1 E2 (YˆDE − Y |I1 , I1i = 1, I2i = 1) = E1 (YˆE − Y |I1i = 1) + E1 E2 (YˆDE − YˆE |I1 , I1i = 1, I2i = 1)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

10 / 22

Measuring the influence: the conditional bias Arbitrary two-phase design: BiDE (I1i = 1, I2i = 1) =

 X  π1ij − 1 yj π1i π1j j∈U | {z } Influence of unit i on the first-phase error

+

 X π1ij  π2ij − 1 yj π1i π1j π2i π2j j∈U | {z } Influence of unit i on the second-phase error

=

X j∈U

|

πij∗ πi∗ πj∗ {z

! − 1 yj }

Total influence of unit i

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

11 / 22

Measuring the influence: the conditional bias SRSWOR/SRSWOR: πi∗ =

n1 N

×

BiDE (I1i = 1, I2i = 1) = + =

David Haziza and Jean-Fran¸cois Beaumont ()

n2 n1

=

n2 N

N N ( − 1)(yi − Y¯ ) (N − 1) n1 N N n1 ( − 1)(yi − Y¯ ) (N − 1) n1 n2 N N ( − 1)(yi − Y¯ ) (N − 1) n2

Robust inference in two-phase sampling

June 21, 2011

12 / 22

Measuring the influence: the conditional bias SRSWOR/SRSWOR: πi∗ =

n1 N

×

BiDE (I1i = 1, I2i = 1) = + =

n2 n1

=

n2 N

N N ( − 1)(yi − Y¯ ) (N − 1) n1 N N n1 ( − 1)(yi − Y¯ ) (N − 1) n1 n2 N N ( − 1)(yi − Y¯ ) (N − 1) n2

Poisson sampling/Poisson sampling:     1 1 1 DE Bi (I1i = 1, I2i = 1) = − 1 yi + − 1 yi π1i π1i π2i   1 = − 1 yi πi∗ David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

12 / 22

Measuring the influence: the conditional bias Arbitrary design/Poisson sampling:  X  π1ij  −1 −1 DE Bi (I1i = 1, I2i = 1) = − 1 yj + π1i π2i − 1 yi π1i π1j j∈U

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

13 / 22

Measuring the influence: the conditional bias Arbitrary design/Poisson sampling:  X  π1ij  −1 −1 DE Bi (I1i = 1, I2i = 1) = − 1 yj + π1i π2i − 1 yi π1i π1j j∈U

Conditional bias: unknown ⇒ must be estimated

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

13 / 22

Measuring the influence: the conditional bias Arbitrary design/Poisson sampling:  X  π1ij  −1 −1 DE Bi (I1i = 1, I2i = 1) = − 1 yj + π1i π2i − 1 yi π1i π1j j∈U

Conditional bias: unknown ⇒ must be estimated can be interpreted as a contribution of each unit (sampled or nonsampled) to the total error

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

13 / 22

Measuring the influence: the conditional bias Arbitrary design/Poisson sampling:  X  π1ij  −1 −1 DE Bi (I1i = 1, I2i = 1) = − 1 yj + π1i π2i − 1 yi π1i π1j j∈U

Conditional bias: unknown ⇒ must be estimated can be interpreted as a contribution of each unit (sampled or nonsampled) to the total error take fully account of the sampling design: an unit may be highly influential under a given sampling design but may have little or no influence under another sampling design

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

13 / 22

Measuring the influence: the conditional bias Arbitrary design/Poisson sampling:  X  π1ij  −1 −1 DE Bi (I1i = 1, I2i = 1) = − 1 yj + π1i π2i − 1 yi π1i π1j j∈U

Conditional bias: unknown ⇒ must be estimated can be interpreted as a contribution of each unit (sampled or nonsampled) to the total error take fully account of the sampling design: an unit may be highly influential under a given sampling design but may have little or no influence under another sampling design If πi∗ = 1, then BiDE (I1i = 1, I2i = 1) = 0

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

13 / 22

Measuring the influence: the conditional bias Arbitrary design/Poisson sampling:  X  π1ij  −1 −1 DE Bi (I1i = 1, I2i = 1) = − 1 yj + π1i π2i − 1 yi π1i π1j j∈U

Conditional bias: unknown ⇒ must be estimated can be interpreted as a contribution of each unit (sampled or nonsampled) to the total error take fully account of the sampling design: an unit may be highly influential under a given sampling design but may have little or no influence under another sampling design If πi∗ = 1, then BiDE (I1i = 1, I2i = 1) = 0

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

13 / 22

A robust version of the double expansion estimator

Following Beaumont, Haziza and Ruiz-Gazen (2011), we obtain o X X n R ˆ iDE (I1i = 1, I2i = 1) + ˆ iDE (I1i = 1, I2i = 1) YˆDE = YˆDE − B ψ B i∈s2

David Haziza and Jean-Fran¸cois Beaumont ()

i∈s2

Robust inference in two-phase sampling

June 21, 2011

14 / 22

A robust version of the double expansion estimator

Following Beaumont, Haziza and Ruiz-Gazen (2011), we obtain o X X n R ˆ iDE (I1i = 1, I2i = 1) + ˆ iDE (I1i = 1, I2i = 1) YˆDE = YˆDE − B ψ B i∈s2

i∈s2

Example of ψ-function:   c if t > c t if |t| ≤ c ψ (t) =  −c if t < −c c: tuning constant

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

14 / 22

A robust version of the double expansion estimator

Following Beaumont, Haziza and Ruiz-Gazen (2011), we obtain o X X n R ˆ iDE (I1i = 1, I2i = 1) + ˆ iDE (I1i = 1, I2i = 1) YˆDE = YˆDE − B ψ B i∈s2

i∈s2

Example of ψ-function:   c if t > c t if |t| ≤ c ψ (t) =  −c if t < −c c: tuning constant R Special case: single-phase sampling; i.e., I2i = 1 for all i ⇒ YˆDE reduces to the robust estimator proposed by Beaumont, Haziza and Ruiz-Gazen (2011). David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

14 / 22

Unit nonresponse s2 : set of respondents

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

15 / 22

Unit nonresponse s2 : set of respondents n2 : number of responding units (random)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

15 / 22

Unit nonresponse s2 : set of respondents n2 : number of responding units (random) I2i : response indicator for unit i

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

15 / 22

Unit nonresponse s2 : set of respondents n2 : number of responding units (random) I2i : response indicator for unit i π2i : unknown response probability for unit i.

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

15 / 22

Unit nonresponse s2 : set of respondents n2 : number of responding units (random) I2i : response indicator for unit i π2i : unknown response probability for unit i. We assume sampled units respond independently of one another (similar to Poisson sampling in the second phase)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

15 / 22

Unit nonresponse s2 : set of respondents n2 : number of responding units (random) I2i : response indicator for unit i π2i : unknown response probability for unit i. We assume sampled units respond independently of one another (similar to Poisson sampling in the second phase) Propensity score adjusted estimator, assuming the π2i ’s are known: X yi Y˜PSA = π1i π2i i∈s2

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

15 / 22

Unit nonresponse s2 : set of respondents n2 : number of responding units (random) I2i : response indicator for unit i π2i : unknown response probability for unit i. We assume sampled units respond independently of one another (similar to Poisson sampling in the second phase) Propensity score adjusted estimator, assuming the π2i ’s are known: X yi Y˜PSA = π1i π2i i∈s2

Influence of a responding unit:  X  π1ij  −1 −1 π2i − 1 yi BiPSA (I1i = 1, I2i = 1) = − 1 yj + π1i π1i π1j | {z } j∈U Influence of unit i on | {z } the nonresponse error Influence of unit i on the sampling error

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

15 / 22

Nonresponse model In practice, the response probability π2i is unknown

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

16 / 22

Nonresponse model In practice, the response probability π2i is unknown Parametric nonresponse model: π2i = m (xi , α) ,

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

16 / 22

Nonresponse model In practice, the response probability π2i is unknown Parametric nonresponse model: π2i = m (xi , α) , where m(.) is a known function xi is a vector of auxiliary variables available for all the sampled units (respondents and nonrespondents) α is a vector of unknown parameters

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

16 / 22

Nonresponse model In practice, the response probability π2i is unknown Parametric nonresponse model: π2i = m (xi , α) , where m(.) is a known function xi is a vector of auxiliary variables available for all the sampled units (respondents and nonrespondents) α is a vector of unknown parameters

Example: logistic regression model

David Haziza and Jean-Fran¸cois Beaumont ()

π2i =

exp (x0i α)  exp 1 + x0i α

Robust inference in two-phase sampling

June 21, 2011

16 / 22

Nonresponse model In practice, the response probability π2i is unknown Parametric nonresponse model: π2i = m (xi , α) , where m(.) is a known function xi is a vector of auxiliary variables available for all the sampled units (respondents and nonrespondents) α is a vector of unknown parameters

Example: logistic regression model π2i =

exp (x0i α)  exp 1 + x0i α

Estimated response probability for unit i: π ˆ2i = m (xi , α) ˆ

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

16 / 22

Nonresponse model In practice, the response probability π2i is unknown Parametric nonresponse model: π2i = m (xi , α) , where m(.) is a known function xi is a vector of auxiliary variables available for all the sampled units (respondents and nonrespondents) α is a vector of unknown parameters

Example: logistic regression model π2i =

exp (x0i α)  exp 1 + x0i α

Estimated response probability for unit i: π ˆ2i = m (xi , α) ˆ Special case: xi is a vector of weighting class indicators ⇒ weight adjustment by the inverse of the within-class response rate

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

16 / 22

Nonresponse model In practice, the response probability π2i is unknown Parametric nonresponse model: π2i = m (xi , α) , where m(.) is a known function xi is a vector of auxiliary variables available for all the sampled units (respondents and nonrespondents) α is a vector of unknown parameters

Example: logistic regression model π2i =

exp (x0i α)  exp 1 + x0i α

Estimated response probability for unit i: π ˆ2i = m (xi , α) ˆ Special case: xi is a vector of weighting class indicators ⇒ weight adjustment by the inverse of the within-class response rate

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

16 / 22

Nonresponse model Propensity score adjusted estimator: YˆPSA =

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

P

yi i∈s2 π1i π ˆ2i

June 21, 2011

17 / 22

Nonresponse model Propensity score adjusted estimator: YˆPSA = One can show that

P

yi i∈s2 π1i π ˆ2i

YˆPSA − YˆL = Op (n−1 ), where YˆL is the linearized version of YˆPSA .

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

17 / 22

Nonresponse model Propensity score adjusted estimator: YˆPSA = One can show that

P

yi i∈s2 π1i π ˆ2i

YˆPSA − YˆL = Op (n−1 ), where YˆL is the linearized version of YˆPSA . Asymptotic conditional bias of a responding unit: BiL (I1i = 1, I2i = 1) = E1 E2 (YˆL − Y |I1 , I1i = 1, I2i = 1)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

17 / 22

Nonresponse model Propensity score adjusted estimator: YˆPSA = One can show that

P

yi i∈s2 π1i π ˆ2i

YˆPSA − YˆL = Op (n−1 ), where YˆL is the linearized version of YˆPSA . Asymptotic conditional bias of a responding unit: BiL (I1i = 1, I2i = 1) = E1 E2 (YˆL − Y |I1 , I1i = 1, I2i = 1) Robust version of YˆPSA R YˆPSA = YˆPSA −

David Haziza and Jean-Fran¸cois Beaumont ()

X

ˆ iPSA (I1i = 1, I2i = 1) B

i∈s2

+

X

n o ˆ iPSA (I1i = 1, I2i = 1) ψ B

i∈s2 Robust inference in two-phase sampling

June 21, 2011

17 / 22

Simulation study We generated a population of size N = 10000 with two variables: y and x

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

18 / 22

Simulation study We generated a population of size N = 10000 with two variables: y and x x ∼ Gamma

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

18 / 22

Simulation study We generated a population of size N = 10000 with two variables: y and x x ∼ Gamma Mixture model: yi = δi × (100 + xi + 5i ) + (1 − δi ) × (400 + xi + 50i )

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

18 / 22

Simulation study We generated a population of size N = 10000 with two variables: y and x x ∼ Gamma Mixture model: yi = δi × (100 + xi + 5i ) + (1 − δi ) × (400 + xi + 50i ) i ∼ N(0, 1)

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

18 / 22

Simulation study We generated a population of size N = 10000 with two variables: y and x x ∼ Gamma Mixture model: yi = δi × (100 + xi + 5i ) + (1 − δi ) × (400 + xi + 50i ) i ∼ N(0, 1) 5% contamination: i.e., P(δi = 1) = 0.95

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

18 / 22

Simulation study We generated a population of size N = 10000 with two variables: y and x x ∼ Gamma Mixture model: yi = δi × (100 + xi + 5i ) + (1 − δi ) × (400 + xi + 50i ) i ∼ N(0, 1) 5% contamination: i.e., P(δi = 1) = 0.95 Select R = 10000 samples, of size n = 500, according to simple random sampling without replacement

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

18 / 22

Simulation study We generated a population of size N = 10000 with two variables: y and x x ∼ Gamma Mixture model: yi = δi × (100 + xi + 5i ) + (1 − δi ) × (400 + xi + 50i ) i ∼ N(0, 1) 5% contamination: i.e., P(δi = 1) = 0.95 Select R = 10000 samples, of size n = 500, according to simple random sampling without replacement Generate nonresponse: Bernoulli trials with probability π2i , where π2i =

1 exp(α0 + α1 xi )

Global response rate: 70%

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

18 / 22

Simulation study R We computed: YˆPSA and YˆPSA

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

19 / 22

Simulation study R We computed: YˆPSA and YˆPSA

π ˆ2i : estimated using a logistic regression model with x as a predictor

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

19 / 22

Simulation study R We computed: YˆPSA and YˆPSA

π ˆ2i : estimated using a logistic regression model with x as a predictor Monte Carlo measures: Monte Carlo percent Relative Bias:

David Haziza and Jean-Fran¸cois Beaumont ()

RB(Yˆ ) =

1 10000

P10000 ˆ t=1 (Yt − Y ) Y

Robust inference in two-phase sampling

June 21, 2011

19 / 22

Simulation study R We computed: YˆPSA and YˆPSA

π ˆ2i : estimated using a logistic regression model with x as a predictor Monte Carlo measures: Monte Carlo percent Relative Bias: RB(Yˆ ) =

1 10000

P10000 ˆ t=1 (Yt − Y ) Y

Relative Efficiency with respect to the nonrobust estimator: R MSE (YˆPSA ) R RE (YˆPSA )= ˆ MSE (YPSA )

Note: YˆPSA has negligible bias

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

19 / 22

Relative bias of the robust estimator (5% contamination)

Relative bias of the robust estimator 0

0

Relative bias

-5

-10

-15

-20 50000

David Haziza and Jean-Fran¸cois Beaumont ()

100000 150000 Tuning constant

Robust inference in two-phase sampling

200000

June 21, 2011

20 / 22

Relative efficiency with respect to the nonrobust estimator (5% contamination)

Relative efficiency 2.5

RE

2.0

1.5

1.0

1

0.5 50000

David Haziza and Jean-Fran¸cois Beaumont ()

100000 150000 Tuning constant

Robust inference in two-phase sampling

200000

June 21, 2011

21 / 22

Concluding remarks

Conditional bias: measure of influence that takes account of the sampling design, the parameter to be estimated and the estimator

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

22 / 22

Concluding remarks

Conditional bias: measure of influence that takes account of the sampling design, the parameter to be estimated and the estimator If the invariance property does not hold, it is still possible to assess the influence of a sampled unit and construct robust estimators

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

22 / 22

Concluding remarks

Conditional bias: measure of influence that takes account of the sampling design, the parameter to be estimated and the estimator If the invariance property does not hold, it is still possible to assess the influence of a sampled unit and construct robust estimators Results can be extended to the case of calibration estimators ⇒ important in the unit nonresponse context since weight adjustment procedures by the inverse of the estimated response probabilities are generally followed by some form of calibration

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

22 / 22

Concluding remarks

Conditional bias: measure of influence that takes account of the sampling design, the parameter to be estimated and the estimator If the invariance property does not hold, it is still possible to assess the influence of a sampled unit and construct robust estimators Results can be extended to the case of calibration estimators ⇒ important in the unit nonresponse context since weight adjustment procedures by the inverse of the estimated response probabilities are generally followed by some form of calibration Requires further investigations: Choice of the tuning constant MSE estimation: reverse framework for variance estimation?

David Haziza and Jean-Fran¸cois Beaumont ()

Robust inference in two-phase sampling

June 21, 2011

22 / 22

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.