Idea Transcript
USING MIXTURES TO MODEL ZERO OBSERVATIONS Richard Tiffin and Ariane Kehlbacher Department of Food Economics and Marketing, University of Reading, UK Abstract. Modelling household demand using microdata is complicated by
zero observations which may arise if households consume zero amounts or decide not to participate in the market. We demonstrate that given only observations on the pooled population, it is possible to uncover households according to the source to which their zero is attributable and make statistical inferences about the properties of the distinct subpopulations. We estimate a mixture model and illustrate the method with simulated data. The results are encouraging and we intend to test the validity of our method by applying it to a data set on meat consumption.
1
USING MIXTURES TO MODEL ZERO OBSERVATIONS
2
1. Introduction Modelling household demand using microdata is complicated by zero observations as not all households consume all goods in a given time period. Zero consumption may occur for different reasons: (i) a corner solution; (ii) the household’s preferences are such that it can maximise utility by choosing not to participate in the market; or (iii) the household is consuming from stocks during the period of investigation. This paper is concerned with zeros arising as in (i) or (ii). A popular model dealing with zero values in household level data is the tobit model (Tobin, 1958). It assumes that zero observations arise from standard corner solution generated by the budget constraint and that the factors affecting the level of consumption and determining the probability of consumption are the same. A different model that allows distinguishing between these factors is the double-hurdle model by Cragg (1971). It assumes that consumption of a particular good is the outcome of a participation decision and a consumption decision, whereby the participation decision is the decision whether to buy the good or not and the consumption decision is the decision on how much to buy of it. Two different latent variables are used to model each decision process, with a Probit model determining participation and a Tobit model determining the expenditure level (Blundell and Meghir, 1987). Only if consumers pass these two separate hurdles are they observed with a positive level of consumption. Jones (1989) and Garcia and Labeaga (1996) confirm the double-hurdle model’s ability to distinguish between the different reasons that generate zeros in tobacco consumption data. It is, however, not able to identify the zeros according to their source. This is a problem if it is hypothesised that the underlying demand functions of households that elect to not participate in the market and households that decide to consume zero quantity are different. We therefore propose a model that allows specifying parameters that are common to households according to the sources to which their zero is attributable. A mixture model is applied to simulated data using Bayesian estimation methods. 2. Method We use a mixture model which allows making statistical inferences about the properties of unobserved homogeneous subpopulations given only observations on the pooled population (Everitt and Hand, 1981; Titterington, 1985). In our data, which is drawn from a general population, we observe zero consumption by different households which can be due either to non-consumption or non-participation. Neither source of zeros is separately identifiable in the data. The observable data thus constitutes a mixture distribution of households who are members of the two different homogeneous subpopulations. The probabilities of a household being a member of the two subpopulations are called mixing proportions. The mixture model allows estimating the mixing proportions and
USING MIXTURES TO MODEL ZERO OBSERVATIONS
3
uncover the underlying components as well as the parameters of each component of the model which are given by their conditional distributions. To demonstrate the effectiveness of this method, we apply the model to Monte Carlo data. A sample of 1000 observations is generated from a two component mixture of a tobit and a double-hurdle process. Estimates of the demand equation parameters for both models are obtained as described in Koop (2003). Inference on the full posterior in the presence of latent data is conducted by using data augmentation (Tanner and Wong, 1987). The Gibbs sampler (see Albert and Chib, 1993) is extended to include draws on the mixing distribution a convenient choice of distribution for which is the beta distribution, as well as draws on the vector specifying a household’s component membership which are taken from a a binomial distribution. 3. Results Our results show that the mixture model does a good job in approximating the underlying shape of the true density function of the data. This is ongoing research and to demonstrate the validity of our approach we indent to apply the mixture approach to a dataset on meat consumption. References Albert, J. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical . . . 88: 669–679. Blundell, R. and Meghir, C. (1987). Bivariate alternatives to the Tobit model. Journal of Econometrics 34: 179–200. Cragg, J. (1971). Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica: Journal of the Econometric Society 39: 829–844. Everitt, B. and Hand, D. (1981). Finite mixture distributions. Chapman and Hall. Garcia, J. and Labeaga, J. (1996). Alternative approaches to modelling zero expenditure: An application to Spanish demand for tobacco. Oxford Bulletin of Economics and Statistics 58: 489–505. Jones, A. (1989). A double-hurdle model of cigarette consumption. Journal of Applied Econometrics 4: 23–39. Koop, G. (2003). Bayesian Econometrics. Wiley. Tanner, M. and Wong, W. (1987). The calculation of posterior distributions by data augmentation. Journal of the American statistical Association 82: 528–540. Titterington, D. M. (1985). Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons. Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica: Journal of the Econometric Society 26: 24–36.