×

Statistical matching of sample survey data: application to integrate Iranian time use and labour force surveys. (English) Zbl 1527.62005

Summary: Survey data are still contemplated as one of the main sources in official statistics. However, due to the high cost of conducting a survey, as well as the respondent burden, it may not be possible to collect all variables of interest in a data set. To obtain a more comprehensive source of data, one possible way is to integrate available data from different data sets such as already existing data, administrative registers, and official surveys. This helps to minimize the shortcomings of each survey and to maximize their advantages. In this paper, a mixed method at the micro-level has been applied to integrate data sourced from two surveys, involving the ‘Iranian Labour Force Survey’ and the ‘Iranian Time Use Survey’ which have been performed in the Fall of 2015. Thereby, besides increasing the coverage of the variables from two sources, we could also study the peculiarities of work and life qualities. For this objective, we develop a statistical matching micro approach by proposing the conditional predictive Dirichlet distribution and conditional predictive multinomial distribution in the regression step of mixed methods. In the end, the quality of matching along with the similarity of marginal distributions of specific variables (variables of interest) pre-and-post the integration are assessed by some similarity measures and the Kolmogorov-Smirnov test.

MSC:

62D05 Sampling theory, sample surveys
62J12 Generalized linear models (logistic models)
Full Text: DOI

References:

[1] Agresti, A., Categorical data analysis (2002), New York: Wiley, New York · Zbl 1018.62002 · doi:10.1002/0471249688
[2] Aitchison, J., The statistical analysis of compositional data (1986), New York: Chapman & Hall, New York · Zbl 0688.62004 · doi:10.1007/978-94-009-4109-0
[3] Alpman A, Gardes F, Thiombiano N (2017) Statistical matching for combining time-use surveys with consumer expenditure surveys: an evaluation on real data. Documents de travail du Centre d’Economie de la Sorbonne 17024, Université Panthéon-Sorbonne (Paris 1), Centre d’Economie de la Sorbonne. ffhalshs-01529699f
[4] Baker, R.; Brick, JM; Bates, NA; Battaglia, M.; Couper, MP; Dever, JA; Gile, KJ; Tourangeau, R., Summary report of the AAPOR Task force on non-probability sampling, J Surv Stat Methodol, 1, 2, 90-143 (2013) · doi:10.1093/jssam/smt008
[5] Balin M, D’Orazio M, Di Zio M, Scanu M, Torelli N (2009) Statistical matching of two surveys with a common subset. In: ISTAT Technical Report; ISTAT: Rome, Italy, pp 1-14
[6] Barceló, C.; Pawlowsky, V.; Grunsky, E., Some aspects of transformations of compositional data and the identification of outliers, Math Geol, 28, 4, 501-518 (1996) · doi:10.1007/BF02083658
[7] Cochran, WG, Sampling techniques (1977), New York: Wiley, New York · Zbl 0353.62011
[8] Conti, PL; Marella, D.; Scanu, M., Evaluation of matching noise for imputation techniques based on nonparametric local linear regression estimators, Comput Stat Data Anal, 53, 2, 354-365 (2008) · Zbl 1231.62062 · doi:10.1016/j.csda.2008.07.041
[9] Conti, PL; Marella, D.; Scanu, M., Statistical matching analysis for complex survey data with applications, J Am Stat Assoc, 111, 516, 1715-1725 (2016) · doi:10.1080/01621459.2015.1112803
[10] Cribari-Neto, F.; Zeileis, A., Beta regression in R, J Stat Softw, 34, 2, 1-24 (2010) · doi:10.18637/jss.v034.i02
[11] D’Ambrosio, A.; Aria, M.; Siciliano, R., Accurate tree-based missing data imputation and data fusion within the statistical learning paradigm, J Classif, 29, 2, 227-258 (2012) · Zbl 1360.62324 · doi:10.1007/s00357-012-9108-1
[12] D’Ambrosio A, Aria M, Siciliano R (2007) Robust tree-based incremental imputation method for data fusion. In: International symposium on intelligent data analysis . Springer, Berlin, pp 174-183
[13] D’Orazio, M., Integration and imputation of survey data in R: the StatMatch package, Rom Stat Rev, 63, 2, 57-68 (2015)
[14] D’Orazio, M.; Di Zio, M.; Scanu, M., Statistical matching: theory and practice (2006), New York: John Wiley & Sons, New York · Zbl 1107.62008 · doi:10.1002/0470023554
[15] D’Orazio, M.; Zio, M.; Scanu, M., Statistical matching for categorical data: displaying uncertainty and using logical constraints, J off Stat, 22, 1, 137-157 (2006)
[16] D’orazio, M., Statistical learning in official statistics: the case of statistical matching, Stat J IAOS, 35, 3, 435-441 (2019) · doi:10.3233/SJI-190518
[17] D’Orazio M, Di Zio M, Scanu M, DCMT ID (2005) A comparison among different estimators of regression parameters on statistically matched files through an extensive simulation study, contributi istat, p 10
[18] D’Orazio M (2011) Statistical matching through regression trees. Paper Presented at the SCo 2011 - 7th Conference on Statistical Computation and Complex Systems. Univ. Padova, September, pp 19-21
[19] D’Orazio M (2013) Statistical matching: methodological issues and practice with R-StatMatch. In: EUSTAT 55th international statistical seminar
[20] D’Orazio M (2020) Statmatch: statistical matching or data fusion. R-package
[21] D’Alberto, R.; Raggi, M., How much reliable are the integrated ‘live’ data? A validation strategy proposal for the non-parametric micro statistical matching, J Appl Stat, 48, 2, 322-348 (2020) · Zbl 1521.62290 · doi:10.1080/02664763.2020.1724272
[22] D’Alberto, R.; Zavalloni, M.; Raggi, M.; Viaggi, D., A Statistical Matching Approach to reproduce the heterogeneity of willingness to pay in benefit transfer, Socioecon Plann Sci, 74 (2020) · doi:10.1016/j.seps.2020.100935
[23] D’Orazio M, D’Orazio MM (2022) Package ‘StatMatch’. Available Online at One of the Mirror Sites.
[24] Eurostat (2013) Statistical matching of EU-SILC and the Household Budget Survey to Compare Poverty Estimates Using Income, Expenditures and Material Deprivation. Eurostat-Methodologies and Working Papers, Luxembourg: Publications Office
[25] Ferrari, SPL; Cribari-Neto, F., Beta Regression for modelling rates and proportions, J Appl Stat, 31, 7, 799-815 (2004) · Zbl 1121.62367 · doi:10.1080/0266476042000214501
[26] Ghahroodi, ZR; Ganjali, M., A Bayesian approach for analysing longitudinal nominal outcomes using random coefficients transitional generalized logit model: an application to the labour force survey data, J Appl Stat, 40, 7, 1425-1445 (2013) · Zbl 1514.62582 · doi:10.1080/02664763.2013.785653
[27] Gower, JC, A general coefficient of similarity and some of its properties, Biometrics, 27, 4, 857-871 (1971) · doi:10.2307/2528823
[28] Hijazi, RH; Jernigan, RW, Modeling compositional data using dirichlet regression models, J Appl Probab Stat, 4, 1, 77-91 (2009) · Zbl 1166.62053
[29] Hijazi RH (2003) Analysis of compositional data using dirichlet covariate models. American University. Ph.D. Dissertation. Washington, D.C.
[30] Hijazi RH (2011) An EM-algorithm Based method to deal with rounded zeros in compositional data under dirichlet models. In: Proceedings of the 4th International workshop on compositional data analysis. Girona, Spain
[31] Hussmanns R, Mehran F, Varmā V (1990) Surveys of economically active population employment, unemployment and underemployment, an ILO manual on concepts and methods. International Labour Organization
[32] International Labour Organization and United Nations Development Programme (2018) Time-use surveys and statistics in Asia and the Pacific: review of challenges and future direction, Thailand
[33] International Labour Organization (1988) Current international recommendations on labour statistics, Geneva.
[34] Kadane, JB, Some statistical problems in merging data files, J off Stat, 17, 3, 423-433 (2001)
[35] Leulescu A, Agafitei M (2013) Statistical matching: a model based approach for data integration. Eurostat-Methodologies and Working Papers, pp 10-2
[36] Little, RJ; Rubin, DB, Statistical analysis with missing data (2019), New York: John Wiley & Sons, New York · Zbl 1411.62006
[37] Maier MJ (2020) DirichletReg: dirichlet regression in R. R Package Version 0.7-0
[38] Marella, D.; Scanu, M.; Conti, PL, On the matching noise of some nonparametric imputation procedures, Stat Probab Lett, 78, 12, 1593-1600 (2008) · Zbl 1325.62092 · doi:10.1016/j.spl.2008.01.020
[39] Markatou, M.; Chen, Y.; Afendras, G.; Lindsay, BG; Diggle, PJ, Statistical distances and their role in robustness, New advances in statistics and data science (2017), Berlin: Springer, Berlin · Zbl 1402.62090
[40] Martin-Fernnandez, JA; Barcelo Vidal, C.; Pawlowsky-Glahn, V., Dealing with zeros and missing values in compositional data sets using nonparametric imputation, Math Geol, 35, 3, 253-278 (2003) · Zbl 1302.86027 · doi:10.1023/A:1023866030544
[41] Moriarity, C.; Scheuren, F., Statistical matching: a paradigm for assessing the uncertainty in the procedure, J off Stat, 17, 3, 407-422 (2001)
[42] Moriarity, C.; Scheuren, F., A note on Rubin’s statistical matching using file concatenation with adjusted weights and multiple imputation, J Bus Econ Stat, 21, 1, 65-73 (2003) · doi:10.1198/073500102288618766
[43] Morikawa, K.; Kim, JK, A note on the equivalence of two semiparametricestimation methods for nonignorable nonresponse, Stat Probab Lett, 140, 1-6 (2018) · Zbl 1463.62011 · doi:10.1016/j.spl.2018.03.020
[44] Okner, BA, Constructing a new database from existing microdata sets: the 1966 merge file, Ann Econ Soc Meas, 1, 3, 325-362 (1972)
[45] Rässler, S., Statistical matching: a frequentist theory, practical applications and alternative bayesian approaches (2002), Berlin: Springer Science & Business Media, Berlin · Zbl 1008.62002
[46] Rezaei Ghahroodi, Z.; Ganjali, M.; Harandi, F.; Berridge, D., Bivariate transition model for analyzing ordinal and nominal categorical responses: an application to the Labour Force Survey data, J Appl Stat, 38, 4, 817-832 (2011) · Zbl 1511.62418 · doi:10.1080/02664761003692324
[47] Rios-Avila F (2016) Quality of match for statistical matches used in the development of the levy institute measure of time and consumption poverty (LIMTCP) for Ghana and Tanzania, Levy Economics Institute, Working Paper 873
[48] Rios-Avila F (2018) Quality of match for statistical matches using the american time use survey 2013, the Survey of Consumer Finances 2013, and the Annual Social and Economic Supplement 2014, Levy Economics Institute, Working Papers 798
[49] Romano MC (2008) Time use in daily life. A multidisciplinary approach to the Time use’s analysis. Tech Rep ISTAT No 35
[50] Rubin, DB, Statistical matching using file concatenation with adjusted weights and multiple imputations, J Bus Econ Stat, 4, 1, 87-94 (1986)
[51] Ruggles, N.; Ruggles, R., A strategy for merging and matching microdata sets, Ann Econ Soc Meas, 1, 3, 353-371 (1974)
[52] Scanu M (2008) The practical aspects to be considered for statistical matching. In: Report of WP2: recommendations on the use of methodologies for the integration of surveys and administrative data, ESSnet statistical methodology project on integration of survey and administrative data, pp 34-35. http://cenex-isad.istat.it/
[53] Singh, AC; Mantel, H.; Kinack, M.; Rowe, G., Statistical matching: use of auxiliary information as an alternative to the conditional independence assumption, Surv Methodol, 19, 1, 59-79 (1993)
[54] Templ, M.; Hron, K.; Filzmoser, P., Compositional data analysis: theory and applications (2011), New York: John Wiley and Sons, New York · Zbl 1304.65033
[55] Tsagris, M.; Stewart, C., A dirichlet regression model for compositional data with zeros, Lobachevskii J Math, 39, 3, 398-412 (2018) · Zbl 1407.62093 · doi:10.1134/S1995080218030198
[56] United Nations Statistics Division (2021) International Classification of Activities for Time-Use Statistics 2016 (ICATUS 2016). United Nations New York
[57] Walthery, P.; Gershuny, J., Improving stylised working time estimates with time diary data: a multi study assessment for the UK, Soc Indic Res, 144, 3, 1303-1321 (2019) · doi:10.1007/s11205-019-02074-3
[58] Wang, T.; Zhao, H., A dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms, Biometrics, 73, 3, 792-801 (2017) · Zbl 1522.62251 · doi:10.1111/biom.12654
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.