"Identifying missing data handling methods with text mining" címmel jelent meg Boros Krisztián és Kmetty Zoltán cikke az International Journal of Data Science and Analytics folyóiratban.
Absztrakt:
Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles published between 1999 and 2016. JSTOR provided the data in text format. We utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods, such as Multiple Imputation or Full Information Maximum Likelihood estimation, is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.
A cikk elérhető itt:
Boros, K., Kmetty, Z. Identifying missing data handling methods with text mining. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00582-1