Application of Binary Logistic Regression Analysis and the Synthetic Minority Oversampling Technique for Predicting Consumer Loan Default

Authors

  • Kanyanat Maksungnern Institute of Science, Suranaree University of Technology
  • Tidarut Areeak Institute of Science, Suranaree University of Technology

Abstract

The binary logistic regression analysis and the synthetic minority oversampling technique (SMOTE) are studied in this paper to predict India’s consumer loan default in 2021. SMOTE is used to solve the imbalanced problem. The ratio of training data to testing data is divided into three different groups. In each ratio, the models are constructed from imbalanced data and balanced data. The recall of the models is used for comparative study. The recall of the model based on balanced data is higher than the recall of the model based on imbalanced data in each ratio of the dataset. The prediction of loan default was improved.       

References

Bellinger, C., Japkowicz, N., & Drummond, C. (2015). Synthetic Oversampling for Advanced Radioactive Threat Detection. In IEEE 14th International Conference on Machine Learning and Applications. (pp. 948-953). United States: IEEE.

Bera, B., Saha, S., & Bhattacharjee, S. (2020). Forest Cover Dynamics (1998 to 2019) and Prediction of Deforestation Probability using Binary Logistic Regression (BLR) Model of Silabati Watershed, India. Trees, Forests and People, 2, 100034.

Boateng, E., & Oduro, F. (2018). Predicting Microfinance Credit Default: A Study of Nsoatreman Rural Bank, Ghana. Journal of Advances in Mathematics and Computer Science, 26(1), 1-9.

Boonmeekham, A. (2018). Predictive Models for Lapse of Life Insurance policy with Logistic Regression Model and Cox Proportional Hazard Model. Master’s Degree Thesis of Kasetsart University. (in Thai)

Chawla, V.N., Bowyer, W.K., Hall, O.L., & Kegelmeyer, W.P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence research, 16, 321-357.

Chemchem, A., Alin, F., & Krajecki, M. (2019). Combining SMOTE Sampling and Machine Learning for Forecasting Wheat Yields in France. In 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering. (pp. 9-14). United States: IEEE.

He, H., & Garcia, A.E. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and DATA Engineering, 21(9), 1263-1284.

Ishaq, A., Sadiq, S., Umer, M., Ullah, S., Mirjalili, S., Rupapara, V., & Nappi, M. (2021) Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques. IEEE Access, 9, 39707-39716.

Khan, S., Halder, H., Rashid, M., Afroja, S., & Islam, M. (2020). Impact of Socioeconomic and Demographic Factors for Underweight and Overweight Children in Bangladesh: A Polytomous Logistic Regression Model. Clinical Epidemiology and Global Health, 8, 1348-1355.

Roshan, V. (2022). Optimal ratio for data splitting. Statistical Analysis and Data Mining: The ASA Data Science Journal, 15(4), 531-538.

Serener, B. (2016). Statistical Analysis of Internet Banking Usage with Logistic Regression. Procedia Computer Science, 102, 648-653.

Sinsomboonthong, S. (2016). Multivariate Analysis. (1). Bangkok: Chamchuree Products Company Limited. (in Thai)

Vanichbuncha, K. (2007). Multivariate Analysis. (2). Bangkok: Dharmasarn Printing Company Limited. (in Thai)

Wei, J., Lu, Z., Qiu, K., Li, P., & Sun, H. (2020). Predicting Drug Risk Level from Adverse Drug Reactions Using

Downloads

Published

2023-09-25