High-dimensional Linear Regression Analysis by using Genetic Algorithm

Authors

  • Panaj Abhavudhichai National Institute of Development Administration
  • Vichit Lorchirachoonkul National Institute of Development Administration
  • Jirawan Jitthavech National Institute of Development Administration

Abstract

The research objective is to study the effectiveness of parameter estimation and variable selection by using genetic algorithm in the high-dimensional linear regression analysis. The results of the proposed method from the simulation are compared with the other three well-known methods: lasso, elastic net, and stepwise regression. The comparison criteria are the percentage of the number of correct fitting models, the percentage of the number of over-fitting models, the percentage of the number of under-fitting models, the percentage of the number of incorrect fitting models including mean squared error and the accuracy of the parameter estimates. It can be concluded that the direct selection by genetic algorithm yields the best results when compared with the other three methods in nearly all cases.Keywords : variable selection, genetic algorithm, high-dimensional data, linear regression analysis

References

Candès, E. & Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35(6), 2313–2351.
Darwin, C. (1859). The origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. London: John Murray.
De Jong, K. A. (1975). An analysis of the behavior of a class of genetic adaptive systems. Doctoral dissertation, University of Michigan.
Drezner, Z., & George, A. (1999). Tabu search model selection in multiple regression analysis. Communications in Statistics – Simulation and Computation, 28(2), 349–367.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression (with discussion). The Annals of Statistics, 32(2), 407-499.
Glover, F. (1986). Future paths for integer programming and links to artificial intelligence. Computers and Operations Research, 13(5), 533–549.
Glover, F. (1989). Tabu search – part 1. ORSA Journal on Computing, 1(2), 190–206.
Glover, F. (1990). Tabu search – part 2. ORSA Journal on Computing, 2(1), 4–32.
Glover, F. (1990). Tabu search: A tutorial. Interfaces, 20(1), 74-94.
Goldberg, D.E. (1989). Genetic algorithms in search, optimization, and machine learning. Massachusetts:
Addison-Wesley Publishing.
Gujarati, D. N. (2006). Essentials of Econometrics (3rd ed.). New York: McGraw-Hill.
Holland, J. H. (1973). Genetic algorithms and the optimal allocation of trials. SIAM Journal on Computing, 2(2), 88–105.
Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.
Ishwaran, H., & Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies.
The Annals of Statistics, 33(2), 730–773.
Jitthavech, J. (2015). Regression analysis. Bangkok: National Institute of Development Administration. (in Thai)
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671-680.
Miller, B. L. & Goldberg, D. E. (1995). Genetic algorithms, tournament selection, and the effects of noise. Complex Systems, 9(3), 193–212.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (2006). Introduction to linear regression analysis (4th ed.). New Jersey: John Willey & Sons.
Na Bangchang, K. (2011). A variable selection in multiple linear regression models based on tabu search. Master’s thesis, National Institute of Development Administration. (in Thai)
Ngamprasertsit, N. (2012). A comparison of variable selection by ridge regression and tabu search with multicollinearity. Master’s thesis, National Institute of Development Administration. (in Thai)
Pungpapong, V. (2012). Empirical Bayes variable selection for high-dimensional regression. Doctoral dissertation, Purdue University.
Pungpapong, V. (2015). A brief review on high-dimensional linear regression. Thammasat Journal of Science and Technology, 23(2), 212–223. (in Thai)
Quenouille, M. H. (1949). Problems in plane sampling. The Annals of Mathematical Statistics, 20(3), 355–375.
Quenouille, M. H. (1956). Notes on bias in estimation. Biometrika, 43(3–4), 353–360.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B, 58(1), 267–288.
Tukey, J. W. (1958). Bias and confidence in not quite large samples. The Annals of Mathematical Statistics, 29(2), 614–623.
Whitley, D., Mathias, K., & Fitzhorn, P. (1991). Delta coding: An iterative search strategy for genetic algorithms.
In Proceeding of the 4th International Conference on Genetic Algorithms. (pp. 77–84). CA: Morgan
Kaufmann.
Wright, A. H. (1991). Genetic algorithms for real parameter optimization. Foundations of Genetic Algorithms, 1, 205–218.
Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B, 67(2), 301–320.

Downloads

Published

2018-01-31