Efficiency of Outlier Detection Statistics in Multiple Linear Regression

Authors

  • Vanida Pongsakchat ภาควิชาคณิตศาสตร์ คณะวิทยาศาสตร์ ม.บูรพา
  • Preawnapa Muansamai

Abstract

The objective of this research was to compare the performance of 5 outlier detecting statistics in the multiple linear regression which are leverage value ( ), studentized deleted residual ( ), Cook’s distance( ),  and covariance ratio ( ). There were three types of outliers: outliers in independent variables, in the dependent variable and in both independent and dependent variables. Sample sizes were 30, 50 and 100 and number of outliers in each dataset were 1 observation and 10, 20 and 30 percent of the sample size. The criterion used for considering the performance of these statistics was proportion of correctly detect all of outliers in the dataset from 10,000 replications. The results are shown as follows: for all sample sizes, in single-outlier case,  and  had the highest performance when an outlier was in the independent variables, whereas , ,  and  had the highest performance when an outlier was in the dependent variable, and when an outlier was in both the independent variables and the dependent variable, ,  and  had the best performance. However, the performance of these statistics decreased as the number of outliers in the dataset increased. Keywords : outliers, leverage, studentized deleted residual, Cook’s distance, covariance ratio

References

Ampanthong, P. & Prachoom, S. (2009). A comparative study of outlier detection procedures in
multiple linear regression. In Proceeding of the International MultiConference of Engineers and Computer Scientists 2009. Hong Kong.
Marubini, E. and Orenti, A. (2014). Detecting outliers and/or leverage points: a robust two-stage
procedure with bootstrap cut-off points. Epidemiology Biostatistics and Public Health, 11(3).
Rousseeuw, P. J. and Leroy, A. M. (2003). Robust regression and outlier detection. New Jersey:
John Wiley & Sons.
Zakaria, A., Howard, N. K. and Nkansah, B. K. (2014). On the detection of influential outliers in linear
regression analysis. American Journal of Theoretical and Applied Statistics, 3(4), 100-106.

Downloads

Published

2017-07-20

Issue

Section

บทความวิจัยจากการประชุมวิชาการระดับชาติ"วิทยาศาสตร์วิจัย"ครั้งที่ 9