Özet
Coşkun Yıldırım, M. Estimation of Transformation Parameter for Various Transformation Methods Via Shapiro-Wilk Test Statistic, Hacettepe University Graduate School of Health Sciences Master Thesis in Biostatistics, Ankara, 2023. Normal distribution assumption; It is used in many fields such as statistical analysis, models, and sampling theory. Ensuring this assumption is very important in analyzing the data correctly and interpreting the results. One of the popular methods used in cases where the assumption of normality is not provided is to apply a transformation on the variables. Within the scope of this thesis, transformation parameters that maximize the Shapiro-Wilk test statistic for Log Shift, Box-Cox, Bickel-Doksum, Yeo-Johnson, Square Root Shift, Manly, Modulus, Dual and Gpower transformation are estimated. By adding Log, Neglog, Glog, and Reciprocal transformation methods to these nine transformation methods, the performances of these transformation methods under different scenarios were examined with the Shapiro-Wilk test with Monte Carlo simulation study. In addition, Shapiro-Wilk method was compared with the maximum likelihood estimation for parameter estimation. As expected as a result of the simulation study, it was observed that Shapiro-Wilk parameter estimation method performed better than the maximum likelihood estimation method in terms of the normality transformation performance of the methods. In line with the simulation study, it is concluded that the Dual transform method performs better under most scenarios. In addition, this approach in which the parameter is estimated with the Shapiro-Wilk test statistic is available to researchers with 13 different transformation methods under Transform package in the R program, and its application is demonstrated on an open-access data set.
Künye
1. Evans JR, Olson DL, Olson DL. Statistics, data analysis, and decision modeling: Pearson/Prentice Hall Upper Saddle River, NJ; 2007.
2. Wells LT. International Petroleum Exploration and Exploitation Agreements: Legal, Economic and Policy Aspects. By Keith W. Blinn, Claude Duval, Honoré Le Leuch and André Pertuzio. London: Euromoney Publications, 1986. Pp. 431. $145;£ 98. American Journal of International Law. 1987;81(4):1015-7.
3. Bartlett MS. The square root transformation in analysis of variance. Supplement to the Journal of the Royal Statistical Society. 1936;3(1):68-78.
4. Gregoire TG, Lin QF, Boudreau J, Nelson R. Regression Estimation Following the Square-Root Transformation of the Response. Forest Sci. 2008;54(6):597-606.
5. Keene ON. The log transformation is special. Statistics in medicine. 1995;14(8):811-9.
6. Box GE, Cox DR. An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological). 1964;26(2):211-43.
7. Feng Q, Hannig J, Marron J. A note on automatic data transformation. Stat. 2016;5(1):82-7.
8. Bickel PJ, Doksum KA. An analysis of transformations revisited. Journal of the american statistical association. 1981;76(374):296-311.
9. Yeo IK, Johnson RA. A new family of power transformations to improve normality or symmetry. Biometrika. 2000;87(4):954-9.
10. Medina L, Castro P, Kreutzmann A, Rojas-Perilla N. trafo: Estimation, Comparison and Selection of Transformations. R package version. 2018;1(0).
11. Manly BF. Exponential data transformations. Journal of the Royal Statistical Society: Series D (The Statistician). 1976;25(1):37-42.
12. John J, Draper NR. An alternative family of transformations. Journal of the Royal Statistical Society Series C: Applied Statistics. 1980;29(2):190-7.
13. Yang Z. A modified family of power transformations. Economics Letters. 2006;92(1):14-9.
14. Kelmansky DM, Martínez EJ, Leiva V. A new variance stabilizing transformation for gene expression data analysis. Statistical Applications in Genetics and Molecular Biology. 2013;12(6):653-66.
15. Shapiro SS, Wilk MB. An analysis of variance test for normality (complete samples). Biometrika. 1965;52(3/4):591-611.
16. Whittaker J, Whitehead C, Somers M. The neglog transformation and quantile regression for the analysis of a large credit scoring database. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2005;54(5):863-78.
17. Rocke DM, Durbin B. A model for measurement error for gene expression arrays. Journal of computational biology. 2001;8(6):557-69.
18. Durbin BP, Hardin JS, Hawkins DM, Rocke DM. A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics. 2002;18:105-10.
19. Huber W, von Heydebreck A, Sültmann H, Poustka A, Vingron M. Parameter estimation for the calibration and variance stabilization of microarray data. Statistical applications in genetics and molecular biology. 2003;2(1).
20. Tukey JW. On the comparative anatomy of transformations. The Annals of Mathematical Statistics. 1957:602-32.
21. Qi L, Luo Z. Tensor analysis: spectral theory and special tensors: SIAM; 2017.
22. White H. Maximum likelihood estimation of misspecified models. Econometrica: Journal of the econometric society. 1982:1-25.
23. Diggle PJ, Gratton RJ. Monte Carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society: Series B (Methodological). 1984;46(2):193-212.
24. Halva AM. Estimating the Box-Cox transformation via an artificial regression model. Communications in Statistics-Simulation and Computation. 1996;25(2):331-50.
25. Rahman M. Estimating the Box-Cox transformation via Shapiro-Wilk W statistic. Communications in Statistics-Simulation and Computation. 1999;28(1):223-41.
26. Rahman M, Pearson LM. Anderson-Darling statistic in estimating the Box-Cox transformation parameter. Journal of Applied Probability and Statistics. 2008;3(1):45-57.
27. Dag O, Asar O, Ilk O. A methodology to implement Box-Cox transformation when no covariate is available. Communications in Statistics-Simulation and Computation. 2014;43(7):1740-59.
28. Asar Ö, Ilk O, Dag O. Estimating Box-Cox power transformation parameter via goodness-of-fit tests. Communications in Statistics-Simulation and Computation. 2017;46(1):91-105.
29. Dag O, Ilk O. An algorithm for estimating Box–Cox transformation parameter in ANOVA. Communications in Statistics-Simulation and Computation. 2017;46(8):6424-35.
30. Yılmaz MA, Dag O. Ensemble Based Box-Cox Transformation via Meta Analysis. Journal of Advanced Research in Natural and Applied Sciences.8(3):463-71.
31. Peterson RA, Peterson MRA. Package ‘bestNormalize’. Normalizing transformation functions R package version. 2020;1.
32. Fox J, Weisberg S. An R companion to applied regression: Sage publications; 2011.
33. Kuhn M. Building predictive models in R using the caret package. Journal of statistical software. 2008;28:1-26.
34. Ripley BD. Modern applied statistics with S: springer; 2002.
35. Medina L, Kreutzmann A-K, Rojas-Perilla N, Castro P. The R Package trafo for Transforming Linear Regression Models. R Journal. 2019;9(2).
36. Ripley WNVBD. Modern applied statistics with S. 2002.
37. Aitchison J, Dunsmore IR. Statistical prediction analysis. 1975.
38. Coskun Yildirim M, Dag O. Transform: An R Package for Statistical Transformations. Cran R-Project Org. 2023;1.