This paper revisits the problem of five year survivability predictions f o r breast cancer using machine learning tools. This work is distinguishable from the past experiments based on the size of the training data, the unbalanced distribution of data in minority and majority classes, and modified data cleaning procedures. These experiments are also based on the principles of TIDY data and reproducible research. In order to finetune the predictions, a set of experiments were run using naive Bayes, decision trees, and logistic regression. Of particular interest were strategies to improve the recall level for the minority class, as the cost of misclassification is prohibitive. The main contribution of this work is that logistic regression with the proper setting of class weight gives the highest precision / recall level for the minority class. In addition, this work provides precise algorithms and codes for determining class membership and execution of competing methods. These codes can facilitate the reproduction and extension of our work by other researchers.
Keywords: Machine Learning, Big Data, Learning Algorithm, Logistic Regression, Classification, ROC
Bozorgi, M., Taghva, K., & Singh, A. (2022). Revisiting Survivability Prediction of Breast Cancer with Machine Learning Tools. Journal of Applied Statistics & Machine Learning. 1(2): pp. 89100.
Structural equation modeling (SEM) is a multivariate statistical analysis technique for analyzing structural relationships. This technique is a combination of factor analysis and multiple regression analysis, and it is used to analyze the structural relationship between measured variables and latent constructs. Covariancebased SEM (CBSEM) involves specifying a model and estimating the parameters so that the distance between the model’s implied population covariance matrix and the sample covariance matrix S is minimized. In the Partial Least Squares approach to SEM (PLSSEM), the explained variance of the endogenous latent variables is maximized. CBSEM has been widely used by hospitality researchers, but not the softmodeling approach of PLSSEM. The CBSEM method requires hard distributional assumptions on the data, whereas PLSSEM is more flexible. This article compares the results of CBSEM model with results derived from the PLSSEM method to test the same hypothesis on a dataset from online casino gaming; the results show that PLSSEM is more accurate than CBSEM for this dataset. Literature also suggests the use of PLSSEM over CBSEM since multivariate normality of the sample is not required, and it generally works well even with smaller sample sizes.
Keywords: online gambling; atmospherics; servicescape; user experience; partial least squares; structural equation modeling, bootstrap
Brett Abarbanel, Ashok K. Singh, Bo Bernhard & Anthony Lucas (2022). A Comparative Study of CBSEM and PLSSEM Methods using Online Casino Survey Data. Journal of Applied Statistics & Machine Learning. 1(2): pp. 101116.
In recent years, the application of machine learning (ML) algorithms has increased rapidly in various domains. Extensively in assisting diagnosis and predicting the prognosis in health care research. However, the challenges in using these methods are less understood by the researchers. The aim of this article is to present the following challenges in using ML algorithms in biomedical research. The use of ‘variable of importance’ in the prediction as ML models do not provide coefficients or weights, relation to regression coefficients and predicting the diagnosis or prognosis of low prevalence (imbalance) diseases, and the adjustment to handle this imbalance using Synthetic Minority Oversampling Technique called SMOTE, etc. Also, highlighted that the model selection with maximum accuracy or area under curve (AUC) statistics is alone not sufficient. The need for predictive values at various prevalence of outcome has to be highlighted. Simulation studies are recommended to evaluate the usefulness of SMOTE. The results of studies with the diseases prevalence 40% to 60% have to be used cautiously. Literature examples have been used to highlight the challenges.
Keywords: Challenges in ML, Imbalance data; Low prevalence; Machine Learning; SMOTE;
S. Marimuthu, Mani Thenmozhi, Melvin Joy, Malavika Babu and L. Jeyaseelan. (2022). Challenges in the application of Machine Learning algorithms in Biomedical Research. Journal of Applied Statistics & Machine Learning. 1(2): pp. 117-126.
The traditional Capital Asset Pricing Model, because of its conceptual and empirical disadvantage, needed some possible extension which led to the inclusion of higher moments in the model. Past studies showed that higher moments i.e. skewness and kurtosis, contributed to the risk premium of an asset. In the present study an attempt was made to compare unconditional and conditional higher moment Capital Asset Pricing Models and find the most suitable among these in context of Indian stock market. For illustrating the better model, the data of the companies listed in S&P BSE 500 Index has been considered. Akaike Information Criteria and Bayesian Information Criteria values were used for the selection of better model among these two models. The results revealed that conditional higher moment model gave better results as compared to unconditional higher moment model.
Keywords: Skewness, kurtosis, Akaike Information Criteria, Bayesian Information Criteria.
Akash Asthana & Syed Shafi Ahmed (2022). Conditional and Unconditional Higher Moment CAPM: A Comparative Study. Journal of Applied Statistics & Machine Learning. 1(2): pp. 127-145.
Copyright ©2023 ESI Publications. All Rights Reserved