Breast cancer is one of the most common cancers afflicting women. Early detection and effective treatment are critical to improving the chances of survival. Since invasive ductal carcinoma (IDC) accounts for 80% of all breast cancers, early detection of IDC cells plays an instrumental role in controlling cancer outcomes. While histopathological image analysis is the gold standard for detecting cancer, it is very challenging for pathologists to examine large patches of benign regions for identifying malignant cells. This process is not only prone to pathologists’ subjectivity but also quite time-consuming, laborious and expensive. Deep learning techniques, particularly convolutional neural networks (CNNs), can mechanize the detection process to make it more objective, precise, and faster since they are good at learning predominant features automatically. However, lack of enough labelled and class balanced data samples are some of the practical challenges in adoption of deep learning methods for such problems. In this paper, we propose an image classification model using CNNs for IDC cell detection in histopathology slides. Further, we have performed a comparative analysis of some of the state-of-theart CNN architectures and applied transfer learning techniques. By trying out experiments on such kinds of models through transfer learning and optimization techniques, we have identified the most suitable transfer learning approach based on the EfficientNet-B7 network that has achieved accuracy of 90%, sensitivity of 91%, specificity of 90%, F1-score of 84% and balanced accuracy of 91%. This is an improvement on some of the previous research literature on this dataset. Through our approach, this research topic has focused on the benefits of using image classification problem with better accuracy and efficiency. This helps us in laying down a state-of-the-art approach for IDC detection through breast cancer histopathology image classification.
Keywords: deep learning, CNN architecture, accuracy, precision, transfer learning, hyperparameter tuning, learning rate
Vikash Sharma, Siddhartha Roy & Girdhar G. Agarwal (2023). Breast Cancer in Histopathological Data through Image Classification using Deep Learning Methods. Journal of Applied Statistics & Machine Learning. 2(1): pp. 1-37.
This study deals the problem of mean estimation in presence of measurement errors (ME) using logarithmic estimator in simple random sampling (SRS). The expression of mean square error of the suggested estimator is determined to the approximation of order first. The suggested estimator is compared with usual mean estimator, classical ratio and product estimators and the efficiency conditions are obtained to show efficacy of the proposed estimator over the conventional estimators. Subsequently, to enhance the theoretical findings, numerical and simulation studies are also performed.
Keywords: Mean square error, Measurement errors, Simulation study.
Mathematical subject classification: 62D05
Shashi Bhushan, Anoop Kumar & Shivam Sukla (2023). Estimation of Mean in Case of Measurement Errors using Logarithmic Type Estimator. Journal of Applied Statistics & Machine Learning. 2(1): pp. 39-50.
Home field advantage is quite a common phrase that is used all throughout the world of sports. As the name implies, it leads one to believe that the team (or player in some cases) has an advantage performing on their home field. This analysis looks at whether or not this is the case by using the Bradley Terry Paired Comparison Model to compare the outcomes of NBA games as well as MLB games. The reasoning behind using match ups from the MLB and NBA is because of the limitations of this extension of the Bradley-Terry Paired Comparison Model. This extension does not account for ties. As a result these sports leagues were chosen so no data had to be omitted. The Bradley- Terry Paired Comparison Model found there was in fact an advantage in playing at home for both teams in the MLB and teams in the NBA. A combination of Linear and Logistic Regression models were also used to see how playing at home affected the response variable (points scored) and the log-odds that a team won. While all of these methods supported that home field advantage is a real thing, they were not significant and did not increase the log-odds by a substantial margin. The Bradley-Terry Paired Comparison Model is good at finding if there is an advantage (or disadvantage) in match ups between two teams, but it is held back by some of its limitations.
Michael Gonzales & Anwar Hossain (2023). Using the Bradley-Terry Paired Comparison Model and Logistic Regression to See if there is an Advantage in Playing at Home for NBA and MLB Games. Journal of Applied Statistics & Machine Learning. 2(1): pp. 51-66.
Algorithms for computing service areas in the presence of facility stations in two dimensions has been considered by investigators in various field of science and engineering that include operations research, computer science, and transportation network. We present a novel approximation algorithm for estimating service areas when facility stations are of varying capacities, distributed in a city surface environment. Our algorithm is based on modeling facility stations of various capacities by weighted Voronoi diagram. We introduce the idea of inserting virtual stations in the proximity of candidate station to convert an instance of weighted Voronoi diagram to an instance of the standard Voronoi diagram. Such a transformation makes it possible to estimate service areas of high-capacity sites by making use of the standard Voronoi diagram . The approximation algorithm runs in O(n log n) time, where n is the number of service stations.
Keywords: Service area estimation, Weighted Voronoi diagram, Approximation algorithms, facility locations
Dara Nyknahad, Laxmi Gewali, Wolfgang Bein, Rojin Aslani & Ashok Singh (2023). Algorithms for Estimating Service Regions. Journal of Applied Statistics & Machine Learning. 2(1): pp. 67-77.
In terms of potato production, India ranks in second position worldwide, and the state of Uttar Pradesh produces the majority of them. The wholesale cost and availability of potatoes are crucial factors for farmers, merchants, and customers. The present research attempts to examine the market prices and arriving quantities of potatoes in the wholesale market of Lucknow district, the capital of Uttar Pradesh. The time-series data from January 2011 to December 2022 was collected from the Agricultural Marketing Information Network (AGMARKNET) website, which was started by the Union Ministry of Agriculture, Govt. of India. The compound annual growth rates and correlation coefficient for market prices and arrival quantity are computed and for their better forecasts, four types of time series forecasting models are applied. The Autoregressive Integrated Moving Average (ARIMA) and Seasonal Autoregressive Integrated Moving Average (SARIMA) models, which are appropriate for stationary time series having clear patterns of seasonality and trend as well as Prophet and Simple Recurrent Neural Network (RNN) models, which are usually used when time series is non-stationary or with complex nonlinear relationships. These models are applied to the univariate time series data, and the predictive accuracies are compared based on Root Mean Square Error (RMSE). The Prophet model came out to be the best-fitted among the four applied models; hence, it is used to forecast the market prices and arrival quantities of potatoes in Lucknow for the next two years, i.e., from January 2023 to December 2024.
Keywords: ARIMA, SARIMA, Prophet, RNN, AIC, RMSE.
Samapriya Trivedi, Shalini Jaiswal, Ashok Kumar & Shambhavi Mishra (2023). Time Series Forecasting for Market Prices and Arrival Quantities of Potatoes in Lucknow. Journal of Applied Statistics & Machine Learning. 2(1): pp. 79-96.
Copyright ©2023 ESI Publications. All Rights Reserved