Document Type : Research Paper
Authors
1 Department of Computer Science, Shahid Bahonar University of Kerman, Kerman, Iran
2 Department of Computer science, Shahid Bahonar University of Kerman, Kerman, Iran
Abstract
Machine learning models for breast cancer diagnosis are often hindered by the high dimensionality of clinical datasets, where many features are redundant or irrelevant, degrading predictive performance. To address this challenge, this paper proposes a novel hybrid feature selection method, the Relevance-Based Sailfish Optimizer Feature Selection (RBSOFS), designed to identify a minimal yet highly informative subset of features. The RBSOFS approach was implemented and evaluated on the Breast Cancer Wisconsin (Diagnostic) dataset, with the selected features being fed into five established classifiers: Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), Decision Tree (DT), and Logistic Regression (LR). The proposed method demonstrates superior performance compared to eight other state-of-the-art metaheuristic algorithms, achieving a peak accuracy of 98.24%. Significantly, this result was obtained using a subset of only 6-7 features, a drastic reduction that leads to simpler and more computationally efficient models compared to competing methods.
Keywords