Improving the Accuracy of Misclassified Breast Cancer Data using Machine Learning
Main Article Content
Abstract
Background: Breast cancer is the most common cancer among women. Many studies have made significant gains to classify breast cancer tumors with much emphasis on the best algorithm and highest classification accuracy but with limited interest in correcting misclassified data (Type 1 and Type 2 errors). Objective: This research proposes a novel hybrid integrated system of WEKA (Waikato Environment for Knowledge Analysis) and case-based reasoning (CBR) using myCBR plugin with protégé for the classification of breast cancer tumors and correction of misclassified data (Type 1 and Type 2 errors) of breast cancer tumors. Methods: The Wisconsin breast cancer dataset retrieved from the Wisconsin university repository was used in this research. The dataset contained 699 instances, 2 classes (malignant and benign), and 9 integer-valued attributes. To determine the breast cancer tumors, we applied the J48, IBK, LibSVM, JRip, and Multi-Layer Perceptron (MLP) classifiers to classify the breast cancer tumors. Next, the myCBR plugin with protégé was used as an advanced modeling technique to correct the misclassified data and enhance its accuracy. Results: The proposed model performance evaluation was based on sensitivity, specificity, precision, and accuracy. Interestingly, based on the analyses, the IBK classifier had the highest misclassified data and the integrated system improved its classification accuracy from 95.61% to 98.53%. Conclusion: The findings demonstrated that the integration of WEKA and myCBR plugin with protégé had unprecedented results with misclassified data. Thus, providing accurate diagnostics procedures for distinguishing between benign and malignant.