Improving the Accuracy of Misclassified Breast Cancer Data using Machine Learning

Rong-Ho  Lin; Benjamin Kofi Kujabi; Chun-Ling Chuang; Yueh-Chung Chen; Chang-Ming Chen

PDF

Published: May 5, 2022

Keywords:

Misclassified data, Classifiers, WEKA, myCBR, protégé

Rong-Ho Lin

National Taipei University of Technology Department of Industrial Engineering and Management. 1, Sec. 3, Zhongxiao E. Rd., Taipei 10608 Taiwan, ROC. Taipei, Taiwan

Benjamin Kofi Kujabi

National Taipei University of Technology Department of Industrial Engineering and Management. 1, Sec. 3, Zhongxiao E. Rd., Taipei 10608 Taiwan, ROC. Taipei, Taiwan

Chun-Ling Chuang

Kainan University Department of Information Management

Yueh-Chung Chen

Division of Cardiology, Department of Internal Medicine, Taipei City Hospital, Renai Branch, Taipei

Chang-Ming Chen

Radiation Oncology Department Tri-Service General Hospital

Abstract

Background: Breast cancer is the most common cancer among women. Many studies have made significant gains to classify breast cancer tumors with much emphasis on the best algorithm and highest classification accuracy but with limited interest in correcting misclassified data (Type 1 and Type 2 errors). Objective: This research proposes a novel hybrid integrated system of WEKA (Waikato Environment for Knowledge Analysis) and case-based reasoning (CBR) using myCBR plugin with protégé for the classification of breast cancer tumors and correction of misclassified data (Type 1 and Type 2 errors) of breast cancer tumors. Methods: The Wisconsin breast cancer dataset retrieved from the Wisconsin university repository was used in this research. The dataset contained 699 instances, 2 classes (malignant and benign), and 9 integer-valued attributes. To determine the breast cancer tumors, we applied the J48, IBK, LibSVM, JRip, and Multi-Layer Perceptron (MLP) classifiers to classify the breast cancer tumors. Next, the myCBR plugin with protégé was used as an advanced modeling technique to correct the misclassified data and enhance its accuracy. Results: The proposed model performance evaluation was based on sensitivity, specificity, precision, and accuracy. Interestingly, based on the analyses, the IBK classifier had the highest misclassified data and the integrated system improved its classification accuracy from 95.61% to 98.53%. Conclusion: The findings demonstrated that the integration of WEKA and myCBR plugin with protégé had unprecedented results with misclassified data. Thus, providing accurate diagnostics procedures for distinguishing between benign and malignant.

Downloads

Download data is not yet available.

How to Cite

Lin, R.-H. ., Kujabi, B. K., Chuang, C.-L., Chen, Y.-C., & Chen, C.-M. (2022). Improving the Accuracy of Misclassified Breast Cancer Data using Machine Learning. Eximia, 4(1), 19–32. Retrieved from https://www.eximiajournal.techniumscience.pluscommunication.eu/index.php/eximia/article/view/100

Issue

Vol. 4 No. 1 (2022): Eximia Science

Section

Articles

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details