Predicting Breast Cancer Survivability: A Comparison of Three Data Mining Methods

Hussain, Omead Ibraheem (2020) Predicting Breast Cancer Survivability: A Comparison of Three Data Mining Methods. Cihan University-Erbil Journal of Humanities and Social Sciences, 4 (1). pp. 17-30. ISSN 2707-6342

[thumbnail of Research Article] Text (Research Article)
Article_CUEJHSS_10-02-2020.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB)

Abstract

this study concentrates on Predicting Breast Cancer Survivability using data mining, and comparing between three main predictive modeling tools. Precisely, we used three popular data mining methods: two from machine learning (artificial neural network and decision trees) and one from statistics (logistic regression), and aimed to choose the best model through the efficiency of each model and with the most effective variables to these models and the most common important predictor. We defined the three main modeling aims and uses by demonstrating the purpose of the modeling. By using data mining, we can begin to characterize and describe trends and patterns that reside in data and information. The preprocessed data set contents were of 87 variables and the total of the records are 457,389; which became 93 variables and 90308 records for each variable, and these dataset were from the SEER database. We have achieved more than three data mining techniques and we have investigated all the data mining techniques and finally we find the best thing to do is to focus about these data mining techniques which are Artificial Neural Network, Decision Trees and Logistic Regression by using SAS Enterprise Miner 5.2 which is in our view of point is the suitable system to use according to the facilities and the results given to us. Several experiments have been conducted using these algorithms. The achieved prediction implementations are Comparison-based techniques. However, we have found out that the neural network has a much better performance than the other two techniques. Finally, we can say that the model we chose has the highest accuracy which specialists in the breast cancer field can use and depend on.

Item Type: Article
Uncontrolled Keywords: Predicting Breast Cancer, Data mining, SEER database, Artificial Neural Network.
Subjects: H Social Sciences > H Social Sciences (General)
H Social Sciences > HG Finance
Divisions: Department of Banking and Financial Sciences > Research papers
Depositing User: ePrints Depositor
Date Deposited: 06 Oct 2024 07:48
Last Modified: 06 Oct 2024 07:48
URI: https://eprints.cihanuniversity.edu.iq/id/eprint/1709

Actions (login required)

View Item
View Item