Data mining approach to predict BRCA1 gene mutation
Volume 1, Issue 2 (2013), pp. 155–170
Pub. online: 18 September 2013
Type: Article
Open Access
Received
23 July 2013
23 July 2013
Accepted
21 August 2013
21 August 2013
Published
18 September 2013
18 September 2013
Abstract
Breast cancer is the most frequent women cancer form and one of the leading mortality causes among women around the world. Patients with pathological mutation of a BRCA gene have 65% lifelong breast cancer probability. It is known that such patients have different cause of illness. In this study, we have proposed a new approach for the prediction of BRCA mutation carriers by methodically applying knowledge discovery steps and utilizing data mining methods. An alternative BRCA risk assessment model has been created utilizing decision tree classifier model. The biggest challenge was a very small size and imbalanced nature of the initial dataset, which have been collected by clinicians during 4 years of clinical trial. Iterative optimization of initial dataset, optimal algorithms selection and their parameterization have resulted in higher classifier model performance, with acceptable prediction accuracy for the clinical usage. In this study, three data mining problems have been analyzed using eleven data mining algorithms.