<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">csat</journal-id>
      <journal-title-group>
        <journal-title>Computational Science and Techniques</journal-title>
      </journal-title-group>
      <issn pub-type="epub"/>
      <issn pub-type="ppub"/>
      <publisher>
        <publisher-name>KU</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">82_513_1_LE_NIAKSU</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Data mining approach to predict BRCA1 gene mutation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Gedminaitė</surname>
            <given-names>Jurgita</given-names>
          </name>
          <email xlink:href="mailto:jurgita.gedminaite@lmu.lt">jurgita.gedminaite@lmu.lt</email>
          <xref ref-type="aff" rid="j_csat_aff_000"/>
        </contrib>
        <aff id="j_csat_aff_000">Vilnius University</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Kurasova</surname>
            <given-names>Olga</given-names>
          </name>
          <email xlink:href="mailto:olga.kurasova@mii.vu.lt">olga.kurasova@mii.vu.lt</email>
          <xref ref-type="aff" rid="j_csat_aff_001"/>
        </contrib>
        <aff id="j_csat_aff_001">Vilnius University</aff>
        <contrib contrib-type="author">
          <name>
            <surname>Niakšu</surname>
            <given-names>Olegas</given-names>
          </name>
          <email xlink:href="mailto:olegas.niaksu@mii.vu.lt">olegas.niaksu@mii.vu.lt</email>
          <xref ref-type="aff" rid="j_csat_aff_002"/>
          <xref ref-type="corresp" rid="cor3">∗∗∗</xref>
        </contrib>
        <aff id="j_csat_aff_002">Vilnius University</aff>
      </contrib-group>
      <author-notes>
        <corresp id="cor3"><label>∗∗∗</label>Corresponding author.</corresp>
      </author-notes>
      <volume>1</volume>
      <issue>2</issue>
      <fpage>155</fpage>
      <lpage>170</lpage>
      <pub-date pub-type="epub">
        <day>18</day>
        <month>09</month>
        <year>2013</year>
      </pub-date>
      <history>
        <date date-type="received">
          <day>23</day>
          <month>07</month>
          <year>2013</year>
        </date>
        <date date-type="accepted">
          <day>21</day>
          <month>08</month>
          <year>2013</year>
        </date>
      </history>
      <permissions>
        <copyright-year>2013</copyright-year>
        <license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/">
          <license-p>Creative Commons Attribution 3.0 License</license-p>
        </license>
      </permissions>
      <abstract>
        <p>Breast cancer is the most frequent women cancer form and one of the leading mortality causes among women around the world. Patients with pathological mutation of a BRCA gene have 65% lifelong breast cancer probability. It is known that such patients have different cause of illness. In this study, we have proposed a new approach for the prediction of BRCA mutation carriers by methodically applying knowledge discovery steps and utilizing data mining methods. An alternative BRCA risk assessment model has been created utilizing decision tree classifier model. The biggest challenge was a very small size and imbalanced nature of the initial dataset, which have been collected by clinicians during 4 years of clinical trial. Iterative optimization of initial dataset, optimal algorithms selection and their parameterization have resulted in higher classifier model performance, with acceptable prediction accuracy for the clinical usage. In this study, three data mining problems have been analyzed using eleven data mining algorithms.</p>
      </abstract>
    </article-meta>
  </front>
</article>
