Breast cancer is mostly seen as common in women thus increasing the mortality rate in women. Diagnosing breast cancer at an early stage is a challenging task and time-consuming. This research article focuses on diagnosing breast cancer at an early stage by using deep learning and machine learning techniques. It uses the Wisconsin Breast Cancer Dataset and implemented techniques such as logistic regression, SVM, KNN, Random Forest, and MLP techniques. Random Forest and SVM ranked better for predictive analysis, in this context they achieved an accuracy of 96.5%, whereas Deep Learning models such as ANN and CNN have been used to increase the accuracy of predictive analysis and reached up to 99.3% and 97.3% respectively, by using activation function such as Relu and Sigmoid.

Introduction:

Breast cancer is ranked 2nd highest dangerous and increasing cancer after Lung cancer. It includes 11% new cases with nearly a quarter targeting woman [1]. People visit oncologists and oncologists diagnose cancer using Mammograms, Magnetic resonance imaging (MRI) of the breast, ultrasound of X-ray of the breast, tissue biopsy, etc. Once breast cancer is confirmed, a sentinel node biopsy of the patient is done regularly which helps to detect cancerous cells in lymph nodes. Machine Learning techniques are used to differentiate between benign and malignant Tumor [2]. Timely identification is necessary for increasing the survival rate and for enhancing the prediction results. Data Mining in the Medical field expedites outcome predictions, cost reduction, and overall enhanced results. Together with machine learning and Deep learning techniques, early diagnosis of cancer becomes easier and more significant. Deep learning techniques give more accurate results rather than ML when data is complex and exceptionally large [3-9].

About Data Set:

A very well-known dataset used in ML and DL techniques for the diagnosis of Breast Cancer. It contained features assessed from digitized images of fine needle aspirates (FNA) of breast masses. These features are used to predict whether the tumor is cancerous or not. This dataset contains 569 samples with 30 attributes accomplished by biopsy images. Attributes include radius, smoothness texture, etc. Researchers utilized this dataset to evaluate different models and techniques of ML and DL.

Methodology:

This article deals with diagnosing breast cancer using ML and DL techniques on the Wisconsin Breast Cancer Dataset which is public [10]. This dataset contains 569 samples with 30 attributes accomplished by biopsy images. This study is divided into two main parts data preprocessing and model formation and its evaluation.

Fig 1 describes the steps that have been followed to form and evaluate the model.

Figure 1: Block Diagram of Proposed methodology.

The first step is data exploration and pre-processing which includes methods such as Label Encoder and normalization. Label Encoder is a competitive tool for converting categorical features into numerical features. In this article, malignant and benign values have been classified as 0 and 1.

In the Normalized method, all values have been rescaled in the range of 0 and 1. For normalization, we used this formula:

Figure 2: x_{i}/(sqrt(x_{i} ^ 2 + y_{i} ^ 2 + z_{i} ^ 2))

Preprocessing encompasses the splitting of data into training and testing for the creation of a model.75% have been used for training and the rest used for testing. Several ML techniques such as Logistic Regression, SVM, and KNN have been applied to create a model that precisely predicts breast cancer [11].

In this article, model outcomes are classified into 2 groups namely, M (malignant) or B (benign).

KNN is a supervised ML technique because data is labeled. The classification of test data points depends on the nearest class [12]. SVM is an ML technique used as a training algorithm for regression and classification tasks by creating decision boundaries [13]. Random forest is applied next on the dataset for creating decision trees on the dataset, getting the prediction from each tree, and predicting the best solution out of it. A Decision tree has also been applied on the dataset to increase the accuracy of the model. The naïve Bayes classifiers, which is a probabilistic technique have been applied next assuming the strong independence between features.

The accuracy achieved after applying these techniques is not sufficiently high so therefore, deep learning techniques such as ANN and CNN have been employed. A convolutional layer network takes in images as input, confers the values to weights and biases and so differentiates one object from the other significantly [14]. The final algorithm that has been used is ANN artificial neural network which is prevalent in the domain of science and technology for its distinguished capabilities. This technique is widely used for biomedical problems commonly in the classification and prediction domain [15].

Implementation:

Multiple ML and DL techniques are present in the industry for the classification process. The preprocessing techniques such as Label Encoder and Normalization have been executed to handle the data proficiently. The bar graph in Fig 2 shows the actual frequency of benign and malignant cells in the dataset attained after applying Label Encoder.

Figure 3: Number of malignant and benign.

The ML techniques used in this project are Logistic Regression, Random Forest, K-Nearest Neighbour, Naive Bayes, Decision Tree, and Support Vector Machine. The Deep Learning techniques used for predicting breast cancer are ANN and CNN. In this article, both of them have been implemented, Table 1 and Table 2 depict the parameters that have been used for both models. The parameters include are the number of neurons, the number of inputs, the number of epochs for which the model was trained, and the activation function.

Table 1: Parameters used in CNN model.

Number of Neurons	Con Layer1- 36 Con Layer2- 64
Number of Input	30
Number of epochs	50
Activation Function	ReLU, Sigmoid

Table 2: Parameters used in ANN model.

Number of Neurons	15
Number of Input	30
Number of epochs	50
Activation Function	ReLU, Sigmoid

Results And Discussion:

Several ML techniques for instance, KNN, Random Forest, SVM, Decision Tree, Naïve Bayes, and Logistic Regression have been executed for predicted breast cancer on the Wisconsin dataset. The optimal accuracy acquired is 96.5% by SVM and Random Forest algorithm. To achieve maximum prediction accuracy, we employed ANN and CNN, Deep learning techniques.

Fig 3 and Fig 4 represent the model loss and model accuracy in graphical form concerning the number of epochs run in the ANN model. The number of epochs is inversely proportional to the loss of the model. As the number of epochs increases, the loss decreases, and accuracy increases.

Figure 4: Graph plot for Model Accuracy.

Figure 5: Graph plot for Model Loss.

The accuracy obtained in the case of ANN and CNN is 99.3% and 97.3% respectively, which were superior to ML techniques as mentioned above. DL techniques turned out to be more efficient than ML in this domain due to the use of activation functions such as Relu and Sigmoid. By applying activation functions, we get the results in probability rather than just 0 and 1 labels (0 for benign and 1 for malignant) as in conventional ML techniques. Table 3 shows the accuracies obtained through different techniques.

Table 3: Comparison of ML and DL algorithm.

Algorithm	Accuracy	Precision	Sensitivity
KNN	0.95	0.95	0.99
SVM	0.96	0.98	0.97
Decision tree	0.95	0.99	0.93
Naïve Bayes	0.92	0.93	0.94
Logistic Regression	0.94	0.96	0.96
Random Forest	0.96	0.98	0.97
CNN	0.97	0.97	0.98
ANN	0.99	0.99	0.99

Conclusion:

This research article implements various ML and DL techniques and compares their accuracy. Among ML techniques SVM and Random Forest outperformed with the accuracy of 96.5%. In contrast to DL, CNN, and ANN yield higher accuracy of 97.3% and 99.3%. Deduction reached that DL outperformed in terms of accuracy. Furthermore, due to activation functions such as Relu and Sigmoid, the results lie in the range of 0 and 1, probabilistic models, which was not possible in traditional ML algorithms.

References:

World Health Organization. Accessed on: Feb 13, 2020. Available: https://www.who.int/news-room/fact-sheets/detail/cancer.
Yi-Sheng Sun, Zhao Zhao, Han-Ping-Zhu,” Risk factors and Preventions of Breast Cancer” International Journal of Biological Sciences.
Dongdong Sun, M. Wang, H. Feng, and Ao Li, “Prognosis prediction of human breast cancer by integrating deep neural network and support vector machine: Supervised feature extraction and classification for breast cancer prognosis prediction,” 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering, and Informatics (CISP-BMEI).
D. Selvathi and A Aarthypoornila, “Performance analysis of various classifiers on deep learning network for breast cancer detection,” International Conference on Signal Processing and Communication (ICSPC).
Tiancheng He, M. Puppala, R. Ogunti, J.J. Mancuso, Xiaohui Yu, J. C. Chang, T. A. Patel, and S.T. C. Wong, “Deep learning analytics for diagnostic support of breast cancer disease management,” 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).
N. Khuriwal and N. Mishra, “Breast Cancer Diagnosis Using Deep Learning Algorithm,” 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN).
Ch. Shravya, K. Pravalika, Sk. Subhani, “Prediction of Breast Cancer Using Supervised Machine Learning Techniques,” 2019 International Journal of Innovative Technology and Exploring Engineering.
J. Ferlay, C. Héry, P. Autier, and R. Sankaranarayanan, “Global burden of breast cancer,” in Breast cancer epidemiology, ed: Springer, 2010, pp. 119.
M. J. Van De Vijver, Y. D. He, L. J. Van’t Veer, H. Dai, A. A. Hart, D. W. Voskuil, et al., “A gene-expression signature as a predictor of survival in breast cancer,” New England Journal of Medicine, vol. 347, pp. 19992009, 2002.
Breast Cancer Wisconsin Dataset, Kaggle. Accessed on: Feb 13, 2020. Available: https://www.kaggle.com/uciml/breast-cancerwisconsin-data.
Chao-Ying, Joanne, Peng Kuk Lida Lee, Gary M. Ingersoll – “An Introduction to Logistic Regression Analysis and Reporting “, September/October 2002.
Mohammad Bol and raftar and Sadegh Bafandeh Imandoust – “Application of K-Nearest Neighbour (KNN) Approach for Predicting Economic Events: Theoretical Background”- International Journal of Engineering Research and Applications Vol. 3, Issue 5, Sep-Oct 2013.
Ebrahim Edriss Ebrahim Ali1, Wu Zhi Feng2- “Breast Cancer Classification using Support Vector Machine and Neural Network”– International Journal of Science and Research (IJSR), 3March 2016.
Mr. Madhan S, Priyadarshini P, Brindha C, Bairavi B, “Predicting Breast Cancer using Convolutional Neural Network,” SSRG International Journal of Computer Science and Engineering (SSRG – IJCSE) – Special Issue ICMR Mar 2019.
Ismail Saritas, “Prediction of Breast Cancer Using Artificial Neural Networks,” Article in Journal of Medical Systems 36(5):2901-7, August 2011.