Está en la página 1de 45

UNIDAD ACADÉMICA DE INGENIERÍA CIVIL

CARRERA DE INGENIERÍA DE SISTEMAS

COMPARAR MODELOS DISEÑADOS MEDIANTE REDES


NEURONALES PARA RECONOCIMIENTO DE PATRONES EN
IMÁGENES MÉDICAS Y DETECCIÓN DE CÁNCER DE MAMAS

AGUILAR NIETO CARLOS ANDRES


INGENIERO DE SISTEMAS

SORIANO TABOADA DANNY OMAR


INGENIERO DE SISTEMAS

MACHALA
2017
UNIDAD ACADÉMICA DE INGENIERÍA CIVIL

CARRERA DE INGENIERÍA DE SISTEMAS

COMPARAR MODELOS DISEÑADOS MEDIANTE REDES


NEURONALES PARA RECONOCIMIENTO DE PATRONES EN
IMÁGENES MÉDICAS Y DETECCIÓN DE CÁNCER DE MAMAS

AGUILAR NIETO CARLOS ANDRES


INGENIERO DE SISTEMAS

SORIANO TABOADA DANNY OMAR


INGENIERO DE SISTEMAS

MACHALA
2017
UNIDAD ACADÉMICA DE INGENIERÍA CIVIL

CARRERA DE INGENIERÍA DE SISTEMAS

TRABAJO TITULACIÓN
ENSAYOS O ÁRTICULOS ACADÉMICOS

COMPARAR MODELOS DISEÑADOS MEDIANTE REDES NEURONALES PARA


RECONOCIMIENTO DE PATRONES EN IMÁGENES MÉDICAS Y DETECCIÓN DE
CÁNCER DE MAMAS

AGUILAR NIETO CARLOS ANDRES


INGENIERO DE SISTEMAS

SORIANO TABOADA DANNY OMAR


INGENIERO DE SISTEMAS

RIVAS ASANZA WILMER BRAULIO

MACHALA, 20 DE SEPTIEMBRE DE 2017

MACHALA
2017
EL REPORTE URKUND DE ESTE TRABAJO
NO ES NECESARIO DEBIDO QUE LA
OPCIÓN DE TITULACIÓN ES DE ARTÍCULO
ACADÉMICO Y LA REVISTA A PUBLICAR
REALIZA EL PROCEDIMIENTO RESPECTIVO
SOBRE LA VERIFICACIÓN DEL
ANTIPLAGIO
DEDICATORIA

Este trabajo está dedicado principalmente a nuestras familias que han sido nuestra
fortaleza para seguir adelante, dándonos los consejos e inculcándonos los valores
necesarios que nos han permitido convertirnos en personas responsables.

Carlos Andrés Aguilar Nieto - Danny Omar Soriano Taboada

III
AGRADECIMIENTO

Ante todo, queremos agradecer a Dios, a nuestros padres y hermanos por estar
junto a nosotros en todo momento, dándonos el soporte, el apoyo y la fortaleza
necesaria para seguir adelante, para de esta manera ir cumpliendo con todas
nuestras metas planteadas.

Así mismo agradecemos a la Universidad Técnica de Machala y en general a todo el


personal de la Unidad Académica de Ingeniería Civil - Carrera de Ingeniería de
Sistemas por brindarnos esa valiosa oportunidad de formarnos como profesionales
de excelencia, capaz de brindar soluciones que aporten y den algún beneficio a
nuestra comunidad.

Carlos Andrés Aguilar Nieto - Danny Omar Soriano Taboada

IV
RESUMEN

Este trabajo presenta la comparación de dos modelos o esquemas de clasificación


de mamografías basados en redes neuronales convolucionales (CNN). La principal
diferencia de estos esquemas de clasificación se basa en el espacio de color de los
datos de imagen, comprendidos por escala de grises (1 canal) para el modelo 1 y
RGB (3 canales) para el modelo 2, de igual manera con las técnicas de extracción
de características y el número de neuronas en la capa totalmente conectada que
difieren en 32 neuronas para el modelo 1 y 1024 para el modelo 2. Fueron utilizadas
1070 mamografías de la DDSM (Digital Database for Screening Mammography) de
la universidad del sur de la florida alojadas en un servidor FTP, que se dividen en
dos categorías: mamografías benignas y malignas. Dentro de la base de datos, se
realizó un proceso de conversión de formato de cada imagen para utilizarlas en
nuestras redes neuronales convolucionales, por razón de que las mamografías
digitalizadas se encuentran en un tipo de compresión de archivos personalizada
desarrollada por los digitalizadores. Para obtener una mayor precisión en los
resultados finales, teniendo en cuenta los aspectos no relevantes de la mamografía
se realizó una extracción de solo la zona afectada de cáncer o de interés de todas
las mamografías a utilizar para mejorar la calidad de los datos que servirán como
entrada en los modelos, de esta manera, se disminuyó la carga en el computador
haciendo que los tiempos de ejecución se reduzcan debido a que el número de
ejemplos de cáncer a utilizar es menor. Se utilizó la red neuronal convolucional
(CNN) para la clasificación mediante la aplicación de la biblioteca de aprendizaje
automático (machine learning) de google Tensorflow que se configuró para trabajar
sobre la librería de alto nivel Keras. La ejecución de los modelos se realizó en una
computadora Dell inspiron 2350 con Windows 10 SL con un procesador i7-6700 con
una velocidad de reloj de 2.60 GHz con 12GB de RAM y con una GPU Nvidia
GeForce 940M con 4GB de RAM. Al utilizar Keras se genera un código más legible
y organizado, a diferencia de utilizar Tensorflow puro, donde el extenso codigo
dificulta la construcción y análisis del mismo. Para afinar los parámetros de nuestros
modelos de clasificación para que nos arrojen resultados óptimos, se aplicaron
algoritmos de búsqueda aleatoria (random search) y de cuadrícula (grid search),
combinando el tamaño del lote (batch size), los optimizadores (Adadelta, RMSprop y
SGD), las funciones de pérdida (categorical cross entropy y mean squared error), el

V
número de capas congeladas de Inception-v3 y la tasa de aprendizaje (learning
rate). La característica resaltante del modelo que arrojó mejores resultados, fue la
utilización de pesos pre-entrenados de Inception-v3 en donde se volvió a entrenar
solo una parte de toda la red neuronal mejorando la tarea de clasificación de todo
nuestro conjunto de datos. El desempeño de nuestros algoritmos de clasificación se
evalúa por los resultados obtenidos en validación de precisión y pérdida. El modelo
con la mejor precisión tiene 85%, y un error cuadrático medio de 15%.

VI
CONTENIDO

DEDICATORIA ................................................................................................................... III


AGRADECIMIENTO ........................................................................................................... IV
RESUMEN ........................................................................................................................... V
Mammogram Classification Schemes by Using Convolutional Neural Networks ............... - 9 -
1 Introduction .......................................................................................................... - 9 -
1.1 Related work ................................................................................................................. - 10 -
2 Materials and Methods ..................................................................................... - 12 -
2.1 Dataset creation ........................................................................................................... - 12 -
2.2 Preprocessing................................................................................................................ - 13 -
2.3 Feature Extraction and Classification ........................................................................... - 14 -
2.4 Metrics for Evaluation .................................................................................................. - 18 -
3 Experimental Results ........................................................................................ - 20 -
3.1 Random search ............................................................................................................. - 20 -
3.2 Grid search .................................................................................................................... - 21 -
4 Conclusion ........................................................................................................ - 23 -
References .................................................................................................................. - 24 -
INDICE ........................................................................................................................... - 26 -
ANEXOS ......................................................................................................................... - 27 -

VII
LISTA DE GRÁFICOS O ILUSTRACIONES

Fig. 1: Example of the ROIs in the dataset for benign and malignant mammograms ....... - 13 -
Fig. 2: Stages for preprocessing ROIs obtained from the DDSM dataset ........................ - 14 -
Fig. 3: Stage of feature extraction in the model Inception-v3. Model 1 uses three size of
filters: 5 × 5, 3 × 3 and 1 × 1. The pooling layer applies the techniques: max pooling and
average pooling. Model 2 uses two size of filters: 5×5 and 3×3. The pooling lay ............. - 17 -
Fig. 4: Scheme of our proposed CNN with fully connected layer and Softmax as the output
layer. Model 1 has a fully connected layer of 1024 neurons, while Model 2 presents 32
neurons. .......................................................................................................................... - 18 -

ÍNDICE DE TABLAS

Table 1: Summary of main differences between model 1 and model 2 ............................ - 20 -


Table 2: First grid search results with categorical cross-entropy function ........................ - 21 -
Table 3: First grid search results with mean squared error function................................. - 22 -
Table 4: Second grid search results with categorical cross-entropy function ................... - 22 -
Table 5: Second grid search results with mean squared error function ........................... - 23 -

VIII
Mammogram Classification Schemes by Using Convolutional Neural Networks
Danny Soriano 1, Carlos Aguilar 1, Wilmer Rivas 1,
Ivan Ramirez-Morales2, Maritza Pinta 1, and Eduardo Tusa 1
1 Unidad Academica de Ingenierıa Civil
2 Unidad Academica de Ciencias Agropecuarias
Universidad Tecnica de Machala, Machala, El Oro 070210, Ecuador,
{dosoriano est caaguilarn est, wrivas, iramirez, mpinta,
etusa}@utmachala.edu.ec

Abstract. This work presents the comparison of two schemes of mammogram classification based on
convolutional neural networks (CNN). The main difference of these classification schemes relies on the
color space of the image data, the feature extraction techniques and the number of neurons in the fully
connected layer. We use 1070 mammograms from the Digital Database for Screening Mammography
(DDSM), which are divided into two categories: benign and malignant mammograms. We use CNN for
classification by applying the open source library TensorFlow which is configured on the high level
library Keras. In order to tune the parameters of our classification models, we apply a random and grid
search algorithms, by combining the batch size, the optimizers (Adadelta, RMSProp and SGD), the
number of layers and the learning rate. The performance of our classification algorithms is evaluated by
the accuracy and two loss functions (categorial cross-entropy and mean squared error). The model with
the best accuracy has 85%, and a mean squared error of 15%.

Keywords: Breast cancer, convolutional neural network, machine learning, mammography, TensorFlow.

1 Introduction
The effects of cancer have significant implications around the world. According to
statistics from the American Cancer Society, there were 14.1 million new cancer
cases and 8.2 million cancer deaths worldwide in 2012. In the US, there will be an
estimated 1.688.780 new cancer cases diagnosed and 600.920 cancer deaths in
2017 [1]. In Ecuador, the increase of advanced cases reflects a poor coverage of
impoverished populations, mainly, in rural areas. According to Godoy et al.[2], 30 of
100.000 people are diagnosed each year with breast cancer in Ecuador.
Approximately, half of these cases are in advance stage, which decreases the
likelihood of successful treatment and survival.
The incidence rate of breast cancer in Ecuador has been growing over the last
years, reaching up to 32.7 in 2012 [3]. The role of the early diagnostic techniques
becomes imperative for saving lives. Screening mammograms are used to detect
abnormalities in women who have no apparent symptoms. Diagnostic
mammography is applied for women who present with clinical signs and symptoms.
The recommendation of the American College of Radiology is a more

-9-
comprehensive auditing of outcomes associated with all mammograms, with
statistics prepared separately for screening and diagnostic mammography [4].
The research area of computer vision and machine learning has reported
numerous algorithms covering medical image registration, segmentation and edge
detection for medical image content analysis, computer-aided diagnosis (CAD) with
specific coverage on mammogram analysis towards breast cancer screening, and
other applications [5]. This work presents the comparison of two schemes of
mammogram classification based on convolutional neural networks (CNN). We test
our algorithms by using the Digital Database for Screening Mammography
(DDSM), in order to discriminate two categories: benign and malignant
mammograms.

1.1 Related work


The task of mammogram classification has been approached by comparing
different learning algorithms widely available in computational libraries.
Kamalakannan et al. [6], evaluate the sensitivity, specificity and accuracy on seven
classifier techniques: Decision Tree (DT), K-Nearest Neighbour (KNN), Fuzzy K-
Nearest Neighbor (FKNN), Nave Bayes (NB), Artificial Neural Network (ANN),
Ensemble and Support Vector Machines (SVM) by selecting proper region of
interest (ROI) from mammograms, whose database is not mentioned explicitly.
Don et al. [7] extract the salient features such as Fractal Dimension (FD) and
Fractal Signature (FS) as texture descriptors. For classification purposes, a
trainable multilayer feed forward Neural Network (NN) has been designed and
compared with K-Means. The algorithm reveals a good performance rate of 98% by
using the DDSM and the Mammographic Image Analysis Society (MIAS) dataset
for discriminating in four tissue density categories.
Jankulovski et al. [8] use 326 annotated images from MIAS database to classifiy
normal and abnormal tissue. While they compare three supervised classifiers:
SVM, KNN and Random Forest (RF), six different feature descriptors are
evaluated: Local Binary Pattern (LBP), Gray Level Difference Method (GLDM),
Gray Level Run Length Matrix (GLRLM), Haralick, Gabor filters and a combined
descriptor. The best results were obtained when the images were described using
GLDM together with the SVM as a classification technique.

- 10 -
Xie et al. [9] present a novel CAD system for the diagnosis of breast cancer based
on Extreme Learning Machine (ELM). They implement a model of multidimensional
feature vectors, where feature selection is done by SVM and ELM. The algorithm
classify malignant masses from benign ones from DDSM and MIAS database.
Authors compare the classification performance with SVM and Particle Swarm
Optimization-Support Vector Machine (PSO-SVM). The system achieve an average
accuracy of 96.02%.
Important contributions for mammography classification have been implemented by
different authors. Taghanaki et al. [10] introduce a novel multiobjective optimization
of deep auto-encoder networks, in which the auto-encoder optimizes two
objectives: Mean Squared Reconstruction Error (MRE) and Mean Classification
Error (MCE) for Pareto-optimal solutions. These two objectives are optimized
simultaneously by a non-dominated sorting genetic algorithm. They use 949 X-ray
mammograms categorized into 12 classes, obtaining a classification accuracy of up
to 98.45%.
Khan et al. [11] propose the concept of Cartesian Genetic Programming
(CGP) for training both differentiable and non-differentiable parameters of Wavelet
Neural Networks (WNNs). They use a representative subset of 200 areas of benign
and malignant tumors from the DDSM dataset. The proposed WNNs perform
competitively in comparison to several other methods.
Magna et al. [12] investigate the properties of a classifier based on an ensemble of
Adaptive Artificial Immune Networks (A2INET) applied to original mammography
image from DDSM and MIAS datasets in order to detect bilateral asymmetry, which
is known to be correlated with increased breast cancer risk. Classification models
were trained using a set of descriptors measuring the degree of similarity of paired
regions of the left and right breasts. The accuracy level is 0.90, sensitivity is 0.93
and specificity is 0.87.
Pratiwi et al. [13] classify various type of breast cancers into benign and malignant
abnormalities by using Radial Basis Function Neural Network (RBFNN) and GLCM
texture based features. Instead of using 14 Haralick descriptors, the most dominant
four are used: ASM, Correlation, Sum Entropy, and Sum Variance. They use the
MIAS dataset to obtain an accuracy of 93.98%.
Buciu and Gacsadi [14] use Gabor wavelets based features extracted from
mammograms representing normal tissues, or benign and malign tumors. Once

- 11 -
features are detected, Principal Component Analysis (PCA) is further employed to
reduce data dimensionality. To an end, directional properties and frequency
spectrum of those features are analyzed with respect to the classification
performance by employing multiclass support vector machines as classifier. For
comparison, as baseline, PCA was also applied directly to the data set with no
feature filtering. The directional properties of Gabor wavelets provided by their
orientation are important issues to discriminate mammogram tumor types.
Khan et al. [15] contribute to the previous work by finding the optimal parameters of
Gabor filter banks based on an incremental clustering algorithm and Particle
Swarm Optimization (PSO). They employ SVM with Gaussian kernel as a fitness
function for PSO. The effect of optimized Gabor filter bank was evaluated on 1024
ROIs extracted from DDSM dataset. The proposed method enhances the
performance and reduces the computational cost.
Srinivasan and Venkata [16], use 322 images from MIAS database to classify
mammograms in three categories: normal, benign and malignant. They select 180
images (60 images for each class) to train the algorithm of Discrete Gabor Wavelet
Transforms based on Hidden Markov Model for classification. The performance of
their technique yields 90% of accuracy.

2 Materials and Methods

2.1 Dataset creation


The dataset for training and evaluating our approach was obtained from DDSM
dataset [17], which belongs to the University of South Florida [18]. We identify two
types of cases: malignant and benign mammograms. In each downloaded
directories, there are 5 files corresponding to the mammograms digitized in two
different projections: Cranial-Caudal (CC) and Mediolateral-Oblique (MLO) in
“LJPG” format. In addition to the images, there is an “.ics” file containing detailed
information for each case, regarding patient data and image resolution. The
digitized mammography in the “LJPG” format is converted to any standard image
format by using a repository hosted in GitHub [19]. Previously, it is necessary to
install a Linux environment called Cygwin that requires two libraries: Ruby and
Imagemagick. The steps for image file format conversion are described as follows:

1. Select the file associated to the cases to convert.

- 12 -
2. Run the jpeg.exe file for each of the mammograms scanned in LJPEG with
the following command: ./jpeg.exe -d -s [filename]. A file with the same name
as the original will be created but with a “.1” extension.
3. Run ddsmraw2pnm.exe with the previously created file by using the
command: ./ddsmraw2pnm.exe [created file] [height] [width] [lumisys / dba /
howtek]. The last 3 fields of this command are extracted from the “.ics” file
corresponding to the image of the selected case. The output is a file with
“.pnm” extension.
4. Apply the command: convert [file created] -resize [width] x [height]ˆ-gravity
center -extent [width] x [height] [new file name] .jpg. The height and width
resolution are according to the “.ics” file. The last parameter correspond to
the name of the new file with its respective format.

After this process, we get mammograms digitized in the selected image format, in
this case, “JPG”. Next, we do selection of the mammograms to crop the ROI that
represents the affected tissue, as it observed in Fig. 1.

Fig. 1: Example of the ROIs in the dataset for benign and malignant mammograms

2.2 Preprocessing
In the previous stage, we obtain 1070 images, which are divided into 535
mammograms for each class: benign and malignant. Two schemes or models are
defined during preprocessing. The image data for Model 1 is defined in the RGB
color space, while the image data for Model 2 are obtained by converting the image
from RGB to grayscale color space through the OpenCV Library. These processes
are differentiated in Fig. 2.

- 13 -
Fig. 2: Stages for preprocessing ROIs obtained from the DDSM dataset

This configuration creates vectors with the pixel values for each image, as we
describe below:

1. Upload image from your directory.


2. Define the color space for Model 1 and 2.
3. Reshape all uploaded images in a list of 89401-dimensional vectors
(299×299 image), which is an array format obtained through library Numpy.
4. Define the pixel values in “float32” format and save them in other vector.
5. Normalize the pixel values of the entire vector between 0 and 1.
6. Create a vector of labels with a size proportional to the total of the images in
the training subset. A label value of 0 is assigned for benign cases and 1 for
malignant cases.
7. Divide the data into two parts, 90% for training and 10% for validation

2.3 Feature Extraction and Classification


Neural Networks (NN) are a set of interconnected artificial neurons that perform
the functions of learning, memorization, generalization or abstraction of
characteristics; by posing a computational model similar to the functioning of the
human brain. NN allow excellent results in applications where the inputs to the
model contain noise, randomness or are incomplete, being used for text-tospeech
processing, or for character and image recognition, and are also used in the
robotics area.

The basic structure of a neuron can be theoretically set by X = {xi , i = 1,2,...,n}


as the inputs to the neuron and Y represents the output. Each input is multiplied by
its weight wi, a bias b is associated with each neuron and their sum goes through a

- 14 -
transfer function f [5]. The relationship between input and output can be described
as follows in equation (1)

(1)

Several approaches have used NN for breast cancer detection in mammography


images . Lucht et al. [20] test the performance of NN for the classification of signal-
time curves obtained from breast masses by dynamic MRI. Signal-time courses
from 105 parenchyma, 162 malignant, and 102 benign tissue regions were
examined. Four neural networks corresponding to different temporal resolutions of
the signal-time curves were tested. Discrimination between malignant and benign
lesions is best if 28 measurement points are used (sensitivity: 84%, specificity:
81%). All examined networks yielded poor results for the subclassification of the
benign lesions into fibroadenomas andbenign proliferative changes.
Setiawan et al. [21], propose a mammogram classification using Law’s Texture
Energy Measure (LAWS) as texture feature extraction method. NN is used as
classifier for normal-abnormal and benign-malignant images. Training data for the
mammogram classification model is retrieved from subset of 327 images of MIAS
database. Result shows that LAWS provides better accuracy than other similar
method such as GLCM. LAWS provide 93.90% accuracy for normalabnormal and
83.30% for benign-malignant classification.
The application of NN in problem solving is necessary to establish a structure of
convolutional layers (feature extraction) and fully connected (classifier) that apply
learning algorithms and validation [22]. This is why, our approach is based on
Convolutional Neural Networks.

Convolutional Neural Networks (CNN) are special type of NN where instead of


having weights for each and every input, the weights are shared and are convolved
across the input in as a moving window [23]. This convolutional property along with
a pooling layer achieves translation invariance, which especially suits images.
Layers of a typical CNN are formed by: a convolutional layer, an activation function,
a pooling layer and an output layer. [24].

- 15 -
Convolutional layer uses a filter, which is an array of weights, to slide across the
input from the previous layer in order to compute the dot product of the weights and
the entries in the input. These weights are learnt using backpropagation of errors.
An activation function, which applies element-wise non-linearity, is applied to
produce a feature map, where each entry can be considered as an output of a
single neuron from a local small region of the input. Eq. (2) is the mathematical
expression for convolution process in case both the filter and input are two-
dimensional [25].

(2)
where F is a filter of size r × c, I the input from the previous layer l − 1, O the output
at the current layer l (after convolution and before applying the activation function)
and x and y represents the starting indices of row and column for the receptive field
of F with respect to I.
Activation function is a non-linear function such as sigmoid or ReLU, which is
applied to elements of the convolution.
Pooling layer reduces the computational complexity of a CNN when one or more
pooling layers are applied to the feature maps produced by the convolutional
layers, by decreasing the size of the maps. Two commonly used methods are max
pooling and average pooling.
Output layer consists of as many nodes as the number of classes in the dataset.
Softmax function is used for classification at the output.
The performance of extracting features in mammography images of our CNN
model presents two techniques of re-training known as Transfer-learning and Fine-
tuning on the model Inception-v3 [26] launched by Google. It contains a pre-trained
CNN to extract features of the images with a low computational cost and to work in
scenarios of handling large amounts of data [27]. Fig. 3 shows the stage of feature
extraction in Model 1 an 2.

- 16 -
Fig. 3: Stage of feature extraction in the model Inception-v3. Model 1 uses three size
of filters: 5 × 5, 3 × 3 and 1 × 1. The pooling layer applies the techniques: max
pooling and average pooling. Model 2 uses two size of filters: 5×5 and 3×3. The
pooling lay

Inception-v3 allows us to use the weights of its network to perform the task of
classification in a mammography dataset distributed in two classes: benign and
malignant cancer. Taking into account the size and similarity of training data it is
necessary to use the following techniques:
Transfer-learning is a technique used to deal with small or large amounts of
labeled information from a source domain to a destination domain, in order to build
an efficient prediction model [28]. We have taken the CNN from Inceptionv3
because it has been pre-trained by using the imageNet dataset, to freeze their
layers making them non-trainable and then, the last layer is removed fully
connected or softmax, and to add a new layer fully connected, taking the rest of the
network for feature extraction and for training the model with our classes: benign
and malignant mammograms.
Fine-tuning is a technique that allows to establish the number of layers of
Inception-v3 model, that is reentered, freezed and thawed; and the number of
layers that are established previously to obtain better results focused on a specific
problem.
These methods help to build a NN model with greater stability and consistency,
due to the fact, it modifies the values of the weights and biases until the last layer is
stabilized in order to reconvert the rest of them [29]. The scheme of classification is
presented Fig. 4.

- 17 -
Fig. 4: Scheme of our proposed CNN with fully connected layer and Softmax as the
output layer. Model 1 has a fully connected layer of 1024 neurons, while Model 2
presents 32 neurons.

2.4 Metrics for Evaluation


During multi-label learning, the prediction of a value may be correct or incorrect
depending on the number of labels of the classification. We use metrics to quantify
the algorithm performance according to the following definitions:
Accuracy computes the proportion of cases in which, a model classifies correctly
[30]. This metric evaluates the number of positive and negative cases that exist
within the model to obtain a level of accuracy. The following expression in equation
(3) describes this metric:

(3)

where TP is the number of cases of the positively classified positive class, TN is


the number of cases of the negative class correctly classified, P corresponds to the
total of cases of positive class and N corresponds to the total of negative class
cases [31].
Loss is a function used in the model compilation which helps to measure the error.
It is a parameter used for compiling the model in order to define the loss produced
during the model validation. The most common loss functions are:

- 18 -
– Categorical cross-entropy is a method used in NN models in order to avoid
over-learning. This phenomenon can occur for two reasons:
overparameterization or lack of data. Cross-validation takes advantage of the
total information available in the database without dispensing with part of its
records. In cross-validation, the database used in the learning phase (learning
set) is randomly divided in N parts (usually 10). Sequentially, each of these
subsets is reserved for using as a test set against the tree model generated
by the remaining N − 1 subsets. We obtain N different models, where we can
evaluate the accuracy of the classifications in both the learning set (N − 1)
and the test subsets (N). We can select the optimal tree when the precision is
reached in one as in another subset [32].
– Mean squared error is a mathematical method that is applied to observe the
error between the desired value and the value obtained, returning to finish the
training in the first layer [33]. The error rate for the output layer depends
entirely on the value obtained against the expected value of the equation, and
also on the network architecture that arises both in the number of layers to be
used and the number of inputs and outputs defined [34].

Optimizers are necessary arguments to elaborate a NN model in Keras, in such a


way that the gradient descent is tunned properly. These models are tested by the
following optimizers:

– Adadelta is an optimization algorithm used to control the gradient descent. It


is very adaptive dynamically over time because it only needs first order
information and occupies low computational cost, which makes it a very
useful optimizer for NN models [35].
– RMSprop has a different operation to the other optimizers, because it divides
the learning rate for a weight by a running average of the magnitudes of
recent gradients for that weight [36].
– SGM, the Stochastic Gradient Descent optimizer does not recalculate the
weights using all the samples. On the contrary, it only uses the input sample
to perform this process, allowing the weights to be updated more frequently,
providing a faster convergence [37].

- 19 -
3 Experimental Results
For the development of our schemes of classification, we use the machine learning
library of Google, Tensorflow [38], that is configured to work on the high level
library Keras [39]. The models were run on a Dell inspiron 2350 computer running
Windows 10 SL with an i7-6700 processor with a clock speed of 2.60 GHz with
12GB of RAM and an Nvidia GeForce 940M GPU with 4GB of RAM. The proposed
models were evaluated in terms of accuracy and loss function.
In Table 1, we show a summary of the main differences between model 1 and
model 2 from preprocessing to the classification stages.

Table 1: Summary of main differences between model 1 and model 2


Model Input dimension Feature extraction Clasification
3 x 3 Convolution
1024 to 2 neurons in
299 x 299 x 3 array in
Model 1 5 x 5 Convolution
3 channels (RGB)
fully conected layer
Max pooling
1 x 1 Convolution
3 x 3 Convolution
299 X 299 x 1 array in 32 to 2 neurons in
Model 2 1 channel (Gray 5 x 5 Convolution
Scale) fully conected layer
Max pooling
Avarage pooling

3.1 Random search


The search for optimal combination of hyperparameters is implemented by applying
random search, which consists of performing a random combinations of
hyperparameters for both models, taking into account their values of loss function
and accuracy. Once the model is selected based on its performance, a grid search
is focused on the hyperparameters in order to tune the model. The accuracy results
for model 1 are higher by using the optimizers SGD and Adadelta. On the other
hand, the loss function provides better results with mean squared error. The
accuracy values of model 2 are around 70%, but the loss function values obtained
by model 2 are not better for those generated in model 1. At this point, we consider
the values of accuracy for the validation of both models around 70%. However, the
loss function allows us to identify better results in the combinations of
hyperparameters in model 1. Since we have to select the best combination for both

- 20 -
models, we identify the best values of loss function and accuracy for model 1 at
0.27 and 0.78, respectively. While the best results of loss function and accuracy for
model 2 are 0.48 and 0.75, respectively. Model 1 offers potential results for
experimenting new combinations of hyperparameters by applying grid search
technique.

3.2 Grid search


For this process, we use the grid search technique [40] which consists of
performing a search through a sweep defined parameters: Number of layers to
freeze: 50, 120, 249; Batch Size (BS): 8, 16; Optimizer: RMSprop, SGD, Adadelta;
Loss Function: categorical cross-entropy, mean squared error; and Learning Rate
(LR): 0.01, 0.001. It allows to execute a set of models in sequence for each
combination of parameters. These results were mainly obtained from variations in
learning rate, optimizers and the number of Inception-v3 layers to freeze. From this,
we compare sets focused on the SGD optimizer, which obtained better results, with
different configurations in learning rate, loss function and the number of frozen
layers of Inception-v3. As shown in Table 2, the best results in the loss value is
obtained with a learning rate configuration of 0.001 and with a smaller number of
frozen layers of Inception-v3, all with the categorical cross-entropy loss function.
The results reflected in Table 3 exceed those obtained in Table 2. This
improvement is given by the function of loss mean squared error with the use of a
higher learning rate of 0.01, and a smaller number of frozen layers of Inception-v3.

Table 2: First grid search results with categorical cross-entropy function


N. BS LR Optimizer freeze hidden layer min loss max acc epoch
1 16 0,01 SGD 50 1024 0,46 0,78 6
2 8 0,01 SGD 50 1024 0,48 0,79 3
3 16 0,01 SGD 172 1024 0,50 0,76 1
4 8 0,01 SGD 172 1024 0,51 0,79 4
5 16 0,01 SGD 249 1024 0,52 0,74 3
6 8 0,01 SGD 249 1024 0,54 0,73 2
7 16 0,001 SGD 50 1024 0,46 0,77 11
8 16 0,001 SGD 249 1024 0,48 0,76 16
9 16 0,001 SGD 172 1024 0,49 0,73 13
10 8 0,001 SGD 172 1024 0,47 0,79 19
11 8 0,001 SGD 249 1024 0,49 0,75 15
12 8 0,001 SGD 50 1024 0,51 0,77 12

- 21 -
Table 3: First grid search results with mean squared error function
N. BS LR Optimizer freeze hidden layer min loss max acc epoch
1 16 0.01 SGD 50 1024 0,14 0,81 5
2 8 0,01 SGD 50 1024 0,14 0,79 7
3 16 0,01 SGD 172 1024 0,15 0,76 2
4 8 0,01 SGD 249 1024 0,16 0,76 6
5 16 0,01 SGD 249 1024 0,16 0,77 12
6 8 0,01 SGD 172 1024 0,17 0,77 11
7 16 0,001 SGD 172 1024 0,15 0,79 15
8 16 0,001 SGD 50 1024 0,16 0,74 11
9 16 0,001 SGD 249 1024 0,18 0,76 7
10 8 0,001 SGD 172 1024 0,18 0,72 6
11 8 0,001 SGD 50 1024 0,18 0,72 8
12 8 0,001 SGD 249 1024 0,19 0,68 6

Then, we apply a second grid search, maintaining the Batch Size, the Optimizer,
and fully connected layer hyperparameters for all combinations, but we vary only
learning rates: 1, 1.0, 0.5, 0.05; Loss Function: mean squared error and categorical
cross-entropy; and the number of layers to freeze from Inception-v3 model: 0, 10,
20, 30, 40, 50, 60, 70, 80, 90; obtaining a total of 80 models to test. As we see in
Table 4, better results are obtained by training more layers of the Inception-v3
model, as well as using the SGD optimizer. The categorical cross-entropy loss
function does not offer promising results.

Table 4: Second grid search results with categorical cross-entropy function


N. BS LR Optimizer freeze hidden layer min loss max acc epoch
1 8 0,05 SGD 0 1024 0,62 0,68 1
2 8 0,05 SGD 10 1024 0,49 0,78 4
3 8 0,05 SGD 20 1024 0,51 0,83 9
4 8 0,05 SGD 30 1024 0,50 0,81 8
5 8 0,05 SGD 40 1024 0,54 0,75 2
6 8 0,05 SGD 50 1024 0,37 0,83 6
7 8 0,05 SGD 60 1024 0,43 0,78 1
8 8 0,05 SGD 70 1024 0,54 0,71 1
9 8 0,05 SGD 80 1024 0,59 0,74 4
10 8 0,05 SGD 90 1024 0,50 0,78 11

- 22 -
In the same way, Table 5 presents that both the loss function and the learning rate
have a great influence on the values obtained during the training. We identify the
optimal combination of hyperparameters for this model is the combination number
7 of Table 5.
Table 5: Second grid search results with mean squared error function
N. BS LR Optimizer freeze hidden layer min loss max acc epoch
1 8 0,05 SGD 0 1024 0,15 0,82 7
2 8 0,05 SGD 10 1024 0,14 0,83 6
3 8 0,05 SGD 20 1024 0,16 0,80 7
4 8 0,05 SGD 30 1024 0,15 0,79 9
5 8 0,05 SGD 40 1024 0,13 0,83 9
6 8 0,05 SGD 50 1024 0,14 0,81 5
7 8 0,05 SGD 60 1024 0,13 0,85 7
8 8 0,05 SGD 70 1024 0,15 0,78 1
9 8 0,05 SGD 80 1024 0,15 0,81 2
10 8 0,05 SGD 90 1024 0,13 0,84 9

4 Conclusion
We compare two schemes of mammogram classification that discriminate two
categories: benign and malignant cancer, by using CNN as classifier. The algorithm
performances were evaluated by the accuracy and two loss functions:the mean
squared error and the categorical cross-entropy, from which the mean squared
error function registers the lowest loss value. The optimizers were implemented by
Adadelta, Rmsprop and SGD, being the latter the most promising of them, because
it obtains the greatest accuracy among the different combinations of
hyperparameters. The learning rate, whose values were between 0.001, 0.01, 0.1,
1, 0.05 and 0.5; generates better results when it is higher. The best results of
accuracies required a less amount of epochs. An important future work is focused
on increasing the number of categories to classify mammograms. The use of
Breast Imaging Reporting and Data System (BIRADS) from the DDSM dataset is
an important resource of breast cancer diagnosis in order to identify other
subcategories of malignant cancer. Another area to explore in this work is the
tuning of the feature descriptors by adding texture information. Several studies
show the use Gabor bank filters, GLCM, Haralick texture descriptors. These
improvements in our system can advance in the next level to identify the location of

- 23 -
regions on the mammogram for purposes of contributing on the early breast cancer
prediction.

References
[1] Society, A.C.: Cancer facts & figures 2017 (2017)
[2] Godoy, Y., Godoy, C., Reyes, J.: Social representations of gynecologic cancer screening
assessment a qualitative research on ecuadorian women. Revista da Escola de Enfermagem da
USP 50(SPE) (2016) 68–73
[3] L´opez-Cort´es, A., Echeverr´ıa, C., On˜a-Cisneros, F., Sa´nchez, M.E., Herrera, C., Cabrera-
Andrade, A., Rosales, F., Ortiz, M., Paz-y Min˜o, C.: Breast cancer risk associated with gene
expression and genotype polymorphisms of the folatemetabolizing mthfr gene: a case-control study
in a high altitude ecuadorian mestizo population. Tumor Biology 36(8) (2015) 6451–6461
[4] Sprague, B.L., Arao, R.F., Miglioretti, D.L., Henderson, L.M., Buist, D.S., Onega, T., Rauscher, G.H.,
Lee, J.M., Tosteson, A.N., Kerlikowske, K., et al.: National performance benchmarks for modern
diagnostic digital mammography: update from the breast cancer surveillance consortium. Radiology
283(1) (2017) 59–69
[5] Jiang, J., Trundle, P., Ren, J.: Medical image analysis with artificial neural networks. Computerized
Medical Imaging and Graphics 34(8) (2010) 617–631
[6] Kamalakannan, J., Thirumal, T., Vaidhyanathan, A., MukeshBhai, K.D.: Study on different
classification technique for mammogram image. In: Circuit, Power and Computing Technologies
(ICCPCT), 2015 International Conference on, IEEE (2015) 1–5
[7] Don, S., Chung, D., Revathy, K., Choi, E., Min, D.: A new approach for mammogram image
classification using fractal properties. Cybernetics and information technologies 12(2) (2012) 69–83
[8] Jankulovski, B., Kitanovski, I., Trojacanec, K., Dimitrovski, I.: Mammography image classification
using texture features. (2012)
[9] Xie, W., Li, Y., Ma, Y.: Breast mass classification in digital mammography based on extreme learning
machine. Neurocomputing 173 (2016) 930–941
[10] Taghanaki, S.A., Kawahara, J., Miles, B., Hamarneh, G.: Pareto-optimal multiobjective dimensionality
reduction deep auto-encoder for mammography classification. Computer Methods and Programs in
Biomedicine 145 (2017) 85–93
[11] Khan, M.M., Mendes, A., Zhang, P., Chalup, S.K.: Evolving multi-dimensional wavelet neural
networks for classification using cartesian genetic programming. Neurocomputing 247 (2017) 39–58
[12] Magna, G., Casti, P., Jayaraman, S.V., Salmeri, M., Mencattini, A., Martinelli, E., Di Natale, C.:
Identification of mammography anomalies for breast cancer detection by an ensemble of
classification models based on artificial immune system. Knowledge-Based Systems 101 (2016) 60–
70
[13] Pratiwi, M., Harefa, J., Nanda, S., et al.: Mammograms classification using graylevel co-occurrence
matrix and radial basis function neural network. Procedia Computer Science 59 (2015) 83–91
[14] Buciu, I., Gacsadi, A.: Gabor wavelet based features for medical image analysis and classification.
In: Applied Sciences in Biomedical and Communication Technologies, 2009. ISABEL 2009. 2nd
International Symposium on, IEEE (2009) 1–4
[15] Khan, S., Hussain, M., Aboalsamh, H., Mathkour, H., Bebis, G., Zakariah, M.: Optimized gabor
features for mass classification in mammography. Applied Soft Computing 44 (2016) 267–280
[16] Srinivasan, M., Venkata, H.P.: Towards better veracity for breast cancer detection using gabor
analysis and statistical learning. In: Control Automation Robotics & Vision (ICARCV), 2014 13th
International Conference on, IEEE (2014) 1864– 1869
[17] Heath, M., Bowyer, K., Kopans, D., Kegelmeyer Jr, P., Moore, R., Chang, K., Munishkumaran, S.:
Current status of the digital database for screening mammography. In: Digital mammography.
Springer (1998) 457–460
[18] Heath, M., Bowyer, K., Kopans, D., Moore, R., Kegelmeyer, W.P.: The digital database for screening
mammography. In: Proceedings of the 5th international workshop on digital mammography, Medical
Physics Publishing (2000) 212–218
[19] Rose, C.: Digital database for digital mammography software (2016)
[20] Lucht, R.E., Knopp, M.V., Brix, G.: Classification of signal-time curves from dynamic mr
mammography by neural networks. Magnetic Resonance Imaging 19(1) (2001) 51–57
[21] Setiawan, A.S., Wesley, J., Purnama, Y., et al.: Mammogram classification using law’s texture energy
measure and neural networks. Procedia Computer Science

- 24 -
59 (2015) 92–97
[22] Blas, M.J.D.F., Nicola´s-Sarli, J.L.: Redes neuronales artificiales aplicadas al reconocimiento de
caracteres morse
[23] Wahab, N., Khan, A., Lee, Y.S.: Two-phase deep convolutional neural network for reducing class
skewness in histopathological images based breast cancer detection. Computers in Biology and
Medicine 85 (2017) 86–97
[24] Qayyum, A., Anwar, S.M., Awais, M., Majid, M.: Medical image retrieval using deep convolutional
neural network. Neurocomputing (2017)
[25] Arevalo, J., Gonza´lez, F.A., Ramos-Polla´n, R., Oliveira, J.L., Lopez, M.A.G.: Representation
learning for mammography mass lesion classification with convolutional neural networks. Computer
methods and programs in biomedicine 127 (2016) 248–257
[26] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for
computer vision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
(June 2016)
[27] Lee, H., Lee, B.T.: Selective inference for accelerating deep learning-based image classification. In:
Information and Communication Technology Convergence (ICTC), 2016 International Conference on,
IEEE (2016) 135–137
[28] Zhuang, F., Cheng, X., Luo, P., Pan, S.J., He, Q.: Supervised representation learning: Transfer
learning with deep autoencoders. In: IJCAI. (2015) 4119–4125
[29] Lu, Y., Chen, L., Saidi, A., Dellandrea, E., Wang, Y.: Discriminative transfer learning using similarities
and dissimilarities. IEEE transactions on neural networks and learning systems (2017)
[30] Leal, Y., Gonzalez-Abril, L., Ruiz, M., Lorencio, C., Bondia, J., Vehi, J.: Un nuevo enfoque para
detectar mediciones de glucosa erro´neas en los sistemas de monitorizacio´n continuos de glucosa.
JARCA 2012 15 (2012) 17
[31] Kaur, A., Kaur, K., Kaur, H.: An investigation of the accuracy of code and process metrics for defect
prediction of mobile applications. In: Reliability, Infocom Technologies and Optimization
(ICRITO)(Trends and Future Directions), 2015 4th International Conference on, IEEE (2015) 1–6
[32] Trujillano, J., Sarria-Santamera, A., Esquerda, A., Badia, M., Palma, M., March, J.: Aproximacio´n a
la metodolog´ıa basada en ´arboles de decisio´n (cart). mortalidad hospitalaria del infarto agudo de
miocardio. Gaceta Sanitaria 22(1) (2008) 65–72
[33] Ruiz Ca´rdenas, L.C., Amaya Hurtado, D., Jim´enez Moreno, R.: Solar insolation prediction through
deep belief network. Tecnura 20(47) (2016) 39–48
[34] Jensen, J.R., Christensen, M.G., Jakobsson, A.: Harmonic minimum mean squared error filters for
multichannel speech enhancement. In: Acoustics, Speech and Signal Processing (ICASSP), 2017
IEEE International Conference on, IEEE (2017) 501–505
[35] Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
[36] Hinton, G.: Overview of mini-batch gradient descent
[37] Raschka, S.: Stochastic downward gradient (2017)
[38] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G.,
Isard, M., et al.: Tensorflow: A system for large-scale machine learning. In: OSDI. Volume 16. (2016)
265–283
[39] Antona Cort´es, C.: Herramientas modernas en redes neuronales: la librer´ıa keras. B.S. thesis
(2017)
[40] Pontes, F.J., Amorim, G., Balestrassi, P.P., Paiva, A., Ferreira, J.R.: Design of experiments and
focused grid search for neural network parameter optimization.
Neurocomputing 186 (2016) 22–34

- 25 -
INDICE

(LAWS, - 15 -
Adadelta, V, - 9 -, - 19 -, - 20 -, - 21 -, - 23 -, - 25 -, - 31 -, - 32 -, - 33 -, - 34 -, - 35 -, - 36 -
CC, - 12 -
CNN, V, VIII, - 9 -, - 10 -, - 15 -, - 16 -, - 17 -, - 18 -, - 23 -
Convolutional layer, - 16 -
DDSM, V, VIII, - 9 -, - 10 -, - 11 -, - 12 -, - 14 -, - 23 -
GLCM, - 11 -, - 15 -, - 23 -
network, - 9 -, - 17 -, - 19 -, - 24 -, - 25 -
NN, - 10 -, - 14 -, - 15 -, - 17 -, - 19 -
RGB, V, - 13 -, - 20 -
RMSprop, V, - 19 -, - 21 -, - 32 -
ROI, - 10 -, - 13 -
SGD, V
SVM, - 10 -, - 11 -, - 12 -
TensorFlow, - 9 -
WNNs, - 11 -

- 26 -
ANEXOS

Anexo A.
Carta Aceptación

- 27 -
- 28 -
Anexo B.
Correos

- 29 -
Anexo C.
Gráficos Tensorboard

- 30 -
Anexo D.
Random Search

Modelo 1

Random search
Parameters validation
hidde
n hiden hiden
freez layer layer layer B optimize max epoc
N. e 1 2 3 S lr r loss min loss acc h
Adadelt
1 249 1024 - - 8 0.01 a mean_squared_error 0.17 0.73 132
Adadelt categorical_crossentro
2 249 1024 - - 8 0.01 a py 0.58 0.71 49
Adadelt categorical_crossentro
3 249 2048 - - 16 0.01 a py 0.59 0.7 41
categorical_crossentro
4 172 1024 - - 32 0.01 rmsprop py 0.56 0.67 22
5 172 1024 1024 - 16 0.01 rmsprop mean_squared_error 0.58 0.71 29
Adadelt categorical_crossentro
6 172 1024 - - 8 0.01 a py 0.53 0.75 32
Adadelt
7 172 1024 2048 1024 8 0.01 a mean_squared_error 0.19 0.71 20
8 172 1024 1024 16 0.01 rmsprop mean_squared_error 0.45 0.54 29
9 172 1024 - - 32 0.01 SGD mean_squared_error 0.27 0.78 38
categorical_crossentro
10 172 1024 - - 16 0.01 SGD py 2.34 0.6 76
11 172 1024 - - 8 0.01 SGD mean_squared_error 1.49 0.57 29

Modelo 2

Random search
Parameters validation
hidde
n hiden hiden
freez layer layer layer B optimize max epoc
N. e 1 2 3 S lr r loss min loss acc h
0.00 Adadelt mean_squared_erro
1 - 32 16 - 32 1 a r 0.13 0.69 39
0.00 categorical_crossent
2 - 32 16 16 16 1 SGD ropy 0.55 0.73 49

- 31 -
0.00 Adadelt categorical_crossent
5 - 32 32 16 64 1 a ropy 0.64 0.72 37
0.00 Adadelt categorical_crossent
3 - 32 - - 32 1 a ropy 0.64 0.71 16
0.00 Adadelt categorical_crossent
4 - 32 - - 8 1 a ropy 0.55 0.75 38
0.00 categorical_crossent
5 - 32 - - 8 1 SGD ropy 0.48 0.75 19
0.00 categorical_crossent
6 - 1024 512 - 16 1 SGD ropy 0.61 0.75 35
0.00 categorical_crossent
7 - 1024 1024 - 16 1 SGD ropy 0.62 0.75 44
0.00 Adadelt categorical_crossent
8 - 128 128 - 8 1 a ropy 0.67 0.72 11
0.00 RMSpr mean_squared_erro
9 - 1024 1024 - 8 1 op r 0.25 0.61 31
0.00 RMSpr categorical_crossent
10 - 256 128 - 8 1 op ropy 0.69 0.51 3
0.00 RMSpr categorical_crossent
11 - 128 64 - 16 1 op ropy 0.69 0.66 5

Grid Search 1

optimiz freez hidden epoc


# model BS lr er loss e layer min loss max acc h

0.0 mean_squared_ 0.14198 0.81164


36 16 1 SGD error 50 1024 55356 38389 5
0.0 mean_squared_ 0.14477 0.79452
72 8 1 SGD error 50 1024 23657 05569 7
0.0 mean_squared_ 0.15341 0.75684
34 16 1 SGD error 172 1024 67379 92889 2
0.0 mean_squared_ 0.15583 0.76369
71 8 1 SGD error 249 1024 53007 86375 6
0.0 mean_squared_ 0.15620 0.77397
35 16 1 SGD error 249 1024 54455 26305 12
0.0 mean_squared_ 0.16617 0.77397
70 8 1 SGD error 172 1024 55294 26305 11
0.0 mean_squared_ 0.15458 0.79109
16 16 01 SGD error 172 1024 05037 59125 15

- 32 -
0.0 mean_squared_ 0.16037 0.74315
18 16 01 SGD error 50 1024 78452 07111 11
0.0 mean_squared_ 0.18138 0.75684
17 16 01 SGD error 249 1024 92871 92889 7
0.0 mean_squared_ 0.17676 0.72260
52 8 01 SGD error 172 1024 22679 2725 6
0.0 mean_squared_ 0.17788 0.72260
54 8 01 SGD error 50 1024 53238 2725 8
0.0 mean_squared_ 0.19275 0.68493
53 8 01 SGD error 249 1024 93052 15166 6
0.0 mean_squared_ 0.22656 0.59931
66 8 1 rmsprop error 50 1024 78942 50473 15
0.0 mean_squared_ 0.22727 0.73972
29 16 1 rmsprop error 249 1024 91415 6007 9
0.0 mean_squared_ 0.26958 0.51027
64 8 1 rmsprop error 172 1024 11391 3993 6
0.0 mean_squared_ 0.45547 0.54452
28 16 1 rmsprop error 172 1024 94431 05569 2
0.0 mean_squared_ 0.47260 0.52739
65 8 1 rmsprop error 249 1024 2725 7275 10
0.0 mean_squared_ 0.47602 0.52397
30 16 1 rmsprop error 50 1024 73993 26305 3
0.0 mean_squared_ 0.16340 0.80479
11 16 01 rmsprop error 249 1024 19166 45499 12
0.0 mean_squared_ 0.19763 0.76369
10 16 01 rmsprop error 172 1024 29237 86375 12
0.0 mean_squared_ 0.26188 0.65410
12 16 01 rmsprop error 50 1024 73715 95972 13
0.0 mean_squared_ 0.19043 0.77739
46 8 01 rmsprop error 172 1024 67656 7275 10
0.0 mean_squared_ 0.20707 0.71232
48 8 01 rmsprop error 50 1024 43442 87916 14
0.0 mean_squared_ 0.21291 0.74315
47 8 01 rmsprop error 249 1024 35579 07111 6
0.0 Adadelt mean_squared_ 0.16048 0.77739
60 8 1 a error 50 1024 67578 7275 19
0.0 Adadelt mean_squared_ 0.16128 0.73972
23 16 1 a error 249 1024 30311 6007 15
0.0 Adadelt mean_squared_ 0.16143
24 16 1 a error 50 1024 28325 0.75 10
22 16 0.0 Adadelt mean_squared_ 172 1024 0.16193 0.74315 24

- 33 -
1 a error 14104 07111

0.0 Adadelt mean_squared_ 0.16708


58 8 1 a error 172 1024 70781 0.75 6
0.0 Adadelt mean_squared_ 0.17717 0.73972
59 8 1 a error 249 1024 61179 6007 12
0.0 Adadelt mean_squared_ 0.19532 0.73630
6 16 01 a error 50 1024 37802 13625 23
0.0 Adadelt mean_squared_ 0.20448 0.69863
4 16 01 a error 172 1024 42094 01541 32
0.0 Adadelt mean_squared_ 0.23716 0.58561
5 16 01 a error 249 1024 83121 64098 3
0.0 Adadelt mean_squared_ 0.17032 0.78082
40 8 01 a error 172 1024 87214 19194 48
0.0 Adadelt mean_squared_ 0.18397 0.74657
41 8 01 a error 249 1024 98391 53555 46
0.0 Adadelt mean_squared_ 0.19399 0.72260
42 8 01 a error 50 1024 76662 2725 37
0.0 categorical_cros 0.46019 0.78424
33 16 1 SGD sentropy 50 1024 36042 65639 6
0.0 categorical_cros 0.47888 0.78767
69 8 1 SGD sentropy 50 1024 60977 12084 3
0.0 categorical_cros 0.49978 0.76369
31 16 1 SGD sentropy 172 1024 56319 86375 1
0.0 categorical_cros 0.51455 0.78767
67 8 1 SGD sentropy 172 1024 14011 12084 4
0.0 categorical_cros 0.52434 0.73972
32 16 1 SGD sentropy 249 1024 63516 6007 3
0.0 categorical_cros 0.54012 0.72602
68 8 1 SGD sentropy 249 1024 93635 73695 2
0.0 categorical_cros 0.45949 0.77397
15 16 01 SGD sentropy 50 1024 22662 26305 11
0.0 categorical_cros 0.48325 0.75684
14 16 01 SGD sentropy 249 1024 71745 92889 16
0.0 categorical_cros 0.49309 0.73287
13 16 01 SGD sentropy 172 1024 94511 6718 13
0.0 categorical_cros 0.46990 0.78767
49 8 01 SGD sentropy 172 1024 85057 12084 19
0.0 categorical_cros 0.49448 0.74657
50 8 01 SGD sentropy 249 1024 60339 53555 15
0.0 categorical_cros 0.50793 0.76712
51 8 01 SGD sentropy 50 1024 88022 3282 12

- 34 -
0.0 categorical_cros 0.89105 0.46575
61 8 1 rmsprop sentropy 172 1024 06964 34361 2
0.0 categorical_cros 8.72E+2
26 16 1 rmsprop sentropy 249 1024 7 0.5 5
0.0 categorical_cros 8.28E+2 0.48630
63 8 1 rmsprop sentropy 50 1024 8 13625 8
0.0 categorical_cros 8.06E+2
25 16 1 rmsprop sentropy 172 1024 9 0.5 7
0.0 categorical_cros 8.06E+2
27 16 1 rmsprop sentropy 50 1024 9 0.5 4
0.0 categorical_cros 8.06E+2
62 8 1 rmsprop sentropy 249 1024 9 0.5 7
0.0 categorical_cros 0.62160 0.77739
7 16 01 rmsprop sentropy 172 1024 96878 7275 7
0.0 categorical_cros 0.64114 0.58561
9 16 01 rmsprop sentropy 50 1024 59446 64098 3
0.0 categorical_cros 4.14E+2
8 16 01 rmsprop sentropy 249 1024 9 0.5 0
0.0 categorical_cros 0.69396 0.51369
45 8 01 rmsprop sentropy 50 1024 4839 86375 4
0.0 categorical_cros 0.74013 0.74657
44 8 01 rmsprop sentropy 249 1024 27491 53555 1
0.0 categorical_cros 0.80866 0.79109
43 8 01 rmsprop sentropy 172 1024 49776 59125 4
0.0 Adadelt categorical_cros 0.47455 0.78082
21 16 1 a sentropy 50 1024 96349 19194 20
0.0 Adadelt categorical_cros 0.48004 0.75684
57 8 1 a sentropy 50 1024 81498 92889 15
0.0 Adadelt categorical_cros 0.48042 0.75684
20 16 1 a sentropy 249 1024 93811 92889 27
0.0 Adadelt categorical_cros 0.48720 0.72602
19 16 1 a sentropy 172 1024 9022 73695 15
0.0 Adadelt categorical_cros 0.49006 0.75342
55 8 1 a sentropy 172 1024 95384 46445 12
0.0 Adadelt categorical_cros 0.50091 0.73287
56 8 1 a sentropy 249 1024 45141 6718 25
0.0 Adadelt categorical_cros 0.55098 0.72602
1 16 01 a sentropy 172 1024 19388 73695 48
0.0 Adadelt categorical_cros 0.55775 0.69520
3 16 01 a sentropy 50 1024 52319 54501 49
2 16 0.0 Adadelt categorical_cros 249 1024 0.56023 0.71575 36

- 35 -
01 a sentropy 1328 34361

0.0 Adadelt categorical_cros 0.56270 0.72945


37 8 01 a sentropy 172 1024 45631 20736 45
0.0 Adadelt categorical_cros 0.58118 0.68150
38 8 01 a sentropy 249 1024 1705 68722 41
0.0 Adadelt categorical_cros 0.58383 0.75684
39 8 01 a sentropy 50 1024 55422 92889 27

Grid Search 2

hidden min max


# model BS lr optimizer loss freeze layer loss acc epoch
0.0 categorical_cr 0.6195 0.6815
1 8 5 SGD ossentropy 0 1024 291877 068722 1
0.0 categorical_cr 0.4869 0.7808
2 8 5 SGD ossentropy 10 1024 236052 219194 4
0.0 categorical_cr 0.5143 0.8321
3 8 5 SGD ossentropy 20 1024 455863 917653 9
0.0 categorical_cr 0.4952 0.8116
4 8 5 SGD ossentropy 30 1024 408671 438389 8
0.0 categorical_cr 0.5387
5 8 5 SGD ossentropy 40 1024 370586 0.75 2
0.0 categorical_cr 0.3747 0.82876
6 8 5 SGD ossentropy 50 1024 598231 71208 6
0.0 categorical_cr 0.4340 0.78082
7 8 5 SGD ossentropy 60 1024 213239 19194 1
0.0 categorical_cr 0.5403 0.7123
8 8 5 SGD ossentropy 70 1024 337479 287916 1
0.0 categorical_cr 0.5924 0.7397
9 8 5 SGD ossentropy 80 1024 6099 26007 4
0.0 categorical_cr 0.5039 0.7773
10 8 5 SGD ossentropy 90 1024 066076 97275 11
0.0 mean_square 0.1462 0.8150
11 8 5 SGD d_error 0 1024 946087 684834 7
0.0 mean_square 0.1416 0.8321
12 8 5 SGD d_error 10 1024 72045 917653 6
0.0 mean_square 0.1551 0.8013
13 8 5 SGD d_error 20 1024 820338 698459 7
14 8 0.0 SGD mean_square 30 1024 0.1478 0.7876 9

- 36 -
5 d_error 767991 712084
0.0 mean_square 0.1344 0.8253
15 8 5 SGD d_error 40 1024 97568 424764 9
0.0 mean_square 0.1376 0.8082
16 8 5 SGD d_error 50 1024 123428 191944 5
0.0 mean_square 0.1282 0.8527
17 8 5 SGD d_error 60 1024 080859 397513 7
0.0 mean_square 0.1468 0.7773
18 8 5 SGD d_error 70 1024 8088 97275 1
0.0 mean_square 0.1487 0.8082
19 8 5 SGD d_error 80 1024 468034 191944 2
0.0 mean_square 0.1252 0.8356
20 8 5 SGD d_error 90 1024 509952 164098 9
categorical_cr 0.6589 0.6301
21 8 0.1 SGD ossentropy 0 1024 594483 369667 7
categorical_cr 0.4648 0.8013
22 8 0.1 SGD ossentropy 10 1024 612142 698459 5
categorical_cr 0.6378 0.5540
23 8 0.1 SGD ossentropy 20 1024 63338 540814 0
categorical_cr 0.7181 0.5890
24 8 0.1 SGD ossentropy 30 1024 922793 411139 4
categorical_cr 0.4534 0.7739
25 8 0.1 SGD ossentropy 40 1024 108937 726305 5
categorical_cr 0.6308 0.6438
26 8 0.1 SGD ossentropy 50 1024 636069 356042 14
categorical_cr 0.7555 0.6746
27 8 0.1 SGD ossentropy 60 1024 999756 575236 5
categorical_cr 0.5349 0.7636
28 8 0.1 SGD ossentropy 70 1024 808335 986375 4
categorical_cr 0.4871 0.8287
29 8 0.1 SGD ossentropy 80 1024 460497 671208 11
categorical_cr 0.4855 0.79109
30 8 0.1 SGD ossentropy 90 1024 0722 59125 3
mean_square 0.1391 0.81164
31 8 0.1 SGD d_error 0 1024 967088 38389 12
mean_square 0.1398 0.82876
32 8 0.1 SGD d_error 10 1024 780793 71208 11
mean_square 0.1568 0.75342
33 8 0.1 SGD d_error 20 1024 119228 46445 1
mean_square 0.1624 0.80136
34 8 0.1 SGD d_error 30 1024 378413 98459 5

- 37 -
mean_square 0.1665 0.78767
35 8 0.1 SGD d_error 40 1024 922105 12084 8
mean_square 0.1357 0.81849
36 8 0.1 SGD d_error 50 1024 931346 31278 4
mean_square 0.1360 0.83219
37 8 0.1 SGD d_error 60 1024 661834 17653 5
mean_square 0.1467 0.79794
38 8 0.1 SGD d_error 70 1024 000693 52014 3
mean_square 0.1377 0.83219
39 8 0.1 SGD d_error 80 1024 818286 17653 10
mean_square 0.1397 0.81849
40 8 0.1 SGD d_error 90 1024 188455 31278 9
categorical_cr 8.72E+ 0.45890
41 8 0.5 SGD ossentropy 0 1024 16 41173 2
categorical_cr 8.06E+
42 8 0.5 SGD ossentropy 10 1024 17 0.5 0
categorical_cr 8.06E+
43 8 0.5 SGD ossentropy 20 1024 17 0.5 0
categorical_cr 0.7464 0.50342
44 8 0.5 SGD ossentropy 30 1024 300394 46445 14
categorical_cr 8.06E+
45 8 0.5 SGD ossentropy 40 1024 17 0.5 0
categorical_cr 8.06E+
46 8 0.5 SGD ossentropy 50 1024 17 0.5 0
categorical_cr 0.6160 0.60958
47 8 0.5 SGD ossentropy 60 1024 191894 90403 10
categorical_cr 8.28E+ 0.48630
48 8 0.5 SGD ossentropy 70 1024 17 13625 7
categorical_cr 8.72E+ 0.45890
49 8 0.5 SGD ossentropy 80 1024 16 41173 2
categorical_cr 7.34E+ 0.54452
50 8 0.5 SGD ossentropy 90 1024 19 05569 2
mean_square 0.2658 0.49315
51 8 0.5 SGD d_error 0 1024 068836 06813 4
mean_square 0.2592 0.52397
52 8 0.5 SGD d_error 10 1024 851818 26305 13
mean_square 0.4623 0.53767
53 8 0.5 SGD d_error 20 1024 287618 12084 6
mean_square 0.1856 0.73630
54 8 0.5 SGD d_error 30 1024 712848 13625 5

55 8 0.5 SGD mean_square 40 1024 0.4760 0.52397 1

- 38 -
d_error 273993 26305

mean_square 0.1981 0.70205


56 8 0.5 SGD d_error 50 1024 195509 47986 8
mean_square 0.2097 0.69178
57 8 0.5 SGD d_error 60 1024 283155 08056 10
mean_square 0.2497 0.63356
58 8 0.5 SGD d_error 70 1024 223169 16708 1
mean_square 0.2222 0.67808
59 8 0.5 SGD d_error 80 1024 299278 21681 3
mean_square 0.2148 0.67465
60 8 0.5 SGD d_error 90 1024 738801 75236 9
categorical_cr 8.28E+ 0.48630
61 8 1 SGD ossentropy 0 1024 17 13625 7
categorical_cr 8.06E+
62 8 1 SGD ossentropy 10 1024 17 0.5 0
categorical_cr 8.06E+
63 8 1 SGD ossentropy 20 1024 17 0.5 0
categorical_cr 8.28E+ 0.48630
64 8 1 SGD ossentropy 30 1024 17 13625 7
categorical_cr 8.28E+ 0.48630
65 8 1 SGD ossentropy 40 1024 17 13625 7
categorical_cr 8.28E+ 0.48630
66 8 1 SGD ossentropy 50 1024 17 13625 7
categorical_cr 8.06E+
67 8 1 SGD ossentropy 60 1024 17 0.5 0
categorical_cr 8.06E+
68 8 1 SGD ossentropy 70 1024 17 0.5 0
categorical_cr 8.28E+ 0.48630
69 8 1 SGD ossentropy 80 1024 17 13625 7
categorical_cr 8.06E+
70 8 1 SGD ossentropy 90 1024 17 0.5 0
mean_square 0.4828 0.51712
71 8 1 SGD d_error 0 1024 76718 3282 1
mean_square 0.4726 0.52739
72 8 1 SGD d_error 10 1024 026654 7275 7
mean_square 0.2266 0.63698
73 8 1 SGD d_error 20 1024 351134 63153 13
mean_square 0.2536
74 8 1 SGD d_error 30 1024 680996 0.5 2
mean_square 0.4760 0.52397
75 8 1 SGD d_error 40 1024 26088 26305 2

- 39 -
mean_square 0.4726 0.52739
76 8 1 SGD d_error 50 1024 02725 7275 3
mean_square 0.4657 0.53424
77 8 1 SGD d_error 60 1024 534361 65639 3
mean_square 0.4589 0.54109
78 8 1 SGD d_error 70 1024 041173 59125 6

mean_squared_ 0.2383 0.50684


79 8 1 SGD error 80 1024 514196 92889 6

mean_squared_ 0.4657 0.53424


80 8 1 SGD error 90 1024 534361 65639 3

- 40 -

También podría gustarte