Está en la página 1de 44

Ing. Leonel D Rozo C, M.

Sc, PhD(c)

Decision tree learning is one of the most widely used and practical
methods for inductive inference. It is a method for approximating
discrete-valued functions that is robust to noisy data and capable of
learning disjunctive expressions.
1. Decision tree representation

 Decision trees classify instances by sorting them down the tree

from the root to some leaf node, which provides the classification
of the instance.

 Each node in the tree specifies a test of some attribute of the

instance, and each branch descending from that node
corresponds to one of the possible values for this attribute.
2. Appropriate problems for decision tree learning

• Instances are represented by attribute-value pairs - Instances are

described by a fixed set of attributes (e.g., Temperature) and
their values (e.g., Hot).

• The target function has discrete output values - The decision tree
assigns a boolean classification (e.g., yes or no) to each

• Disjunctive descriptions may be required.

• The training data may contain errors.

• The training data may contain missing attribute values.

3. The basic decision tree learning algorithm

1. Which attribute should be tested at the root of the tree?

2. The best attribute is selected and used as the test at the root
node of the tree.

3. A descendant of the root node is then created for each possible

value of this attribute, and the training examples are sorted to
the appropriate descendant node.

4. The entire process is then repeated using the training examples

associated with each descendant node to select the best attribute
to test at that point in the tree.
3. The basic decision tree learning algorithm

3.1. Which attribute is the best classifier ?

The central choice in the algorithm is selecting which attribute to test

at each node in the tree. We would like to select the attribute that is
most useful for classifying examples.
3.1. Which attribute is the best classifier ?

 Entropy measures homogeneity of examples

Defining a measure commonly used in information theory, called
entropy, that characterizes the (im)purity of an arbitrary collection of

Given a collection S, containing positive and negative examples of

some target concept, the entropy of S relative to this boolean
classification is:
3.1. Which attribute is the best classifier ?

 Information gain measures the expected reduction in entropy

Information gain is simply the expected reduction in entropy caused
by partitioning the examples according to this attribute. More
precisely, the information gain, Gain(S, A) of an attribute A, relative
to a collection of examples S, is defined as:

Subset of S for
which attribute A
has value v

Set of all possible

values for attribute A
3.1. Which attribute is the best classifier ?

 An illustrative example
4. Issues in decision tree learning

4.1. Avoiding overfitting the data

The algorithm described before grows

each branch of the tree just deeply enough
to perfectly classify the training examples.
While this is sometimes a reasonable
strategy, in fact it can lead to difficulties

 There is noise in the data.

 The number of training examples is too

small to produce a representative
sample of the true target function.
4. Issues in decision tree learning

4.1. Avoiding overfitting the data

A hypothesis overfits the training examples if some other hypothesis

that fits the training examples less well actually performs better over the
entire distribution of instances (i.e., including instances beyond the
training set).
4. Issues in decision tree learning

4.1. Avoiding overfitting the data

There are several approaches to avoiding overfitting in decision tree

learning. These can be grouped into two classes:

 Approaches that stop growing the tree earlier, before it reaches the
point where it perfectly classifies the training data.

 Approaches that allow the tree to overfit the data, and then post-prune
the tree.
4. Issues in decision tree learning

4.1. Avoiding overfitting the data

 Reduced error pruning

Consider each of the decision nodes in the tree to be candidates for
pruning. Pruning a decision node consists of removing the subtree
rooted at that node, making it a leaf node, and assigning it the most
common classification of the training examples affiliated with that

• Nodes are removed only if the resulting pruned tree performs no

worse than the original over the validation set.

• Nodes are pruned iteratively, always choosing the node whose

removal most increases the decision tree accuracy over the validation
4. Issues in decision tree learning

4.1. Avoiding overfitting the data

 Reduced error pruning

4. Issues in decision tree learning

4.1. Avoiding overfitting the data

 Rule post-pruning

i. Infer the decision tree from the training set.

ii. Convert the learned tree into an equivalent set of rules.

iii. Prune (generalize) each rule by removing any preconditions that

result in improving its estimated accuracy.

iv. Sort the pruned rules by their estimated accuracy, and consider them
in this sequence when classifying subsequent instances.
4. Issues in decision tree learning

4.2. Incorporating continuous-valued attributes

This can be accomplished by dynamically defining new discrete valued

attributes that partition the continuous attribute value into a discrete set
of intervals.

 In particular, for an attribute A that is continuous-valued, the

algorithm can dynamically create a new boolean attribute Ac, that
is true if A < c and false otherwise.
Many tasks involving intelligence or
pattern recognition are extremely
difficult to automate, but appear to be
performed very easily by animals.

 For instance, animals

recognize various objects and
make sense out of the large
amount of visual information in
their surroundings, apparently
requiring very little effort.
1. Introduction

The neural network of an animal is part

of its nervous system, containing a large
number of interconnected neurons
(nerve cells).

 “Neural” is an adjective for neuron

 “Network” denotes a graph-like


Artificial neural networks refer to computing

systems whose central theme is borrowed
from the analogy of biological neural
2. History of neural networks

“The amount of activity at any given point in the brain cortex is the sum
of the tendencies of all other points to discharge into it, such tendencies
being proportionate…” (William James)

1. To the number of times the excitement of other points may have

accompanied that of the point in question.

2. To the intensities of such excitements.

3. To the absence of any rival point functionally disconnected with

the first point, into which the discharges may be diverted.
2. History of neural networks

1938 Rashevsky in itiated studies of neurodynamics, also

known as neural field theory, representing activation and
propagation in neural networks in terms of differential

1943 McCulloch and Pitts invented the first artificial model

for biological neurons using simple binary threshold

1949 Hebb introduced his famous learning rule: repeated

activation of one neuron by another, across a particular
synapse, increases its conductance.
2. History of neural networks

1954 Gabor invented the “learning filter" that uses gradient

descent to obtain “optimal” weights that minimize the MSE
between the observed output signal and a signal generated
based upon the past information.

1958 Rosenblatt invented the “perceptron”, introducing a

learning method for the McCulloch and Pitts neuron model.

1960 Widrow and Hoff introduced the “Adaline”.

1961 Rosenblatt proposed the “backpropagation” scheme for

training multilayer networks.

1969 The limits of simple perceptrons were demonstrated.

3. Structure and function of a single neuron

3.1. Biological neurons

A typical biological neuron is composed of a cell body, a tubular axon,

and a multitude of hair-like dendrites.
3. Structure and function of a single neuron

3.1. Biological neurons

The small gap between an end bulb and a dendrite is called a synapse,
across which information is propagated. The axon of a single neuron
forms synaptic connections with many other neurons.
3. Structure and function of a single neuron

3.1. Biological neurons

Inhibitory or excitatory signals from other neurons are transmitted to

a neuron at its dendrites’ synapses . The magnitude of the signal
received by a neuron (from another) depends on the efficiency of the
synaptic transmission.

 The cell membrane becomes electrically active when sufficiently

excited by the neurons making synapses onto this neuron.

 A neuron will fire if sufficient signals from other neurons fall upon its
dendrites in a short period of time, called the period of latent
3. Structure and function of a single neuron

3.2. Artificial neuron models

 The position of the neuron (node) of

the incoming synapse (connection) is

 Each node has a single output value,

distributed to other nodes via
outgoing links, irrespective their

 All inputs come in the same time or

remain activated at the same level
long enough for computation of f to
3. Structure and function of a single neuron

3.2. Artificial neuron models

The next level of specialization is to assume that different weighted

inputs are summed.
3. Structure and function of a single neuron

3.2. Artificial neuron models

Now, it is necessary to stablish which function f the neuron has…

 Ramp functions

 Step functions
3. Structure and function of a single neuron

3.2. Artificial neuron models

 Sigmoid functions

 Piecewise linear and Gaussian functions

4. Neural net architectures

A single node is insufficient for many practical problems, and

networks with a large number of nodes are frequently used. The
way nodes are connected determines how computations proceed
and constitutes an important early design decision by a neural
network developer.

 Fully connected networks

4. Neural net architectures

 Layered networks

 Acyclic networks
4. Neural net architectures

 Feedforward networks

 Modular networks
5. Neural learning

 Correlation learning

When an axon of cell A is near

enough to excite a cell B and
repeatedly or persistently takes
place in firing it, some growth
process or metabolic change takes
place in one or both cells such that
A’s efficiency, as one of the cells
firing B, is increased.
5. Neural learning

 Competitive learning

Another principle for neural computation is that when an

input pattern is presented to a network, different nodes
compete to be " winners" with high levels of activity. The
competitive process involves self-excitation and mutual
inhibition among nodes, until a single winner emerges.

 The connections between input nodes and the winner node are
then modified , increasing the likelihood that the same winner
continues to win in future competitions.

 The converse of competition is cooperation, found in some

neural network models.
5. Neural learning

 Feedback-based weight adaptation

If increasing a particular weight leads

to diminished performance or
larger error, then that weight is
decreased as the network is trained
to perform better.

 The amount of change made at every

step is very small in most networks to
ensure that a network does not stray too
far from its partially evolved state, and so
that the network withstands some
mistakes made by the teacher, feedback,
or performance evaluation mechanism.
6. What can neural networks be used for ?

 Classification
6. What can neural networks be used for ?

 Clustering

Clustering requires grouping together objects that are similar to

each other…
6. What can neural networks be used for ?

 Pattern association

In pattern association, another important task that can be

performed by neural networks, the presentation of an input
sample should trigger the generation of a specific output pattern
6. What can neural networks be used for ?

 Function approximation

Many computational models can be described as functions

mapping some numerical input vectors to numerical
outputs. The outputs corresponding to some input vectors
may be known from training data, but we may not know the
mathematical function describing the actual process that
generates the outputs from the input vectors.
6. What can neural networks be used for ?

 Forescasting

There are many real-life problems in which future events

must be predicted on the basis of past history. An example
task is that of predicting the behavior of stock market
6. What can neural networks be used for ?

 Control applications

Control addresses the task of determining the values for input

variables in order to achieve desired values for output variables.
7. Evaluation of networks

 Quality of results

The performance of a neural network is frequently gauged in

terms of an error measure.

• Euclidean distance

• Manhattan or Hamming distance

In classification problems, another possible error measure is the

fraction of misclassified samples.
7. Evaluation of networks

 Generalizability

It is not surprising for a system to perform well on the data

on which it has been trained. But good generalizability is
also necessary, i.e., the system must perform well on new
test data distinct from training data.

 Computational resources

Once training is complete, many neural networks generally take

up very little time in their execution or application to a specific
problem. However, training the networks or applying a learning
algorithm can take a very long time.
8. Real applications of neural networks
8. Real applications of neural networks