Está en la página 1de 31

Genetic Algorithms

Overview
Genetic Algorithms: a gentle introduction
What are GAs How do they work/ Why? Critical issues

Use in Data Mining


GAs and statistics decile performance maximization multi-objective models

Natural Genetics to AI
Computational models inspired by biological evolution
survival of the fittest reproduction through cross-breeding

Population based search (parallel)

Genetic Algorithms

simultaneous search from multiple points in search space useful in complex, unstructured search spaces (less prone to local failures)

Population members: potential solutions

Population of solutions evolve from one generation to the next

Genetic Algorithms
Search objective
Fitness score for population members (fitness function)

Survival of the fittest


selection

Generating new solutions


Mating and reproduction of individuals (crossover, mutation)

Selection
String1 (f1) String2 (f2) String3 (f3) String4 (f4) ... ... StringN (fN)

Basic Operation Recombination


String1 String2 String2 String4 ... ... Stringx

Crossover Mutation Offspring1(1,4)


Offspring2(1,4) Offspring3(2,7) Offspring4(2,7) ... ... OffspringN(x,y)

Generation t

Generation t+1

GAs: Parallel Search


Fitness

Hill climber

GAs: Basic Principles


Representation of individuals
String of parameters (genes) : chromosome
eg. optimize a function F(p,q,r,s,t) Population members: p q r s t

genotype and phenotype

Binary representation?
Population members as bit strings
F( p,q,r,s,t) as: 10011010110110011010

early theory in terms of binary strings theorem) unnecessary perversity?

(schema

GAs: Basic Principles


Survival of the fittest (Fitness function)
numerical figure of merit/utility measure of an individual tradeoff amongst a multiple evaluation criteria efficient evaluation

GAs: Basic Principles


Iterative search
population evolves over generations

Convergence
progression towards uniformity in population premature convergence? (local optima)

Typical GA Run
Fitness Best

Average

Generations

Operators: Selection
Fitness proportionate selection (fi/f ) number of reproductive trials for individuals

Selection
Roulette-wheel selection
(stochastic sampling with replacement) wheel spaced in proportion to fitness values N (pop size) spins of the wheel

Stochastic universal sampling


N equally spaced pins on wheel single turn of the wheel

Premature converge Fitness scaling f = f - (2*avg. - max.)


Ranked fitness Elitism Steady-state selection Demetic grouping

Selection

Operators: Crossover
Parent 1: axpsqvqbtpihd Parent 2: qzxxaycgbtphw
crossover sites

Offspring 1: azpsavcbtpphd Offspring 2: qxxxqyqgbtihw


(Uniform crossover)

combining good building blocks

Operators: Mutation
alters each gene with small probability
x1yx0y0yy0x yxy x1yx0y1yy0x xxy

Non-Binary Representations
Integer, real-number, order-based, rules, ... Binary or Real-valued?
real representations give faster, more consistent, more accurate results

High-level representation
intuitive, can utilize specialized operators effective search over complex spaces

Real-valued representation
Parent1: Parent2: 3.45 0.56 6.78 0.976 2.5 0.98 1.06 4.20 0.34 1.8

Offspring1: 3.22 0.56 6.78 0.65 2.12 Offspring2: 1.43 1.06 4.20 0.41 1.93

(Arithmetic crossover)

High-level representation
Parent1: Parent2:
{(1.2 x1 3.4)(5.8 x2 6.0)(0.2 x7 0.61)}

{(2.3 x6 41) (36 x2 51) (51 x4 561) . . . . .


(03 x3 11) (2.2 x9 2.7)} . .
{(1.2 x1 3.4) (2.2 x9 2.7) (51 x4 561)} . .

Offspring1: Offspring2:

{(2.3 x6 41) [(36 x2 51) (5.8 x2 6.0)] . . .


(03 x3 11) (0.2 x7 0.61)} . .

High-level representation
Generalize/Specialize
{(03 x3 11) (2.2 x9 2.7)} . .
{(03 x3 11) (2.2 x9 2.7) (51 x4 62)} . . . .

{(03 x3 11) (2.2 x9 2.7)} . .


{(045 x3 09) (19 x9 2.9)} . . .

Tree-structured representation (GP)


Automated learning of programs (originally)
parse tree expressions *

Non-linear interaction terms Function set : internal nodes


{+,-,*,/,log}

log

y x 5 (x log(y))/5)

terminal set: leaf nodes


{constants, variables}

Tree-structured representation
Representing complex patterns
if If (y<7) and (x>2) then 0 else 2x+y

AND < >

Genetic search: Issues Coding scheme, fitness function critical


the art in GA design! General mechanism so robust that, within reasonable margins, parameter settings are not critical.

Representation to match problem, domain


utilizing domain knowledge problem-specific crossover, mutation, selection

Flexibility in fitness function formulation


modeling business objectives

Genetic search: Issues


Stochastic search
initial populations, probabilistic operators multiple runs with different random streams Initializing population with known solutions seeding initial population with solutions from multiple, independent runs

Genetic search: Issues


Guarantees optimality?
But...

GAs and traditional techniques


especially useful where traditional approaches fail in conjunction with traditional techniques

Parallelizable for large data


multi-processor, networked machines

Using GAs ?
When to use a GA? GA and traditional techniques How long does it take? Will it perform better?

Using GAs
population size mutation, crossover rates how many generations multiple runs

Is it a black-box?

?
Data characteristics Fitness function GA parameters

Huh?

GA Application Examples
Function optimizers
difficult, discontinuous, multi-modal, noisy functions

Combinatorial optimization
layout of VLSI circuits, factory scheduling, traveling salesman problem

Design and Control


bridge structures, neural networks, communication networks design; control of chemical plants, pipelines

GA Application Examples
Machine learning
classification rules, economic modeling, scheduling strategies Portfolio design, optimized trading models, direct marketing models, sequencing of TV advertisements, adaptive agents, data mining, etc.

También podría gustarte