Metodos de Optimizacion

71 (7. Advanced Optimization Methods So far inthis book, we have only studied and used Stochastic Gradient Descent (SGD) to optimize ‘our networks ~ but there are other optimization methods that are used in deep learning. Specifically, these more advanced optimization techniques seck to either: 1, Reduce the amount of time (i.c., number of epochs) to obtain reasonable classification 2, Make the network more “well-behaved fora larger range of hyperparameters other than the learning rate 3. Ideally, obtain higher classification aecuracy than what is possible with SGD. With the latest incamation of deep learning, there has been an explosion of new optimization techniques, each seeking to improve on SGD and provide the concept of adaptive learning rates. AS ‘we know, SGD modifies all parameters in a network egually in proportion toa given learning rate However, given that the learning rate of a network is (1) the most important hyperparameter to tune and (2) a hard, tedious hyperparameter to set correctly, deep learning researchers have postulated that it's possible to adaptively tune the learning rate (and in some cases, per parameter) as the network trains, In this chapter, we'll review adaptive leaming rate methods. I'l also provide suggestions on ‘which optimization algorithms you should be using in your own projects. Adaptive Leaming Rate Methods In order to understand each of the optimization algorithms in this section, we are going to examine them in terms of pseudocode — specifically the update step. Much of this chapter has been inspired by the excellent overview of optimization methods by Karpathy [26] and Ruder (27], We'll extend (and in some cases, simplify) their explanations of these methods to make the content more digestible, “To get started, let's take a look at an algorithm we are already familiar with — the update phase of vanilla SGD. mow84 Chapter 7. Advanced Opt Here we have three values: 1, W: Our weight matrix. 2. Le: The learning rate 3. a: The gradient of W. (ur learning rate here is fixed and, provided itis small enough, we know our loss will decrease during taining. We've also seen extensions to SGD which incorporate momentum and Nesterov acceleration in Chapter 7. Given this notation, let’s explore common adaptive learning rate optimizers you will encounter in your deep learning career. Adagrad ‘The first adaptive learning rate method we are going to explore is Adagrad, first introduced by Duchi et al [28], Adagrad adapts the learning rate to the network parameters. Larger updates are performed on parameters that change infrequently while smaller updates are done on parameters that change frequently Below we can see a pseudocode representation of the Adagrad update: cache Cai = Ws También podría gustarte
Ecuaciones Paramétricas
Documento7 páginas
Ecuaciones Paramétricas
xolar2002
Aún no hay calificaciones
Sintesis Celular
Documento23 páginas
Sintesis Celular
xolar2002
Metodo de Diferencias para Ecuaciones Elipticas
Metodo de Diferencias para Ecuaciones Elipticas
xolar2002
Revistas
Podcasts
Partituras
Teoria Espacios Vectoriales Lineales
Teoria Espacios Vectoriales Lineales
xolar2002
Material Genetico - Naturaleza y Estructura
Material Genetico - Naturaleza y Estructura
xolar2002
Material Genetico - Organizacion
Material Genetico - Organizacion
xolar2002
Diagnostico de Regresion
Diagnostico de Regresion
xolar2002
Genomic A
Genomic A
xolar2002
Cerebro PDF
Documento6 páginas
Cerebro PDF
jheyson
Sesion 01 Sistemas
Sesion 01 Sistemas
xolar2002
Analisis de Genes
Analisis de Genes
xolar2002
Celula Cancerosa
Celula Cancerosa
xolar2002
Arreglos - Lista de Ejercicios
Documento1 página
Arreglos - Lista de Ejercicios
xolar2002
Libro Electrónico II Estadística Inferencial I
Libro Electrónico II Estadística Inferencial I
Pepe Pecas
Cap 1
Documento4 páginas
Cap 1
xolar2002
Analysis Synthesis
Documento2 páginas
Analysis Synthesis
xolar2002
Estructura Secuencial
Documento4 páginas
Estructura Secuencial
xolar2002
Matematicas para Algoritmos
Matematicas para Algoritmos
xolar2002
Fundamentos de Marketing
Fundamentos de Marketing
xolar2002
Analisis de Algoritmos
Analisis de Algoritmos
xolar2002
Fundamentos de Inteligencia de Negocios
Fundamentos de Inteligencia de Negocios
xolar2002
Hoja01 PDF
Documento5 páginas
Hoja01 PDF
xolar2002
Ejercicios de Programacion - Hoja 1
Documento1 página
Ejercicios de Programacion - Hoja 1
xolar2002
Sistemas de Representación
Sistemas de Representación
xolar2002
Apuntes Ingenieria Sistemas 3
Apuntes Ingenieria Sistemas 3
Nigenda Jazuries
Algoritmos Ejercicios Resueltos
Documento7 páginas
Algoritmos Ejercicios Resueltos
xolar2002
Time Analysis
Time Analysis
xolar2002
Rubrica - Entregable2
Documento1 página
Rubrica - Entregable2
xolar2002
03 - Ejercicio Formulas - 1
Documento5 páginas
03 - Ejercicio Formulas - 1
xolar2002

Metodos de Optimizacion

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Metodos de Optimizacion

Cargado por

Copyright:

Formatos disponibles

También podría gustarte