Documentos de Académico
Documentos de Profesional
Documentos de Cultura
2 Etapa 2 46
2.1 Manipulación de Matrices en R . . . . . . . . . . . . . . . . . 46
2.2 Ejercicio 1 y 2 Manipulación de Matrices en R . . . . . . . . . 52
2.3 Ejercicio Manipulación de Matrices en R . . . . . . . . . . . . 55
2.3.1 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.4 Matplot en R . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.5 Funciones en R . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.5.1 Uso de funciones . . . . . . . . . . . . . . . . . . . . . 86
2.6 Manejo de Data Frames en R vol 1 . . . . . . . . . . . . . . . 95
2.6.1 Explorando los Datos . . . . . . . . . . . . . . . . . . . 96
2.6.2 Operaciones basicas con DataFrame . . . . . . . . . . . 102
2.7 Manejo de Data Frames en R vol 2 . . . . . . . . . . . . . . . 107
2.8 Visualización de Data Frames en R . . . . . . . . . . . . . . . 112
2.8.1 Visualizando solo lo que necesitamos . . . . . . . . . . 112
2.8.2 Enriquecimiento de Data Frames en R . . . . . . . . . 114
2.8.3 Enriqueciendo Data Frames . . . . . . . . . . . . . . . 116
2.8.4 Visualizando con una nueva división . . . . . . . . . . 117
2.9 Ejercicio Manejo de Data Frames en R . . . . . . . . . . . . . 118
1
2.9.1 Filtrando información del Data Frames T - F . . . . . 120
2.10 Graficación en R . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.10.1 Aspecto . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.10.2 Grafica por capas . . . . . . . . . . . . . . . . . . . . . 132
2.10.3 Sobrescribe los estéticos de la gráfica . . . . . . . . . . 135
2.10.4 Mapeo vs. Ajuste . . . . . . . . . . . . . . . . . . . . . 139
2.10.5 Histogramas y Gráficos de densidad . . . . . . . . . . . 142
2.11 Ejercicio Graficación en R . . . . . . . . . . . . . . . . . . . . 147
2.12 Ejercicio de estructuración de datos en R . . . . . . . . . . . . 153
2.13 Ejercicio 2 estructuración de datos en R . . . . . . . . . . . . 160
2.14 Introducción a la limpieza de datos en R . . . . . . . . . . . . 170
2.15 Seguimiento a la limpieza de datos en R . . . . . . . . . . . . 184
2.15.1 Reemplazando la información faltante: análisis basado
en hechos . . . . . . . . . . . . . . . . . . . . . . . . . 192
3 Etapa 3 204
3.1 Ejercicio de Manejo de datos en R . . . . . . . . . . . . . . . . 204
3.2 Ejercicio Transformación de Datos en R . . . . . . . . . . . . 215
3.3 Vizualización Gráfica en R Resultados de VGchartz . . . . . . 220
2
1 Etapa 1
1.1 Instalación de paquetes
Paquetes de Uso para el Curso de Data Science
Paquetes del Bloque 2 : Paquete Principal - ggplot2
i n s t a l l . packages ( ” g g p l o t 2 ” )
Paquetes que requiere ggplot2 para funcionar
i n s t a l l . packages (” assertthat ”)
i n s t a l l . packages ( ” bio3d ” )
i n s t a l l . packages (” cli ”)
i n s t a l l . packages (” colorspace ”)
i n s t a l l . packages (” fansi ”)
i n s t a l l . packages (” glue ”)
i n s t a l l . packages (” gtable ”)
i n s t a l l . packages (” labeling ”)
i n s t a l l . packages (” lazyeval ”)
i n s t a l l . packages ( ” munsell ” )
i n s t a l l . packages (” pillar ”)
i n s t a l l . packages (” plyr ”)
i n s t a l l . packages ( ” RColorBrewer ” )
i n s t a l l . packages ( ” reshape2 ” )
i n s t a l l . packages (” scales ”)
i n s t a l l . packages (” stringi ”)
i n s t a l l . packages (” stringr ”)
i n s t a l l . packages (” tbble ”)
i n s t a l l . packages (” utf8 ”)
i n s t a l l . packages (” viridisLite ”)
i n s t a l l . packages ( ” withr ” )
Paquetes del Bloque 3: Paquetes Principales ggplot2, gridExtra
i n s t a l l . packages ( ” g g p l o t 2 ” )
i n s t a l l . packages ( ” g r i d E x t r a ” )
Paquetes del Bloque 4: Paquetes Principales ggplot2, stringr
i n s t a l l . packages ( ” g g p l o t 2 ” )
i n s t a l l . packages ( ” s t r i n g r ” )
3
Paquetes del Bloque 5: Paquetes Principales ggplot2, shiny
i n s t a l l . packages ( ” g g p l o t 2 ” )
i n s t a l l . packages ( ” s h i n y ” )
Paquetes que requiere shiny para funcionar
i n s t a l l . packages ( ”BH” )
i n s t a l l . packages ( ” crayon ” )
i n s t a l l . packages (” digest ”)
i n s t a l l . packages (” htmltools ”)
i n s t a l l . packages ( ” httpuv ” )
i n s t a l l . packages (” jsonlite ”)
i n s t a l l . packages (” later ”)
i n s t a l l . packages (” magrittr ”)
i n s t a l l . packages ( ”mime” )
i n s t a l l . packages ( ” promises ” )
i n s t a l l . packages ( ”R6” )
i n s t a l l . packages ( ”Rcpp” )
i n s t a l l . packages (” rlang ”)
i n s t a l l . packages (” sourcetools ”)
i n s t a l l . packages (” xtable ”)
4
1.2 Primeros pasos
A diferencia de otros lenguajes de programación, tan solo es necesario el
poner en comillas (“ ”) lo que quieres que se imprima en terminal. R es un
lenguaje de programación intuitivo, resulta ser una herramienta altamente
eficiente y simple de implementar una vez dominas las bases.
1 > " Hello world ! "
Hello world!
Hello world!
5
1.3 Variables en R
Entero (integer)
1 >x <- 2 L
2 > typeof ( x ) # typeof = Te indica el tipo de variable
integer
Doble (double)
1 >y <- 2.5
2 > typeof ( y )
double
Complejo (Complex)
1 >z <- 3+2 i
2 > typeof ( z )
complex
Caracter (character)
1 >a <- " h "
2 >b <- " 2 "
3 > typeof ( a )
character
1 > typeof ( b )
character
logical
6
Logico F= Falso (logical F = FALSE)
1 > q2 <- F
2 > typeof ( q2 )
logical
7
1.4 Manejo de Variables en R
Una variable puede ser o contener cualquier número, letra, arreglo, función,
etc. Básicamente es un espacio en la memoria de la computadora en la cual
estas guardando uno o múltiples valores. La indicación < − representa que
se va a guardar en la variable (también se puede utilizar el “=”)
1 >A <- 5
2 >B <- 5
3 >A = 10
4 >B = 10
5
6 >C <- A + B
7 >C
20
El nombre que utilicemos para definir una variable, puede ser cualquier
combinación de números y caracteres deseados. Se recomienda ampliamente
el utilizar nombres que te sean lo suficientemente simples e intuitivos para
no perder el hilo de las posibles operaciones y funciones que más adelante
llegaras a realizar.
Variable 1
1 > var1 <- 2.5
double
Variable 2
1 var2 <- 4
8
1 resultado <- var1 / var2
2 resultado
0.625
sqrt(), es una función que como su acrónimo en inglés lo indica sirve para
calcular la raı́z cuadrada de lo que coloquemos en los “()”
1 resp <- sqrt ( var2 )
2 resp
2
1 saludo <- " Hola "
2 nombre <- " Bob "
3 Mensaje <- paste ( saludo , nombre )
4 # " paste " , es una funci \ ’ on que une dos elementos en una
variable
5 ? paste ()
6 Mensaje
Hola Bob
HolaBob
1 Mensaje3 <- paste ( saludo , nombre , sep = " @ " )
2 Mensaje3
Hola@Bob
HolaBob@gmail.com
9
1.5 Verdadero-Falso en R
Los argumentos lógicos son la medula espinal de la ejecución de muchas de
las funciones que estaremos empleando a lo largo de las rutinas del curso.
1 4 < 5
True
1 10 > 100
False
1 4 == 5
False
== igual a
!= no igual a
< menor que
> mayor que
<= menor o igual que
>= mayor o igual que
! no
| o (or)
& y (and)
isTRUE(x)
10
True
1 typeof ( res )
logical
False
False
True
El argumento “&” implica que si uno de los dos es Falso, el resultado igual-
mente lo será
1 res & res2
False
1 isTRUE ( res )
True
11
1 isTRUE (4 > 5)
False
1 isTRUE (4 < 5)
True
False
12
1.6 Loop while en R
While = Mientras. Función que ejecuta la siguiente idea, mientras(”Esto
sea verdadero”) ejecuta esto, en el momento que llegue a ser Falso deja de
correrlo.
5 while ( TRUE ) {
6 print ( " Hola " )
7 } # Imprime Hola Indefinidamente ya que el argumento siempre
sera verdadero
1 conteo <- 1
2 while ( conteo < 12) {
3 print ( conteo )
4 } # En este caso el ‘‘ argumento ’ ’ nunca va a ser menor que 12
por ende el 1 se imprimir \ ’ a indefinidamente
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11 12
13
1.7 Loop for en R
Los loop tipo “for” sirven para ejecutar una indicación por cada elemento
disponible en un vector, lista, tabla, que se le indique.
Hola R
Hola R
Hola R
Hola R
Hola R
1 typeof ( i )
integer
Hola R
Hola R
Hola R
Hola R
Hola R
Hola R
1 for ( i in 5:10) {
2 print ( i )
3 }
14
5
6
7
8
9
10
A partir de este simple ejercicio podemos ver que i es una variable la cual
podemos emplear dentro del cuerpo del loop.
En el siguiente ejemplo definimos la variable “i” del loop “for” con cada
elemento en el vector ”fruta”.
1 for ( i in fruta ) {
2 i <- paste (i , " es una fruta " , sep = " , " )
3 print ( i ) # paste () une dos o m \ ’ as caracteres definidos por
una separaci \ ’ on " sep "
4 }
1 fruta [1]
2 fruta [4]
3 fruta [5] # Error
15
Manzana
Plátano
NA
1234
1 4 9 16
1
4
9
16
16
1.8 Loop if-else en R
rnorm = Generador de n\’umeros al azar con una Distribuci\’on
Normal
rnorm(1) = Genera un numero al azar, si cambi\’aramos a (n),
genera n numero al azar en forma de un vector.
-3—- -2 —- -1 —- 0 —- 1 —- 2 —-3
-0.5551363
rm = Remueve (variable del ambiente global)
1 rm ( respuesta )
El ”if” es un loop que por sı́ solo tiene una gran variedad de aplicaciones.
No obstante su verdadera fortaleza reside al ser ejecutado en conjunto con el
”else”.
17
“if” Anidado
1 if ( x > 1) {
2 respuesta <- " mayor que 1 "
3 } else { # else - if argumento anidado = para definir una nueva
regla o condici \ ’ on if
4 if ( x >= -1) {
5 respuesta <- " Entre -1 y 1 "
6 } else { respuesta <- " menor que -1 " # un ultimo else nos
puede ofrecer la informaci \ ’ on que nos falta por cubrir
7 }
8 }
Encadenando argumentos
1 rm ( respuesta )
2 x <- rnorm (1)
3 if ( x > 1) {
4 respuesta <- " Mayor que 1 "
5 # else if = seria un equivalente a en cambio s \ ’ i ocurre (
esto ) { entonces corre esto }
6 } else if ( x >= -1) { # else if es una forma mas elegante y de
una sola linea para correr el else { if {}} anidado
7 respuesta <- " Entre -1 y 1 "
8 } else { # Si por ultimo no era opci \ ’ on 1 o 2 , entonces seria
{ esto }
9 respuesta <- " Menor que -1 "
10 }
18
1.9 Ejercicio Loops en R
Problema: Quiero ver cuantas veces la función rnorm cae entre -1 y 1 en
un numero N de veces
1 N <- 10000
2 rnorm ( N ) # Va a generar N n \ ’ umeros al azar con un intervalo
de -3 a 3 cada uno
1 a <- 0
¿Qué loop puede ver cada uno de los elementos de un vector para ejecutar
una tarea?
Respuesta: “for”
19
1.10 Asignatura 1 en R
Quiero evaluar la precisión del modelo empleado por el lenguaje de progra-
mación para generar los números aleatorios con Distribución Normal
0.1%--2.1%-----13.6%--------68.2%-------13.6%-----2.1%--0.1%
<-3---- -2 -------- -1 --- 0 --- 1 -------- 2 ---- 3>
20
31 # Lo siguiente nos representa la facci \ ’ on de cada evento al
cada uno ser dividido por la N total
32 UnoMenosUno <- a / N
33 MenosUnoMenosDos <- b / N
34 Men osDosM enosTr es <- c / N
35 UnoDos <- d / N
36 DosTres <- e / N
37
38 # Al multiplicar el resultado por 100 podemos obtener el
porcentaje de cada evento
39 UnoMenosUno * 100 # Real 68.2%
40 MenosUnoMenosDos * 100 # Real 13.6%
41 UnoDos * 100 # Real 13.6%
42 Men osDosM enosTr es * 100 # Real 2.1%
43 DosTres * 100 # Real 2.1%
Entre 1 y 2
Entre -1 y 1
Entre -1 y 1
Entre -1 y 1
Entre -1 y 1
...
21
1.11 Vectores en R
c(*,*,*,*) = Combina (num o "caracteres" en un vector),
una de las restricciones de los vectores en R es el no
poder combinar dos tipos de elementos en un solo
vector.
Integer = entero
1 is . numeric ( MiPrimerVector )
TRUE
1 is . integer ( MiPrimerVector )
FALSE
TRUE
TRUE
TRUE
FALSE
22
¿Que va a ocurrir con el 7 en el vector X3?
1 X3 <- c ( " Pedro " , " Z3 " , " Hola " , 7)
2 X3
3 is . character ( X3 )
4 is . numeric ( X3 )
5 is . integer ( X3 )
TRUE
FALSE
FALSE
1 seq (1 ,15)
2 1:15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
seq() te permite pasar espacios que “:” no, básicamente seria (de donde,
a donde, con que intervalo)
1 seq (1 ,15 ,2)
2 z <- seq (1 ,15 ,4)
3 z
4
23
1 3 5 7 9 11 13 15
1 5 9 13
33333333333333333333333333333333333
333333333333333
“a” “a” “a” “a” “a”
40 15 40 15 40 15 40 15 40 15 40 15 40 15 40 15 40 15 40 15
“a”
“b”
“c”
24
w[1:3] muestra los elementos del 1 al 3 del vector, recuerden que “:” es
similar a la función “seq()” la cual va a generar un vector que va del 1 al
3. El cual al ponerlo dentro de “[ ]” sirve como indicación para mostrar la
información que contiene el vector de esos elementos.
1 w [1:3]
2 w [3:5]
“a”, “b”
“a”, “b”
25
1.12 Operaciones con vectores en R
“rnorm()” es una función que genera valores aleatorios que respetan una
distribución normal
1 N <- 10000
2 a <- rnorm ( N )
3 b <- rnorm ( N )
Enfoque vectorizado en R
1 c <- a * b
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
41: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
61: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
81: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...
26
Por su arquitectura, R es un lenguaje operativo de segundo orden. Esto
implica que manda llamar librerı́as de c++ y fortran para que realicen las
tareas según se requieran y el resultado lo reporta Rstudio, lo anterior hace
de R un lenguaje de programación muy eficiente y preciso.
27
1.13 Ejercicio - Alineador de Secuencias en R
Definimos las variables de las secuencias que vamos a comparar
1 sec1 <- " A T G A A G T A T A G T T T G C T C C T C T T C C T T G C T C C G C T T G G A G T A T G G A G C C G
2 TGCCTGTACATGCGGGCAGGCAAATCAGAATGGCGCCTATTCGAGAAATG "
3 sec2 <- " A T G A A G T A T A G T T T C C T C C T C T T C C T T G C T C C G C T T G G A G T A T G G A G C C G
4 TGCCTATACATACGGGCAGGCAAATCCGAATGGCGCCGATTCGAGAAATG "
5 sec3 <- " A T G A A G T T A A G T T T C C T C C T C T T C C T T G C T C C G C T T G C T G T A T G G A G C C G
6 TGCCTATACATACGGGCAGGCAAATCCGAATGGCGCCGATTCGAGAACTG "
1 rm ( sec1 )
2 rm ( sec2 )
3 rm ( sec3 )
4 rm ( sec1split )
5 rm ( sec2split )
6 rm ( sec3split )
28
1 ID <- sec1splitvector == sec2splitvector
2 ID # En ID se est \ ’ a guardando el registro de la comparaci \ ’
on , que b \ ’ asicamente es si ( T ) o no ( F ) . El resultado es un
vector de booleanos
3 ID2 <- sec2splitvector == sec3splitvector
4 ID3 <- sec1splitvector == sec3splitvector
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
La función “which()” sirve para mandar llamar el o los números del index
de un vector que cumple una caracterı́stica deseada.
Básicamente preguntas ”cuales(de estos elementos cumplen esta regla)”
1 Dist1F <- which ( ID == FALSE )
2 Dist1F
15 56 62 77 88
123 4 5 6 7 8 9 10 11 12 13 14 16 17 18 19 20 21 22 23 24 25 26
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 57 58 59 60 61 63 64 65 66 67 68 69 70 71 72
73 74 75 76 78 79 80 81 82 83 84 85 86 87 89 90 91 92 93 94 95 96
97 98 99 100
29
1 Dist2T <- which ( ID2 == TRUE )
2 Dist3T <- which ( ID3 == TRUE )
G
G
G
A
T
1 x1
2 Dist1F
3 y
GGGAT
15 56 62 77 88
15 56 62 77 88
A
T
G
A
A
30
1 x3 <- integer (0)
2 for ( i in Dist3F ) {
3 print ( sec3splitvector [ i ])
4 x3 <- append ( x3 , sec3splitvector [ i ])
5 }
T
A
C
C
T
A
A
C
G
C
1 Dist1F
2 x1
3 Dist2F
4 x2
5 Dist3F
6 x3
15 56 62 77 88
‘G’ ‘G’ ‘G’ ‘A’ ‘T’
8 9 38 39 98
‘A’ ‘T’ ‘G’ ‘A’ ‘A’
8 9 15 38 39 56 62 77 88 98
‘T’ ‘A’ ‘C’ ‘C’ ‘T’ ‘A’ ‘A’ ‘C’ ‘G’ ‘C’
100
31
Si divido el resultado de “length(Dist1T)” entre “length(sec1splitvector)”
serı́a como dividir el número de registros TRUE entre el total, lo cual darı́a
la fracción de identidad
1 IdenT <- length ( Dist1T ) / length ( sec1splitvector )
0.95
0.05
0.95
0.05
32
1 c <- 0
2 d <- 0
3 for ( i in ID2 ) {
4 if ( i == TRUE ) {
5 c <- c + 1
6 } else if ( i == FALSE ) {
7 d <- d + 1
8 }
9 }
10 Id2enT <- c / length ( sec1splitvector )
11 Id2enF <- d / length ( sec1splitvector )
12 print ( Id2enT )
13 print ( Id2enF )
0.95
0.05
1 e <- 0
2 f <- 0
3 for ( i in ID3 ) {
4 if ( i == TRUE ) {
5 e <- e + 1
6 } else if ( i == FALSE ) {
7 f <- f + 1
8 }
9 }
10 Id3enT <- e / length ( sec1splitvector )
11 Id3enF <- f / length ( sec1splitvector )
12 print ( Id3enT )
13 print ( Id3enF )
0.9
0.1
33
1.14 Funciones en R
“rnorm()” es una función. Qué y cómo va a realizar su ejecución, dependerá
de los parámetros que en ella definamos.
1 ? rnorm ()
2 rnorm (5 , 10 , 8) # Los resultados son al azar en un rango de
-14 y 34 con media de 10
1 ? c ()
2 c ()
NULL
34
“a” “b” “c”
1 ? seq ()
2 seq ( from =10 , to =20 , by =3)
10 13 16 19
1 seq ( from =10 , to =20 , length . out =100) # " length . out ": longitud
total de valores que abra desde 10 hasta 20
10 15 20
35
1 ? rep ()
2 rep (5 , 10)
3 rep (5:6 , times =10)
4 rep (x , times =5)
5555555555
56565656565656565656
”a” ”b” ”c” ”a” ”b” ”c” ”a” ”b” ”c” ”a” ”b” ”c” ”a” ”b” ”c”
55555555556666666666
”a” ”a” ”a” ”a” ”a” ”b” ”b” ”b” ”b” ”b” ”c” ”c” ”c” ”c” ”c”
1 print ( x )
1 is . numeric ( x )
FALSE
1 is . integer ( x )
FALSE
1 is . double ( x )
FALSE
1 is . character ( x )
TRUE
1 typeof ( x )
36
character
10 15 20
3.162278 3.872983 4.472136
1 B <- sqrt ( A )
2 paste (B , A , sep = " + " )
1.14.1 Paquetes en R
“install.packages()” sirve para buscar la librerı́a(x), la cual cuenta con un
gran número de funciones relacionadas.
1 install . packages ( " ggplot2 " )
2 library ( ggplot2 )
3
4 ? qplot ()
5 ? ggplot ()
6 ? diamonds
7 View ( diamonds )
37
1 qplot ( data = diamonds , carat , price ,
2 colour = clarity , facets =. ~ clarity )
Otro ejemplo:
1 install . packages ( " bio3d " )
2 library ( bio3d ) # " bio3d () " es un conjunto de funciones que
permiten manipular secuencias de nucle \ ’ otidos
3
4 getwd ()
38
C:/Users/Username/Documents * Varia la dirección “C:”
dependiendo de cada computadora
Ej1 Ej2
Ej1 1.00 0.91
Ej2 0.91 1.00
39
1.15 Ejercicio Contabilidad en R
Datos
1 ingresos <- c (14574.49 , 7606.46 , 8611.41 , 9175.41 , 8058.65 ,
8105.44 , 11496.28 , 9766.09 , 10305.32 , 14379.96 , 10713.97 ,
15433.50)
2 gastos <- c (12051.82 , 5695.07 , 12319.20 , 12089.72 , 8658.57 ,
840.20 , 3285.73 , 5821.12 , 6976.93 , 16618.61 , 10054.37 ,
3803.96)
Básicamente lo que tendrı́amos que hacer es réstale los gastos a los ingre-
sos
1 ganancia <- ingresos - gastos
2 ganancia
40
1 ganancia . despues . iva <- ganancia - iva
2 ganancia . despues . iva
1862.305
Queremos encontrar los meses en los cuales la ganancia después del iva
estuvo por encima de la Media ¿cómo podrı́amos hacerlo?
1 Meses . buenos <- ganancia . despues . iva > median _ pat
2 Meses . buenos
41
¿Cómo encontrarı́amos los Meses Malos?
1 Meses . Malos <- ! Meses . buenos
2 Meses . Malos
1 mejor . mes <- ganancia . despues . iva == max ( ganancia . despues . iva
)
2 mejor . mes
1 peor . mes <- ganancia . despues . iva == min ( ganancia . despues . iva )
2 peor . mes
42
Imprime los resultados
1 ingresos .1000
15 8 9 9 8 8 11 10 10 14 11 15
1 gastos .1000
12 6 12 12 9 1 3 6 7 17 10 4
1 ganancia .1000
3 2 -4 -3 -1 7 8 4 3 -2 1 12
2 2 -3 -2 -1 6 7 3 3 -2 1 10
1 margen . ganancia
1 Meses . buenos
1 Meses . Malos
1 mejor . mes
43
1 peor . mes
1 6 7 8 9 12
2 3 4 5 10 11
12
1 M <- rbind (
2 ingresos .1000 ,
3 gastos .1000 ,
4 ganancia .1000 ,
5 ganancia . despues . iva .1000 ,
6 margen . ganancia ,
7 Meses . buenos ,
8 Meses . Malos ,
9 mejor . mes ,
10 peor . mes
11 )
12
13 M
44
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
ingresos.1000 15 8 9 9 8 8 11
gastos.1000 12 6 12 12 9 1 3
ganancia.1000 3 2 -4 -3 -1 7 8
ganancia.despues.iva.1000 2 2 -3 -2 -1 6 7
margen.ganancia 15 21 -36 -27 -6 75 60
Meses.buenos 1 0 0 0 0 1 1
Meses.Malos 0 1 1 1 1 0 0
mejor.mes 0 0 0 0 0 0 0
peor.mes 0 0 1 0 0 0 0
[,8] [,9] [,10] [,11] [,12]
ingresos.1000 10 10 14 11 15
gastos.1000 6 7 17 10 4
ganancia.1000 4 3 -2 1 12
ganancia.despues.iva.1000 3 3 -2 1 10
margen.ganancia 34 27 -13 5 63
Meses.buenos 1 1 0 0 1
Meses.Malos 0 0 1 1 0
mejor.mes 0 0 0 0 1
peor.mes 0 0 0 0 0
45
2 Etapa 2
2.1 Manipulación de Matrices en R
Las Matrices en R básicamente son tablas, que tiene información de vectores
de forma horizontal y vertical. Vector con 2 dimensiones.
1 ? matrix ()
Matriz
1 my . data <- 1:20
2 my . data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 A [2 ,3]
10
46
9 10 11 12
1 A [2 ,]
2 6 10 14 18
1 B
47
1 B [2 ,5]
10
c1 c2
[1,] 1 -1
[2,] 2 -2
[3,] 3 -3
[4,] 4 -4
[5,] 5 -5
Nombrando Vectores
1 Javier <- 1:5
2 Javier
12345
48
Dales nombres a las posiciones del vector.
1 names ( Javier ) <- c ( " a " ," b " ," c " ," d " ," e " )
2 Javier
abcde
12345
d
4
1 Javier [4]
d
4
1 names ( Javier )
abcde
12345
a a a b b b zZ zZ zZ
49
Recordemos que la función matrix(espera el vector, el no.de filas, el no.de
columnas).
1 Bravo <- matrix ( vec . temp , 3 , 3)
2 Bravo
NULL
1 rownames ( Bravo ) <- c ( " How " ," are " ," you " )
2 Bravo
por columna:
1 colnames ( Bravo )
NULL
50
1 Bravo [2 ,2]
51
2.2 Ejercicio 1 y 2 Manipulación de Matrices en R
Ejercicio 1
1 getwd ()
2
3 Examenes <- matrix (1:5 , 9 , 5 , byrow = TRUE )
4 View ( Examenes )
V1 V2 V3 V4 V5
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
6 1 2 3 4 5
7 1 2 3 4 5
8 1 2 3 4 5
9 1 2 3 4 5
1 colnames ( Examenes ) <- c ( " Lunes 24 " , " Martes 25 " , " Miercoles
26 " ," Jueves 27 " , " Viernes 28 " )
2 rownames ( Examenes ) <- c ( " 7:00 am " , " 8:00 am " , " 9:00 am " , " 10:00
am " ," 11:00 am " ," 12:00 pm " , " 1:00 pm " , " 2:00 pm " , " 3:00 pm " )
3 Examenes [ ,] <- " "
4 Examenes [ c ( " 11:00 am " , " 12:00 pm " ) ," Martes 25 " ] <- " GeF _ 461 _
Bs101 "
5 Examenes [ c ( " 11:00 am " , " 12:00 pm " ) ," Jueves 27 " ] <- " GeC _ 461 _
Bs101 "
6 Examenes [ c ( " 7:00 am " , " 8:00 am " ) ," Miercoles 26 " ] <- " GeF _ 462 _
Bs102 "
7 Examenes [ c ( " 11:00 am " , " 12:00 pm " ) ," Miercoles 26 " ] <- " GeC _ 462 _
Bs102 "
8
52
Lunes 24 Martes 25 Miercoles 26 Jueves 27 Viernes 28
7:00am GeF_462_Bs102
8:00am GeF_462_Bs102
9:00am
10:00am
11:00am GeF_461_Bs101 GeC_462_Bs102 GeC_461_Bs101
12:00pm GeF_461_Bs101 GeC_462_Bs102 GeC_461_Bs101
1:00pm
2:00pm
3:00pm
Ejercicio 2
1 getwd ()
2
3 Asist <- matrix (1:5 , 8 , 5 , byrow = TRUE )
4 View ( Asist )
V1 V2 V3 V4 V5
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
6 1 2 3 4 5
7 1 2 3 4 5
8 1 2 3 4 5
1 colnames ( Asist ) <- c ( " Sabado 1 " , " Sabado 8 " , " Sabado 15 " , "
Sabado 22 " , " Sabado 29 " )
2 rownames ( Asist ) <- c ( " 1 " , " 2 " , " 3 " , " 4 " , " 5 " , " 6 " , " 7 " , " 8 " )
3 Asist [ ,] <- " "
4
5 Equipo <- c ( " Hipolito " , " Aza " , " Cristian " , " Abelardo " , " Dante
" , " Paloma " , " Cristina " , " Sam " )
6 RowNo <- c ( " 1 " , " 2 " , " 3 " , " 4 " , " 5 " , " 6 " , " 7 " , " 8 " )
7
53
12 Asist [ RowNo [1:6] , " Sabado 29 " ] <- Equipo [ c ( -4 , -7) ]
13
14 install . packages ( ’ gridExtra ’)
15 library ( gridExtra )
16
17 pdf ( " Asist . pdf " , height =11 , width =8.5)
18 grid . table ( Asist )
19 dev . off ()
20
21 View ( Asist )
54
2.3 Ejercicio Manipulación de Matrices en R
Copyright: www.superdatascience.com.
Comments:
Seasons are labeled based on the first year in the season E.g. the 2012-
2013 season is preseneted as simply 2012
Notes and Corrections to the data:
• Kevin Durant: 2006 - College Data Used
•
Seasons
1 Seasons <- c ( " 2005 " ," 2006 " ," 2007 " ," 2008 " ," 2009 " ," 2010 " ," 2011 "
," 2012 " ," 2013 " ," 2014 " )
Players
1 Players <- c ( " KobeBryant " ," JoeJohnson " ," LeBronJames " ,"
CarmeloAnthony " ," DwightHoward " ," ChrisBosh " ," ChrisPaul " ,"
KevinDurant " ," DerrickRose " ," DwayneWade " )
Salaries
1 KobeBryant _ Salary <- c
(15946875 ,17718750 ,19490625 ,21262500 ,23034375 ,24806250 ,
2 25244493 ,27849149 ,30453805 ,23500000)
3 JoeJohnson _ Salary <- c
(12000000 ,12744189 ,13488377 ,14232567 ,14976754 ,16324500 ,
4 18038573 , 19752645 ,21466718 ,23180790)
5 LeBronJames _ Salary <- c
(4621800 ,5828090 ,13041250 ,14410581 ,15779912 ,14500000 ,
6 16022500 ,17545000 ,19067500 ,20644400)
7 CarmeloAnthony _ Salary <- c
(3713640 ,4694041 ,13041250 ,14410581 ,15779912 ,17149243 ,
55
8 18518574 , 19450000 ,22407474 ,22458000)
9 DwightHoward _ Salary <- c
(4493160 ,4806720 ,6061274 ,13758000 ,15202590 ,16647180 ,
10 18091770 ,19536360 ,20513178 ,21436271)
11 ChrisBosh _ Salary <- c
(3348000 ,4235220 ,12455000 ,14410581 ,15779912 ,14500000 ,
12 16022500 ,17545000 ,19067500 ,20644400)
13 ChrisPaul _ Salary <- c
(3144240 ,3380160 ,3615960 ,4574189 ,13520500 ,14940153 ,
14 16359805 ,17779458 ,18668431 ,20068563)
15 KevinDurant _ Salary <- c
(0 ,0 ,4171200 ,4484040 ,4796880 ,6053663 ,15506632 ,16669630 ,
16 17832627 ,18995624)
17 DerrickRose _ Salary <- c
(0 ,0 ,0 ,4822800 ,5184480 ,5546160 ,6993708 ,16402500 ,
18 17632688 ,18862875)
19 DwayneWade _ Salary <- c
(3031920 ,3841443 ,13041250 ,14410581 ,15779912 ,14200000 ,
20 15691000 ,17182000 ,18673000 ,15000000)
Matrix
Games
1 KobeBryant _ G <- c (80 ,77 ,82 ,82 ,73 ,82 ,58 ,78 ,6 ,35)
2 JoeJohnson _ G <- c (82 ,57 ,82 ,79 ,76 ,72 ,60 ,72 ,79 ,80)
3 LeBronJames _ G <- c (79 ,78 ,75 ,81 ,76 ,79 ,62 ,76 ,77 ,69)
4 CarmeloAnthony _ G <- c (80 ,65 ,77 ,66 ,69 ,77 ,55 ,67 ,77 ,40)
5 DwightHoward _ G <- c (82 ,82 ,82 ,79 ,82 ,78 ,54 ,76 ,71 ,41)
56
6 ChrisBosh _ G <- c (70 ,69 ,67 ,77 ,70 ,77 ,57 ,74 ,79 ,44)
7 ChrisPaul _ G <- c (78 ,64 ,80 ,78 ,45 ,80 ,60 ,70 ,62 ,82)
8 KevinDurant _ G <- c (35 ,35 ,80 ,74 ,82 ,78 ,66 ,81 ,81 ,27)
9 DerrickRose _ G <- c (40 ,40 ,40 ,81 ,78 ,81 ,39 ,0 ,10 ,51)
10 DwayneWade _ G <- c (75 ,51 ,51 ,79 ,77 ,76 ,49 ,69 ,54 ,62)
Matrix
1 Games <- rbind ( KobeBryant _G , JoeJohnson _G , LeBronJames _G ,
CarmeloAnthony _G , DwightHoward _G , ChrisBosh _G , ChrisPaul _G
, KevinDurant _G , DerrickRose _G , DwayneWade _ G )
2 rm ( KobeBryant _G , JoeJohnson _G , CarmeloAnthony _G , DwightHoward
_G , ChrisBosh _G , LeBronJames _G , ChrisPaul _G , DerrickRose _G
, DwayneWade _G , KevinDurant _ G )
3 colnames ( Games ) <- Seasons
4 rownames ( Games ) <- Players
Minutes Played
1 KobeBryant _ MP <- c
(3277 ,3140 ,3192 ,2960 ,2835 ,2779 ,2232 ,3013 ,177 ,1207)
2 JoeJohnson _ MP <- c
(3340 ,2359 ,3343 ,3124 ,2886 ,2554 ,2127 ,2642 ,2575 ,2791)
3 LeBronJames _ MP <- c
(3361 ,3190 ,3027 ,3054 ,2966 ,3063 ,2326 ,2877 ,2902 ,2493)
4 CarmeloAnthony _ MP <- c
(2941 ,2486 ,2806 ,2277 ,2634 ,2751 ,1876 ,2482 ,2982 ,1428)
5 DwightHoward _ MP <- c
(3021 ,3023 ,3088 ,2821 ,2843 ,2935 ,2070 ,2722 ,2396 ,1223)
6 ChrisBosh _ MP <- c
(2751 ,2658 ,2425 ,2928 ,2526 ,2795 ,2007 ,2454 ,2531 ,1556)
7 ChrisPaul _ MP <- c
(2808 ,2353 ,3006 ,3002 ,1712 ,2880 ,2181 ,2335 ,2171 ,2857)
8 KevinDurant _ MP <- c
(1255 ,1255 ,2768 ,2885 ,3239 ,3038 ,2546 ,3119 ,3122 ,913)
9 DerrickRose _ MP <- c
(1168 ,1168 ,1168 ,3000 ,2871 ,3026 ,1375 ,0 ,311 ,1530)
10 DwayneWade _ MP <- c
(2892 ,1931 ,1954 ,3048 ,2792 ,2823 ,1625 ,2391 ,1775 ,1971)
Matrix
1 MinutesPlayed <- rbind ( KobeBryant _ MP , JoeJohnson _ MP ,
LeBronJames _ MP , CarmeloAnthony _ MP , DwightHoward _ MP ,
57
ChrisBosh _ MP , ChrisPaul _ MP , KevinDurant _ MP , DerrickRose _ MP
, DwayneWade _ MP )
2 rm ( KobeBryant _ MP , JoeJohnson _ MP , CarmeloAnthony _ MP ,
DwightHoward _ MP , ChrisBosh _ MP , LeBronJames _ MP , ChrisPaul _
MP , DerrickRose _ MP , DwayneWade _ MP , KevinDurant _ MP )
3 colnames ( MinutesPlayed ) <- Seasons
4 rownames ( MinutesPlayed ) <- Players
Field Goals
1 KobeBryant _ FG <- c (978 ,813 ,775 ,800 ,716 ,740 ,574 ,738 ,31 ,266)
2 JoeJohnson _ FG <- c (632 ,536 ,647 ,620 ,635 ,514 ,423 ,445 ,462 ,446)
3 LeBronJames _ FG <- c (875 ,772 ,794 ,789 ,768 ,758 ,621 ,765 ,767 ,624)
4 CarmeloAnthony _ FG <- c
(756 ,691 ,728 ,535 ,688 ,684 ,441 ,669 ,743 ,358)
5 DwightHoward _ FG <- c (468 ,526 ,583 ,560 ,510 ,619 ,416 ,470 ,473 ,251)
6 ChrisBosh _ FG <- c (549 ,543 ,507 ,615 ,600 ,524 ,393 ,485 ,492 ,343)
7 ChrisPaul _ FG <- c (407 ,381 ,630 ,631 ,314 ,430 ,425 ,412 ,406 ,568)
8 KevinDurant _ FG <- c (306 ,306 ,587 ,661 ,794 ,711 ,643 ,731 ,849 ,238)
9 DerrickRose _ FG <- c (208 ,208 ,208 ,574 ,672 ,711 ,302 ,0 ,58 ,338)
10 DwayneWade _ FG <- c (699 ,472 ,439 ,854 ,719 ,692 ,416 ,569 ,415 ,509)
Matrix
1 FieldGoals <- rbind ( KobeBryant _ FG , JoeJohnson _ FG , LeBronJames
_ FG , CarmeloAnthony _ FG , DwightHoward _ FG , ChrisBosh _ FG ,
ChrisPaul _ FG , KevinDurant _ FG , DerrickRose _ FG , DwayneWade _
FG )
2 rm ( KobeBryant _ FG , JoeJohnson _ FG , LeBronJames _ FG ,
CarmeloAnthony _ FG , DwightHoward _ FG , ChrisBosh _ FG ,
ChrisPaul _ FG , KevinDurant _ FG , DerrickRose _ FG , DwayneWade _
FG )
3 colnames ( FieldGoals ) <- Seasons
4 rownames ( FieldGoals ) <- Players
58
5 DwightHoward _ FGA <- c
(881 ,873 ,974 ,979 ,834 ,1044 ,726 ,813 ,800 ,423)
6 ChrisBosh _ FGA <- c
(1087 ,1094 ,1027 ,1263 ,1158 ,1056 ,807 ,907 ,953 ,745)
7 ChrisPaul _ FGA <- c
(947 ,871 ,1291 ,1255 ,637 ,928 ,890 ,856 ,870 ,1170)
8 KevinDurant _ FGA <- c
(647 ,647 ,1366 ,1390 ,1668 ,1538 ,1297 ,1433 ,1688 ,467)
9 DerrickRose _ FGA <- c
(436 ,436 ,436 ,1208 ,1373 ,1597 ,695 ,0 ,164 ,835)
10 DwayneWade _ FGA <- c
(1413 ,962 ,937 ,1739 ,1511 ,1384 ,837 ,1093 ,761 ,1084)
Matrix
1 Fie ldGoal Attemp ts <- rbind ( KobeBryant _ FGA , JoeJohnson _ FGA ,
LeBronJames _ FGA , CarmeloAnthony _ FGA , DwightHoward _ FGA ,
ChrisBosh _ FGA , ChrisPaul _ FGA , KevinDurant _ FGA , DerrickRose
_ FGA , DwayneWade _ FGA )
2 rm ( KobeBryant _ FGA , JoeJohnson _ FGA , LeBronJames _ FGA ,
CarmeloAnthony _ FGA , DwightHoward _ FGA , ChrisBosh _ FGA ,
ChrisPaul _ FGA , KevinDurant _ FGA , DerrickRose _ FGA ,
DwayneWade _ FGA )
3 colnames ( Fiel dGoalA ttempt s ) <- Seasons
4 rownames ( Fiel dGoalA ttempt s ) <- Players
Points
1 KobeBryant _ PTS <- c
(2832 ,2430 ,2323 ,2201 ,1970 ,2078 ,1616 ,2133 ,83 ,782)
2 JoeJohnson _ PTS <- c
(1653 ,1426 ,1779 ,1688 ,1619 ,1312 ,1129 ,1170 ,1245 ,1154)
3 LeBronJames _ PTS <- c
(2478 ,2132 ,2250 ,2304 ,2258 ,2111 ,1683 ,2036 ,2089 ,1743)
4 CarmeloAnthony _ PTS <- c
(2122 ,1881 ,1978 ,1504 ,1943 ,1970 ,1245 ,1920 ,2112 ,966)
5 DwightHoward _ PTS <- c
(1292 ,1443 ,1695 ,1624 ,1503 ,1784 ,1113 ,1296 ,1297 ,646)
6 ChrisBosh _ PTS <- c
(1572 ,1561 ,1496 ,1746 ,1678 ,1438 ,1025 ,1232 ,1281 ,928)
7 ChrisPaul _ PTS <- c
(1258 ,1104 ,1684 ,1781 ,841 ,1268 ,1189 ,1186 ,1185 ,1564)
8 KevinDurant _ PTS <- c
(903 ,903 ,1624 ,1871 ,2472 ,2161 ,1850 ,2280 ,2593 ,686)
59
9 DerrickRose _ PTS <- c
(597 ,597 ,597 ,1361 ,1619 ,2026 ,852 ,0 ,159 ,904)
10 DwayneWade _ PTS <- c
(2040 ,1397 ,1254 ,2386 ,2045 ,1941 ,1082 ,1463 ,1028 ,1331)
Matrix
1 Points <- rbind ( KobeBryant _ PTS , JoeJohnson _ PTS , LeBronJames _
PTS , CarmeloAnthony _ PTS , DwightHoward _ PTS , ChrisBosh _ PTS ,
ChrisPaul _ PTS , KevinDurant _ PTS , DerrickRose _ PTS ,
DwayneWade _ PTS )
2 rm ( KobeBryant _ PTS , JoeJohnson _ PTS , LeBronJames _ PTS ,
CarmeloAnthony _ PTS , DwightHoward _ PTS , ChrisBosh _ PTS ,
ChrisPaul _ PTS , KevinDurant _ PTS , DerrickRose _ PTS ,
DwayneWade _ PTS )
3 colnames ( Points ) <- Seasons
4 rownames ( Points ) <- Players
1 Games
60
2005 2006 2007 2008 2009 2010 2011 2012
KobeBryant 80 77 82 82 73 82 58 78
JoeJohnson 82 57 82 79 76 72 60 72
LeBronJames 79 78 75 81 76 79 62 76
CarmeloAnthony 80 65 77 66 69 77 55 67
DwightHoward 82 82 82 79 82 78 54 76
ChrisBosh 70 69 67 77 70 77 57 74
ChrisPaul 78 64 80 78 45 80 60 70
KevinDurant 35 35 80 74 82 78 66 81
DerrickRose 40 40 40 81 78 81 39 0
DwayneWade 75 51 51 79 77 76 49 69
2013 2014
KobeBryant 6 35
JoeJohnson 79 80
LeBronJames 77 69
CarmeloAnthony 77 40
DwightHoward 71 41
ChrisBosh 79 44
ChrisPaul 62 82
KevinDurant 81 27
DerrickRose 10 51
DwayneWade 54 62
1 rownames ( Games )
1 colnames ( Games )
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
76
61
1 FieldGoals
1 FieldGoals / Games
62
2005 2006 2007 2008
KobeBryant 12.225000 10.558442 9.451220 9.756098
JoeJohnson 7.707317 9.403509 7.890244 7.848101
LeBronJames 11.075949 9.897436 10.586667 9.740741
CarmeloAnthony 9.450000 10.630769 9.454545 8.106061
DwightHoward 5.707317 6.414634 7.109756 7.088608
ChrisBosh 7.842857 7.869565 7.567164 7.987013
ChrisPaul 5.217949 5.953125 7.875000 8.089744
KevinDurant 8.742857 8.742857 7.337500 8.932432
DerrickRose 5.200000 5.200000 5.200000 7.086420
DwayneWade 9.320000 9.254902 8.607843 10.810127
2009 2010 2011 2012
KobeBryant 9.808219 9.024390 9.896552 9.461538
JoeJohnson 8.355263 7.138889 7.050000 6.180556
LeBronJames 10.105263 9.594937 10.016129 10.065789
CarmeloAnthony 9.971014 8.883117 8.018182 9.985075
DwightHoward 6.219512 7.935897 7.703704 6.184211
ChrisBosh 8.571429 6.805195 6.894737 6.554054
ChrisPaul 6.977778 5.375000 7.083333 5.885714
KevinDurant 9.682927 9.115385 9.742424 9.024691
DerrickRose 8.615385 8.777778 7.743590 NaN
DwayneWade 9.337662 9.105263 8.489796 8.246377
2013 2014
KobeBryant 5.166667 7.600000
JoeJohnson 5.848101 5.575000
LeBronJames 9.961039 9.043478
CarmeloAnthony 9.649351 8.950000
DwightHoward 6.661972 6.121951
ChrisBosh 6.227848 7.795455
ChrisPaul 6.548387 6.926829
KevinDurant 10.481481 8.814815
DerrickRose 5.800000 6.627451
DwayneWade 7.685185 8.209677
63
2005 2006 2007 2008 2009 2010 2011 2012
KobeBryant 12 11 9 10 10 9 10 9
JoeJohnson 8 9 8 8 8 7 7
6
LeBronJames 11 10 11 10 10 10 10 10
CarmeloAnthony 9 11 9 8 10 9 8 10
DwightHoward 6 6 7 7 6 8 8 6
ChrisBosh 8 8 8 8 9 7 7 7
ChrisPaul 5 6 8 8 7 5 7 6
KevinDurant 9 9 7 9 10 9 10 9
DerrickRose 5 5 5 7 9 9 8 NaN
DwayneWade 9 9 9 11 9 9 8 8
2013 2014
KobeBryant 5 8
JoeJohnson 6 6
LeBronJames 10 9
CarmeloAnthony 10 9
DwightHoward 7 6
ChrisBosh 6 8
ChrisPaul 7 7
KevinDurant 10 9
DerrickRose 6 7
DwayneWade 8 8
64
2005 2006 2007 2008 2009 2010 201 2012
KobeBryant 12.2 10.6 9.5 9.8 9.8 9.0 9.9 9.5
JoeJohnson 7.7 9.4 7.9 7.8 8.4 7.1 7.0 6.2
LeBronJames 11.1 9.9 10.6 9.7 10.1 9.6 10.0 10.1
CarmeloAnthony 9.4 10.6 9.5 8.1 10.0 8.9 8.0 10.0
DwightHoward 5.7 6.4 7.1 7.1 6.2 7.9 7.7 6.2
ChrisBosh 7.8 7.9 7.6 8.0 8.6 6.8 6.9 6.6
ChrisPaul 5.2 6.0 7.9 8.1 7.0 5.4 7.1 5.9
KevinDurant 8.7 8.7 7.3 8.9 9.7 9.1 9.7 9.0
DerrickRose 5.2 5.2 5.2 7.1 8.6 8.8 7.7 NaN
DwayneWade 9.3 9.3 8.6 10.8 9.3 9.1 8.5 8.2
2013 2014
KobeBryant 5.2 7.6
JoeJohnson 5.8 5.6
LeBronJames 10.0 9.0
CarmeloAnthony 9.6 8.9
DwightHoward 6.7 6.1
ChrisBosh 6.2 7.8
ChrisPaul 6.5 6.9
KevinDurant 10.5 8.8
DerrickRose 5.8 6.6
DwayneWade 7.7 8.2
1 MinutesPlayed / Games
65
2005 2006 2007 2008 2009
KobeBryant 40.96250 40.77922 38.92683 36.09756 38.83562
JoeJohnson 40.73171 41.38596 40.76829 39.54430 37.97368
LeBronJames 42.54430 40.89744 40.36000 37.70370 39.02632
CarmeloAnthony 36.76250 38.24615 36.44156 34.50000 38.17391
DwightHoward 36.84146 36.86585 37.65854 35.70886 34.67073
ChrisBosh 39.30000 38.52174 36.19403 38.02597 36.08571
ChrisPaul 36.00000 36.76562 37.57500 38.48718 38.04444
KevinDurant 35.85714 35.85714 34.60000 38.98649 39.50000
DerrickRose 29.20000 29.20000 29.20000 37.03704 36.80769
DwayneWade 38.56000 37.86275 38.31373 38.58228 36.25974
2010 2011 2012 2013 2014
KobeBryant 33.89024 38.48276 38.62821 29.50000 34.48571
JoeJohnson 35.47222 35.45000 36.69444 32.59494 34.88750
LeBronJames 38.77215 37.51613 37.85526 37.68831 36.13043
CarmeloAnthony 35.72727 34.10909 37.04478 38.72727 35.70000
DwightHoward 37.62821 38.33333 35.81579 33.74648 29.82927
ChrisBosh 36.29870 35.21053 33.16216 32.03797 35.36364
ChrisPaul 36.00000 36.35000 33.35714 35.01613 34.84146
KevinDurant 38.94872 38.57576 38.50617 38.54321 33.81481
DerrickRose 37.35802 35.25641 NaN 31.10000 30.00000
DwayneWade 37.14474 33.16327 34.65217 32.87037 31.79032
66
2005 2006 2007 2008 2009 2010 2011 2012
2013
KobeBryant 41 41 39 36 39 34 38 39
JoeJohnson 41 41 41 40 38 35 35 37
LeBronJames 43 41 40 38 39 39 38 38
CarmeloAnthony 37 38 36 34 38 36 34 37
DwightHoward 37 37 38 36 35 38 38 36
ChrisBosh 39 39 36 38 36 36 35 33
ChrisPaul 36 37 38 38 38 36 36 33
KevinDurant 36 36 35 39 40 39 39 39
DerrickRose 29 29 29 37 37 37 35 NaN
DwayneWade 39 38 38 39 36 37 33 35
2013 2014
KobeBryant 30 34
JoeJohnson 33 35
LeBronJames 38 36
CarmeloAnthony 39 36
DwightHoward 34 30
ChrisBosh 32 35
ChrisPaul 35 35
KevinDurant 39 34
DerrickRose 31 30
DwayneWade 35 32
2.3.1 Subsets
1 x <- c ( " a " ," b " ," c " ," d " ," e " )
2 x
abcde
ae
1 x [1]
67
a
1 Games
1 Games [ c (1 ,10) ,]
68
2005 2006 2007 2008 2009 2010 2011 2012
KobeBryant 80 77 82 82 73 82 58 78
DwayneWade 75 51 51 79 77 76 49 69
2013 2014
KobeBryant 6 35
DwayneWade 54 62
2008 2009
KobeBryant 82 73
JoeJohnson 79 76
LeBronJames 81 76
CarmeloAnthony 66 69
DwightHoward 79 82
ChrisBosh 77 70
ChrisPaul 78 45
KevinDurant 74 82
DerrickRose 81 78
DwayneWade 79 77
1 Games [1 ,]
1 Games [1 ,5]
73
69
FALSE
TRUE
1 Games [1 ,5 , drop = F ]
2009
KobeBryant 73
70
2.4 Matplot en R
La función “matplot()” grafica columnas contra filas de matrices
1 ? matplot
71
La funcion “legend()” te permite incluir una leyenda con la información
de un vector
1 legend ( " bottomleft " , inset =0.01 , legend = Players , col = c
(1:4 ,6) , pch =15:18 , horiz = F )
2 ? legend ()
1 FieldGoals
72
2005 2006 2007 2008 2009 2010 2011 2012
KobeBryant 978 813 775 800 716 740 574 738
JoeJohnson 632 536 647 620 635 514 423 445
LeBronJames 875 772 794 789 768 758 621 765
CarmeloAnthony 756 691 728 535 688 684 441 669
DwightHoward 468 526 583 560 510 619 416 470
ChrisBosh 549 543 507 615 600 524 393 485
ChrisPaul 407 381 630 631 314 430 425 412
KevinDurant 306 306 587 661 794 711 643 731
DerrickRose 208 208 208 574 672 711 302 0
DwayneWade 699 472 439 854 719 692 416 569
2013 2014
KobeBryant 31 266
JoeJohnson 462 446
LeBronJames 767 624
CarmeloAnthony 743 358
DwightHoward 473 251
ChrisBosh 492 343
ChrisPaul 406 568
KevinDurant 849 238
DerrickRose 58 338
DwayneWade 415 509
73
KobeBryant JoeJohnson LeBronJames
2005 12.225000 7.707317 11.075949
2006 10.558442 9.403509 9.897436
2007 9.451220 7.890244 10.586667
2008 9.756098 7.848101 9.740741
2009 9.808219 8.355263 10.105263
2010 9.024390 7.138889 9.594937
2011 9.896552 7.050000 10.016129
2012 9.461538 6.180556 10.065789
2013 5.166667 5.848101 9.961039
2014 7.600000 5.575000 9.043478
CarmeloAnthony DwightHoward ChrisBosh
2005 9.450000 5.707317 7.842857
2006 10.630769 6.414634 7.869565
2007 9.454545 7.109756 7.567164
2008 8.106061 7.088608 7.987013
2009 9.971014 6.219512 8.571429
2010 8.883117 7.935897 6.805195
2011 8.018182 7.703704 6.894737
2012 9.985075 6.184211 6.554054
2013 9.649351 6.661972 6.227848
2014 8.950000 6.121951 7.795455
ChrisPaul KevinDurant DerrickRose DwayneWade
2005 5.217949 8.742857 5.200000 9.320000
2006 5.953125 8.742857 5.200000 9.254902
2007 7.875000 7.337500 5.200000 8.607843
2008 8.089744 8.932432 7.086420 10.810127
2009 6.977778 9.682927 8.615385 9.337662
2010 5.375000 9.115385 8.777778 9.105263
2011 7.083333 9.742424 7.743590 8.489796
2012 5.885714 9.024691 NaN 8.246377
2013 6.548387 10.481481 5.800000 7.685185
2014 6.926829 8.814815 6.627451 8.209677
1 FieldGoals / Games
74
2005 2006 2007
KobeBryant 12.225000 10.558442 9.451220
JoeJohnson 7.707317 9.403509 7.890244
LeBronJames 11.075949 9.897436 10.586667
CarmeloAnthony 9.450000 10.630769 9.454545
DwightHoward 5.707317 6.414634 7.109756
ChrisBosh 7.842857 7.869565 7.567164
ChrisPaul 5.217949 5.953125 7.875000
KevinDurant 8.742857 8.742857 7.337500
DerrickRose 5.200000 5.200000 5.200000
DwayneWade 9.320000 9.254902 8.607843
2008 2009 2010
KobeBryant 9.756098 9.808219 9.024390
JoeJohnson 7.848101 8.355263 7.138889
LeBronJames 9.740741 10.105263 9.594937
CarmeloAnthony 8.106061 9.971014 8.883117
DwightHoward 7.088608 6.219512 7.935897
ChrisBosh 7.987013 8.571429 6.805195
ChrisPaul 8.089744 6.977778 5.375000
KevinDurant 8.932432 9.682927 9.115385
DerrickRose 7.086420 8.615385 8.777778
DwayneWade 10.810127 9.337662 9.105263
2011 2012 2013
KobeBryant 9.896552 9.461538 5.166667
JoeJohnson 7.050000 6.180556 5.848101
LeBronJames 10.016129 10.065789 9.961039
CarmeloAnthony 8.018182 9.985075 9.649351
DwightHoward 7.703704 6.184211 6.661972
ChrisBosh 6.894737 6.554054 6.227848
ChrisPaul 7.083333 5.885714 6.548387
KevinDurant 9.742424 9.024691 10.481481
DerrickRose 7.743590 NaN 5.800000
DwayneWade 8.489796 8.246377 7.685185
2014
KobeBryant 7.600000
JoeJohnson 5.575000
LeBronJames 9.043478
CarmeloAnthony 8.950000
DwightHoward 6.121951
ChrisBosh 7.795455
75
ChrisPaul 6.926829
KevinDurant 8.814815
DerrickRose 6.627451
DwayneWade 8.209677
1 matplot ( t ( FieldGoals / Fi eldGoa lAttem pts ) , type = " b " , pch =15:18 ,
col = c (1:4 ,6) )
2 legend ( " bottomleft " , inset =0.01 , legend = Players , col = c
(1:4 ,6) , pch =15:18 , horiz = F )
76
2005 2006 2007 2008 2009
KobeBryant 40.96250 40.77922 38.92683 36.09756 38.83562
JoeJohnson 40.73171 41.38596 40.76829 39.54430 37.97368
LeBronJames 42.54430 40.89744 40.36000 37.70370 39.02632
CarmeloAnthony 36.76250 38.24615 36.44156 34.50000 38.17391
DwightHoward 36.84146 36.86585 37.65854 35.70886 34.67073
ChrisBosh 39.30000 38.52174 36.19403 38.02597 36.08571
ChrisPaul 36.00000 36.76562 37.57500 38.48718 38.04444
KevinDurant 35.85714 35.85714 34.60000 38.98649 39.50000
DerrickRose 29.20000 29.20000 29.20000 37.03704 36.80769
DwayneWade 38.56000 37.86275 38.31373 38.58228 36.25974
2010 2011 2012 2013 2014
KobeBryant 33.89024 38.48276 38.62821 29.50000 34.48571
JoeJohnson 35.47222 35.45000 36.69444 32.59494 34.88750
LeBronJames 38.77215 37.51613 37.85526 37.68831 36.13043
CarmeloAnthony 35.72727 34.10909 37.04478 38.72727 35.70000
DwightHoward 37.62821 38.33333 35.81579 33.74648 29.82927
ChrisBosh 36.29870 35.21053 33.16216 32.03797 35.36364
ChrisPaul 36.00000 36.35000 33.35714 35.01613 34.84146
KevinDurant 38.94872 38.57576 38.50617 38.54321 33.81481
DerrickRose 37.35802 35.25641 NaN 31.10000 30.00000
DwayneWade 37.14474 33.16327 34.65217 32.87037 31.79032
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
3277 3140 3192 2960 2835 2779 2232 3013 177 1207
77
Recordemos que al extraer una sola fila de una matriz esencialmente lo
estamos convirtiendo en un vector
1 matplot ( t ( Data ) , type = " b " , pch =15:18 , col = c (1:4 ,6) )
2 legend ( " bottomleft " , inset =0.01 , legend = Players [1] , col = c
(1:4 ,6) , pch =15:18 , horiz = F )
1 Data2
78
2005 2006 2007 2008 2009 2010 2011 2012
KobeBryant 3277 3140 3192 2960 2835 2779 2232 3013
2013 2014
KobeBryant 177 1207
79
2.5 Funciones en R
Con la indicaci\’on "function(){}" puedes crear tus propias
funciones. Esta se recomienda emplear siempre y cuando una
tarea se vaya a ejecutar m\’ultiples ocasiones, con la finalidad
de no copiar, pegar y modificar cada vez el c\’odigo
80
1 myplot2 <- function ( rows ) {
2 Data2 <- MinutesPlayed [ rows , , drop = F ]
3 matplot ( t ( Data2 ) , type = " b " , pch =15:18 , col = c (1:4 ,6) )
4 legend ( " bottomleft " , inset =0.01 , legend = Players [ rows ] , col =
c (1:4 ,6) , pch =15:18 , horiz = F )
5 }
6
1 myplot2 (1:10)
1 myplot2 ( c (1 ,3) )
81
El número de variables que podemos definir dentro de los parámetros de
la función le dará una mayor utilidad.
1 myplot3 <- function ( data , rows ) {
2 Data2 <- data [ rows , , drop = F ]
3 matplot ( t ( Data2 ) , type = " b " , pch =15:18 , col = c (1:4 ,6) )
4 legend ( " bottomleft " , inset =0.01 , legend = Players [ rows ] , col =
c (1:4 ,6) , pch =15:18 , horiz = F )
5 }
6
7 myplot3 ( Salary , 1:2)
82
Se pueden definir parámetros fijos en la función los cuales se pueden
reescribir.
1 myplot4 <- function ( data , rows =1:10) {
2 Data2 <- data [ rows , , drop = F ]
3 matplot ( t ( Data2 ) , type = " b " , pch =15:18 , col = c (1:4 ,6) )
4 legend ( " bottomleft " , inset =0.01 , legend = Players [ rows ] , col =
c (1:4 ,6) , pch =15:18 , horiz = F )
5 }
6
7 myplot4 ( Salary )
83
1 myplot4 ( MinutesPlayed )
84
1 myplot4 ( MinutesPlayed / Games , 3)
85
2.5.1 Uso de funciones
Para emplear una función de la mejor manera, se recomienda definir las
variables que la compongan. En el caso particular que exponemos a contin-
uación estamos definiendo el set de datos, las filas y las columnas que se van
a graficar. Básicamente esta función te permitirá en una sola lı́nea el acceder
a toda la información que tienes en tu ambiente global.
1 myplot5 <- function ( data , rows =1:10 , cols =1:10) {
2 Data2 <- data [ rows , cols , drop = F ]
3 matplot ( t ( Data2 ) , type = " b " , pch =15:18 , col = c (1:4 ,6) )
4 legend ( " bottomleft " , inset =0.01 , legend = Players [ rows ] , col =
c (1:4 ,6) , pch =15:18 , horiz = F )
5 }
Salary
1 myplot5 ( Salary )
86
1 myplot5 ( Salary / Games )
87
1 myplot5 ( Salary , cols = 1:7)
88
1 myplot5 ( Salary / FieldGoals , cols = 1:7)
In-Game Metrics
1 myplot5 ( MinutesPlayed )
89
1 myplot5 ( Points )
90
1 myplot5 ( FieldGoals / Fi eldGoa lAttem pts )
91
1 myplot5 ( Points / Games )
Interesting Observation
1 myplot5 ( MinutesPlayed / Games )
92
1 myplot5 ( Games )
Time is value
1 myplot5 ( FieldGoals / MinutesPlayed )
93
Player Style
1 myplot5 ( Points / FieldGoals )
94
2.6 Manejo de Data Frames en R vol 1
1 install . packages ( " dplyr " )
2 library ( dplyr )
Importar un archivo
1 ? read . csv ()
95
Método 2: imponer tu directorio de trabajo Working Directory(WD)
para leer y salvar archivos
1 getwd ()
• Windows
setwd("E:/User/PathToFolder/D\’ia 5")
• Mac
setwd("/User/PathToFolder/D\’ia 5")
getwd()
1 rm ( estadisticas )
2 estadisticas <- read . csv ( " DemographicData . csv " )
195
195 5
96
La función “head()” te muestra las primeras seis filas del Data Frame.
1 head ( estadisticas )
97
La función “tail()” te muestra las ultimas 6 filas del Data Frame.
1 tail ( estadisticas )
98
Income.Group : Factor w/ 4 levels ”High income”,..: 1 2 4 4 1 1
3 1 1 1 ...
Internet.users Income.Group
Min. : 0.90 High income :67
1st Qu.:14.52 Low income :30
Median :41.00 Lower middle income:50
Mean :42.08 Upper middle income:48
3rd Qu.:66.22
Max. :96.55
99
El poder del signo de "$" en R
1 head ( estadisticas )
45.985
45.985
100
78.90000 5.90000 19.10000 57.20000 88.00000 59.90000 41.90000
63.40000 83.00000 80.61880 58.70000 1.30000 82.17020 4.90000
9.10000 6.63000 53.06150 90.00004 (......) 72.00000 57.79000
54.17000 33.60000 95.30000 36.94000 43.90000 11.30000 46.60000
15.30000 20.00000 46.50000 2.20000 15.40000 18.50000
1 78.90000
2 5.90000
3 19.10000
4 57.20000
5 88.00000
6 59.90000
... ...
193 2.20000
194 15.40000
195 18.50000
1 78.90000
2 5.90000
3 19.10000
4 57.20000
5 88.00000
6 59.90000
... ...
193 2.20000
194 15.40000
195 18.50000
101
Un subsecuente [] nos va a aislar un solo valor de la columna,
b\’asicamente nos ofrece el vector correspondiente a la fila o
la columna
5.9
78.90000 5.90000 19.10000 57.20000 88.00000 59.90000 41.90000
63.40000 83.00000 80.61880 58.70000 1.30000 82.17020 4.90000
9.10000 6.63000 53.06150 90.00004 (......) 72.00000 57.79000
54.17000 33.60000 95.30000 36.94000 43.90000 11.30000 46.60000
15.30000 20.00000 46.50000 2.20000 15.40000 18.50000
102
Country.Name Country.Code Birth.rate Internet.users Income.Group
3 Angola AGO 45.985 19.1 Upper middle income
4 Albania ALB 12.877 57.2 Upper middle income
5 United Arab Emirates ARE 11.044 88.0 High income
6 Argentina ARG 17.716 59.9 High income
7 Armenia ARM 13.308 41.9 Lower middle income
8 Antigua and Barbuda ATG 16.447 63.4 High income
9 Australia AUS 13.200 83.0 High income
Estamos aislando una fila y esta fila tiene “colnames” razón por la cual
lo identifica como Data Frame.
1 is . data . frame ( estadisticas [1 ,])
2 estadisticas [ ,1]
3 is . data . frame ( estadisticas [ ,1]) # Cual creen que sea el
resultado de esta funci \ ’ on ?
TRUE
FALSE
103
La razón del resultado FALSE de “is.data.frame(estadisticas[,1])” es que
en esta ocasión lo que estamos extrayendo es una sola columna con un solo
nombre con múltiples filas, R esto automáticamente lo asocia a un vector.
Al agregar la opción “drop=F” recuerden era la forma de mantener la
información como Data frame.
1 estadisticas [ ,1 , drop = F ]
2 is . data . frame ( estadisticas [ ,1 , drop = F ])
Country.Name
1 Aruba
2 Afghanistan
3 Angola
...
193 Congo, Dem. Rep.
194 Zambia
195 Zimbabwe
FALSE
Agregar columnas es tan simple como definir una nueva variable que
reescriba sobre el Data Frame que estamos trabajando. El nombre de la
columna lo definimos después de signo de $
1 estadisticas $ MyCalcPeso <- estadisticas $ Birth . rate *
estadisticas $ Internet . users
104
1 estadisticas <- mutate ( estadisticas ,
2 MyCalcdplyr = estadisticas $ Birth . rate
* estadisticas $ Internet . users )
Prueba
105
Eliminar columnas
106
2.7 Manejo de Data Frames en R vol 2
1 head ( estadisticas )
2 estadisticas $ Internet . users < 2
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Con esta simple indicación podemos hacer un loop “for” que va a recorrer
todos los espacios de la columna para determinar si el argumento es T o F
1 filter <- estadisticas $ Internet . users < 2
2 estadisticas [ filter ,]
107
53 Eritrea ERI 34.800 0.9 Low income
56 Ethiopia ETH 32.925 1.9 Low income
65 Guinea GIN 37.337 1.6 Low income
118 Myanmar MMR 18.119 1.6 Lower middle income
128 Niger NER 49.661 1.7 Low income
155 Sierra Leone SLE 36.729 1.7 Low income
157 Somalia SOM 43.891 1.5 Low income
173 Timor-Leste TLS 35.755 1.1 Lower middle income
Con lo anterior podemos aislar información que nos puede ser determinante
para generar un análisis dirigido.
9 5
12 5
108
129 Nigeria NGA 40.045 38.0 Lower middle income
157 Somalia SOM 43.891 1.5 Low income
168 Chad TCD 45.745 2.3 Low income
179 Uganda UGA 43.474 16.2 Low income
193 Congo, Dem. Rep. COD 42.394 2.2 Low income
194 Zambia ZMB 40.471 15.4 Lower middle income
109
68 Equatorial Guinea GNQ 35.362 16.40000 High income
69 Greece GRC 8.500 59.86630 High income
71 Greenland GRL 14.500 65.80000 High income
73 Guam GUM 17.389 65.40000 High income
75 Hong Kong SAR, China HKG 7.900 74.20000 High income
77 Croatia HRV 9.400 66.74760 High income
79 Hungary HUN 9.200 72.64390 High income
82 Ireland IRL 15.000 78.24770 High income
85 Iceland ISL 13.400 96.54680 High income
86 Israel ISR 21.300 70.80000 High income
87 Italy ITA 8.500 58.45930 High income
90 Japan JPN 8.200 89.71000 High income
96 Korea, Rep. KOR 8.600 84.77000 High income
97 Kuwait KWT 20.575 75.46000 High income
103 Liechtenstein LIE 9.200 93.80000 High income
106 Lithuania LTU 10.100 68.45290 High income
107 Luxembourg LUX 11.300 93.77650 High income
108 Latvia LVA 10.200 75.23440 High income
109 Macao SAR, China MAC 11.256 65.80000 High income
117 Malta MLT 9.500 68.91380 High income
127 New Caledonia NCL 17.000 66.00000 High income
131 Netherlands NLD 10.200 93.95640 High income
132 Norway NOR 11.600 95.05340 High income
134 New Zealand NZL 13.120 82.78000 High income
135 Oman OMN 20.419 66.45000 High income
141 Poland POL 9.600 62.84920 High income
142 Puerto Rico PRI 10.800 73.90000 High income
143 Portugal PRT 7.900 62.09560 High income
145 French Polynesia PYF 16.393 56.80000 High income
146 Qatar QAT 11.940 85.30000 High income
148 Russian Federation RUS 13.200 67.97000 High income
150 Saudi Arabia SAU 20.576 60.50000 High income
153 Singapore SGP 9.300 81.00000 High income
162 Slovak Republic SVK 10.100 77.88260 High income
163 Slovenia SVN 10.200 72.67560 High income
164 Sweden SWE 11.800 94.78360 High income
166 Seychelles SYC 18.600 50.40000 High income
175 Trinidad and Tobago TTO 14.590 63.80000 High income
110
181 Uruguay URY 14.374 57.69000 High income
182 United States USA 12.500 84.20000 High income
185 Venezuela, RB VEN 19.842 54.90000 High income
186 Virgin Islands (U.S.) VIR 10.700 45.30000 High income
111
2.8 Visualización de Data Frames en R
1 install . packages ( " ggplot2 " )
2 library ( ggplot2 )
3 ? qplot
112
Lo interesante de la indicación “colour” es que podemos seleccionar un
grupo de valores para definir su coloración. En nuestro caso, podrı́an ser los
“levels” de alguno de nuestro Data Frame.
1 qplot ( data = estadisticas , x = Internet . users , y = Birth . rate ,
2 colour = Income . Group , size = I (5) )
113
2.8.2 Enriquecimiento de Data Frames en R
setwd("E:/Ususario/PathToFolder/D\’ia 5")
# estadisticas <- read.csv(file.choose())
1 Countries _ 2012 _ Dataset <- c ( " Aruba " ," Afghanistan " ," Angola " ,"
Albania " ," United Arab Emirates " ," Argentina " ," Armenia " ,"
Antigua and Barbuda " ," Australia " ," Austria " ," Azerbaijan " ,"
Burundi " ," Belgium " ," Benin " ," Burkina Faso " ," Bangladesh " ,"
Bulgaria " ," Bahrain " ," Bahamas , The " ," Bosnia and Herzegovina
" ," Belarus " ," Belize " ," Bermuda " ," Bolivia " ," Brazil " ,"
Barbados " ," Brunei Darussalam " ," Bhutan " ," Botswana " ," Central
African Republic " ," Canada " ," Switzerland " ," Chile " ," China " ,
" Cote d ’ Ivoire " ," Cameroon " ," Congo , Rep . " ," Colombia " ,"
Comoros " ," Cabo Verde " ," Costa Rica " ," Cuba " ," Cayman Islands "
," Cyprus " ," Czech Republic " ," Germany " ," Djibouti " ," Denmark " ,
" Dominican Republic " ," Algeria " ," Ecuador " ," Egypt , Arab Rep .
" ," Eritrea " ," Spain " ," Estonia " ," Ethiopia " ," Finland " ," Fiji " ,
" France " ," Micronesia , Fed . Sts . " ," Gabon " ," United Kingdom " ,
" Georgia " ," Ghana " ," Guinea " ," Gambia , The " ," Guinea - Bissau " ,"
Equatorial Guinea " ," Greece " ," Grenada " ," Greenland " ,"
Guatemala " ," Guam " ," Guyana " ," Hong Kong SAR , China " ,"
Honduras " ," Croatia " ," Haiti " ," Hungary " ," Indonesia " ," India " ,
" Ireland " ," Iran , Islamic Rep . " ," Iraq " ," Iceland " ," Israel " ,"
Italy " ," Jamaica " ," Jordan " ," Japan " ," Kazakhstan " ," Kenya " ,"
Kyrgyz Republic " ," Cambodia " ," Kiribati " ," Korea , Rep . " ,"
Kuwait " ," Lao PDR " ," Lebanon " ," Liberia " ," Libya " ," St . Lucia " ,
" Liechtenstein " ," Sri Lanka " ," Lesotho " ," Lithuania " ,"
Luxembourg " ," Latvia " ," Macao SAR , China " ," Morocco " ," Moldova
" ," Madagascar " ," Maldives " ," Mexico " ," Macedonia , FYR " ," Mali "
," Malta " ," Myanmar " ," Montenegro " ," Mongolia " ," Mozambique " ,"
Mauritania " ," Mauritius " ," Malawi " ," Malaysia " ," Namibia " ," New
Caledonia " ," Niger " ," Nigeria " ," Nicaragua " ," Netherlands " ,"
Norway " ," Nepal " ," New Zealand " ," Oman " ," Pakistan " ," Panama " ,"
Peru " ," Philippines " ," Papua New Guinea " ," Poland " ," Puerto
Rico " ," Portugal " ," Paraguay " ," French Polynesia " ," Qatar " ,"
Romania " ," Russian Federation " ," Rwanda " ," Saudi Arabia " ,"
Sudan " ," Senegal " ," Singapore " ," Solomon Islands " ," Sierra
Leone " ," El Salvador " ," Somalia " ," Serbia " ," South Sudan " ," Sao
Tome and Principe " ," Suriname " ," Slovak Republic " ," Slovenia
" ," Sweden " ," Swaziland " ," Seychelles " ," Syrian Arab Republic "
," Chad " ," Togo " ," Thailand " ," Tajikistan " ," Turkmenistan " ,"
Timor - Leste " ," Tonga " ," Trinidad and Tobago " ," Tunisia " ,"
Turkey " ," Tanzania " ," Uganda " ," Ukraine " ," Uruguay " ," United
States " ," Uzbekistan " ," St . Vincent and the Grenadines " ,"
114
Venezuela , RB " ," Virgin Islands ( U . S .) " ," Vietnam " ," Vanuatu "
," West Bank and Gaza " ," Samoa " ," Yemen , Rep . " ," South Africa "
," Congo , Dem . Rep . " ," Zambia " ," Zimbabwe " )
2 Codes _ 2012 _ Dataset <- c ( " ABW " ," AFG " ," AGO " ," ALB " ," ARE " ," ARG " ,"
ARM " ," ATG " ," AUS " ," AUT " ," AZE " ," BDI " ," BEL " ," BEN " ," BFA " ," BGD "
," BGR " ," BHR " ," BHS " ," BIH " ," BLR " ," BLZ " ," BMU " ," BOL " ," BRA " ,"
BRB " ," BRN " ," BTN " ," BWA " ," CAF " ," CAN " ," CHE " ," CHL " ," CHN " ," CIV "
," CMR " ," COG " ," COL " ," COM " ," CPV " ," CRI " ," CUB " ," CYM " ," CYP " ,"
CZE " ," DEU " ," DJI " ," DNK " ," DOM " ," DZA " ," ECU " ," EGY " ," ERI " ," ESP "
," EST " ," ETH " ," FIN " ," FJI " ," FRA " ," FSM " ," GAB " ," GBR " ," GEO " ,"
GHA " ," GIN " ," GMB " ," GNB " ," GNQ " ," GRC " ," GRD " ," GRL " ," GTM " ," GUM "
," GUY " ," HKG " ," HND " ," HRV " ," HTI " ," HUN " ," IDN " ," IND " ," IRL " ,"
IRN " ," IRQ " ," ISL " ," ISR " ," ITA " ," JAM " ," JOR " ," JPN " ," KAZ " ," KEN "
," KGZ " ," KHM " ," KIR " ," KOR " ," KWT " ," LAO " ," LBN " ," LBR " ," LBY " ,"
LCA " ," LIE " ," LKA " ," LSO " ," LTU " ," LUX " ," LVA " ," MAC " ," MAR " ," MDA "
," MDG " ," MDV " ," MEX " ," MKD " ," MLI " ," MLT " ," MMR " ," MNE " ," MNG " ,"
MOZ " ," MRT " ," MUS " ," MWI " ," MYS " ," NAM " ," NCL " ," NER " ," NGA " ," NIC "
," NLD " ," NOR " ," NPL " ," NZL " ," OMN " ," PAK " ," PAN " ," PER " ," PHL " ,"
PNG " ," POL " ," PRI " ," PRT " ," PRY " ," PYF " ," QAT " ," ROU " ," RUS " ," RWA "
," SAU " ," SDN " ," SEN " ," SGP " ," SLB " ," SLE " ," SLV " ," SOM " ," SRB " ,"
SSD " ," STP " ," SUR " ," SVK " ," SVN " ," SWE " ," SWZ " ," SYC " ," SYR " ," TCD "
," TGO " ," THA " ," TJK " ," TKM " ," TLS " ," TON " ," TTO " ," TUN " ," TUR " ,"
TZA " ," UGA " ," UKR " ," URY " ," USA " ," UZB " ," VCT " ," VEN " ," VIR " ," VNM "
," VUT " ," PSE " ," WSM " ," YEM " ," ZAF " ," COD " ," ZMB " ," ZWE " )
3 Regions _ 2012 _ Dataset <- c ( " The Americas " ," Asia " ," Africa " ,"
Europe " ," Middle East " ," The Americas " ," Asia " ," The Americas "
," Oceania " ," Europe " ," Asia " ," Africa " ," Europe " ," Africa " ,"
Africa " ," Asia " ," Europe " ," Middle East " ," The Americas " ,"
Europe " ," Europe " ," The Americas " ," The Americas " ," The
Americas " ," The Americas " ," The Americas " ," Asia " ," Asia " ,"
Africa " ," Africa " ," The Americas " ," Europe " ," The Americas " ,"
Asia " ," Africa " ," Africa " ," Africa " ," The Americas " ," Africa " ,"
Africa " ," The Americas " ," The Americas " ," The Americas " ,"
Europe " ," Europe " ," Europe " ," Africa " ," Europe " ," The Americas "
," Africa " ," The Americas " ," Africa " ," Africa " ," Europe " ,"
Europe " ," Africa " ," Europe " ," Oceania " ," Europe " ," Oceania " ,"
Africa " ," Europe " ," Asia " ," Africa " ," Africa " ," Africa " ," Africa
" ," Africa " ," Europe " ," The Americas " ," The Americas " ," The
Americas " ," Oceania " ," The Americas " ," Asia " ," The Americas " ,"
Europe " ," The Americas " ," Europe " ," Asia " ," Asia " ," Europe " ,"
Middle East " ," Middle East " ," Europe " ," Middle East " ," Europe "
," The Americas " ," Middle East " ," Asia " ," Asia " ," Africa " ," Asia
" ," Asia " ," Oceania " ," Asia " ," Middle East " ," Asia " ," Middle
East " ," Africa " ," Africa " ," The Americas " ," Europe " ," Asia " ,"
Africa " ," Europe " ," Europe " ," Europe " ," Asia " ," Africa " ," Europe
115
" ," Africa " ," Asia " ," The Americas " ," Europe " ," Africa " ," Europe
" ," Asia " ," Europe " ," Asia " ," Africa " ," Africa " ," Africa " ,"
Africa " ," Asia " ," Africa " ," Oceania " ," Africa " ," Africa " ," The
Americas " ," Europe " ," Europe " ," Asia " ," Oceania " ," Middle East "
," Asia " ," The Americas " ," The Americas " ," Asia " ," Oceania " ,"
Europe " ," The Americas " ," Europe " ," The Americas " ," Oceania " ,"
Middle East " ," Europe " ," Europe " ," Africa " ," Middle East " ,"
Africa " ," Africa " ," Asia " ," Oceania " ," Africa " ," The Americas " ,
" Africa " ," Europe " ," Africa " ," Africa " ," The Americas " ," Europe
" ," Europe " ," Europe " ," Africa " ," Africa " ," Middle East " ,"
Africa " ," Africa " ," Asia " ," Asia " ," Asia " ," Asia " ," Oceania " ,"
The Americas " ," Africa " ," Europe " ," Africa " ," Africa " ," Europe "
," The Americas " ," The Americas " ," Asia " ," The Americas " ," The
Americas " ," The Americas " ," Asia " ," Oceania " ," Middle East " ,"
Oceania " ," Middle East " ," Africa " ," Africa " ," Africa " ," Africa "
)
(c) Kirill Eremenko, www.superdatascience.com
1 mydatfr <- data . frame ( Countries _ 2012 _ Dataset , Codes _ 2012 _
Dataset , Regions _ 2012 _ Dataset )
2 head ( mydatfr )
3 colnames ( mydatfr ) <- c ( " Country " , " Code " , " Region " )
4 head ( mydatfr )
5 rm ( mydatfr )
6
1 head ( estadisticas )
2 head ( mydatfr )
116
by.y = que parámetro de la segunda matriz se selecciona
1 Formas
1 qplot ( data = union , x = Internet . users , y = Birth . rate , colour =
Region , size = I (5) , shape = I (17) )
2 qplot ( data = union , x = Internet . users , y = Birth . rate , colour =
Region , size = I (5) , shape = I (2) )
2 Transparencia
1 qplot ( data = union , x = Internet . users , y = Birth . rate , colour =
Region , size = I (5) , shape = I (19) , alpha = I (0.6) )
3 Titulo
1 qplot ( data = union , x = Internet . users , y = Birth . rate , colour =
Region , size = I (5) , shape = I (19) , alpha = I (0.6) , main = " Birth
rate vs Internet Users " )
117
2.9 Ejercicio Manejo de Data Frames en R
1 library ( dplyr )
2
3 getwd ()
setwd("E:/Usuario/PathToFolder/D\’ia 5")
353
1 ncol ( stats )
16
1 dim ( stats )
353 16
118
1 View ( stats )
1 str ( stats )
1 summary ( stats )
119
Tags Name logFC
[DOWN]:191 XXXX_00037: 1 Min. :-7.7076
[UP] :162 XXXX_00056: 1 1st Qu.:-1.5731
XXXX_00081: 1 Median :-1.0440
XXXX_00082: 1 Mean :-0.2834
XXXX_00132: 1 3rd Qu.: 1.2732
XXXX_00156: 1 Max. : 6.4324
(Other) :347
(............................................)
carbohydrate metabolic process; hydrolase activity,
hydrolyzing O-glycosyl compounds: 9
catalytic activity
oxidoreductase activity; oxidation-reduction process
(Other)
120
11 [DOWN] XXXX_00254 -2.141944 3.8140550 1.060000e-32
12 [DOWN] XXXX_00262 -1.151004 6.1700215 3.110000e-38
15 [DOWN] XXXX_00336 -1.354724 9.3118920 1.600000e-16
16 [DOWN] XXXX_00337 -1.788649 1.4505150 4.500000e-05
18 [DOWN] XXXX_00374 -1.595040 6.6360524 1.250000e-46
(......................................................)
123 [DOWN] XXXX_03115 -5.367095 3.8136448 5.370000e-70
124 [DOWN] XXXX_03116 -6.081473 3.4447581 3.550000e-54
125 [DOWN] XXXX_03117 -7.459759 1.6833833 9.170000e-21
127 [DOWN] XXXX_03148 -1.278143 0.9691055 1.608155e-02
PValue Description
2 9.180000e-39 putative neutral amino acid permease
5 1.840000e-16 L-aminoadipate-semialdehyde dehydrogenase
9 1.680000e-75 hypothetical protein
11 3.110000e-34 hypothetical protein
12 7.340000e-40 hypothetical protein
15 1.000000e-17 hypothetical protein
16 9.080000e-06 hypothetical protein
18 2.330000e-48 putative allantoate and ureidosuccinate
(.....................................................)
123 5.560000e-72 transmembrane transport
124 5.350000e-56 no GO terms
125 4.440000e-22 transferase activity
127 5.931740e-03 ATPase activity;
ó con dplyr
1 library ( dplyr )
2 filter ( stats , logFC < 1)
121
15 [DOWN] UMAG_00336 -1.354724 9.3118920 1.600000e-16
16 [DOWN] UMAG_00337 -1.788649 1.4505150 4.500000e-05
18 [DOWN] UMAG_00374 -1.595040 6.6360524 1.250000e-46
(......................................................)
123 [DOWN] UMAG_03115 -5.367095 3.8136448 5.370000e-70
124 [DOWN] UMAG_03116 -6.081473 3.4447581 3.550000e-54
125 [DOWN] UMAG_03117 -7.459759 1.6833833 9.170000e-21
127 [DOWN] UMAG_03148 -1.278143 0.9691055 1.608155e-02
PValue Description
2 9.180000e-39 putative neutral amino acid permease
5 1.840000e-16 L-aminoadipate-semialdehyde dehydrogenase
9 1.680000e-75 hypothetical protein
11 3.110000e-34 hypothetical protein
12 7.340000e-40 hypothetical protein
15 1.000000e-17 hypothetical protein
16 9.080000e-06 hypothetical protein
18 2.330000e-48 putative allantoate and ureidosuccinate
(.....................................................)
123 5.560000e-72 transmembrane transport
124 5.350000e-56 no GO terms
125 4.440000e-22 transferase activity
127 5.931740e-03 ATPase activity;
122
1 C <- stats [ grep ( " C " , stats $ GO _ IDs _ C .. F .. P .) , ]
2 f <- stats [ grep ( " F " , stats $ GO _ IDs _ C .. F .. P .) , ]
3 P <- stats [ grep ( " P " , stats $ GO _ IDs _ C .. F .. P .) , ]
4
5 stats [ stats $ Tags == " [ UP ] " ,]
123
1 UP <- stats [ stats $ Tags == " [ UP ] " ,]
2 DOWN <- stats [ stats $ Tags == " [ DOWN ] " ,]
3
4 View ( DOWN )
124
2.10 Graficación en R
1 getwd ()
setwd("E:/User/PathToFolder/Dia 5")
1 colnames ( movies ) <- c ( " Film " , " Genre " , " CriticRating " , "
AudienceRating " , " BudgetMillions " , " Year " )
2 str ( movies )
125
1 levels ( movies $ Genre )
1 summary ( movies )
Film Genre CriticRating
(500) Days of Summer :1 Action:154 Min.: 0.0
10,000 B.C. : 1 Adventure: 29 1st Qu.:25.0
12 Rounds : 1 Comedy :172 Median :46.0
127 Hours : 1 Drama :101 Mean :47.4
17 Again : 1 Horror : 49 3rd Qu.:70.0
2012 : 1 Romance : 21 Max. :97.0
(Other) :556 Thriller : 36
2009 2008 2009 2010 2009 2009 2008 2007 2011 2011 2007 2011
2010 2009 2011 2011 2007 2009 2011 2010 2007 2009 2009 2010
2009 2007 2009 2011 2011 2008 2009 2011 2008
(......................................................) [529] 2007 2010 2008 2011
2011 2009 2011 2011 2007 2008 2008 2011 2009 2010 2009 2009
2009 2010 2007 2010 2009 2011 2009 2008 2010 2010 2008 2010
2011 2009 2008 2007 2009 2011 Levels: 2007 2008 2009 2010 2011
126
Film Genre CriticRating
(500) Days of Summer :1 Action:154 Min.: 0.0
10,000 B.C. : 1 Adventure: 29 1st Qu.:25.0
12 Rounds : 1 Comedy :172 Median :46.0
127 Hours : 1 Drama :101 Mean :47.4
17 Again : 1 Horror : 49 3rd Qu.:70.0
2012 : 1 Romance : 21 Max. :97.0
(Other) :556 Thriller : 36
1 str ( movies )
2.10.1 Aspecto
1 library ( ggplot2 )
127
1 ggplot ( data = movies , aes ( x = CriticRating , y = AudienceRating ) )
Agregar la geometrı́a
1 ggplot ( data = movies , aes ( x = CriticRating , y = AudienceRating ) ) +
2 geom _ point ()
128
Agregar parámetros estadı́sticos a la gráfica (ajuste de lı́nea recta)
1 ggplot ( data = movies , aes ( x = CriticRating , y = AudienceRating ) ) +
2 geom _ point () + geom _ smooth ( method = ’ lm ’)
129
Agrega color
1 ggplot ( data = movies , aes ( x = CriticRating , y = AudienceRating ,
colour = Genre ) ) + geom _ point ()
Agregar tamaño
1 ggplot ( data = movies , aes ( x = CriticRating , y = AudienceRating ,
colour = Genre , size = Genre ) ) + geom _ point ()
130
Al definir el color y el tamaño de las gráficas con diferentes valores, la
forma en que se presenta la información puede ser más digerible
1 ggplot ( data = movies , aes ( x = CriticRating , y = AudienceRating ,
colour = Genre , size = Year ) ) + geom _ point ()
131
2.10.2 Grafica por capas
132
Agrega una gráfica de puntos a la variable p
1 p + geom _ point ()
o de lineas
1 p + geom _ line ()
133
Agrega Múltiples capas
1 p + geom _ point () + geom _ line ()
134
2.10.3 Sobrescribe los estéticos de la gráfica
135
Sobrescribe aes
Ejemplo 1:
1 q + geom _ point ( aes ( size = Genre ) )
Ejemplo 2:
1 q + geom _ point ( aes ( size = Year ) ) + labs ( size = " Year " )
136
q sigue integro
1 q + geom _ point ()
Ejemplo 3
La función “xlab” te permite modificar el texto del eje x
1 q + geom _ point ( aes ( x = BudgetMillions ) ) + xlab ( " Budget Millions
$$$")
137
Ejemplo 4
1 q + geom _ line () + geom _ point ()
138
2.10.4 Mapeo vs. Ajuste
139
Agregando color
1. Mapeo
Estamos tomando el valor de los ”levels” de la columna como referencia
1 r + geom _ point ( aes ( colour = Genre ) )
2. Ajuste
Error en la función si se define “colour=”Green” en la función “aes()”
básicamente estamos utilizando a “DarkGreen” como variable en la gráfica.
1 r + geom _ point ( colour = " Green " )
2 # r + geom _ point ( aes ( colour =" Green ") ) ERROR
140
Agregando tamaño
1. Mapeo
1 r + geom _ point ( aes ( size = BudgetMillions ) )
Ajuste
1 r + geom _ point ( size =10)
2 # ERROR -> r + geom _ point ( aes ( size =10) )
141
2.10.5 Histogramas y Gráficos de densidad
Al definir solo el valor de x para graficar, automáticamente “ggplot()” genera
un Histograma.
“binwidth” es un valor que debemos siempre definir para obtener una
cobertura adecuada de nuestra gráfica.
1 s <- ggplot ( data = movies , aes ( x = BudgetMillions ) )
2 s + geom _ histogram ( binwidth =10)
142
Agregar color: Ajuste vs Mapeo
1 s + geom _ histogram ( binwidth =10 , fill = " Green " )
143
Agregar un borde
“colour” en “geom histogram()” determina los bordes
1 s + geom _ histogram ( binwidth =10 , aes ( fill = Genre ) , colour = "
Black " )
144
1 s + geom _ histogram ( binwidth =20 , aes ( fill = Genre ) , colour = "
Black " )
145
position=“stack” te permite traslapar de menor a mayor
1 s + geom _ density ( aes ( fill = Genre ) , position = " stack " )
Cada tipo de gráfica tiene un sin fin de opciones que se pueden definir
para personalizar el resultado.
1 ? geom _ density ()
146
2.11 Ejercicio Graficación en R
setwd("E:/User/PathToFolder/Dia 6")
1 getwd ()
1 summary ( Data )
147
DayOfWeek Month Day Time start
Fri:40 Nov :24 Min. : 1.00 7:00 AM: 11
Mon:40 May :22 1st Qu.: 8.00 6:56 AM: 9
Sat: 2 Aug :21 Median :16.00 6:59 AM: 9
Sun: 1 Jan :19 Mean :15.84 7:01 AM: 9
Thu:41 Mar :19 3rd Qu.:23.00 6:51 AM: 8
Tue:39 Feb :18 Max. :31.00 6:54 AM: 8
Wed:39 (Other):79 (Other):148
¿Como puedo contar las semanas laborales que llevo hasta el momento?
Opcion 1
1 a =0
2 for ( i in Data $ DayOfWeek ) {
3 if ( i == " Fri " ) {
4 a <- a + 1
5 }
6 }
7 a
40
Opcion 2
1 b <- length ( which ( Data $ DayOfWeek == " Fri " ) )
2 b
40
148
Opcion 3
1 summary ( Data $ DayOfWeek )
54.38
1 library ( ggplot2 )
¿Como podrı́a hacer un subset de las horas que he trabajado por dı́a en
todo el periodo?
1 mon <- Data [ Data $ DayOfWeek == " Mon " ,]
2 Tue <- Data [ Data $ DayOfWeek == " Tue " ,]
3 Wed <- Data [ Data $ DayOfWeek == " Wed " ,]
4 Thu <- Data [ Data $ DayOfWeek == " Thu " ,]
5 Fri <- Data [ Data $ DayOfWeek == " Fri " ,]
6 Sat <- Data [ Data $ DayOfWeek == " Sat " ,]
7 Sun <- Data [ Data $ DayOfWeek == " Sun " ,]
8
9 rm ( mon , Tue , Wed , Thu , Fri , Sat )
10
11 day <- levels ( Data $ DayOfWeek )
12 Day _ of _ week <- list ()
13 for ( i in day ) {
14 Day _ of _ week [[ i ]] <- Data [ Data $ DayOfWeek == i ,]
15 }
16
17 Day _ of _ week [[ " Fri " ]]
¿Como podrı́a hacer un subset de las horas que he trabajado por mes en
todo el periodo?
149
1 Nov <- Data [ Data $ Month == " Nov " ,]
2 Nov <- Data [ Data $ Month == " Dec " ,]
3 Nov <- Data [ Data $ Month == " Jan " ,]
4 Nov <- Data [ Data $ Month == " Feb " ,]
5 Nov <- Data [ Data $ Month == " Mar " ,]
6 Nov <- Data [ Data $ Month == " Apr " ,]
7 Nov <- Data [ Data $ Month == " May " ,]
8 Nov <- Data [ Data $ Month == " Jun " ,]
9 Nov <- Data [ Data $ Month == " Jul " ,]
10 Ago <- Data [ Data $ Month == " Aug " ,]
11 Sep <- Data [ Data $ Month == " Sep " ,]
12 Oct <- Data [ Data $ Month == " Oct " ,]
13
14 rm ( Mar , Apr , May , Jun , Jul , Aug , Sep , Oct , Nov , Dec , Jan , Feb
)
15
16 mnt <- levels ( Data $ Month )
17 Month _ of _ year <- list ()
18 for ( i in mnt ) {
19 Month _ of _ year [[ i ]] <- Data [ Data $ Month == i ,]
20 }
21
22 Month _ of _ year [[ " Nov " ]]
¿Como podrı́a hacer un subset de las horas que he trabajado por mes y
por dı́a en todo el periodo?
1 Nov . mon <- Data [ Data $ DayOfWeek == " Mon " & Data $ Month == " Nov "
,]
2 Nov . tue <- Data [ Data $ DayOfWeek == " Tue " & Data $ Month == " Nov "
,]
3 Nov . wed <- Data [ Data $ DayOfWeek == " Wed " & Data $ Month == " Nov "
,]
4 Nov . thu <- Data [ Data $ DayOfWeek == " Thu " & Data $ Month == " Nov "
,]
5 Nov . fri <- Data [ Data $ DayOfWeek == " Fri " & Data $ Month == " Nov "
,]
6 Nov . sat <- Data [ Data $ DayOfWeek == " Sat " & Data $ Month == " Nov "
,]
7 Nov . sun <- Data [ Data $ DayOfWeek == " Sun " & Data $ Month == " Nov "
,]
8
9 Ago . mon <- Data [ Data $ DayOfWeek == " Mon " & Data $ Month == " Aug "
,]
150
10 Ago . tue <- Data [ Data $ DayOfWeek == " Tue " & Data $ Month == " Aug "
,]
11 Ago . wed <- Data [ Data $ DayOfWeek == " Wed " & Data $ Month == " Aug "
,]
12 Ago . thu <- Data [ Data $ DayOfWeek == " Thu " & Data $ Month == " Aug "
,]
13 Ago . fri <- Data [ Data $ DayOfWeek == " Fri " & Data $ Month == " Aug "
,]
14 Ago . sat <- Data [ Data $ DayOfWeek == " Sat " & Data $ Month == " Aug "
,]
15 Ago . sun <- Data [ Data $ DayOfWeek == " Sun " & Data $ Month == " Aug "
,]
16
17 DayOfWeek <- levels ( Data $ DayOfWeek )
18 Month <- levels ( Data $ Month )
19
20 Data $ Month <- factor ( Data $ Month , levels = c ( " Nov " , " Dec " , "
Jan " , " Feb " , " Mar " , " Apr " , " May " , " Jun " , " Jul " , " Aug " , "
Sep " , " Oct " ) )
21 Data $ DayOfWeek <- factor ( Data $ DayOfWeek , levels = c ( " Mon " , "
Tue " , " Wed " , " Thu " , " Fri " , " Sat " , " Sun " ) )
151
1 v <- ggplot ( data = Data , aes ( x = Hours ) )
2 v + geom _ histogram ( binwidth =.5 , colour = " Black " )
152
2.12 Ejercicio de estructuración de datos en R
1 install . packages ( " stringr " )
2 library ( stringr ) # llama libreria stringr
setwd("E:/PathToFolder/Dia 6")
“do.call” manda llamar los comandos “rbind” para apilara los objetos en la
matriz de datos, por ultimo “strsplit” dividirá los elementos que cada lı́nea
contenga para ordenarlos en columnas.
153
La opción “split” divide los datos por los valores que asignemos “”, en
este caso serán en espacios en blanco de 2 a 10. La opción “stringsAs-
Factor=FALSE” evita que las columnas del texto sean consideradas como
factor.
1 dat <- as . data . frame ( do . call ( rbind , strsplit ( dat , split = "
{2 ,10} " ) ) , stringsAsFactors = FALSE )
2 head ( dat )
V1 V2 V3 V4
Bania Thomas M. 725 Commonwealth Ave. Boston
Barnaby David 373 W. Geneva St. Wms. Bay
Bausch Judy 373 W. Geneva St. Wms. Bay
Bolatto Alberto 725 Commonwealth Ave. Boston
Carlstrom John 933 E. 56th St. Chicago
Chamberlin Richard A. 111 Nowelo St. Hilo
V5 V6
MA O2215
WI 53191
WI 53191
MA O2215
IL 60637
HI 96720
154
LastName FirstName address city state
Bania Thomas M. 725 Commonwealth Ave. Boston MA
Barnaby David 373 W. Geneva St. Wms. Bay WI
Bausch Judy 373 W. Geneva St. Wms. Bay WI
Bolatto Alberto 725 Commonwealth Ave. Boston MA
Carlstrom John 933 E. 56th St. Chicago IL
Chamberlin Richard A. 111 Nowelo St. Hilo HI
zip
O2215
53191
53191
O2215
60637
96720
1 tail ( dat )
155
1 View ( dat )
156
1 dat $ streetname <- gsub ( " [0 -9]{1 ,4} (. * ) " , " \\1 " , dat $ address
)
2 dat $ streetno2 <- paste ( dat $ streetname , dat $ streetno , sep = "
")
3
4 View ( dat )
157
1 View ( dat )
158
La función “data.entry(dat)” te permite visualizar y modificar directa del
“Data Frame”
1 data . entry ( dat )
159
2.13 Ejercicio 2 estructuración de datos en R
1 setwd ( " E : / User / PathToFolder / Dia 6 " )
2 library ( stringr )
1 tail ( dat )
160
1 dat <- read . csv ( " Horario . csv " )
2 head ( dat )
1 nrow ( dat )
204
161
1 dat <- dat [ - c (1 , nrow ( dat ) ) ,]
2
3 colnames ( dat ) <- c ( " DayOfWeek _ Month _ Day " , " Time _ start " , " Time
_ end _ PM " , " Hours " )
1 View ( dat )
162
Podemos separar “Thu, Nov 1” con gsub
1 dat $ DayOfWeek <- gsub ( " ( * ) ,. * " , " \\1 " , dat $ DayOfWeek _ Month _
Day )
2 View ( dat )
163
[A-Z]+[a-z]+ permite extraer caracteres mayúscula y minúscula que se en-
cuentran después del caracter “, ”
1 dat $ DayOF _ Month <- str _ extract ( dat $ DayOfWeek _ Month _ Day , " , [A
- Z ]+[ a - z ]+ " )
2 View ( dat )
1 summary ( dat )
164
DayOfWeek Month Day
Length:202 Length:202 Min. : 1.00
Class :character Class :character 1st Qu.: 8.00
Mode :character Mode :character Median :16.00
Mean :15.84
3rd Qu.:23.00
Max. :31.00
Time start Time end PM Hours
7:00 AM: 11 3:40 PM: 4 8.65 : 6
6:56 AM: 9 4:13 PM: 4 8.45 : 4
6:59 AM: 9 5:19 PM: 4 8.55 : 4
7:01 AM: 9 3:16 PM: 3 8.7 : 4
6:51 AM: 8 3:33 PM: 3 10.42 : 3
6:54 AM: 8 3:41 PM: 3 8: 3
(Other):148 (Other):181 (Other):178
1 str ( dat )
165
,..: 15 74 45 44 49 7 3 38 51 46 ...
$ Time_end_PM: Factor w/ 143 levels "1:18 PM","1:19 PM",..:
117 90 78 36 126 128 89 58 6 109 ...
$ Hours : Factor w/ 152 levels "0.6","1.12","1.22",..:
49 105 128 82 14 69 61 113 64 6 ...
1 Oneto4Hr <- dat [ dat $ Hours > 1 & dat $ Hours < 4 ,]
2 Over5Hr <- dat [ dat $ Hours > 5 & dat $ Hours < 6 ,]
3 Over6Hr <- dat [ dat $ Hours > 6 & dat $ Hours < 7 ,]
4 Over7Hr <- dat [ dat $ Hours > 7 & dat $ Hours < 8 ,]
5 Over8Hr <- dat [ dat $ Hours > 8 & dat $ Hours < 9 ,]
6 Over9Hr <- dat [ dat $ Hours > 9 & dat $ Hours < 10 ,]
7 Over10Hr <- dat [ dat $ Hours > 10 ,]
8
9 Oneto4Hr
166
DayOfWeek Month Day Time start Time end PM Hours
Fri Dec 7 8:20 AM 10:52 AM 2.53
Tue Dec 18 10:45 AM 2:31 PM 3.77
Thu Dec 20 11:43 AM 12:50 PM 1.12
Mon Jan 28 6:00 AM 7:13 AM 1.22
Fri Mar 8 8:31 AM 12:08 PM 3.62
Fri Apr 12 12:29 PM 2:40 PM 2.18
Fri Jul 19 9:36 AM 12:03 PM 2.45
Fri Jul 26 9:49 AM 1:27 PM 3.63
0.6
14.28
1 x <- seq ( min ( dat $ Hours ) , max ( dat $ Hours ) , length . out = 10)
2 x
167
0.60 2.12 3.64 5.16 6.68 8.20 9.72 11.24 12.76 14.28
1.52
168
1 p <- ggplot ( data = dat , aes ( x = Hours ) )
2 p + geom _ histogram ( binwidth =y , aes ( fill = DayOfWeek ) , colour = "
Black " ) + facet _ grid (. ~ Month , scales = " free " )
169
2.14 Introducción a la limpieza de datos en R
1 if ( ! require ( installr ) )
2 install . packages ( " installr " ) ; require ( installr )
3 updateR ()
1 getwd ()
setwd("E:/User/PathToFolder/Dia 6")
1 getwd ()
("E:/User/PathToFolder/Dia 6")
Profit Growth
8553827 19
13212508 20
8701897 16
10727561 19
4193069 19
8179177 22
170
1 tail ( fin )
171
$ City : Factor w/ 297 levels "Addison","Alexandria",..:
94 181 105 195 151 154 53 295 232 26 ...
$ Revenue : Factor w/ 499 levels "","$1,614,585",..: 480 195
486 247 403 142 309 1 97 118 ...
$ Expenses : Factor w/ 498 levels "","1,026,548 Dollars",..:
7 486 4 249 228 248 58 1 403 496 ...
$ Profit : int 8553827 13212508 8701897 10727561 4193069
8179177 3259485 NA 5274553 11412916 ...
$ Growth : Factor w/ 33 levels "","-2%","-3%",..: 15 17 12
15 15 19 13 1 27 17 ...
1 summary ( fin )
172
ID Name
Min. : 1.0 Abstractedchocolat: 1
1st Qu.:125.8 Abusivebong : 1
Median :250.5 Acclaimedcirl : 1
Mean :250.5 Admitruppell : 1
3rd Qu.:375.2 Admonishbadelynge : 1
Max. :500.0 Ahemparticular : 1
(Other) :494
Industry Inception Employees
173
Asignando propiedades de factor a una variable
1 fin $ ID
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52.................................... 464 465 466 467 468 469
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485
486 487 488 489 490 491 492 493 494 495 496 497 498 499 500
1 factor ( fin $ ID )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 ....................................... 465 466 467 468 469 470 471 472
473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488
489 490 491 492 493 494 495 496 497 498 499 500 500 Levels: 1 2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... 500
174
$ Expenses : Factor w/ 498 levels "","1,026,548 Dollars"
,..: 7 486 4 249 228 248 58 1 403 496 ...
$ Profit : int 8553827 13212508 8701897 10727561
4193069 8179177 3259485 NA 5274553 11412916 ...
$ Growth : Factor w/ 33 levels "","-2%","-3%",..:
15 17 12 15 15 19 13 1 27 17 ...
1 str ( fin )
175
7 486 4 249 228 248 58 1 403 496 ...
$ Profit : int 8553827 13212508 8701897 10727561 4193069
8179177 3259485 NA 5274553 11412916 ...
$ Growth : Factor w/ 33 levels "","-2%","-3%",..:
15 17 12 15 15 19 13 1 27 17 ...
12 13 14 12 12
1 typeof ( a )
character
1 b <- as . numeric ( a )
2 b
12 13 14 12 12
1 typeof ( b )
double
1 c <- factor ( a )
2 c
12 13 14 12 12 Levels: 12 13 14
1 typeof ( c )
integer
1 y <- as . numeric ( c )
176
Error, se asignan las unidades de las categorias (numero de unidades del
factor) en vez del numero.
1 y
12311
1 typeof ( y )
Forma correcta
1 x <- as . numeric ( as . character ( c ) )
2 x
12 13 14 12 12
1 typeof ( x )
double
177
$ Industry : Factor w/ 8 levels "","Construction",..:
8 6 7 6 8 6 3 2 6 3 ...
.........................................................
$ Profit : int 8553827 13212508 8701897 10727561 4193069
8179177 3259485 NA 5274553 11412916 ...
$ Growth : Factor w/ 33 levels "","-2%","-3%",..:
15 17 12 15 15 19 13 1 27 17 ...
1 summary ( fin )
178
ID Name ..... ..... Profit2
1: 1 Abstractedchocolat: 1 ..... ..... 12434 : 1
2: 1 Abusivebong : 1 ..... ..... 46851 : 1
3: 1 Acclaimedcirl : 1 ..... ..... 53681 : 1
4: 1 Admitruppell : 1 ..... ..... 68862 : 1
5: 1 Admonishbadelynge : 1 ..... ..... 73350 : 1
6: 1 Ahemparticular : 1 ..... ..... (Other):493
(Other):494 (Other) :494 ..... ..... NA’s : 2
1 fin $ Profit2 <- as . numeric ( fin $ Profit2 )
2 str ( fin )
1 head ( fin )
179
1 View ( fin )
1 fin $ Expenses <- gsub ( " Dollars " , " " , fin $ Expenses )
2 fin $ Expenses <- gsub ( " ," , " " , fin $ Expenses )
3 head ( fin )
ID Name Industry ..... Profit Growth
1 Over-Hex Software ..... 8553827 19
2 Unimattax IT Services ..... 13212508 20
3 Greenfax Retail ..... 8701897 16
4 Blacklane IT Services ..... 10727561 19
5 Yearflex Software ..... 4193069 19
6 Indigoplanet IT Services ..... 8179177 22
$ es un caracter especial, se debe agregar \\$
1 fin $ Revenue <- gsub ( " \\ $ " ," " , fin $ Revenue )
2 fin $ Revenue <- gsub ( " ," ," " , fin $ Revenue )
3 head ( fin )
ID Name Industry ..... Profit Growth
1 Over-Hex Software ..... 8553827 19
2 Unimattax IT Services ..... 13212508 20
3 Greenfax Retail ..... 8701897 16
4 Blacklane IT Services ..... 10727561 19
5 Yearflex Software ..... 4193069 19
6 Indigoplanet IT Services ..... 8179177 22
180
1 str ( fin )
1 fin $ Growth <- gsub ( " % " ," " , fin $ Growth )
2 head ( fin )
181
’data.frame’: 500 obs. of 11 variables:
$ ID : Factor w/ 500 levels "1","2","3","4",..:
1 2 3 4 5 6 7 8 9 10 ...
$ Name : Factor w/ 500 levels "Abstractedchocolat",..:
297 451 168 40 485 199 435 339 242 395 ...
$ Industry : Factor w/ 8 levels "","Construction",..:
8 6 7 6 8 6 3 2 6 3 ...
$ Inception: Factor w/ 16 levels "1999","2000",..:
8 11 14 13 15 15 11 15 11 12 ...
$ Employees: int 25 36 NA 66 45 60 116 73 55 25 ...
$ State : Factor w/ 43 levels "","AL","AZ","CA",..:
37 34 36 4 42 28 23 30 4 9 ...
$ City : Factor w/ 297 levels "Addison","Alexandria",..:
94 181 105 195 151 154 53 295 232 26 ...
$ Revenue :
chr "9684527" "14016543" "9746272" "15359369" ...
$ Expenses :
chr "1130700" "804035" "1044375" "4631808" ...
$ Profit :
int 8553827 13212508 8701897 10727561 4193069 8179177
3259485 NA 5274553 11412916 ...
$ Growth : chr "19" "20" "16" "19" ...
182
$ Inception: Factor w/ 16 levels "1999","2000",..:
8 11 14 13 15 15 11 15 11 12 ...
$ Employees: int 25 36 NA 66 45 60 116 73 55 25 ...
$ State : Factor w/ 43 levels "","AL","AZ","CA",..:
37 34 36 4 42 28 23 30 4 9 ...
$ City : Factor w/ 297 levels "Addison",
"Alexandria",..: 94 181 105 195 151 154 53 295 232 26 ...
$ Revenue :
num 9684527 14016543 9746272 15359369 8567910 ...
$ Expenses :
num 1130700 804035 1044375 4631808 4374841 ...
$ Profit : int 8553827 13212508 8701897 10727561
4193069 8179177 3259485 NA 5274553 11412916 ...
$ Growth : num 19 20 16 19 19 22 17 NA 30 20 ...
1 summary ( fin )
183
2.15 Seguimiento a la limpieza de datos en R
1 View ( fin )
2 head ( fin ,24)
1 str ( fin )
184
.........................................................
$ Profit : int 8553827 13212508 8701897 10727561 4193069
8179177 3259485 NA 5274553 11412916 ...
$ Growth : Factor w/ 32 levels "-2%","-3%","0%",..:
14 16 11 14 14 18 12 NA 26 16 ...
1 head ( fin )
185
ID Name Industry ... Profit Growth
5 Yearflex Software ... 4193069 19
137 Toughcare Retail ... 6633554 14
183 Ittech IT Services ... 4589251 20
200 Lalane Retail ... 7527175 14
208 Countslovenly Construction ... 166462 10
245 Peskyevaluate IT Services ... 8727201 23
360 Remembergabbro Construction ... 6363466 12
380 Pickyfive IT Services ... 10368276 26
435 Lucrepickled IT Services ... 9382538 17
487 Genusequ Construction ... 2756691 11
1 head ( fin ,24)
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA (.....................................................) NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA
186
ID Name Industry Inception Employees ... Profit Growth
NA NA NA NA NA ... NA NA
NA NA NA NA NA ... NA NA
NA NA NA NA NA ... NA NA
NA NA NA NA NA ... NA NA
NA NA NA NA NA ... NA NA
NA NA NA NA NA ... NA NA
NA NA NA NA NA ... NA NA
NA NA NA NA NA ... NA NA
NA NA NA NA NA ... NA NA
NA ¡NA¿ ¡NA¿ NA NA ... NA NA
187
ID Name Industry Inception
8 Rednimdox Construction 2013
17 Ganzlax IT Services 2011
44 Ganzgreen Construction 2010
Employees State City Revenue
73 NY Woodside ¡NA¿
75 NJ Iselin 14,001,180
224 TN Franklin ¡NA¿
Expenses Profit Growth
¡NA¿ NA ¡NA¿
¡NA¿ 11901180 18
¡NA¿ NA 9
1 fin [ is . na ( fin $ State ) ,]
188
Eliminando observaciones de las cuales no se tiene información.
1 fin _ backup <- fin
2 fin <- fin _ backup
12
498
189
ID Name Industry Inception
1 Over-Hex Software 2006
2 Unimattax IT Services 2009
3 Greenfax Retail 2012
4 Blacklane IT Services 2011
... ... ... ...
10
190
ID Name Industry Inception
3 Greenfax Retail 2012
8 Rednimdox Construction 2013
11 Canecorporation Health 2012
17 Ganzlax IT Services 2011
22 Lathotline Health NA
44 Ganzgreen Construction 2010
84 Drilldrill Software 2010
267 Circlechop Software 2010
332 Westminster Financial Services 2010
379 Stovepuck Retail 2013
191
Recetando el index del data frame.
1 fin
192
1 fin [ ! complete . cases ( fin ) ,]
193
A partir de la columna de “City”, puedo fácilmente saber al “State” que
estos pertenecen.
1 fin [ is . na ( fin $ State ) ,]
194
a esa columna, de tal manera que si agregamos un valor a ese espacio
básicamente lo reescribiremos.
1 fin [ is . na ( fin $ State ) & fin $ City == " New York " ," State " ] <- " NY
"
1 fin [ is . na ( fin $ State ) & fin $ City == " San Francisco " ," State " ]
<- " CA "
Profit Growth
5014821 17
3137242 20
1 nrow ( fin [ ! complete . cases ( fin ) ,])
195
Reemplazando la información faltante: método de la mediana estadı́stica
“Employees” y “Growth
1 fin [ ! complete . cases ( fin $ Employees ) ,]
1 med _ empl _ retail <- median ( fin [ fin $ Industry == " Retail " ,"
Employees " ] , na . rm = TRUE )
2 med _ empl _ retail
28
1 mean ( fin [ fin $ Industry == " Retail " ," Employees " ] , na . rm = TRUE )
209.2766
196
1 fin [ is . na ( fin $ Employees ) & fin $ Industry == " Retail " ," Employees "
] <- med _ empl _ retail
1 fin [3 ,]
80
1 fin [ is . na ( fin $ Employees ) & fin $ Industry == " Financial Services "
,]
1 fin [330 ,]
197
1 nrow ( fin [ ! complete . cases ( fin ) ,])
1 fin $ Growth <- gsub ( " % " ," " , fin $ Growth )
2 fin $ Growth <- as . numeric ( fin $ Growth )
3 med _ growth _ constr <- median ( fin [ fin $ Industry == " Construction " ,
" Growth " ] , na . rm = TRUE )
4 med _ growth _ constr
10
1 fin [8 ,]
198
ID Name Industry Inception
8 Rednimdox Construction 2013
17 Ganzlax IT Services 2011
22 Lathotline Health NA
44 Ganzgreen Construction 2010
9055059
199
ID Name Industry Inception
8 Rednimdox Construction 2013
44 Ganzgreen Construction 2010
200
1 fin $ Expenses <- gsub ( " Dollars " , " " , fin $ Expenses )
2 fin $ Expenses <- gsub ( " ," , " " , fin $ Expenses )
3 fin $ Expenses <- as . numeric ( fin $ Expenses )
4
5 med _ exp _ constr <- median ( fin [ fin $ Industry == " Construction " ,"
Expenses " ] , na . rm = TRUE )
6 med _ exp _ constr
4506976
1 fin [ is . na ( fin $ Expenses ) & fin $ Industry == " Construction " & is .
na ( fin $ Profit ) ,]
201
ID Name Industry Inception
8 Rednimdox Construction 2013
17 Ganzlax IT Services 2011
22 Lathotline Health NA
44 Ganzgreen Construction 2010
1 fin [ c (8 ,42) ,]
202
1 fin [ ! complete . cases ( fin ) ,]
203
3 Etapa 3
3.1 Ejercicio de Manejo de datos en R
setwd("E:/User/PathToFolder/Dia 6")
204
2.- Encuentra cual es la fila del auto más costoso
1 Auto [ which ( Auto $ price == max ( Auto $ price , na . rm = TRUE ) ) ,]
price
36 45400
205
alfa-romero audi bmw chevrolet
3 4 6 3
dodge honda isuzu jaguar
2 3 3 3
mazda mercedes-benz mitsubishi nissan
5 4 4 5
porsche toyota volkswagen volvo
3 7 4 2
Opción 1 subsets
1 alfa _ romero <- Auto [ Auto $ company == " alfa - romero " ,]
2 audi <- Auto [ Auto $ company == " audi " ,]
3 bmw <- Auto [ Auto $ company == " bmw " ,]
4 chevrolet <- Auto [ Auto $ company == " chevrolet " ,]
5 dodge <- Auto [ Auto $ company == " dodge " ,]
6 honda <- Auto [ Auto $ company == " honda " ,]
7 isuzu <- Auto [ Auto $ company == " isuzu " ,]
8 jaguar <- Auto [ Auto $ company == " jaguar " ,]
9 mazda <- Auto [ Auto $ company == " mazda " ,]
10 mercedes _ benz <- Auto [ Auto $ company == " mercedes - benz " ,]
11 mitsubishi <- Auto [ Auto $ company == " mitsubishi " ,]
12 nissan <- Auto [ Auto $ company == " nissan " ,]
13 porsche <- Auto [ Auto $ company == " porsche " ,]
14 toyota <- Auto [ Auto $ company == " toyota " ,]
15 volkswagen <- Auto [ Auto $ company == " volkswagen " ,]
16 volvo <- Auto [ Auto $ company == " volvo " ,]
17
18 MaxpriceAr <- alfa _ romero [ which ( alfa _ romero $ price == max ( alfa
_ romero $ price , na . rm = TRUE ) ) ,]
19 MaxpriceAu <- audi [ which ( audi $ price == max ( audi $ price , na . rm
= TRUE ) ) ,]
20 MaxpriceBw <- bmw [ which ( bmw $ price == max ( bmw $ price , na . rm =
TRUE ) ) ,]
21 MaxpriceCh <- chevrolet [ which ( chevrolet $ price == max (
chevrolet $ price , na . rm = TRUE ) ) ,]
22 MaxpriceDo <- dodge [ which ( dodge $ price == max ( dodge $ price , na .
rm = TRUE ) ) ,]
23 MaxpriceHo <- honda [ which ( honda $ price == max ( honda $ price , na .
206
rm = TRUE ) ) ,]
24 MaxpriceIs <- isuzu [ which ( isuzu $ price == max ( isuzu $ price , na .
rm = TRUE ) ) ,]
25 MaxpriceJa <- jaguar [ which ( jaguar $ price == max ( jaguar $ price ,
na . rm = TRUE ) ) ,]
26 MaxpriceMa <- mazda [ which ( mazda $ price == max ( mazda $ price , na .
rm = TRUE ) ) ,]
27 MaxpriceMe <- mercedes _ benz [ which ( mercedes _ benz $ price == max (
mercedes _ benz $ price , na . rm = TRUE ) ) ,]
28 MaxpriceMi <- mitsubishi [ which ( mitsubishi $ price == max (
mitsubishi $ price , na . rm = TRUE ) ) ,]
29 MaxpriceNi <- nissan [ which ( nissan $ price == max ( nissan $ price ,
na . rm = TRUE ) ) ,]
30 MaxpricePo <- porsche [ which ( porsche $ price == max ( porsche $
price , na . rm = TRUE ) ) ,]
31 MaxpriceTo <- toyota [ which ( toyota $ price == max ( toyota $ price ,
na . rm = TRUE ) ) ,]
32 MaxpriceVk <- volkswagen [ which ( volkswagen $ price == max (
volkswagen $ price , na . rm = TRUE ) ) ,]
33 MaxpriceVl <- volvo [ which ( volvo $ price == max ( volvo $ price , na .
rm = TRUE ) ) ,]
34
35 PriceDF <- rbind ( MaxpriceAr , MaxpriceAu , MaxpriceBw ,
MaxpriceCh , MaxpriceDo , MaxpriceHo ,
36 MaxpriceIs , MaxpriceJa , MaxpriceMa ,
MaxpriceMe , MaxpriceMi , MaxpriceNi ,
37 MaxpricePo , MaxpriceTo , MaxpriceVk ,
MaxpriceVl )
38
39 PriceDF
207
index company body-style wheel-base length engine-type
2 1 alfa-romero convertible 88.6 168.8 dohc
3 2 alfa-romero hatchback 94.5 171.2 ohcv
7 6 audi wagon 105.8 192.7 ohc
12 14 bmw sedan 103.5 193.8 ohc
16 18 chevrolet sedan 94.5 158.8 ohc
17 19 dodge hatchback 93.7 157.3 ohc
20 28 honda sedan 96.5 175.4 ohc
22 30 isuzu sedan 94.3 170.7 ohc
27 35 jaguar sedan 102.0 191.7 ohcv
32 43 mazda sedan 104.9 175.0 ohc
36 47 mercedes-benz hardtop 112.0 199.2 ohcv
40 52 mitsubishi sedan 96.3 172.4 ohc
45 57 nissan sedan 100.4 184.6 ohcv
47 62 porsche convertible 89.5 168.9 ohcf
55 79 toyota wagon 104.5 187.8 dohc
59 86 volkswagen sedan 97.3 171.7 ohc
61 88 volvo wagon 104.3 188.8 ohc
num-of-cylinders horsepower average mileage price
2 four 111 21 16500.0
3 six 154 19 16500.0
7 five 110 19 18920.0
12 six 182 16 41315.0
16 four 70 38 6575.0
17 four 68 31 6377.0
20 four 101 24 12945.0
22 four 78 24 6785.0
27 twelve 262 13 36000.0
32 four 72 31 18344.0
36 eight 184 14 45400.0
40 four 88 25 8189.0
45 six 152 19 13499.0
47 six 207 17 37028.0
55 six 156 19 15750.0
59 four 100 26 9995.0
61 four 114 23 13415.0
208
Opción 2 loop
1 comp _ auto <- levels ( Auto $ company )
2 PriceDF2 <- data . frame ()
3 for ( i in comp _ auto ) {
4 Temp <- Auto [ Auto $ company == i ,]
5 Temp <- Temp [ which ( Temp $ price == max ( Temp $ price , na . rm =
TRUE ) ) ,]
6 PriceDF2 <- rbind ( PriceDF2 , Temp )
7 }
8 PriceDF2
9
10 View ( PriceDF )
11 View ( PriceDF2 )
Var1 Freq
1 19 1
2 21 2
209
1 alfa _ romeroKmMode <- names ( alfa _ romeroTable ) [ which ( alfa _
romeroTable == max ( alfa _ romeroTable ) ) ]
2 alfa _ romeroKmMode
[1] ”21”:
Opcion 2 loop
1 AutoKmMode <- c ()
2 comp _ auto <- levels ( Auto $ company )
3 for ( i in comp _ auto ) {
4 Temp <- Auto [ Auto $ company == i ,]
5 Temp <- table ( Temp $ average . mileage )
6 Temp <- names ( Temp ) [ which ( Temp == max ( Temp ) ) ]
7 Temp <- paste (i , Temp , sep = " = " )
8 AutoKmMode <- append ( AutoKmMode , Temp )
9 }
10 AutoKmMode
[[’alfa-romero = [21]’],
[’audi = [19]’],
[’bmw = [16]’],
[’chevrolet = [38]’],
[’dodge = [31]’],
[’honda = [24]’],
[’isuzu = [38]’],
[’jaguar = [15]’],
[’mazda = [31]’],
[’mercedes-benz = [14]’],
[’mitsubishi = [25]’],
[’nissan = [31]’],
[’porsche = [17]’],
[’toyota = [31]’],
[’volkswagen = [37]’],
[’volvo = [23]’]]
210
1 o <- order ( Auto $ price , na . last = TRUE , decreasing = TRUE )
2 AutoOrderPrice <- Auto [o ,]
3
4 AutoOrderPrice <- Auto [ order ( Auto $ price , na . last = TRUE ,
decreasing = TRUE ) ,]
5
6 AutoOrderPrice
211
20 29 honda sedan 96.5 169.1 ohc
58 86 volkswagen sedan 97.3 171.7 ohc
53 71 toyota wagon 95.7 169.7 ohc
39 52 mitsubishi sedan 96.3 172.4 ohc
57 82 volkswagen sedan 97.3 171.7 ohc
56 81 volkswagen sedan 97.3 171.7 ohc
52 70 toyota wagon 95.7 169.7 ohc
55 80 volkswagen sedan 97.3 171.7 ohc
43 56 nissan wagon 94.5 170.2 ohc
18 27 honda wagon 96.5 157.1 ohc
40 53 nissan sedan 94.5 165.3 ohc
38 51 mitsubishi sedan 96.3 172.4 ohc
51 69 toyota wagon 95.7 169.7 ohc
42 55 nissan sedan 94.5 165.3 ohc
29 38 mazda hatchback 93.1 159.1 ohc
21 30 isuzu sedan 94.3 170.7 ohc
41 54 nissan sedan 94.5 165.3 ohc
15 18 chevrolet sedan 94.5 158.8 ohc
50 68 toyota hatchback 95.7 158.7 ohc
16 19 dodge hatchback 93.7 157.3 ohc
49 67 toyota hatchback 95.7 158.7 ohc
14 17 chevrolet hatchback 94.5 155.9 ohc
17 20 dodge hatchback 93.7 157.3 ohc
37 50 mitsubishi hatchback 93.7 157.3 ohc
28 37 mazda hatchback 93.1 159.1 ohc
36 49 mitsubishi hatchback 93.7 157.3 ohc
48 66 toyota hatchback 95.7 158.7 ohc
27 36 mazda hatchback 93.1 159.1 ohc
13 16 chevrolet hatchback 88.4 141.1 l
22 31 isuzu sedan 94.5 155.9 ohc
23 32 isuzu sedan 94.5 155.9 ohc
47 63 porsche hatchback 98.4 175.7 dohcv
212
num-of-cylinders horsepower average mileage price
35 eight 184 14 45400.0
11 six 182 16 41315.0
34 eight 184 14 40960.0
46 six 207 17 37028.0
12 six 182 15 36880.0
26 twelve 262 13 36000.0
25 six 176 15 35550.0
45 six 207 17 34028.0
24 six 176 15 32250.0
10 six 182 16 30760.0
33 five 123 22 28248.0
32 five 123 22 25552.0
9 six 121 21 20970.0
6 five 110 7 19 18920.0
31 four 72 31 18344.0
4 five 115 18 17450.0
8 four 101 23 16925.0
2 six 154 19 16500.0
1 four 111 21 16500.0
7 four 101 23 16430.0
54 six 156 19 15750.0
5 five 110 19 15250.0
3 four 102 24 13950.0
44 six 152 19 13499.0
0 four 111 21 13495.0
60 four 114 23 13415.0
19 four 101 24 12945.0
59 four 114 23 12940.0
30 two 101 17 11845.0
20 four 100 25 10345.0
58 four 100 26 9995.0
53 four 62 27 8778.0
39 four 88 25 8189.0
57 four 52 37 7995.0
56 four 85 27 7975.0
52 four 62 27 7898.0
55 four 52 37 7775.0
213
43 four 69 31 7349.0
18 four 76 30 7295.0
40 four 55 45 7099.0
38 four 88 25 6989.0
51 four 62 31 6918.0
42 four 69 31 6849.0
29 four 68 31 6795.0
21 four 78 24 6785.0
41 four 69 31 6649.0
15 four 70 38 6575.0
50 four 62 31 6488.0
16 four 68 31 6377.0
49 four 62 31 6338.0
14 four 70 38 6295.0
17 four 68 31 6229.0
37 four 68 31 6189.0
28 four 68 31 6095.0
36 four 68 37 5389.0
48 four 62 35 5348.0
27 four 68 30 5195.0
13 three 70 38 NaN
23 four 70 38 NaN
47 eight 288 17 NaN
214
3.2 Ejercicio Transformación de Datos en R
setwd("E:/User/PathToFolder/Dia 7")
Como puedo definir la primera columna como los nombres de las columnas?
215
Al sumar todos los valores de la fila podrı́amos validar si la operación se
realizo según lo esperado.
1 sum ( anormPorcentaje [1 ,])
100
100
4 str ( anormlog )
216
$ C4DC.C5OH: num -12.9 -12.9 -13 -12.5 -12.5 ...
$ C5 : num -14.4 -14.5 -14.6 -13.8 -13.5 ...
$ C5.1 : num -17.9 -18 -18 -17.7 -17.6 ...
$ C5DC.C6OH: num -14.8 -14.7 -14.7 -13.8 -13.8 ...
$ C6 : num -16.9 -17 -17 -16.1 -15.6 ...
$ C6DC : num -14.2 -14.4 -14.3 -14.7 -14.4 ...
$ C8 : num -16.9 -17 -17 -16.7 -16 ...
$ C8.1 : num -Inf -18 -18 -17.7 -17.6 ...
$ C10 : num -16.9 -17 -17 -16.7 -16.6 ...
$ C10.1 : num -Inf -Inf -Inf -17.7 -16.6 ...
$ C10.2 : num -Inf -Inf -Inf -Inf -Inf ...
$ C12 : num -16.9 -17 -17 -16.7 -16.6 ...
$ C12.1 : num -16.9 -17 -17 -Inf -16.6 ...
$ C14 : num -15.4 -15.7 -15.5 -16.7 -16.6 ...
$ C14.1 : num -15.9 -16 -16 -16.1 -16 ...
$ C14.2 : num -16.9 -18 -17 -16.7 -16.6 ...
$ C14OH : num -Inf -18 -Inf -Inf -Inf ...
$ C16 : num -12.1 -12.1 -12 -14.1 -13.9 ...
$ C16.1 : num -14.6 -14.7 -14.6 -16.1 -16 ...
$ C16.1OH : num -15.9 -15.7 -16 -16.7 -16.6 ...
$ C16OH : num -16.9 -16.4 -16.5 -16.7 -16 ...
$ C18 : num -13.6 -13.6 -13.9 -14 -13.8 ...
$ C18.1 : num -11.9 -11.8 -11.9 -11.2 -11.1 ...
$ C18.1OH : num -15.9 -16 -16 -16.7 -16.6 ...
$ C18.2 : num -13 -13 -13.1 -12.1 -12 ...
$ C18OH : num -16.9 -17 -17 -Inf -17.6 ...
217
El resultado “-Inf” puede afectar significativamente los análisis que a este
se le quisieran hacer.
1 View ( anormlog )
16 View ( anormlog )
218
1 write . csv ( anormlog , " FoldTriplicado . csv " )
219
3.3 Vizualización Gráfica en R Resultados de VGchartz
1
2 install . packages ( " dplyr " )
3 install . packages ( " ggplot2 " )
4 install . packages ( " gganimate " )
5 install . packages ( " gifski " )
6
7 library ( dplyr )
8 library ( ggplot2 )
9 library ( gganimate )
10 library ( gifski )
220
vendidas por a \ ~ no
21 if ( as . numeric ( df1 [i , " Years " ]) < as . numeric ( df $ Year ) ) {
22 df1 [i , " TotalShipped " ] <- 0 # A \ ~ nos en que el juego
aun no se lanza se asigna 0
23 } else if ( as . numeric ( df1 [i , " Years " ]) == as . numeric ( df $
Year ) ) {
24 # A \ ~ nos en que lanza el juego y adelante se asigna una
proporci \ ’ on de ventas del juego
25 a <- df $ Total _ Shipped / difference # Se divide la
cantidad completa de juegos entre la diferencia
26 df1 [i , " TotalShipped " ] <- a # Al primer a \ ~ no se asigna
el resultado de la divisi \ ’ on
27 } else if ( as . numeric ( df1 [i , " Years " ]) > as . numeric ( df $
Year ) ) {
28 df1 [i , " TotalShipped " ] <- a + df1 [ i - 1 , " TotalShipped "
] # A los resultados en adelante se asigna la suma de los
anteriores
29 }
30 }
31 return ( df1 )
32 }
33
34 assign _ rank <- function ( df ) {
35 # Loops dise \ ~ nados con el fin de asignar el rango de
acuerdo a la cantidad m \ ’ axima de juegos
36 # disponibles en cada uno de los a \ ~ nos
37
38 # For disponible para asignar el Rank == 1 en cada uno de
los a \ ~ nos
39 for ( i in levels ( df $ Years ) ) {
40 b <- df % >% filter ( Years == i )
41 df $ Rank [ which ( df $ Name == b $ Name [ which ( b $ TotalShipped ==
max ( b $ TotalShipped ) ) ]
42 & df $ Years == i ) ] <- 1
43 }
44
45 # For disponible para asignar el Rank == 2 en cada uno de
los a \ ~ nos
46 for ( i in levels ( df $ Years ) ) {
47 b <- df % >% filter ( Years == i & Rank ! = 1)
48 df $ Rank [ which ( df $ Name == b $ Name [ which ( b $ TotalShipped ==
max ( b $ TotalShipped ) ) ]
49 & df $ Years == i ) ] <- 2
50 }
51
221
52 # For disponible para asignar el Rank == 3 en cada uno de
los a \ ~ nos
53 for ( i in levels ( df $ Years ) ) {
54 b <- df % >% filter ( Years == i & Rank ! = 1 & Rank ! = 2)
55 df $ Rank [ which ( df $ Name == b $ Name [ which ( b $ TotalShipped ==
max ( b $ TotalShipped ) ) ]
56 & df $ Years == i ) ] <- 3
57 }
58
59 # For disponible para asignar el Rank == 4 en cada uno de
los a \ ~ nos
60 for ( i in levels ( df $ Years ) ) {
61 b <- df % >% filter ( Years == i & Rank ! = 1 & Rank ! = 2 &
Rank ! = 3)
62 df $ Rank [ which ( df $ Name == b $ Name [ which ( b $ TotalShipped ==
max ( b $ TotalShipped ) ) ]
63 & df $ Years == i ) ] <- 4
64 }
65
66 # For disponible para asignar el Rank == 5 en cada uno de
los a \ ~ nos
67 for ( i in levels ( df $ Years ) ) {
68 b <- df % >% filter ( Years == i & Rank ! = 1 & Rank ! = 2 &
Rank ! = 3 & Rank ! = 4)
69 df $ Rank [ which ( df $ Name == b $ Name [ which ( b $ TotalShipped ==
max ( b $ TotalShipped ) ) ]
70 & df $ Years == i ) ] <- 5
71 }
72
73 # For disponible para asignar el Rank == 6 en cada uno de
los a \ ~ nos
74 for ( i in levels ( df $ Years ) ) {
75 b <- df % >% filter ( Years == i & Rank ! = 1 & Rank ! = 2 &
Rank ! = 3 & Rank ! = 4
76 & Rank ! = 5)
77 df $ Rank [ which ( df $ Name == b $ Name [ which ( b $ TotalShipped ==
max ( b $ TotalShipped ) ) ]
78 & df $ Years == i ) ] <- 6
79 }
80
81 # For disponible para asignar el Rank == 7 en cada uno de
los a \ ~ nos
82 for ( i in levels ( df $ Years ) ) {
83 b <- df % >% filter ( Years == i & Rank ! = 1 & Rank ! = 2 &
Rank ! = 3 & Rank ! = 4
222
84 & Rank ! = 5 & Rank ! = 6)
85 df $ Rank [ which ( df $ Name == b $ Name [ which ( b $ TotalShipped ==
max ( b $ TotalShipped ) ) ]
86 & df $ Years == i ) ] <- 7
87 }
88
89 # For disponible para asignar el Rank == 8 en cada uno de
los a \ ~ nos
90 for ( i in levels ( df $ Years ) ) {
91 b <- df % >% filter ( Years == i & Rank ! = 1 & Rank ! = 2 &
Rank ! = 3 & Rank ! = 4
92 & Rank ! = 5 & Rank ! = 6 & Rank ! = 7)
93 df $ Rank [ which ( df $ Name == b $ Name [ which ( b $ TotalShipped ==
max ( b $ TotalShipped ) ) ]
94 & df $ Years == i ) ] <- 8
95 }
96
97 # For disponible para asignar el Rank == 9 en cada uno de
los a \ ~ nos
98 for ( i in levels ( df $ Years ) ) {
99 b <- df % >% filter ( Years == i & Rank ! = 1 & Rank ! = 2 &
Rank ! = 3 & Rank ! = 4
100 & Rank ! = 5 & Rank ! = 6 & Rank ! = 7 &
Rank ! = 8)
101 df $ Rank [ which ( df $ Name == b $ Name [ which ( b $ TotalShipped ==
max ( b $ TotalShipped ) ) ]
102 & df $ Years == i ) ] <- 9
103 }
104
105 # For disponible para asignar el Rank == 10 en cada uno de
los a \ ~ nos
106 for ( i in levels ( df $ Years ) ) {
107 b <- df % >% filter ( Years == i & Rank ! = 1 & Rank ! = 2 &
Rank ! = 3 & Rank ! = 4
108 & Rank ! = 5 & Rank ! = 6 & Rank ! = 7 &
Rank ! = 8 & Rank ! = 9)
109 df $ Rank [ which ( df $ Name == b $ Name [ which ( b $ TotalShipped ==
max ( b $ TotalShipped ) ) ]
110 & df $ Years == i ) ] <- 10
111 }
112
113 return ( df )
114 }
115 sv _ anim <- function ( data , name ) {
116 final _ animation <- animate ( data , 100 , fps = 20 , duration =
223
30 , width = 950 ,
117 height = 750 , renderer = gifski _
renderer () )
118 assign ( " final _ animation " , final _ animation , envir =
globalenv () )
119 filename <- getwd ()
120 filename <- paste ( filename , " / " , name , " . gif " , sep = " " )
121 anim _ save ( filename , animation = final _ animation )
122 }
224
Rank Name Total Shipped Year
1 1 Pokemon 362.06 1998
2 2 Super Mario 354.51 1983
3 3 Call of Duty 300.00 2003
4 4 Grand Theft Auto 300.00 1998
5 5 FIFA 282.40 1993
6 6 The Sims 200.00 2000
7 7 Minecraft 180.00 2011
8 8 Tetris 171.00 1984
9 9 Need for Speed 150.00 1994
10 10 Final Fantasy 149.00 1987
11 11 Mario Kart 142.34 1992
12 12 Assassin’s Creed 140.00 2007
13 13 Madden NFL 130.00 1988
14 14 Wii Sports 115.99 2006
15 15 Pro Evolution Soccer 106.80 1995
16 16 The Legend of Zelda 105.81 1987
17 17 Lego 104.30 1997
18 18 Resident Evil 95.00 1996
19 19 NBA 2K 90.00 1999
20 20 Wii Sports 82.88 2006
21 21 Gran Turismo 80.40 1998
22 22 Dragon Quest 80.00 1986
23 23 Battlefield 78.90 2002
24 24 Sonic the Hedgehog 76.64 1991
25 25 Tomb Raider 75.00 1996
26 26 Halo 71.00 2001
27 27 Just Dance 70.00 2009
28 28 WWE 2K 70.00 2000
29 29 Counter-Strike 65.00 2000
30 30 The Oregon Trail 65.00 1971
31 31 Donkey Kong 62.55 1981
32 32 Monster Hunter 60.00 2004
33 33 PUBG 60.00 2017
34 34 Super Smash Bros. 58.88 1999
35 35 The Elder Scrolls 58.62 1994
36 36 Borderlands 56.00 2009
37 37 Dragon Ball 56.00 1986
38 38 Metal Gear 55.00 1987
39 39 Far Cry 225 52.50 2004
40 40 Mario Party 50.94 1999
Esta es una pequeña trampa por que de acuerdo VGChartz “Call of Duty”
y “Gran Theft Auto” tienen la misma cantidad de ventas. Asignamos que
“Call of Duty” tenga más copias vendidas.
1 games $ Total _ Shipped [ which ( games $ Name == " Call of Duty " ) ] <-
as . numeric (300.10)
2 games $ Rank <- rep (0 , length ( games $ Rank ) ) # Se asigna un
formato de Rango que todos sean 0 , se les asignara por
ventas proximamente
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[33] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
juego.
1 gamesdb <- as . data . frame ( NULL ) # Se crea el dataframe vacio
226
Se cambia la clase de “Years” de numérico a factor, con el fin de poder
conseguir los niveles.
1 gamesdb $ Years <- as . factor ( gamesdb $ Years )
2 gamesdb <- assign _ rank ( gamesdb ) # Se reassigna el resultado
de la funcion
Se realiza un filtrado final con el fin de solo obtener los juegos que en algún
momento se encontraban en el Top 10, eliminando los años en que no se
encontraban a la venta.
1 finalgame <- gamesdb % >% filter ( Rank >= 1 & TotalShipped ! =
0)
2 finalgame
227
10 scale _ y _ continuous ( labels = scales :: comma ) +
11 # scale _ x _ reverse para voltear 180 el inicio del eje X
12 scale _ x _ reverse () +
13 guides ( color = FALSE , fill = FALSE ) +
14 # elimina el formato del eje X y eje Y que se establece
en el plot est \ ’ atico
15 theme _ minimal () +
16 # formato de los titulos del gr \ ’ afico
17 theme (
18 plot . title = element _ text ( size =20 , hjust =0.5 , face = " bold "
, colour = " grey " , vjust = -1) ,
19 plot . subtitle = element _ text ( size =18 , hjust =0.5 , face = "
italic " , color = " grey " ) ,
20 plot . caption = element _ text ( size =8 , hjust =0.5 , face = "
italic " , color = " grey " ) ,
21 axis . ticks . y = element _ blank () ,
22 axis . text . y = element _ blank () ,
23 axis . title . y = element _ blank () ,
24 plot . margin = margin (1 , 1 , 1 , 4 , " cm " )
25 )
26 p
228
Es totalmente normal que queden los nombres empalmados ya que en el
plot dinámico se mueven. VGChartz Fase 4: Construcción del Plot Dinámico
229
1 plt <- p +
2 # La funci \ ’ on transition _ states () permite que se realice
la animaci \ ’ on
3 transition _ states ( states = Years , transition _ length = 4 ,
state _ length = 1) +
4 # Permite establece el formato cuadr \ ’ atico de los aes del
plot est \ ’ atico
5 ease _ aes ( " cubic - in - out " ) +
6 # Estable los titulos de cada uno de los objetos del gr \ ’
afico
7 labs ( title = " Top 10 Most Sucessful Videogames : { closest _
state } " ,
8 subtitle = " Millions Units Sold " ,
9 caption = " Sorce : VGChartz " ,
10 y = " Total Copies Sold " )
11 plt
230
VGChartz Fase 5: Guardar la animación
Los argumentos son el plot dinámico y el nombre que se quiere asignar a la
animación.
1 sv _ anim ( plt , " vgchartzcomplete " )
231