Está en la página 1de 7

El paquete classInt para clasificar variables continuas

classIntervals: Choose univariate class intervals

Descripción
La función proporciona una interfaz uniforme para encontrar intervalos de clase para variables numéricas
continuas, por ejemplo, para elegir colores o símbolos para graficar. Los intervalos de clase no se
superponen y las clases se cierran a la izquierda; consulte findInterval. Los valores de argumento para el
estilo elegido se pasan a través de los argumentos de punto. classIntervals2shingle convierte un objeto
classIntervals en una teja. Las etiquetas generadas en los métodos son como las que se encuentran en cut, a
menos que cutlabels=FALSE

classIntervals(var, n, style = "quantile", rtimes = 3, ...,


intervalClosure = c("left", "right"), dataPrecision = NULL,
warnSmallN = TRUE, warnLargeN = TRUE, largeN = 3000L, samp_prop = 0.1,
gr = c("[", "]"))
## S3 method for class 'classIntervals'
plot(x, pal, ...)
## S3 method for class 'classIntervals'
print(x, digits = getOption("digits"), ...,
under="under", over="over", between="-", cutlabels=TRUE, unique=FALSE)
nPartitions(x)
classIntervals2shingle(x)

Argumentos

var
una variable numérica continua

n
número de clases requeridas,, si falta, se usa nclass.Sturges;
vea también los estilos "dpih" y "headtails" para la elección
automática del número de clases

style
chosen style: one of "fixed", "sd", "equal", "pretty", "quantile", "kmeans", "hclust",
"bclust", "fisher", "jenks", "dpih" or "headtails"

rtimes
número de réplicas de var a catena y jitter; puede usarse con
los estilos "kmeans" o "bclust" en caso de que tengan
dificultades para llegar a una clasificación

intervalClosure
por defecto "izquierda", permite especificar si los intervalos de
partición se cierran a la izquierda oa la derecha (agregado por
Richard Dunlap). Tenga en cuenta que el sentido de cierre de
intervalo está codificado como "right"-closed
whenstyle="jenks" 

dataPrecision
predeterminado NULL, permite el redondeo de los extremos
del intervalo (agregado por Richard Dunlap)
warnSmallN
por defecto VERDADERO, si es FALSO, silencia la advertencia
para n> = nobs

warnLargeN
predeterminado VERDADERO, si FALSO no se usa el manejo de
datos grandes

largeN
predeterminado 3000L, el umbral de muestreo de QGIS; más
de 3000, las observaciones presentadas a "fisher" y "jenks" son
una muestra samp_prop= o una muestra de 3000, la que sea
mayor

samp_prop
predeterminado 0.1, QGIS 10% proporción de muestreo

gr
predeterminado c("[", "]"), si el paquete de unidades está
disponible, units::units_options("group") se puede usar
directamente para dar el estilo de corchete adjunto

...
argumentos a pasar a las funciones llamadas en cada estilo

x
objeto "classIntervals" para imprimir, convertir a tejas o trazar

under
valor de cadena de caracteres para "debajo" en etiquetas de
tablas impresas si cutlabels=FALSE

over
valor de cadena de caracteres para "sobre" en etiquetas de
tablas impresas si cutlabels=FALSE

between
valor de cadena de caracteres para "entre" en etiquetas de
tablas impresas si cutlabels=FALSE

digits
número mínimo de dígitos significativos en las etiquetas de las
tablas impresas

cutlabels
predeterminado VERDADERO, use etiquetas de estilo de corte
en etiquetas de tablas impresas

unique
por defecto FALSO; si es VERDADERO, colapsa las etiquetas
de las clases de un solo valor

pal
un vector de caracteres de al menos dos nombres de colores
para codificar por colores los intervalos de clase en un gráfico
ECDF; colorRampPalette se usa internamente para crear la
cantidad correcta de colores

The "fixed" style permits a "classIntervals" object to be specified with given breaks, set in
the fixedBreaks argument; the length of fixedBreaks should be n+1; this style can be used to insert
rounded break values.

The "sd" style chooses breaks based on pretty of the centred and scaled variables, and may have a number
of classes different from n; the returned par= includes the centre and scale values.

The "equal" style divides the range of the variable into n parts.

The "pretty" style chooses a number of breaks not necessarily equal to n using pretty, but likely to be
legible; arguments to pretty may be passed through ....

The "quantile" style provides quantile breaks; arguments to quantile may be passed through ....

The "kmeans" style uses kmeans to generate the breaks; it may be anchored using set.seed;
the pars attribute returns the kmeans object generated; if kmeans fails, a jittered input vector
containing rtimes replications of var is tried — with few unique values in var, this can prove necessary;
arguments to kmeans may be passed through ....

The "hclust" style uses hclust to generate the breaks using hierarchical clustering; the pars attribute
returns the hclust object generated, and can be used to find other breaks using getHclustClassIntervals;
arguments to hclust may be passed through ....

The "bclust" style uses bclust to generate the breaks using bagged clustering; it may be anchored
using set.seed; the pars attribute returns the bclust object generated, and can be used to find other breaks
using getBclustClassIntervals; if bclust fails, a jittered input vector containing rtimes replications
of var is tried — with few unique values in var, this can prove necessary; arguments to bclust may be
passed through ....

The "fisher" style uses the algorithm proposed by W. D. Fisher (1958) and discussed by Slocum et al. (2005)
as the Fisher-Jenks algorithm; added here thanks to Hisaji Ono. This style will subsample by default for
more than 3000 observations. This style should always be preferred to "jenks" as it uses the original Fortran
code and runs nested for-loops much faster.

The "jenks" style has been ported from Jenks' code, and has been checked for consistency with ArcView,
ArcGIS, and MapInfo (with some remaining differences); added here thanks to Hisaji Ono (originally
reported as Basic, now seen as Fortran (as described in a talk last seen at http://www.irlogi.ie/wp-
content/uploads/2016/11/NUIM_ChoroHarmful.pdf, slides 26-27)). Note that the sense of interval closure
is reversed from the other styles, and in this implementation has to be right-closed - use cutlabels=TRUE
in findColours on the object returned to show the closure clearly, and use findCols to extract the classes
for each value. This style will subsample by default for more than 3000 observations.
The "dpih" style uses the dpih() function from KernSmooth (Wand, 1995) implementing direct plug-in
methodology to select the bin width of a histogram.

The "headtails" style uses the algorithm proposed by Bin Jiang (2013), in order to find groupings or
hierarchy for data with a heavy-tailed distribution. This classification scheme partitions all of the data
values around the mean into two parts and continues the process iteratively for the values (above the
mean) in the head until the head part values are no longer heavy-tailed distributed. Thus, the number of
classes and the class intervals are both naturally determined. By default the algorithm uses thr = 0.4,
meaning that when the head represents more than 40% of the observations the distribution is not
considered heavy-tailed. The threshold argument thr may be modified through ... (see Examples).

Value
an object of class "classIntervals":

the input variable


var

brks a vector of breaks

Examples
1if (!require("spData", quietly=TRUE)) {
2 message("spData package needed for examples")
3 run <- FALSE
4} else {
5 run <- TRUE
6}
7if (run) {
8data(jenks71, package="spData")
9pal1 <- c("wheat1", "red3")
10opar <- par(mfrow=c(2,3))
11plot(classIntervals(jenks71$jenks71, n=5, style="fixed",
12 fixedBreaks=c(15.57, 25, 50, 75, 100, 155.30)), pal=pal1, main="Fixed")
13plot(classIntervals(jenks71$jenks71, n=5, style="sd"), pal=pal1, main="Pretty standard
14deviations")
15plot(classIntervals(jenks71$jenks71, n=5, style="equal"), pal=pal1, main="Equal intervals")
16plot(classIntervals(jenks71$jenks71, n=5, style="quantile"), pal=pal1, main="Quantile")
17set.seed(1)
18plot(classIntervals(jenks71$jenks71, n=5, style="kmeans"), pal=pal1, main="K-means")
19plot(classIntervals(jenks71$jenks71, n=5, style="hclust", method="complete"),
20 pal=pal1, main="Complete cluster")
21}
22if (run) {
23plot(classIntervals(jenks71$jenks71, n=5, style="hclust", method="single"),
24 pal=pal1, main="Single cluster")
25set.seed(1)
26plot(classIntervals(jenks71$jenks71, n=5, style="bclust", verbose=FALSE),
27 pal=pal1, main="Bagged cluster")
28plot(classIntervals(jenks71$jenks71, n=5, style="fisher"), pal=pal1,
29 main="Fisher's method")
30plot(classIntervals(jenks71$jenks71, n=5, style="jenks"), pal=pal1,
31 main="Jenks' method")
32 plot(classIntervals(jenks71$jenks71, style="dpih"), pal=pal1,
33 main="dpih method")
34 plot(classIntervals(jenks71$jenks71, style="headtails", thr = 1), pal=pal1,
35 main="Head Tails method")
36par(opar)
37}
38if (run) {
39print(classIntervals(jenks71$jenks71, n=5, style="fixed",
40 fixedBreaks=c(15.57, 25, 50, 75, 100, 155.30)))
41}
42if (run) {
43print(classIntervals(jenks71$jenks71, n=5, style="sd"))
44}
45if (run) {
46print(classIntervals(jenks71$jenks71, n=5, style="equal"))
47}
48if (run) {
49print(classIntervals(jenks71$jenks71, n=5, style="quantile"))
50}
51if (run) {
52set.seed(1)
53print(classIntervals(jenks71$jenks71, n=5, style="kmeans"))
54}
55if (run) {
56set.seed(1)
57print(classIntervals(jenks71$jenks71, n=5, style="kmeans", intervalClosure="right"))
58}
59if (run) {
60set.seed(1)
61print(classIntervals(jenks71$jenks71, n=5, style="kmeans", dataPrecision=0))
62}
63if (run) {
64set.seed(1)
65print(classIntervals(jenks71$jenks71, n=5, style="kmeans"), cutlabels=FALSE)
66}
67if (run) {
68print(classIntervals(jenks71$jenks71, n=5, style="hclust", method="complete"))
69}
70if (run) {
71print(classIntervals(jenks71$jenks71, n=5, style="hclust", method="single"))
72}
73if (run) {
74set.seed(1)
75print(classIntervals(jenks71$jenks71, n=5, style="bclust", verbose=FALSE))
76}
77if (run) {
78print(classIntervals(jenks71$jenks71, n=5, style="bclust",
79 hclust.method="complete", verbose=FALSE))
80}
81if (run) {
82print(classIntervals(jenks71$jenks71, n=5, style="fisher"))
83}
84if (run) {
85print(classIntervals(jenks71$jenks71, n=5, style="jenks"))
86}
87if (run) {
88print(classIntervals(jenks71$jenks71, style="dpih"))
89}
90if (run) {
91print(classIntervals(jenks71$jenks71, style="dpih", range.x=c(0, 160)))
92}
93if (run) {
94 print(classIntervals(jenks71$jenks71, style="headtails"))
95}
96if (run) {
97 print(classIntervals(jenks71$jenks71, style="headtails", thr = .45))
98}
99x <- c(0, 0, 0, 1, 2, 50)
100print(classIntervals(x, n=3, style="fisher"))
101print(classIntervals(x, n=3, style="jenks"))
102
103# Argument 'unique' will collapse the label of classes containing a
104# single value. This is particularly useful for 'censored' variables
105# that contain for example many zeros.
106
107data_censored<-c(rep(0,10), rnorm(100, mean=20,sd=1),rep(26,10))
108plot(density(data_censored))
109cl2 <- classIntervals(data_censored, n=5, style="jenks", dataPrecision=2)
110print(cl2, unique=FALSE)
111print(cl2, unique=TRUE)
112
113## Not run:
114set.seed(1)
115n <- 1e+05
116x <- runif(n)
117classIntervals(x, n=5, style="sd")
118classIntervals(x, n=5, style="pretty")
119classIntervals(x, n=5, style="equal")
120classIntervals(x, n=5, style="quantile")
121# the class intervals found vary a little because of sampling
122classIntervals(x, n=5, style="kmeans")
123classIntervals(x, n=5, style="fisher")
124classIntervals(x, n=5, style="fisher")
125classIntervals(x, n=5, style="fisher")
126
127## End(Not run)
128have_units <- FALSE
129if (require(units, quietly=TRUE)) have_units <- TRUE
130if (have_units) {
131set.seed(1)
132x_units <- set_units(sample(seq(1, 100, 0.25), 100), km/h)
133classIntervals(x_units, n=5, style="sd")
134}
135if (have_units) {
136classIntervals(x_units, n=5, style="pretty")
137}
138if (have_units) {
139classIntervals(x_units, n=5, style="equal")
140}
141if (have_units) {
142classIntervals(x_units, n=5, style="quantile")
143}
144if (have_units) {
145classIntervals(x_units, n=5, style="kmeans")
146}
147if (have_units) {
148classIntervals(x_units, n=5, style="fisher")
149}
150if (have_units) {
151classIntervals(x_units, style="headtails")
152}
153st <- Sys.time()
154x_POSIXt <- sample(st+((0:500)*3600), 100)
155fx <- st+((0:5)*3600)*100
156classIntervals(x_POSIXt, style="fixed", fixedBreaks=fx)
157classIntervals(x_POSIXt, n=5, style="sd")
158classIntervals(x_POSIXt, n=5, style="pretty")
159classIntervals(x_POSIXt, n=5, style="equal")
160classIntervals(x_POSIXt, n=5, style="quantile")
161classIntervals(x_POSIXt, n=5, style="kmeans")
162classIntervals(x_POSIXt, n=5, style="fisher")
163classIntervals(x_POSIXt, style="headtails")
164
165# Head Tails method is suitable for right-sided heavy-tailed distributions
166set.seed(1234)
167# Heavy tails-----
168# Pareto distributions a=7 b=14
169paretodist <- 7 / (1 - runif(1000)) ^ (1 / 14)
170# Lognorm
lognormdist <- rlnorm(1000)
171
# Weibull
172
weibulldist <- rweibull(1000, 1, scale = 5)
173
174
pal1 <- c("wheat1", "red3")
175
opar <- par(mfrow = c(2, 3))
176
plot(classIntervals(paretodist, style = "headtails"),
177
pal = pal1,
178
main = "HeadTails: Pareto Dist.")
179
plot(classIntervals(lognormdist, style = "headtails"),
180
pal = pal1,
181
main = "HeadTails: LogNormal Dist.")
182
plot(classIntervals(weibulldist, style = "headtails"),
183
pal = pal1,
184
main = "HeadTails: Weibull Dist.")
185
plot(classIntervals(paretodist, n = 5, style = "fisher"),
186
pal = pal1,
187
main = "Fisher: Pareto Dist.")
188
plot(classIntervals(lognormdist, n = 7, style = "fisher"),
189
pal = pal1,
190
main = "Fisher: LogNormal Dist.")
191
plot(classIntervals(weibulldist, n= 4, style = "fisher"),
192
pal = pal1,
193
main = "Fisher: Weibull Dist.")
194
par(opar)
195
196
197
#Non heavy tails, thr should be increased-----
198
199
#Normal dist
200
normdist <- rnorm(1000)
201
#Left-tailed truncated Normal distr
202
leftnorm <- rep(normdist[normdist < mean(normdist)], 2)
203
# Uniform distribution
204
unifdist <- runif(1000)
205
opar <- par(mfrow = c(2, 3))
206
plot(classIntervals(normdist, style = "headtails"),
207
pal = pal1,
208
main = "Normal Dist.")
209
plot(classIntervals(leftnorm, style = "headtails"),
210
pal = pal1,
211
main = "Truncated Normal Dist.")
212
plot(classIntervals(unifdist, style = "headtails"),
213
pal = pal1,
214
main = "Uniform Dist.")
215
# thr should be increased for non heavy-tailed distributions
216
plot(
217
classIntervals(normdist, style = "headtails", thr = .6),
218
pal = pal1,
219
main = "Normal Dist. thr = .6"
220
)
221
plot(
222
classIntervals(leftnorm, style = "headtails", thr = .6),
223
pal = pal1,
224
main = "Truncated Normal Distribution thr = .6"
225
)
226
plot(
227
classIntervals(unifdist, style = "headtails", thr = .6),
228
pal = pal1,
229
main = "Uniform Distribution thr = .6"
230
)
231
par(opar)

También podría gustarte