Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Dan Tavrov
I. INTRODUCTION
A man is a social animal [1]. Most of our everyday actions
depend on or are based on how people treat them, especially
the ones closest to us (or whose opinion is most relevant to us).
Such people constitute what may be called a close circle.
At the same time, a person may not always be eager to
disclose information about members of such a close circle.
This reluctance may originate either from some subjective
views or from the nature of the circle (religious identity,
professional community, income level, LGBT community,
radical group etc.).
What we face here is a problem of concealing membership
of a given respondent in a certain group. This problem can be
formulated as the task of masking certain characteristics of a
given respondent [2, 3, 4]. This task is usually called the task
of providing individual anonymity, where anonymity means
[5] the property of a subject to be unidentifiable within a set of
other subjects. It is possible to set a complementary task of
providing group anonymity, where we need to conceal
information not about a single respondent, but about a group
of respondents (e.g., we need to mask regional, age, or other
kinds of distributions of a certain group).
The procedure for providing data anonymity should meet
the following conditions [6, p.399]:
1) Disclosure risk is low or at least adequate to protected
information importance.
b) define
an
P = P1 ,
, Ml p .
: i M, Gi *i M* , Gi
, ql p , where qi
r I p r* I p
InfM r, r p
r I p r* I p
p 1
nord
(1)
ncat
k 2 r J k , r* J k ,
algorithm
k 1
S Ui i 1, .
Ui1 ,Ui2
to each pair
Cmax
(3)
U j j U
j 1
(4)
individuals with too many rows. Since all three terms are of
equal importance, their values lie in the interval [0,1].
We propose to use the following expression for the first
term of (2):
j 1
1.
index i in the first and the third column of U . Then, for any
qi .
( ui 3 , ui 4 ) i 1, Q can
f U U U U ,
(2)
ui 3 , ui 4 needs to be preserved i 1, Q .
3) Operator M 3 is applied with small probability pm3 to
the second column of U as to the vector of categorical values.
4) Operator M 4 is applied with small probability pm4 to
the fourth column of U as to the vector of categorical values.
Operator S U in this work is defined as an operator
applied to the individual U that yields the modified one U
according to the following procedure [19]:
1) Carry out steps 24 i 1, Q .
2) Generate a uniformly distributed random number
r 0,1 .
3) If r pmem , assign to ui 4 the index of a record from
value of the second, the seventh, the ninth, and the twelfth
signal elements. This leads us to the following decreasing
restriction functions (Fig. 1):
1,
2
1 2 x 20 ,
47
2 x
2
x 67
2
47 ,
0,
1,
2
1 2 x 25 ,
5
7 x
2
x 30
2
5 ,
0,
1,
2
1 2 x 25 ,
3
9 x
2
x 28
2
3 ,
0,
1,
2
1 2 x 25 ,
13
12 x
2
x 38
2
13 ,
0,
x 20
20 x 43.5
43.5 x 67
(a)
x 67
x 25
25 x 27.5
27.5 x 30
x 30
(b)
x 25
25 x 26.5
26.5 x 28
x 28
(c)
x 25
25 x 21.5
21.5 x 38
x 38
(d)
Fig. 1 Decreasing restriction functions for the example:
(a) for the second element, (b) for the seventh element,
(c) for the ninth element, (d) for the twelfth element
ex U
1
1 e
0.5Q 90
f ex1 U
1
4
j2,7,9,12
j U
1099
j
ex
U ,
Fig. 3 Increasing restriction function for the eighth and tenth signal elements
from the example
x 15
0,
2
2 x 15 ,
12
8 x 10 x
2
x 27
2
,
12
1,
15 x 21
21 x 27
x 27
f ex 2 U
1
6
j2,7,8,9,10,12
1099
j U
ex
U .
TABLE I
CLUSTERS OBTAINED AFTER THE FIRST PHASE OF MMA
Quantity Signal Elements
Cluster Size
Mean Metric
to Increase
1 and 6
78
45.436
3 and 6
84
46.048
3 and 10
26
46.269
4 and 6
43
48.488
6 and 8
183
46.519
8 and 10
101
44.238
a)
REFERENCES
[1] D. Brooks, The Social Animal: The Hidden Sources of Love, Character,
and Achievement, N.Y. Random House Trade Paperbacks, 2011.
[2] C. C. Aggarwal and P. S. Yu, A general survey of privacy-preserving
data mining: models and algorithms, in Privacy-Preserving Data
Mining: Models and Algorithms, Advanced in Database Systems, vol. 34,
C. C. Aggarwal and P. S. Yu, Eds. New York: Springer Science+
Business Media, LLC, 2008, pp. 1152.
[3] B. Fung, K. Wang, R. Chen, P. Yu, Privacy-preserving data publishing:
a survey of recent developments, ACM Computing Surveys, 42(4),
pp. 153, 2010.
[4] C. N. Sowmyarani and G. N. Srinivasan, Survey on recent developments
in privacy preserving models, International Journal of Computer
Applications, 38(9), pp. 1822, 2012.
[5] A. Phitzmann and M. Hansen. (2010). A terminology for talking about
privacy by data minimization: anonymity, unlinkability, undetectability,
unobservability, pseudonymity, and identity management. Version v0.34
[Online]. Available:
http://dud.inf.tu-dresden.de/Anon_Terminology.shtml
[6] O. Chertov and A. Pilipyuk, Statistical disclosure control methods for
microdata, in 2009 International Symposium on Computing,
Communication, and Control. Proc. Of CSIT, vol. 1. Singapore: IACSIT
Press, 2011, pp. 339343.
[7] O. Chertov, Ed. Group Methods of Data Processing. Raleigh: Lulu.com,
2010.
[8] A. Meyerson and R. Williams, General k-anonymization is hard,
Carnegie Mellon School of Computer Science, Tech. Rep. CMU-CS-03113, 2003.
[9] P. Moscato, On evolution, search, optimization, genetic algorithms and
martial arts: toward memetic algorithms, Caltech Concurrent
Computation Program, Caltech, CA, C3P Rep. 826, 1989.
[10] A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computing,
2nd ed. Berlin, Heidelberg: Springer-Verlag, 2007.
[11] R. Dawkins, The Selfish Gene: 30th Anniversary Edition, Oxford, New
York: Oxford University Press, 2006.
[12] T. Ray and R. Sarker, Memetic algorithms in constrained
optimization, in Handbook of Memetic Algorithms, F. Neri, C. Cotta,
and P. Moscato, Eds. Berlin, Heidelberg: Springer-Verlag, 2012,
pp. 135151.
[13] A. E. Smith and D. W. Coit, Penalty functions, in Evolutionary
Computation 2. Advanced Algorithms and Operators, T. Bck,
D. B. Fogel, and Z. Michalewicz, Eds. Bristol, Philadelphia: Institute of
Physics Publishing, 2000, pp. 4148.
[14] Z. Michalewicz, Repair algorithms, in Evolutionary Computation 2.
Advanced Algorithms and Operators, T. Bck, D. B. Fogel, and
Z. Michalewicz, Eds. Bristol, Philadelphia: Institute of Physics
Publishing, 2000, pp. 5661.
[15] Z. Michalewicz, Decoders, in Evolutionary Computation 2. Advanced
Algorithms and Operators, T. Bck, D. B. Fogel, and Z. Michalewicz,
Eds. Bristol, Philadelphia: Institute of Physics Publishing, 2000, pp. 49
55.
b)
Fig. 4 Initial (solid line) and modified quantity signals:
(a) feasible one with the metric 40 (dashed-dotted line), feasible one with the
metric 43 (dotted line), subfeasible one (dashed line)
(b) the one with the metric 37 (dashed-dotted line), the one with the metric 38
(dotted line)
[16] O. Chertov and D. Tavrov, Providing group anonymity using wavelet
transform, in Data Security and Security Data, LNCS, vol. 6121,
L. M. MacKinnon, Ed. Berlin, Heidelberg: Springer-Verlag, 2012,
pp. 2536.
[17] D. Tavrov and O. Chertov, SSA-caterpillar in group anonymity,
presented at the World Conference in Soft Computing, San Francisco,
CA, 2011.
[18] O. R. Chertov and D. Y. Tavrov, Memetic algorithm for microfile
modification with distortion minimization while providing group
anonymity, (in Ukrainian), Bulletin of Volodymyr Dahl East Ukrainian
National University, vol. 8(179), pp. 256262, 2012.
[19] O. Chertov and D. Tavrov, Memetic algorithm for solving the task of
providing group anonymity, in Advanced Trends in Soft Computing,
Studies in Fuzziness and Soft Computing, vol. 312, M. Jamshidi,
V. Kreinovich, and J. Kacprzyk, Eds. Springer International Publishing
Switzerland, 2014, pp. 281292.
[20] D. E. Goldberg, B. Korb, and K. Deb, Messy genetic algorithms:
motivation, analysis, and first results, Complex Systems, 3, pp. 493530,
1989.
[21] U. S. Census 2000. (2000). 5-Percent Public Use Microdata Sample
Files [Online]. Available:
http://www.census.gov/main/www/cen2000.html
[22] G. Syswerda, Schedule optimization using genetic algorithms, in
Handbook of Genetic Algorithms, L. Davis, Ed. New York: Van
Nostrand Reinhold, 1991, pp. 332349.
[23] A. Brindle, Genetic algorithms for function optimization, Doctoral
Dissertation, Department of Computer Science, Tech. Rep. TR81-2,
University of Alberta, 1981.
[24] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and
Machine Learning, Addison-Wesley, 1989.