Two-Phase MMA For Solving The TPGA

Two-Phase Memetic Modifying Algorithm for
Solving the Task of Providing Group Anonymity

Oleg Chertov
Dan Tavrov
Applied Mathematics Department

National Technical University of Ukraine
Kyiv Polytechnic Institute
Kyiv, Ukraine
chertov@i.ua
Applied Mathematics Department

National Technical University of Ukraine
Kyiv Polytechnic Institute
Kyiv, Ukraine
dan.tavrov@i.ua
Abstract Nowadays, it has become a common practice to

provide public access to various kinds of primary non-aggregated
statistical data. Necessary precautions ought to be taken in order
to guarantee that sensitive data features are masked, and data
privacy cannot be violated.
In the case of protecting information about a group of
people, it is important to protect intrinsic data features and
distributions. To do so, it is obligatory to introduce a certain level
of distortion into the dataset. The problem of minimizing this
distortion is a complex optimization task, which can be
successfully solved by applying appropriate heuristic procedures,
e.g., memetic algorithms. The task of determining whether a
particular solution masks sensitive data features is an ill-defined
one, and often can be solved only by expert evaluation.
In the paper, we propose to apply two-phase memetic
algorithm to solving such tasks of providing group anonymity, for
which it is not always possible to define appropriate constraints.
I. INTRODUCTION
A man is a social animal [1]. Most of our everyday actions
depend on or are based on how people treat them, especially
the ones closest to us (or whose opinion is most relevant to us).
Such people constitute what may be called a close circle.
At the same time, a person may not always be eager to
disclose information about members of such a close circle.
This reluctance may originate either from some subjective
views or from the nature of the circle (religious identity,
professional community, income level, LGBT community,
radical group etc.).
What we face here is a problem of concealing membership
of a given respondent in a certain group. This problem can be
formulated as the task of masking certain characteristics of a
given respondent [2, 3, 4]. This task is usually called the task
of providing individual anonymity, where anonymity means
[5] the property of a subject to be unidentifiable within a set of
other subjects. It is possible to set a complementary task of
providing group anonymity, where we need to conceal
information not about a single respondent, but about a group
of respondents (e.g., we need to mask regional, age, or other
kinds of distributions of a certain group).
The procedure for providing data anonymity should meet
the following conditions [6, p.399]:
1) Disclosure risk is low or at least adequate to protected
information importance.
2) Both original and protected data, when analyzed, yield

close, or even equal results.
3) The cost of transforming the data is acceptable.
When providing group anonymity, preserving data utility
in the sense of the second requirement is an optimization task
at least of the same complexity [7, p. 12] as the well-known kanonymization problem in the field of statistical disclosure
control, which is NP-hard [8]. Moreover, providing group
anonymity can be viewed as a constrained optimization
problem, in which the search space consists of both feasible
and unfeasible solutions. The feasibility of a solution is
interpreted as the ability to mask sensitive data features in the
modified dataset, and its optimality is determined by the level
of modified data utility.
In this paper, we propose to use memetic algorithms
(MAs) [9], which are usually implemented as evolutionary
algorithms with incorporated local search procedures [10,
p. 173], sometimes called memes after the term introduced in
[11, p. 192]. A review of memetic algorithms for dealing with
constrained optimization problems presented in [12] shows the
benefits of using MAs instead of conventional heuristics.
There can be distinguished four [10] commonly used ways
to handle the constraints in any evolutionary algorithm:
1) Using penalty functions that reduce fitness of infeasible
solutions [13].
2) Using repair functions that convert infeasible solutions
into feasible ones [14].
3) Restricting search to a feasible subspace of the search
space by using specific alphabet for problem representation
[10, pp. 215216].
4) Using decoder functions that map infeasible solutions
into feasible ones, thus transforming initial search space into
another one [15].
In some cases, constraints on the solutions can be deduced
from additional considerations. For instance, the solutions
might be restricted to the ones preserving specific data features
like high frequency [16] or periodic [17] components. It is
therefore possible to restrict search space to a well-defined
subspace by using a special form of solution representation in
the memetic algorithm [18]. In general, however, there are no
specific requirements for the solution to meet, and using
penalty functions seems to be the most appropriate alternative.
In practice, more often than not quality of a solution

heavily depends on quality of constraints. If they are too
severe (the desired data distribution that masks sensitive data
features is known), the level of distortion introduced into the
dataset might become too high. On the other hand, if
constraints are too mild, the solution may not mask sensitive
data features, even though the penalties are relatively small.
In general, when it is not clear what constraints should be
introduced (different constraints guide evolution in different
directions, not necessarily leading to obtaining optimal
solutions), we propose to use a two-phase memetic algorithm
for solving the TPGA. At the first phase, it helps define
appropriate constraints for the solution, and at the second
phase, it optimizes data utility loss caused by the solution.
II. THEORETIC BACKGROUND
A. The Task of Providing Group Anonymity
Let the data for anonymizing be gathered in a
depersonalized microfile M . Each record ri , i 1, , of this
microfile contains values of attributes w j , j 1, . The set of
all the values of w j is denoted as w j .
Let wv j , j 1, t , denote vital microfile attributes. Then, a
vital value combination V can be defined as an element of
Cartesian product wv1 wv2 wvt . We will denote a set of
vital value combinations by V = V1 ,
,Vlv . We call microfile
b) define
an
P = P1 ,
, Pl p . Parameter values can be used to divide the
microfile M into submicrofiles M1 ,
, Ml p .
Let us denote by G V, P a group of respondents whose

distribution needs to be protected when providing group
anonymity. The group is thereby determined by appropriately
defined values of parameter and vital microfile attributes.
The task of providing group anonymity (TPGA) [19] lies
in performing modification of M in order to mask sensitive
data features. The generic scheme of providing group
anonymity according to single-stage approach to solving the
TPGA goes as follows:
1) Construct a (depersonalized) microfile M representing
statistical data to be processed.
2) Define groups of respondents to be protected
Gi Vi , Pi , i 1, k .
3) For each i from 1 to k:
a) choose a dataset of arbitrary structure i M, Gi
called goal representation that represents features of Gi in a
way appropriate for their masking;
: i M, Gi *i M* , Gi
called modifying algorithm and obtain both a modified goal

representation and modified microfile.
4) Prepare the modified microfile M* for publishing.
In this work, we will illustrate the two-phase modifying
algorithm based on the most widely used goal representation,
the quantity signal, in the form q q1 , q2 ,
, ql p , where qi
is a total count of respondents in submicrofile Mi , i 1, l p ,

whose vital attribute values belong to V .
Any modifying algorithm has to provide two kinds of
data modification. On the one hand, the quantity signal has to
be altered in order to mask its sensitive features according to
restrictions imposed on all or some of its values. On the level
of microfile modification, this can be achieved by swapping
the vital and non-vital records between different submicrofiles.
Records should be swapped in a pairwise fashion to preserve
the number of records in each submicrofile.
This brings us to the second kind of data modification.
Swapping two records between submicrofiles obviously
introduces certain amount of distortion into the microfile,
which can be measured by the influential metric [7]
r I p r* I p

InfM r, r p
r I p r* I p
p 1
nord
(1)
ncat
k 2 r J k , r* J k ,
records whose attribute values belong to V vital records.

Let w p denote a parameter microfile attribute. Then, a
parameter value P can be defined as a value of this attribute,
P w p . We will denote a set of parameter values by
algorithm
k 1
where I p ( J k ) stands for the pth ordinal (kth categorical)

influential attribute (attribute whose distribution over
parameter values is of interest for researchers), r stands for
the operator returning the attribute value of the record r ,
v1 , v2 stands for the operator, which is equal to a certain
number 1 if its arguments belong to the same category, and
2 otherwise, p and k are nonnegative weights (the more

important is the attribute, the greater is the weight).
In other words, properly defined modifying algorithm has
to modify the quantity signal to mask sensitive data features,
and at the same time minimize the distortion introduced into
the microfile in terms of (1).
B. Memetic
Modifying
Algorithm
and
Individual
Representation
Since the task that needs to be solved by modifying
algorithm seems to be the one that can be solved only by
exhaustive search, we propose to use the modifying algorithm
in the form of memetic algorithm introduced in [19] (memetic
modifying algorithm, MMA):
1) Create initial population P Ui of randomly
generated individuals, i 1, . Apply local search operator
S Ui i 1, .
2) Calculate fitness function f Ui i 1, .

3) If termination condition holds, stop. Continue otherwise.
4) Select pairs of parents. Put them into set P .
5) Apply recombination operator R Ui1 ,Ui2
Ui1 ,Ui2
to each pair
from P , i1 1, , i2 1, , i1 i2 . Put the resulting
offspring into population P .

6) Apply mutation operator M U j U j P , j 1, .
7) Apply local search operator S U j j 1, .
8) Calculate fitness function f U j j 1, .
index of the record from M ui1 to be removed.
Cmax InfM Mui1 ui 2 , Mui 3 ui 4

i 1
Cmax
(3)
influential metric, Mi j is the operator yielding the jth

vector of the submicrofile Mi .
We also propose to use the following expression for the
penalty function:
lp
U j j U
j 1
(4)
where j x , j 1, l p , is a restriction function that takes

values in the interval [0,1] and expresses the degree of
compatibility of the current value q j with the corresponding
restriction, j , j 1, l p , is a weight determining the relative
lp
4) Element of the fourth column ui 4 i 1, Q is an index

of the record from Mui 3 to be swapped with the one defined
by ui 2 .
Number of rows can vary from individual to individual.
Let us denote by U
individuals with too many rows. Since all three terms are of
equal importance, their values lie in the interval [0,1].
We propose to use the following expression for the first
term of (2):
where Cmax is the greatest possible value of the cumulative
9) Select among individuals from P P fittest ones.

Put them into P in place of the current ones.
10) Go to step 3.
Each individual is a matrix U with Q rows and four
columns with the following elements:
1) Element of the first column ui1 i 1, Q is an index of
a submicrofile to remove vital records from. The set of such
submicrofiles needs to be defined by the user.
2) Element of the third column ui 3 i 1, Q is an index
of a submicrofile to add vital records to. The set of such
submicrofiles also needs to be defined by the user.
3) Element of the second column ui 2 i 1, Q is an
is a penalty function representing estimation of TPGA solution

quality from the masking sensitive quantity signal features
point of view, and U is a penalty term against obtaining
importance of the corresponding restriction,
j 1
1.
D. Other MMA Components

Operator R Ui1 ,Ui2 should be defined as a proper
recombination operator applied to two parent individuals U i1
the total count of occurrences of
and U i2 that yields two offspring individuals U j1 and U j2 . It
index i in the first and the third column of U . Then, for any
should be applied with a high probability pc . In this work, we

propose to use recombination operator introduced in [19]
based on the cut and splice operator [20]. It randomly
generates two crossover points k1 0, Qi1 and k2 0, Qi2 ,
index i in the first column, U
qi .
Each particular pair ui1 , ui 2
( ui 3 , ui 4 ) i 1, Q can
occur in U only once.

Requirements mentioned above cannot be violated during
the whole run of the algorithm.
Each individual U uniquely defines both a modified
quantity signal q* and a precise sequence of pairwise swaps to
be performed in order to modify the microfile, thereby
defining a solution to the TPGA.
C. Fitness Function for the MMA
In this work, we propose to use the fitness function as a
sum of three independent terms:
f U U U U ,
(2)
where U represents estimation of a TPGA solution quality

from the minimizing microfile distortion point of view, U
then splits each parent at appropriate points, and exchanges the

tails between them, thus creating the offspring.
Operator M U should be defined as a proper mutation
operator applied to the individual U that yields the mutated
one U . In this work, we propose to use operator introduced in
[19], which is a superposition M M 4 M 3 M 2 M1 of the
following operators:
1) Operator M 1 is applied with small probability pm1 to
the first column of U as to the permutation. Each pair
ui1 , ui 2 needs to be preserved i 1, Q .

the third column of U as to the permutation. Each pair
ui 3 , ui 4 needs to be preserved i 1, Q .
the second column of U as to the vector of categorical values.
the fourth column of U as to the vector of categorical values.
Operator S U in this work is defined as an operator
applied to the individual U that yields the modified one U
according to the following procedure [19]:
1) Carry out steps 24 i 1, Q .
2) Generate a uniformly distributed random number
r 0,1 .
3) If r pmem , assign to ui 4 the index of a record from
Mui 3 closest to the record ui 2 from M ui1 in terms of (1).

If r pmem , assign to ui 2 the index of a record from M ui1
closest to the record ui 4 from Mui 3 in terms of (1).
4) Go to step 2.
Other MMA components, such as selection, initialization,
termination, population size etc. can be chosen individually for
each TPGA at hand.
III. TWO-PHASE MEMETIC ALGORITHM
In this section, we will discuss the two-phase memetic
algorithm for the TPGA, in which sensitive data features to be
masked are maximum values of the quantity signal.
For a certain element of the quantity signal, there can be
defined two types of restriction functions:
1) Decreasing
restriction
functions,
which are
monotonically non-increasing functions that tend to unity as
the corresponding quantity signal value decreases to a
particular value.
2) Increasing
restriction
functions,
which
are
monotonically non-decreasing functions that tend to unity as
the corresponding quantity signal value increases to a
particular value.
In most cases, we can determine only decreasing
restriction functions for submicrofiles to remove vital records
from. The choice of submicrofiles to add records to (without
explicitly defining corresponding increasing restriction
functions) can be left to the evolutionary process itself [19].
However, the quality of the solution heavily depends on the
quality of the decreasing restriction functions. If the
restrictions are too severe (too many swaps need to be
performed), the cumulative metric (1) might become too great.
On the other hand, if they are too mild, the solution may not be
feasible, since maximums may remain greater than other signal
values, even though their absolute values have decreased.
In some cases, with the help of information from external
sources, it may be possible to choose submicrofiles to add vital
records too, and define appropriate increasing restriction
functions. However, in general, when there is no additional

information other than present in the data at hand, it is not
clear what submicrofiles to choose and what restrictions to
impose, because different restrictions guide evolution in
different directions, not necessarily leading to obtaining
optimal solution. In this work, we propose to apply MMA
according to the following procedure:
1) Based on analyzing the quantity signal q representing
microfile M , define suitable decreasing restriction functions
for those signal elements that violate the requirement of
masking maximum signal values.
2) Apply the memetic algorithm as described in Sect. IIB.
3) Classify individuals obtained into feasible solutions
(compatible with decreasing restrictions to a high degree and
mask maximum signal values), subfeasible solutions
(compatible with decreasing restrictions to a high degree and
dont mask maximum signal values), and infeasible solutions
(not compatible with decreasing restrictions to a high degree).
4) Group all subfeasible solutions, for which it is possible
to define the same increasing restriction functions, in clusters.
One solution may belong to several clusters.
5) Choose the cluster with the smallest mean value of
cumulative metric (1). If the cluster contains less than
solutions ( is the population size in the algorithm), then
increase its size to by duplicating solutions at random. If
the cluster contains more than solutions, decrease its size to
by removing solutions at random.
6) Apply memetic algorithm of step 2 to the set of
solutions obtained on step 5 as to the initial population.
The first two steps of this procedure constitute the first
phase of the MMA, the other four ones constitute the second
phase of the MMA.
IV. PRACTICAL RESULTS
A. General Description of the Task
To illustrate the application of the MMA to the real data
based task of providing group anonymity, we decided to
consider the following problem. Let us mask the regional
distribution of military active personnel in the state of
Massachusetts (the U.S.) according to the 5-Percent Public
Use Microdata Sample File of the 2000 U.S. census [21]. The
total of 141 838 records was taken for analysis.
To define the group of military active personnel
distributed by place of work, we took Military Service as the
vital attributes, its value Active Duty as the only vital value,
Place of Work PUMA (Public Use Microdata Area) as the
parameter attribute, its values 2501025120 with the step 10
(codes of Massachusetts statistical areas) as parameter values.
Quantity signal q corresponding to the group is presented
in Fig. 4 (solid line). Signal elements 1, 2, , 12 correspond
to statistical areas 25010, 25020, , 20120, respectively.
B. The First Phase of the MMA
As we can see from the graph of the quantity signal
(Fig. 4, solid line), anonymity can be provided by reducing the
value of the second, the seventh, the ninth, and the twelfth
signal elements. This leads us to the following decreasing
restriction functions (Fig. 1):
1,
2
1 2 x 20 ,
47
2 x
2
x 67
2
47 ,
0,
1,
2
1 2 x 25 ,
5
7 x
2
x 30
2
5 ,
0,
1,
2
1 2 x 25 ,
3
9 x
2
x 28
2
3 ,
0,
1,
2
1 2 x 25 ,
13
12 x
2
x 38
2
13 ,
0,
x 20
20 x 43.5
43.5 x 67
(a)
x 67
x 25
25 x 27.5
27.5 x 30
x 30
(b)
x 25
25 x 26.5
26.5 x 28
x 28
(c)
x 25
25 x 21.5
21.5 x 38
x 38
Indices of all the signal elements other than those

restricted by functions from Fig. 1 were chosen to appear in
the third column of individuals in the MMA population.
To minimize the distortion introduced into the microfile,
we took Sex, Age, Hispanic or Latino Origin, Marital
Status, Educational Attainment, Citizenship Status, and
Persons Total Income in 1999 as the influential attributes.
We considered all these attributes to be categorical ones. To
simplify the matter, we chose the following parameters of (1):
k 1 k 1,7 , 1 1 , 2 0 . In this case, the metric (1)
shows the number of attribute values to be altered during one
swap of the records between the submicrofiles.
To prevent individuals in the MMA from growing
indefinitely, we used the following penalty term (Fig. 2):
(d)
Fig. 1 Decreasing restriction functions for the example:
(a) for the second element, (b) for the seventh element,
(c) for the ninth element, (d) for the twelfth element
ex U
1
1 e
0.5Q 90
The fitness function (2) for the example is as follows:

Q
f ex1 U
1099 sign Mui1 ui 2 , Ak Mui 3 ui 4 , Ak

i 1 k 1
1
4
j2,7,9,12
j U
1099
j
ex
U ,
Fig. 2 Penalty term heavily discriminating from obtaining individuals with

more than 100 rows
where sign is a function yielding 1 if its argument is

negative, 0 if it equals 0, and 1 if it is positive, Ak , k 1,7 , is
the kth influential attribute, M j i, Ak returns the value of the
attribute Ak of the ith record in submicrofile M j .
We chose the swap mutation [22] as mutation operators
M 1 and M 2 , and the random resetting mutation [10, p. 43] as
mutation operators M 3 and M 4 . We decided to apply
tournament selection [23] as an efficient and easy-toimplement selection operator, with the tournament size 5.
The population was initialized by randomly generating
matrices with different numbers of rows. Elements of the first
column were generated with probabilities proportional to the
values of the corresponding elements of q . Elements of the
third column were generated with probabilities proportional to
the total numbers of records in corresponding submicrofiles.
During the MMA run, we applied linear fitness scaling
[24, p. 79] to prevent premature convergence. We also
multiplied the mutation probabilities by the factor of 10
whenever the standard deviation of the population fitness
values dropped below 0.03.
Other MMA parameters were chosen to be 100 ,
40 , pc 1 , pm pm pm pm 0.001 , pmem 0.75 .

1
We performed 30 independent runs of the MMA,

terminating each run after having obtained 1000 generations.
C. The Second Phase of the MMA
Among 3000 solutions obtained as the result of the first
phase of applying MMA, only 754 (25.133%) are feasible
ones. Two solutions with the lowest cumulative metrics (1) are
given in Fig. 4a (dashed-dotted and dotted lines). The mean
cumulative metric (1) is 57.901.
The majority of solutions are subfeasible ones (1837, or
61.233%). We divided them into several clusters, the most
prominent ones are presented in Table I.
As can be deduced from Table I, it is reasonable to choose
solutions, for which values of elements 8 and 10 should be
increased, for the second phase. This leads us to the following
increasing restriction functions (Fig. 3):
Fig. 3 Increasing restriction function for the eighth and tenth signal elements
from the example
x 15
0,
2
2 x 15 ,
12
8 x 10 x
2
x 27
2
,
12
1,
15 x 21
21 x 27
x 27
The fitness function (2) for the second phase is as follows:

Q
f ex 2 U
1099 sign Mui1 ui 2 , Ak Mui 3 ui 4 , Ak

i 1 k 1
1
6
j2,7,8,9,10,12
1099
j U
ex
U .
Among 3000 solutions obtained as the result of the second

phase of applying MMA, 2693 ones (or 89.767%) are feasible
ones. Two solutions with the lowest cumulative metrics (1) are
given in Fig. 4b (dashed and dotted lines). The mean
cumulative metric (1) is 47.873. In other words, it is sufficient
to alter as few as 0.005% of the microfile attribute values in
order to provide group anonymity.
This result is better than the one obtained in [19], even
though restrictions imposed on the solution are stricter here.
V. CONCLUSIONS
Combining local search techniques with evolutionary
algorithms to increase efficiency of the latter ones has become
a widely accepted practice. This idea can be applied to solving
real-life problems in quite diverse ways yielding results of
varying quality.
In the paper, we proposed the two-phase approach to
applying memetic algorithms that can provide results of a
significantly better quality. During the first phase, feasible
solutions to the optimization task are obtained, and potential
ways of their improvement are discovered. During the second
phase, more concise constraints can be formulated, leading to
obtaining solutions of a better quality.
Experimental results presented in the paper prove the twophase memetic algorithm to be worthy of practical interest.
TABLE I
CLUSTERS OBTAINED AFTER THE FIRST PHASE OF MMA
Quantity Signal Elements
Cluster Size
Mean Metric
to Increase
1 and 6
78
45.436
3 and 6
84
46.048
3 and 10
26
46.269
4 and 6
43
48.488
6 and 8
183
46.519
8 and 10
101
44.238
However, several issues need further investigation, for

example, automatic clustering of the results obtained after the
first phase, enhancing algorithm efficiency by choosing
appropriate components, analyzing algorithms efficiency as a
function of its parameters.
a)
REFERENCES
[1] D. Brooks, The Social Animal: The Hidden Sources of Love, Character,
and Achievement, N.Y. Random House Trade Paperbacks, 2011.
[2] C. C. Aggarwal and P. S. Yu, A general survey of privacy-preserving
data mining: models and algorithms, in Privacy-Preserving Data
Mining: Models and Algorithms, Advanced in Database Systems, vol. 34,
C. C. Aggarwal and P. S. Yu, Eds. New York: Springer Science+
Business Media, LLC, 2008, pp. 1152.
[3] B. Fung, K. Wang, R. Chen, P. Yu, Privacy-preserving data publishing:
a survey of recent developments, ACM Computing Surveys, 42(4),
pp. 153, 2010.
[4] C. N. Sowmyarani and G. N. Srinivasan, Survey on recent developments
in privacy preserving models, International Journal of Computer
Applications, 38(9), pp. 1822, 2012.
[5] A. Phitzmann and M. Hansen. (2010). A terminology for talking about
privacy by data minimization: anonymity, unlinkability, undetectability,
unobservability, pseudonymity, and identity management. Version v0.34
[Online]. Available:
http://dud.inf.tu-dresden.de/Anon_Terminology.shtml
[6] O. Chertov and A. Pilipyuk, Statistical disclosure control methods for
microdata, in 2009 International Symposium on Computing,
Communication, and Control. Proc. Of CSIT, vol. 1. Singapore: IACSIT
Press, 2011, pp. 339343.
[7] O. Chertov, Ed. Group Methods of Data Processing. Raleigh: Lulu.com,
2010.
[8] A. Meyerson and R. Williams, General k-anonymization is hard,
Carnegie Mellon School of Computer Science, Tech. Rep. CMU-CS-03113, 2003.
[9] P. Moscato, On evolution, search, optimization, genetic algorithms and
martial arts: toward memetic algorithms, Caltech Concurrent
Computation Program, Caltech, CA, C3P Rep. 826, 1989.
[10] A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computing,
2nd ed. Berlin, Heidelberg: Springer-Verlag, 2007.
[11] R. Dawkins, The Selfish Gene: 30th Anniversary Edition, Oxford, New
York: Oxford University Press, 2006.
[12] T. Ray and R. Sarker, Memetic algorithms in constrained
optimization, in Handbook of Memetic Algorithms, F. Neri, C. Cotta,
and P. Moscato, Eds. Berlin, Heidelberg: Springer-Verlag, 2012,
pp. 135151.
[13] A. E. Smith and D. W. Coit, Penalty functions, in Evolutionary
Computation 2. Advanced Algorithms and Operators, T. Bck,
D. B. Fogel, and Z. Michalewicz, Eds. Bristol, Philadelphia: Institute of
Physics Publishing, 2000, pp. 4148.
[14] Z. Michalewicz, Repair algorithms, in Evolutionary Computation 2.
Advanced Algorithms and Operators, T. Bck, D. B. Fogel, and
Z. Michalewicz, Eds. Bristol, Philadelphia: Institute of Physics
Publishing, 2000, pp. 5661.
[15] Z. Michalewicz, Decoders, in Evolutionary Computation 2. Advanced
Algorithms and Operators, T. Bck, D. B. Fogel, and Z. Michalewicz,
Eds. Bristol, Philadelphia: Institute of Physics Publishing, 2000, pp. 49
55.
b)
Fig. 4 Initial (solid line) and modified quantity signals:
(a) feasible one with the metric 40 (dashed-dotted line), feasible one with the
metric 43 (dotted line), subfeasible one (dashed line)
(b) the one with the metric 37 (dashed-dotted line), the one with the metric 38
(dotted line)
[16] O. Chertov and D. Tavrov, Providing group anonymity using wavelet
transform, in Data Security and Security Data, LNCS, vol. 6121,
L. M. MacKinnon, Ed. Berlin, Heidelberg: Springer-Verlag, 2012,
pp. 2536.
[17] D. Tavrov and O. Chertov, SSA-caterpillar in group anonymity,
presented at the World Conference in Soft Computing, San Francisco,
CA, 2011.
[18] O. R. Chertov and D. Y. Tavrov, Memetic algorithm for microfile
modification with distortion minimization while providing group
anonymity, (in Ukrainian), Bulletin of Volodymyr Dahl East Ukrainian
National University, vol. 8(179), pp. 256262, 2012.
[19] O. Chertov and D. Tavrov, Memetic algorithm for solving the task of
providing group anonymity, in Advanced Trends in Soft Computing,
Studies in Fuzziness and Soft Computing, vol. 312, M. Jamshidi,
V. Kreinovich, and J. Kacprzyk, Eds. Springer International Publishing
Switzerland, 2014, pp. 281292.
[20] D. E. Goldberg, B. Korb, and K. Deb, Messy genetic algorithms:
motivation, analysis, and first results, Complex Systems, 3, pp. 493530,
1989.
[21] U. S. Census 2000. (2000). 5-Percent Public Use Microdata Sample
Files [Online]. Available:
http://www.census.gov/main/www/cen2000.html
[22] G. Syswerda, Schedule optimization using genetic algorithms, in
Handbook of Genetic Algorithms, L. Davis, Ed. New York: Van
Nostrand Reinhold, 1991, pp. 332349.
[23] A. Brindle, Genetic algorithms for function optimization, Doctoral
Dissertation, Department of Computer Science, Tech. Rep. TR81-2,
University of Alberta, 1981.
[24] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and
Machine Learning, Addison-Wesley, 1989.

Two-Phase MMA For Solving The TPGA

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Two-Phase MMA For Solving The TPGA

Cargado por

Copyright:

Formatos disponibles

Two-Phase Memetic Modifying Algorithm for

Solving the Task of Providing Group Anonymity

Applied Mathematics Department

Applied Mathematics Department

Abstract Nowadays, it has become a common practice to

2) Both original and protected data, when analyzed, yield

In practice, more often than not quality of a solution

vital value combinations by V = V1 ,

,Vlv . We call microfile

, Pl p . Parameter values can be used to divide the

microfile M into submicrofiles M1 ,

Let us denote by G V, P a group of respondents whose

called modifying algorithm and obtain both a modified goal

the quantity signal, in the form q q1 , q2 ,

is a total count of respondents in submicrofile Mi , i 1, l p ,

records whose attribute values belong to V vital records.

where I p ( J k ) stands for the pth ordinal (kth categorical)

2 otherwise, p and k are nonnegative weights (the more

2) Calculate fitness function f Ui i 1, .

5) Apply recombination operator R Ui1 ,Ui2

from P , i1 1, , i2 1, , i1 i2 . Put the resulting

offspring into population P .

index of the record from M ui1 to be removed.

Cmax InfM Mui1 ui 2 , Mui 3 ui 4

influential metric, Mi j is the operator yielding the jth

where j x , j 1, l p , is a restriction function that takes

4) Element of the fourth column ui 4 i 1, Q is an index

where Cmax is the greatest possible value of the cumulative

9) Select among individuals from P P fittest ones.

is a penalty function representing estimation of TPGA solution

importance of the corresponding restriction,

D. Other MMA Components

recombination operator applied to two parent individuals U i1

the total count of occurrences of

and U i2 that yields two offspring individuals U j1 and U j2 . It

should be applied with a high probability pc . In this work, we

index i in the first column, U

Each particular pair ui1 , ui 2

occur in U only once.

where U represents estimation of a TPGA solution quality

then splits each parent at appropriate points, and exchanges the

ui1 , ui 2 needs to be preserved i 1, Q .

2) Operator M 2 is applied with small probability pm2 to

Mui 3 closest to the record ui 2 from M ui1 in terms of (1).

functions. However, in general, when there is no additional

Indices of all the signal elements other than those

The fitness function (2) for the example is as follows:

1099 sign Mui1 ui 2 , Ak Mui 3 ui 4 , Ak

Fig. 2 Penalty term heavily discriminating from obtaining individuals with

where sign is a function yielding 1 if its argument is

40 , pc 1 , pm pm pm pm 0.001 , pmem 0.75 .

We performed 30 independent runs of the MMA,

The fitness function (2) for the second phase is as follows:

1099 sign Mui1 ui 2 , Ak Mui 3 ui 4 , Ak

Among 3000 solutions obtained as the result of the second

However, several issues need further investigation, for

También podría gustarte