Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Massimiliano Tomassoli
June 3, 2008
Abstract
This document was written in a hurry, therefore the bibliography is missing and,
above all, the algorithms and methods described in this document were never
implemented nor tested in any way. The proofs of the theorems presented are
only sketched and no one has ever read them besides me.
I do not take any responsibility for any harm which may result by using the
information contained in this document.
Copyright 2008 by Massimiliano Tomassoli. This document may be freely
distributed and duplicated as long as this copyright notice remains intact. For
any comments: mtomassoli@alice.it.
1
Contents
1 Introduction 3
1.1 The one-dimensional case . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The multidimensional case . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Some common approaches . . . . . . . . . . . . . . . . . . 5
1.2.2 UB Trees and space-filling curves . . . . . . . . . . . . . . 6
2 Static Indexing 9
2.1 Logarithmic Decomposition . . . . . . . . . . . . . . . . . . . . . 9
2.2 First method: O(log2 n) . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Second method: O(log n) . . . . . . . . . . . . . . . . . . . . . . 17
2.4 The Multidimensional Case . . . . . . . . . . . . . . . . . . . . . 23
3 Dynamic Indexing 26
3.1 Splitting and Fusion . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Point Insertion and Deletion . . . . . . . . . . . . . . . . . . . . . 27
2
Chapter 1
Introduction
1 → 5 → 8 → 9 → 20 → 23 → 28 → 30 (1.1)
We notice that each single number partitions the list into two parts in a
natural way. For instance, the number 20 partitions list (1.1) in the two lists
1→5→8→9 (1.2)
and
20 → 23 → 28 → 30 (1.3)
3
1 5 8 9 20 23 28 30
20
8 28
1 9 23 30
1 5 8 9 20 23 28 30
4
Problem 1.1.1 asks us to find all the integers x in S such that a ≤ x ≤ b.
If we insert the elements of S into a tree or, even better, we sort them in time
O(n log n) and build a balanced tree in time O(n) or put them in an array and
perform a sequence of binary searches on them, we can find the smallest integer
in S greater than or equal to a in time O(log n). The other integers can be
found in time O(1).
5
A B C DE F G
B F
A C E G
6
h1,1 h1,2
h2
h1,4,1 h1,4,2
h1,3
h1,4,3 h1,4,4
h3 h4
Figure 1.5: This figure shows how, in UB-Trees, we can tell which one of two
points is the smaller. In this case, we clearly see that the point in h1,4,1 is
smaller than the point in h1,4,4 .
Figure 1.6: This figure shows some regions induced by the total ordering used
in the UB-Trees (based on the Z-curve).
by this kind of subdivision scheme (see figure 1.6) cannot be as thin as the ones
induced by the naive subdivision scheme described above (see figure 1.4).
An ordering can also be induced by a space-filling curve. In Mathematics
a space-filling curve is a surjective continuous curve f : [0, 1] → S, where S
is a topological space. Usually, S is Rd . These curves are called space-filling
because they actually fill the entire space. Given p, q in S, we could say that
p < q ⇐⇒ min(f −1 (p)) < min(f −1 (q))) i.e. p < q if and only if p intersects
the curve f “before” q does. Some of these curves are recursive in nature. If
f is a recursive space-filling curve, there is an infinite sequence of functions
f0 , f1 , f2 , . . . where f0 is given and, for each i > 0, fi+1 is obtained by applying
a construction step to fi , and f = limn→∞ fn . Usually we can tell whether
p < q just by looking at a single function of the infinite sequence above. Let
us look at an example. The order imposed on the space by the UB Trees is
induced by the so-called Z-curve. Figure 1.7 shows the first “approximations”
7
Figure 1.7: This figure shows the first four “approximations” of the Z-curve.
The recursive nature of the curve is quite evident.
of the Z-curve. It should be evident that, for instance, points in h1 intersect the
curve “before” points in h2 do.
8
Chapter 2
Static Indexing
This chapter deals with the static version of the problem, i.e. where all points
in our space are known in advance and no points can be inserted or deleted.
Example 2.1.5. If S = (5, 1, 7, 2, 8, 2, 9, 6, 3, 4), then δS = {δS1 , δS2 , δS4 , δS8 , δS16 }
Definition 2.1.6. Let S = (s1 , s2 , . . . , sm ) and P = (p1 , p2 , . . . , pn ) be two
sequences. The concatenation S · P , or simply SP if no confusion arises, is the
sequence (s1 , s2 , . . . , sm , p1 , p2 , . . . , pn ).
9
P s6 s7 s8 s9 s10 s11 s12 s13 s14 s15
10
if P = ()
return ()
γ(T ) := () // initially empty
d := concatenation of δSc
while true
if P = d
return d
if P v L(d)
d := L(d)
else if P v R(d)
d := R(d)
else
d1 := L(d)
P1 := P \ R(d)
while P1 6= ()
if P1 @ R(d1 )
d1 := R(d1 )
else
γ(T ) := (R(d1 )) · γ(T )
P1 := P1 \ R(d1 )
d1 := L(d1 )
d2 := R(d)
P2 := P \ L(d)
while P2 6= ()
if P2 @ L(d2 )
d2 := L(d2 )
else
γ(T ) := γ(T ) · (L(d2 ))
P2 := P2 \ L(d2 )
d2 := R(d2 )
return γ(T )
Algorithm 2.1.12.1: This algorithm is used in the proof of theorem 2.1.12.
11
P s6 s7 s8 s9 s10 s11 s12 s13 s14 s15
Figure 2.2: This figure shows the same situation as figure 2.1, but here a binary
tree connecting the sequences of the logarithmic decomposition is shown. Note
that the arrows of the tree let us reach all the shaded sequences in a total time
of O(log n).
2. P v L(d)
3. P v R(d)
Proof. (sketch) Let S = (s1 , s2 , . . . , sn ) and c = 2dlog2 |S|e . It is clear that every
sequence δSi can be built from S in time O(n). Because |δS | = O(log n), δS
can be built in time O(n log n). We also build a search binary tree B such that
i
for i = 1, 2, . . . , c, each sequence d ∈ δS2 (or its first element) points to the
sequences (or their first elements) in L(d) and R(d). Have a look at figure 2.2.
We can now solve the problem by using the algorithm described in the proof
of theorem 2.1.12. B makes the implementation of the algorithm completely
straightforward.
12
2.2 First method: O(log2 n)
Let us generalize problem 1.2.1 a little.
Definition 2.2.1. If p ∈ A1 × A2 × · · · × An and p = (p1 , p2 , . . . , pn ), then
pai = τai p = pi .
Example 2.2.2. If p ∈ U × V and p = (x, y), then pu = τu p = x and pv =
τv p = y.
Definition 2.2.3. If S = {s1 , s2 , . . .} ⊂ A1 × A2 × · · · × An , then τai S =
{τai s1 , τai s2 , . . .}. For sequences replace ‘{’ and ‘}’ with ‘(’ and ‘)’, respectively.
Problem 2.2.4. Let U and V be two totally ordered sets. Given a set S =
{s1 , s2 , . . . , sn } ⊂ U × V and a, b ∈ U × V, find {(u, v) ∈ S | au ≤ u ≤ bu ∧ av ≤
v ≤ bv }.
This generalization makes sure that our method does not take advantage of
anything but the fact that the sets are totally ordered.
Definition 2.2.5. Given two sequences a = (a1 , a2 , . . .) and b = (b1 , b2 , . . .), a
is lexicographically smaller than b, written a <l b or simply a < b, if and only
if there exists an integer i ∈ [1, min{|a|, |b|} + 1[ such that, for each 0 < k < i,
ak = bk and ai < bi .
Let us solve problem 2.2.4. Let P = τv S and let Q = (q1 , q2 , . . . , qn ) be the
sequence containing all the elements of P such that qi < qj ⇐⇒ i < j for each
valid i, j (see figure 2.3).
The idea is to group the points in S by looking at the logarithmic decompo-
sition of Q.
Definition 2.2.6. Let φ : Q → 2S be an injective function that associates each
connected subsequence of Q to a different subset of S. For each subsequence
q of Q, φ(q) = φq = s, where s is the biggest subset of S such that τv s and
q contain the same elements. For every sequence X = (x1 , x2 , . . .), let φX =
(φx1 , φx2 , . . .). The sequence G = (G20 , G21 , G22 . . . , G2|δQ |−1 ), where, for each
valid i, Gi = φδSi , is called the group (logarithmic) decomposition associated to
Q.
Example 2.2.7. Figure 2.4 shows how δQ and G, i.e. the group decomposi-
tion associated to Q, are interrelated. For the sake of clarity, we will prefer
representations such as that in figure 2.5.
13
Q
Figure 2.3: This figure shows a set S of points in the space U×V and the ordered
projection Q of S onto V. Q is ordered because it is a sequence (q1 , q2 , . . . , qn )
such that qi < qj ⇐⇒ i < j for each valid i, j.
14
G Q
V δQ
Figure 2.4: This figure shows the same points as figure 2.3, but here a represen-
tation of the logarithmic decomposition δQ and a representation of the group
logarithmic decomposition G are also shown.
1 2
4 8
1
2
1
1
2
1
1 4
16
2
1
1 8
2
1
1 4
2
1
Figure 2.5: This figure shows the same points as figure 2.4, but represents the
group logarithmic decomposition in a more practical way.
15
Definition 2.2.11. If a ∈ A1 × A2 × · · · × An and a = (a1 , a2 , . . . , an ), let
ā = (an , an−1 , . . . , a1 ) ∈ An × An−1 × · · · × A1 .
16
Proof. (sketch) According to theorem 2.2.15 we can find T in O(log n) after a
precomputation step (independent of T ) that takes O(n log n). Let a and b be
the two points in problem 2.2.4. We know that |T | = O(log n) and that, for
each t ∈ T and p ∈ t, av ≤ pv ≤ bv . Because each sequence in T is sorted, for
each sequence t ∈ T we can find the smallest point p such that pu ≥ au in time
O(log n). Therefore the algorithm returns the first point in time O(log 2 n) and
each one of the others in O(1). Note that since S is static we can actually let
the client find the other points by itself and say that the problem can indeed be
solved in time O(log2 n).
17
list 2
list 8
2
1
4
1
a
2
8
1
c
2
b 1
4
1
Figure 2.6: Let us assume we are to search for all the points contained in the
smaller rectangle above and that we are only provided with an ordered group
logarithmic decomposition. Let us assume we have just found a. We can find
each one of the other points in the smaller rectangle in time O(1).
18
list 2
list 4
2 4 8
1
a1 a2 a3 a4 1
2
b 1
4
c 1
2
1
8
1
2
1
4
1
2
1
Figure 2.7: Let us assume we are to search for all the points contained in the
smaller rectangle above and that we are only provided with an ordered group
logarithmic decomposition. Let us assume we have just found a1 . We can find
each one of the other points in the smaller rectangle in time O(1).
19
1
a 1
b 1
c 1
d 1
Figure 2.8: Let us assume we are to search for all the points contained in the
smaller rectangle above and that we are only provided with an ordered group
logarithmic decomposition. Let us assume we have just found a. We cannot
find each one of the other points in the smaller rectangle in time O(1), indeed
the length of the shorter path between a and b, c or d is O(n).
level 1
level 2
level 4
a 1 2
4
1
b 2
1
2
8
c 1
1 4
2
d 1
Figure 2.9: This figure shows a proximity graph. Note that arrows of level (or
length) l connect brothers of level l. Note also that b < a.
20
i := 1
j := 1
while i < s ∧ xi+1 < yj
i := i + 1
while j < t ∧ yj+1 < xi
j := j + 1
add arrow max{xi , yj } → min{xi , yj }
while i < s ∧ j < t
// invariant: max{xi , yj } points to min{xi , yj }.
if xi+1 < yj+1
i := i + 1
add arrow xi → yj
else
j := j + 1
add arrow yj → xi
while i < s
// invariant: max{xi , yj } points to min{xi , yj }.
i := i + 1
add arrow xi → yj
while j < t
// invariant: max{xi , yj } points to min{xi , yj }.
j := j + 1
add arrow yj → xi
Algorithm 2.3.5.1: This algorithm is used in the proof of theorem 2.3.5.
but a > b because we are under the lexicographic order. That is why c and d
point to a rather than to b.
Theorem 2.3.5. Let U and V be two totally ordered sets, and let S be a subset
of U × V. The proximity graph AS of S takes space O(n log n) and can be built
in time O(n log n).
Proof. (sketch) Look again at figure 2.9. AS contains |S| points. Each point in
AS can point at most to one brother per level, therefore, because the levels are
O(log n), it can point to O(log n) brothers at most. This means that, for each
point p in AS , the set of the arrows that start from p has cardinality O(log n),
therefore we need at most O(log n) pointers per point.
HS takes space O(n log n) and, by theorem 2.2.13, can be built in time
O(n log n). We first build HS and then transform it into AS in the following
way. Let hi be the sequence of level i in HS , and hi,j the j-th sequence in
hi . For each valid level l, all the pointers representing the arrows of level l
in AS can be set by considering every pair (hl,i , hl,i+1 ) such that there exists
h2l,k that has the same elements as hl,i hl,i+1 . Let hl,i = (x1 , x2 , . . . , xs ) and
hl,i+1 = (y1 , y2 , . . . , yt ). We use algorithm 2.3.5.1. Algorithm 2.3.5.1 never
spends more than O(1) time on each pair (xi , yj ), and, because the indices i
and j are only incremented, never consider more than s + t pairs.
Since AS has O(n log n) arrows, AS can be built in time O(n log n). Note
that, in a real implementation, we can, and should, build AS from scratch.
21
Problem 2.3.6. Let U and V be two totally ordered sets, and S a subset of
U × V. Let a, b ∈ U × V, and P = {(u, v) ∈ S | av ≤ v ≤ bv }. Given a and b,
find a 2-selection T = (t1 , t2 , . . . , tm ) of HS of length at most O(log2 n), such
that the concatenation of T contains the same points as P . For each ti , find
also, if there is such a point, the biggest point p in ti such that pu ≤ bu .
Theorem 2.3.7. There exists an algorithm that solves problem 2.3.6 in time
O(log n) and space O(n log n) with a precomputation step, independent of a and
b, taking time O(n log n).
Proof. (sketch) According to theorem 2.3.5 we can build the proximity graph
AS in time O(n log n). To simplify this discussion, let us assume that AS and
HS are joined in forming a single data structure. We do not really need HS ,
but it simplifies somewhat the explanation. But we do need the sequence in the
last sequence in HS (i.e. the lexicographically ordered sequence which contains
all the points in S). Let us call it Z.
Let hi,j be the j-th sequence in the i-th sequence in HS . For each valid i and
j, let Mi,j be the biggest point in hi,j such that Mi,j u ≤ bu . If i > 1 then there
is a k such that Mi,j = Mi−1,k . Let q be the integer such that hi,j contains the
same points as hi−1,k hi−1,q . Since Mi−1,k and Mi−1,q are brothers of level i − 1
and Mi−1,k = Mi,j > Mi−1,q , it is clear that, by definition of AS , Mi−1,k points
to Mi−1,q .
If s is a sequence, let Ms be the biggest point in s such that Msu ≤ bu .
The proof of theorem 2.2.15 (which refers to the proof of theorem 2.1.13) shows
how T can be found in time O(log n) by starting from Z and, for each sequence
s, considering the subsequences L(s) and R(s). Since we can find MZ with a
single O(log n)-time search in Z, and since, for each sequence s, given Ms we
can find ML(s) and MR(s) in time O(1), we can indeed solve problem 2.3.6 in
time O(log n).
Theorem 2.3.8. Problem 2.2.4 can be solved in space O(n log n) and time
O(log n) with a precomputation step (independent of a and b) which takes time
O(n log n).
Proof. (sketch) Let a and b be the two points in problem 2.2.4. According to
theorem 2.3.7, we can solve problem 2.3.6 for the same two points a and b in
time O(log n) and space O(n log n) with a precomputation step, independent
of a and b, taking time O(n log n). Let P = {p1 , p2 , . . . , pk } be the set of the
O(log n) points we are asked to find in problem 2.3.6. All the points in the
rectangle identified by the points a and b can be found by searching for the
biggest points on the left of each one of the points in P . We can do this by
simply following the arrows in AS as we did to find the points pi in the first
instance.
There is only a little obstacle: points with the same v-coordinate do not
point to each other in AS , but we can promptly solve the problem by adding
the required O(n) arrows to AS during its construction.
Note that since S is static we can actually let the client find the other points
by itself and say that the problem can indeed be solved in time O(log n).
Remark 2.3.9. In a real implementation, we should not solve problem 2.2.4
as described in the proof of theorem 2.3.8. Let a and b be the two points in
problem 2.2.4 and let R be the set {(u, v) ∈ U × V | au ≤ u ≤ bu }. We should
22
level 1
level 2
level 4
level 8
1
2
1
4
p3 1
2
p4 1
8
1
2
1
4
p2 1
2
1
16
1
p7 p6 1
2
4
1
p8 1
2
8
p5 1
p 1
2
4
1
2
1
Figure 2.10: This figure shows a partial proximity graph. Let us assume that
we want to find all the points in the search-rectangle above. The biggest point
in the search-rectangle is p therefore our search starts exactly from it. The
important thing to note is that as soon as we reach p2 , p6 and p8 we know that
p and p5 are the only two points in the search-rectangle. There is no need to
proceed any further.
check whether the points Mi,j , defined as in the proof of theorem 2.3.7, are
included in R as we find them. If Mi,j is not in R, there is no need to follow the
arrows that start from it: the pointed points will be clearly out of R as well.
Let us consider the case depicted in figure 2.10. After an O(log n)-time
search we find the point p and then start taking advantage of AS to determine
the other points. In a real implementation, as soon as we see that p2 is not in
R, we stop following that branch and go back to p immediately and follow some
lower-level arrow. Similarly, as soon as we reach p6 and see that it is not in R,
we go back to p5 .
23
Single
2.5D Problem
a8
b4
a7
c2
a6
b3
a5
d1
a4
b2
a3
c1
a2
b1
a1
Figure 2.11: This figure shows how a logarithmic decomposition along the third
axis can be used to partition the space in such a way that problem 2.4.2 may
be solved by solving O(log n) instances of problem 2.4.1.
Problem 2.4.2. Let U, V and W be three totally ordered sets. Given a set
S = {s1 , s2 , . . . , sn } ⊂ U × V × W and a, b ∈ U × V × W, find {(u, v, w) ∈ S |
au ≤ u ≤ bu ∧ av ≤ v ≤ bv ∧ aw ≤ w ≤ bw }.
Theorem 2.4.3. Problem 2.4.2 can be solved in space O(n log2 n) and time
O(log2 n) with a precomputation step (independent of a and b) taking time
O(n log2 n).
Proof. (sketch) First of all, note that problem 2.4.1 can be solved by slight
variations of the two methods described in the previous sections. The impor-
tant thing is that the lexicographic ordering be extended to include the third
coordinate: that way, given two points, one is always smaller than the other.
This suggests that if we partition the three-dimensional space along the
third axis by using the logarithmic decomposition, we can solve problem 2.4.2,
by solving at most O(log n) instances of problem 2.4.1.
Now note also that the addition of a third dimension does not substantially
alter the structures and the algorithms used to handle the group decompositions
and the proximity graphs. Once again, the additional dimension is taken care
of by the lexicographic ordering extended to include it. A group decomposition
partition the space in O(log n) different ways (in groups of length 1, 2, 4, etc. . . ).
Each one of these O(log n) partitioned spaces have to be further partitioned
along the second axis as we do in the two-dimensional case, therefore we need
space and time O(n log2 n).
An example should help. Look at figure 2.11. First we partition the box (con-
taining all the points in S) into 8 boxes of height 1: a1 , a2 , . . . , a8 . Now, we de-
compose each one of them along the second axis as we did in the two-dimensional
case. This operation takes time O(|a1 | log n)+O(|a2 | log n)+. . .+O(|a8 | log n) =
O(n log n), where |ai | is the number of points within ai . We then partition the
24
box into 4 boxes of height 2: b1 , b2 , b3 , b4 . We decompose each one of them in
time O(|b1 | log n) + O(|b2 | log n) + O(|b3 | log n) + O(|b4 | log n) = O(n log n). We
repeat the same procedure for c1, c2 and d1 . The total time required to build a
three-dimensional version of HS or AS is therefore O(n log2 n). Note that if we
merge all these proximity graphs into a single graph, we end up with a graph
whose nodes each have O(log2 n) arrows.
Let us say we want to find all the points p such that au ≤ pu ≤ bu ∧ av ≤
pv ≤ bv ∧ aw ≤ pw ≤ bw . Let Z be the sequence contained in the last sequence
in HS . We know that Z contains the same points as S. If s is a sequence,
let Ms be the biggest point in s such that Msu ≤ bu . Let Hw be the group
decomposition of S along the third-axis. We know that there is a 2-selection
Tw = (t1 , t2 , . . . , tk ) of Hw such that the concatenation of Tw contains the same
points as the set {(u, v, w) ∈ S | aw ≤ w ≤ bw }. We first find the point MZ ,
then use the “third-axis proximity graph” to find the points mt1 , mt2 , . . . , mtk .
For each mti we can now use the “second-axis proximity graphs” to solve the k
(i.e. O(log n)) 2.5D sub-problems. We define them “2.5D” because they are 2D
problems where the points are immersed in a three-dimensional space.
Note that since S is static we can actually find the biggest points and let
the client find the other points by itself and say that the problem can indeed be
solved in time O(log2 n).
Remark 2.4.4. What we said in remark 2.3.9 applies to the multidimensional
case as well. For instance, if we are walking through the “third-axis proximity
graph” and see that the current biggest point p has the u coordinate smaller
than that of a, we can abandon p immediately without even examining the
“second-axis proximity graphs” it leads/is related to.
Problem 2.4.5. Let U1 , U2 ,. . . , Ud be totally ordered sets and let U = U1 ×
U2 × · · · × Ud , where d ≥ 2. Given a set S = {s1 , s2 , . . . , sn } ⊂ U and a, b ∈ U,
find {(x1 , x2 , . . . , xd ) ∈ S | ∀i ∈ {1, 2, . . . , d} aui ≤ xi ≤ bui }.
Theorem 2.4.6. Problem 2.4.5 can be solved in space O(n logd−1 n) and time
O(logd−1 n) with a precomputation step (independent of a and b) taking time
O(n logd−1 n). Moreover, if the restrictions are imposed only on k ≥ 2 coordi-
nates, the search takes time O(logk−1 n + d − k).
Proof. (sketch) Problem 2.4.5 is an almost straightforward generalization of
problem 2.4.2. We can prove it by induction on d by generalizing the reasoning
of the proof of theorem 2.4.3. To be precise, we should also account for the fact
that a lexicographic comparison takes time O(d) in dimension d. We can rule
out the d-factor by noting that the asymptotic estimations are dominated by
the cost of the O(logd−2 n) instances of the two-dimensional problem: instead of
keeping the points distinct by extending the lexicographic ordering as proposed
in the proof of theorem 2.4.3, we consider only the first two coordinates (exactly
as if we were to solve problem 2.2.4) and whenever two points coincide, we put
them in the same multinode. That is not different from what one have to do to
handle repeated keys with structures that does not support them natively.
If the restrictions are imposed only on k ≥ 2 coordinates, we will need
to find only k 2-selections because in the other d − k cases we will choose
immediately the group that contains all the points (in the subspace we are in
at that moment).
25
Chapter 3
Dynamic Indexing
26
(because we change the problem): if our space is the Cartesian product of metric
spaces (not necessarily bounded), then we can build the proximity graph by
connecting the points with arrows whose length is directly proportional to the
distance (along a single axis) of the points to be connected. Because the spaces
may be unbounded, we should determine a unitary length l based on the first
two points we receive and then let the other arrows be of length 2i l, for some
i ∈ Z.
Another possible solution is that of relaxing the proximity graph as much
as possible and perform some rebalancing step each time we add or delete a
point. There is a little problem here: a group of level 1 can have many points,
then many steps may be necessary to handle it. We can overcome this obsta-
cle by (conceptually) perturbing the points so that there are no points on the
same axis-aligned line. Note that that does not alter our asymptotic estima-
tions. That way the work to do to rebalance the proximity graph should be
proportional to the number of points added and deleted.
You might have noticed that something is missing here: how do we insert
and delete points? If we cannot perform these operations efficiently, then there
is no point in talking of splitting and joining groups.
Definition 3.1.1. A relaxed proximity graph is balanced if no splittings or
fusions can be performed on it.
27
q1 p q2
r1 r2 r3 r4 r5 r6 r7 r8
Figure 3.1: This figure shows a small portion of a proximity graph into which
a point p has just been inserted. Let us assume that q1 < p < q2 and that all
the arrows in the figure are of the same level. It is clear that, in this case, r4 ,
r5 and r6 must stop pointing to q1 and start pointing to p.
you can see by looking at figure 3.2, the points r1 , r2 , . . . , rn point both to q1
as to the nodes in Tq1 ,l that represent them (or directly to other elements ri
contained in the tree: it depends on the type of tree used). Figure 3.3 shows
what happens when Tq1 ,l is split. Note that, after the splitting, the points in
Tp,l still point to q1 . Because no direct pointers to q1 need to be modified, the
splitting and the rebalancing can be performed in time O(log n).
We have two ways of determining where a point points to:
(i) we can follow its direct link in time O(1) or
(ii) we can reach the root of the tree it belongs to in time O(log n) and then
see to which point the tree is associated in time O(1).
The last method always gives the correct answer, but is slower, so the former
method is always tried first. Let us say we want to determine which point r3
points to. We first follow the direct link of r3 which, in our example, lead us to
q1 . We then check if r3 is in Tq1 ,l in time O(1) (note that we just have to look
at the the last element of the tree). Because r3 is not in Tq1 ,l we start moving
toward the root of Tp,l and for each node s we visit along the path, we check
whether s points directly to the right node p. If this is the case, we are done,
otherwise we move on to the next parent. After we have determined the node
p one way or the other, we make all the nodes we have visited point directly to
p. This is called path compression and greatly speeds up successive searches.
Because we must redirect arrows of O(log n) different lengths, we need time
O(log2 n).
We note that P is compact if and only if all the direct links are correct. Since
P is balanced, the arrows that start from p are never more than O(log n) and we
can decide to which elements they must point by first searching for the biggest
point less than p in P and then determining the other points by consulting P as
usual. If the proximity graph is compact we can find all those elements in time
O(log n), otherwise we need time O(log2 n). Let b1 , b2 , . . . , bk be the O(log n)
elements to which p must point. For each i ∈ {1, 2, . . . , k}, we must insert p in
one of the trees associated to bi , therefore the total time needed is O(log2 n).
Remark 3.2.3. Of course, in a real implementation we will associate a tree Tp,l
of level l to a point p only if p is pointed by a big enough number of brothers of
level l.
28
Tq1 ,l
r1 r5 r2 r7 r3 r6 r4
q1
Figure 3.2: This figure shows the tree Tq1 ,l of level l associated to the point
q1 . This tree is used to handle all the arrows of level l the points to q1 . In the
figure, the points r1 , r2 , . . . , r7 are the points from which the arrows of level l
start.
29
Tq1 ,l Tp,l
r3 r6 r4
r1 r5 r2 r7
q1 p
Figure 3.3: This figure shows the trees of level l associated to the points q1 and
p. Before the point p was inserted in the proximity graph, the situation was
that shown in figure 3.2. Here the original tree Tq1 ,l has been split into the new
tree Tq1 ,l and the tree Tp,l so that now the points r3 , r6 and r4 points to p (see
the proof of theorem 3.2.2).
30