Documentos de Académico
Documentos de Profesional
Documentos de Cultura
1. (1 point)
Consider the following XML document:
<Vehicles>
<Car manf="Hyundai">
<Model>Azera</Model>
<HorsePower>240</HorsePower>
</Car>
<Car manf="Toyota">
<Model>Camry</Model>
<HorsePower>240</HorsePower>
</Car>
<Truck manf="Toyota">
<Model>Tundra</Model>
<HorsePower>240</HorsePower>
</Truck>
<Car manf="Hyundai">
<Model>Elantra</Model>
<HorsePower>120</HorsePower>
</Car>
<Car manf="Toyota">
<Model>Prius</Model>
<HorsePower>120</HorsePower>
</Car>
</Vehicles>
Solution:
(a) <HorsePower>240</HorsePower>,
<HorsePower>120</HorsePower>
(b) <Model>Azera</Model>,
<Model>Camry</Model>,
<Model>Tundra</Model>
(c) manf="Toyota",
manf="Toyota",
manf="Toyota"
(d) The complete document is returned once.
1
2. (1 point)
Consider an XML document containing course information for students and
that conforms to the following DTD:
<!DOCTYPE Classes [
<!ELEMENT Classes (Class*)>
<!ELEMENT Class (Topic, Students)>
<!ELEMENT Topic (#PCDATA)>
<!ELEMENT Students (Student+)>
<!ELEMENT Student EMPTY>
<!ATTLIST Student Name #REQUIRED> ]>
- Query Q1 in XPath:
/Classes/Class
[Students/Student/@Name != Students/Student/@Name]/Topic
- and query Q2 in XQuery:
for $c in /Classes/Class
for $s1 in $c/Students/Student
for $s2 in $c/Students/Student
where $s1/@Name != $s2/@Name
return $c/Topic
Solution: Q1 returns the topic of the classes in which there are two students with
a different name. Q2 does this also, but returns the topics for every pair of such
students with different names. The result of Q2 hence potentially contains many
duplicate topics.
2
3. (1 point)
Explain the following rule in the formal semantic of LiXQuery:
Solution: In the rule above we see what happens if a text node is added to
the document. Given a store St and an environment En, a text node is added
which has as text value te result of evaluation expression e. For XML store St
and environment (context) En, expression e changes the store into St1 and the
context to hsi, s being a non-empty string, and textnode r in store St2 has the
value s. St1 and St2 together form the store St3 in which the document order of
St1 remains preserved. Now, if text{e} is executed on St, En, it results in store
St3 , having textnode r with the non-empty value s as its context.
3
4. (2.5 points)
Consider a datacube with dimensions Student, Course, and Semester, and
measure Grade. There is no hierarchy defined over the dimensions. The
cube contains information for 200 students, 6 courses and 6 semesters. The
number of students that followed a particular course in a particular semester
is given in the following table (an empty cell indicates that a particular course
was not taught during that semester):
Semesters
Courses S1 S2 S3 S4 S5 S6 T otal
C1 20 35 25 15 95
C2 40 35 45 120
C3 29 30 28 17 104
C4 50 69 46 165
C5 40 35 42 117
C6 12 15 27
T otal 110 69 169 47 144 89 628
The following tables summarize how many students followed the different
courses (some students followed a course more than once), and how many
students were subscribed in the different semesters (most students are en-
rolled in more than one course per semester):
C1 C2 C3 C4 C5 C6 T otal S1 S2 S3 S4 S5 S6 T otal
# 80 105 90 150 102 27 554 # 50 30 85 20 72 40 297
Solution:
(a) These queries are just plain SQL queries; so, in their answer no null values
are introduced. There might have been some confusion with the way the null-
value is used in ROLAP. The goal of this question was to determine the sizes
of all views needed in part (b). In case of a wrong answer in part (a), part
(b) was corrected as if the numbers in the answer to (a) were the right ones;
i.e., it was perfectly possible to miss part (a) and get full grades on part (b).
4
The sizes of the views:
(b) First we start by visualizing the partial order between the different views:
The numbers : x : y after the views indicate the size x of the view and
the current cost y of computing the view, taking into account that only
(St,Se,C) is materialized. An arrow from view V to view W denotes that for
the computation of V we can use view W .
We compute the benefits of materializing the views. Initially we start with
only view (St,Se,C) materialized; this is the base table, which always needs
to be present as no other views can be used to compute it.
The benefits are:
() 1 × (628 − 1)
(St) 2 × (628 − 200)
(Se) 2 × (628 − 6)
(C) 2 × (628 − 6)
(St,Se) 4 × (628 − 297)
(St,C) 4 × (628 − 554)
(Se,C) 4 × (628 − 19)
5
Of all views, it is obvious that (Se,C) gives the highest benefit. Hence, in
the first step (Se,C) is selected. With the views in S = {(Se, C), (St, Se, C)}
materialized, the costs become:
() 1 × (19 − 1)
(St) 1 × (628 − 200)
(Se) 2 × (19 − 6)
(C) 2 × (19 − 6)
(St,Se) 2 × (628 − 297)
(St,C) 2 × (628 − 554)
Clearly, (St,Se) gives the largest benefit. The views selected by the
greedy algorithm are hence: (Se,C) and (St,Se), giving a total ben-
efit of 4 × (628 − 19) + 2 × (628 − 297). It was not necessary to completely
work out the numbers as the relative magnitude of the benefits was obvious.
6
5. (2 points)
Consider the following Datalog-program.
Solution:
(a) The program is safe; for every rule, every variable in the head occurs pos-
itively in the body and every variable in the body occurs positively in a
non-arithmetic literal.
The program is also stratified as there is no pair of intensional relations
such that the first one negatively depends on the second one and vice versa
(the program even does not contain any recursion at all).
The correctness can be seen as follows: for X we can pick any element in v,
and for Y we can always pick d.
Stratum 3 contains the intensional relation s. s is not in stratum 2 as it
depends negatively on a relation in this stratum, namely r. Using the rule
for s, we get the following instantiation:
s = {}
7
(c) Another minimal model of this Datalog-program is the following:
8
6. (2.5 points)
Consider the following database consisting of only one relation:
R1
a b
b c
c d
d e
e f
f g
g h
(c) Show that there cannot exist a relational algebra query Q that returns
the middle element a(l+1)/2 of a chain Rl = {(a1 , a2 ), (a2 , a3 ), . . . , (al−1 , al )}
for all l ≥ 3, l odd. That is, Q(R3 ) = {(a2 )}, Q(R5 ) = {(a3 )},
Q(R7 ) = {(a4 )}, . . .
Solution:
(a) a b c d e f g h
9
Visually, we can represent the different neighborhoods as follows (notice that
the special dedicated element is filled.)
N2D ((a)) ≡ • /◦ / ◦
N2D ((b)) ≡ ◦ /• / ◦ / ◦
N2D ((c)) ≡ ◦ /◦ / • / ◦ / ◦
N2D ((d)) ≡ ◦ /◦ / • / ◦ / ◦
N2D ((e)) ≡ ◦ /◦ / • / ◦ / ◦
N2D ((f )) ≡ ◦ /◦ / • / ◦ / ◦
N2D ((g)) ≡ ◦ /◦ / • / ◦
N2D ((h)) ≡ ◦ /◦ / •
(c) We will give a proof by contradiction. Suppose, for the sake of contradiction,
that there is a relational algebra query Q always returns the middle element
of a chain with an odd number of elements.
r nodes r nodes
N2D ((ar+1 ))≡ ◦ −→ . . . −→ ◦ −→ • −→ ◦ −→ . . . −→ ◦
z }| { z }| {
r nodes r nodes
D
N2 ((ar+2 )) ≡ ◦ −→ . . . −→ ◦ −→ • −→ ◦ −→ . . . −→ ◦
z }| { z }| {
10
BONUS (1 point) A safe Datalog-program is always domain independent. Is the
opposite direction also true? That is, is a domain independent query always
safe? Prove or give a counter-example.
The crux in this example is that the atoms in the head do not occur positively
in a non-arithmetic expression in body, but the expression is constructed in such
a way that it does not matter; the body will never be true anyway. As such, for
the result of the program it does not matter what the actual domains of X and Y
are. There were also constructions possible with arithmetic operators. One such
solution is:
r(X) :- plus(X,0,0).
Given the way the arithmetic operator plus is introduced (as if it is an infinite
extensional relation), and the rather stringent requirement in the safety condition
that every variable in the head must appear in a non-arithmetic positive literal
in the body, and the fact that we can reasonably assume that in the infinite
extensional relation there is only one X such that (X,0,0) is in it, the program is
not safe, but obviously domain-independent.
11