Está en la página 1de 235

Anwendersoftware (AS)

as
Anwendungssoftware

Introduction to
Database and Information Systems

Lectures: Daniela Nicklas Exercises: Nazario Cipriani


nicklas@informatik.uni-stuttgart.de cipriani@informatik.uni-stuttgart.de
Room: 2.365 Room: 2.356
Phone: +49 711 7816 217 Phone: +49 711 7816 347

Anwendersoftware (AS)

as
Anwendungssoftware
IPVS, University of Stuttgart
nicklas@ipvs.uni-stuttgart.de

Chapter 0: Why Database


Management?

1
Motivation Anwendersoftware (AS)

HT
Goals
ML
ERM Document Object Model
UML Modelling of
Data and processes L
Rational Rose XM
STE
bjects P/ E
ess O XPR
Busin
Vi

E SS
su

DB
a

Information Systems
lB

Doc O

eb
BPEL Open
as

R3 Workflow Management RB

iW
Visual Age
ic

IX
.Net A
RB E
DCOM CO OL Middleware
Component Ware
Ac OMG Standards
tiv
dio eX
Stu Transaction
OBDC
u al
Vis rise Processing
erp s JDBC Xopen/DCF
Oracle VB Ent aBean TP Monitors
Ja v
ys
Universal Storage Encina wa
CICS edo te
SQL3 Informix Tux Ga
Dynamic Server Java Universal Access
Syb
ase
Ad IBM DataJoiner SQLJ
ab
as DataLinks
DB2 V8
Object-Relational
Database Technology

Information Integrator

Motivation Anwendersoftware (AS)

Goals

• Knowledge and skills in:


ƒ Usages of information models and data models, esp.
- Entity/Relationship-Model and extensions
- Relational Model and SQL
- Network model and Hierarchical Model

ƒ Modeling of real worlds scenarios (universes of discourse)


ƒ Development of database applications
ƒ Design, management and administration of databases

• Become an expert in data management!

2
Motivation Anwendersoftware (AS)

Goals

• Requirements for Tasks:


ƒ Development of database-based applications

ƒ Usages of interactive database languages

ƒ Responsibility for various tasks in database administration, esp.


- database archiving
- database (re-) organization
- database application management

Motivation Anwendersoftware (AS)

What is a Database Management System?

• Manages very large amounts of data


• Supports efficient access to very large amounts of data
• Supports concurrent access to very large amounts of data
ƒ Example: bank and its ATM machines
• Supports secure atomic access to very large amounts of data
ƒ Contrast two people editing the same UNIX file - last to write
"wins" - with the problem if two people deduct money from the
same account via ATM machines at the same time - new
balance is wrong whichever writes last.

new amount :=
new amount := amount-$250
amount-$100 6

3
Motivation Anwendersoftware (AS)

Three Aspects to Studying DBMS´s

• Modeling and Design of Databases


ƒ Allows exploration of issues before committing to an
implementation

• Programming: queries and DB operations like update


ƒ SQL = "intergalactic dataspeak"

• DBMS Implementation

Motivation Anwendersoftware (AS)

Data Models

• Several different data models have been proposed since the


first commercial DBMSs appeared in the late 1960's. First
systems were directly based on the file system.
ƒ hierarchical model 100%

ƒ network model (CODASYL) 90%

ƒ relational model (RDBMS) 80%

70%
ƒ object-oriented models (OODBMS) 60%

ƒ object-relational model (ORDBMS) 50%

ƒ flat files 40%

30%
ƒ since ~2000: RDBMS/ORDBMS 20%
semistructured data OODBMS
10%
and XML CODASYL
Hierarchical DBMS 0%
Flat Files 1990 1995 2000

4
Motivation Anwendersoftware (AS)

Data Structures? (1)

• Well-known Data Structures:


ƒ Array, sequence
ƒ Tuple, record
ƒ List
ƒ Graph
ƒ Tree
• mostly transient data:
ƒ maintained in main memory
ƒ does only survive a single program execution

Motivation Anwendersoftware (AS)

Data Structures? (2)

• New Aspect: usage of external memory


(secondary memory, non-volatile memory)
ƒ Persistence:
Data survives end of program, end of session, operating system
uptime, ...
ƒ New kind of access: Read operations and Write operations
based on units of data blocks or units of tuples

Æ data structures and algorithms (search and sort) that are


efficient for main memory do not necessarily show efficiency
for secondary memory
ƒ Value-based access (associative access) to large data sets
ƒ Voluminous attribute values (e. g. pictures, documents)
10

5
Motivation Anwendersoftware (AS)

Data Models? (1)

• Constructors

to generate data structures and associated operations.


For example, Tables (sets of equally structured records
or tuples)

CREATE TABLE STUDENT


(MATRIKELNUMBER INTEGER,
LASTNAME VARCHAR(40),
FIRSTNAME VARCHAR(40),
BIRTHDATE CHAR(8),
.... )

11

Motivation Anwendersoftware (AS)

Data Models? (2)

• Important Role of Relationships


Some data models have special relationships between tuples
(records): hierarchies, aggregations, ...
• e.g.: students enrole in a course that is given by a lecturer

student enrol course give lecturer

• Notion of Model
ƒ Given set of language features to describe an universe of
discourse
Æprogramming languages and operating systems have a data
model as well

12

6
Motivation Anwendersoftware (AS)

Databases? (1)

• Efficient Management of Huge Amounts of Data


• Data Independence (of the applications)
ƒ Data usage without any knowledge of the technical aspects and
implementation (abstract data model table)
ƒ Ease of use and powerful operations

13

Motivation Anwendersoftware (AS)

Databases? (2)
customer
management
• Openness to New Applications
ƒ Symmetric structuring transfers
ATM
ƒ Explicit constraints
ƒ Single and neutral representation ?

ƒ System enforced integrity


• Transactions (ACID property)
ƒ Atomicity bank
ƒ Consistency data
ƒ Isolated execution
ƒ Durability

14

7
Motivation Anwendersoftware (AS)

Databases? (3)

• Fault Tolerance
ƒ Logging of redundant data during normal operations
ƒ Automatic repair (recovery) of data structures after program
failure, system failure, or media failure
- Undo of not finished transactions
- Redo of the effect of completed (committed) transactions
• Multi User Operation
ƒ Simultaneous (concurrent) access of different users to the same
data
ƒ Synchronization

15

Motivation Anwendersoftware (AS)

How many information is there1? (1)

• Several thousand PBytes² suffice to store all relevant


information in the world
• There will be enough disk and tape capacity to store
everything written, said, done, and photographed by
humankind
ƒ This is already true today for written information
ƒ In some years this will be true for the remaining amount of
information
• Computers store and manage information more effective
than humans

1 http:// www.lesk.com/mlesk/ksg97/ksg.html
21 Gigabyte (GByte) = 1,000 Megabytes = 109 Bytes
1 Terabyte (TByte) = 1,000 Gigabytes
1 Petabyte (PByte) = 1,000 Terabytes
1 Exabyte (EByte) = 1,000 Petabytes
16

8
Motivation Anwendersoftware (AS)

How many information is there? (2)

• Consequences for the Future


ƒ In a few years one will be able to retain all information
indefinitely, i. e. no information needs to be discarded
ƒ Computers will do searching, storage, and processing
automatically, without human intervention
ƒ Today, digital libraries focus on input: scanning, compression,
and OCR technology
ƒ Tomorrow, the focus will shift to retrieval: selection, search,
and quality assessment.

17

Motivation Anwendersoftware (AS)

Lecture Overview

• URI:
http://www.ipvs.uni-stuttgart.de/abteilungen/as/start/en
Æ Courses Æ Database and Information Systems
• Contents:
ƒ Chap. 1: Introduction
ƒ Chap. 2: Realization of Information Systems
ƒ Chap. 3: Information Models and Data Models (Exercise E.1)
ƒ Chap. 4: Relational Model
ƒ Chap. 5: Relational Algebra (Exercise E.2)
ƒ Chap. 6: SQL (Exercise E.3, E.4)
ƒ Chap. 7: SQL Programming
ƒ Chap. 8: Logical Database design (Exercise E.5)

18

9
Motivation Anwendersoftware (AS)

Course chapters Mapped to Book Chapter


Chapter in Script [1] Date [2] Korth &
(7th Ed, 2000) Silberschatz

Chap 1: Introduction Chap 1 Chap 1

Chap 2: Realization of Information Systems Chap 2 Chap 1

Chap 3: Information and Data Models Chap 13 Chap 2

Chap 4: Relational Model Part II Chap 3


Chap 5: Relational Algebra Chap 3
Chap 6: SQL Chap 4, 6, and 15
Chap 7: SQL Programming Part III Chap 7
Chap 8: Logical Database Design Part IV Chap 4 to 6

Recent chapters are on web page

[1] C. J. Date: An introduction to database systems, 2000


[2] Abraham Silberschatz, Henry F. Korth: Database system concepts, 2002
19

Motivation Anwendersoftware (AS)

Exercises
(Nazario Cipriani / Daniela Nicklas)
• Work can be done in groups of 1-3 students
• Grading of exercises:
ƒ you explain your solution and get bonuses for the final exam
• Tutoring:
ƒ We are present in the seminar room or computer lab
ƒ You can work there and ask questions
• Tutoring/Gradings:
ƒ 16.5.2007
ƒ 13.6.2007
ƒ 20.6.2007
ƒ 27.6.2007
ƒ 4.7.2007
ƒ 18.7.2007
20

10
Anwendersoftware (AS)

as
Anwendungssoftware

Chapter 1: Introduction

Introduction Anwendersoftware (AS)

Introduction to Database and Information


Systems
• The Concept of Information Systems
ƒ Computer-based Information System (CIS)
ƒ Universe of Discourse
ƒ Requirements for Business Information Systems
• Database System (DBS)
ƒ Properties
ƒ Example: Relational Model
• Classes of Database Applications
ƒ Information Pyramid
ƒ Transaction Processing
ƒ Web-based Application Architecture
ƒ Data Warehousing
• Classes of Databases
ƒ Non-Standard-DBS
ƒ Object-oriented DBS and Object-relational DBS
ƒ Information Retrieval Systems

1
Introduction Anwendersoftware (AS)

Computer-based information systems

Database systems (DBS): key component for CIS

application
database
system
CIS
operating
system

hardware

DBS = DB + DBMS

A database is a collection of stored data that are used by


applications.

Introduction Anwendersoftware (AS)

Database Management Systems


as Part of a CIS
• DBMS: A tool for creating and managing large amounts of
data efficiently and allowing it to persist over long periods of
time, safely. (Garcia-Molina et. al., 2002)
• Databases today are essential to every
business, in order to:
ƒ present data to customers
ƒ present data on the World-Wide-Web DBMS
ƒ support commercial processes
• Some important capabilities:
ƒ Persistent storage for large amounts of data
Database
ƒ Programming interface for users
and applications
ƒ Transaction management
DBS 4

2
Introduction Anwendersoftware (AS)

Modeling and Design of Databases (1 of 2)


R activity R'
R: Universe of discourse
modeling (miniworld)
A
I modeling I
I: Information model of the
realization miniworld
query M: Database model of the
M transaction M' miniworld (schema)
A: Mapping of all relevant
objects and relationships
• Transaction: Æabstraction step
ƒ Models the activity (business
process) of R within M.
• Integrity Constraints:
ƒ Guarantees an atomic
transition from M to M’ ƒ Assertion of best possible
Æimplemented via a match between R and M.
sequence of DB operations ƒ Ideally the database is a
ƒ Database queries refer to M perfect representation of the
or M’. miniworld. 5

Introduction Anwendersoftware (AS)

Modeling and Design of Databases (2 of 2)


R change of information need R'
R: Universe of discourse
modeling (miniworld)
A
I incremental change
I'
I: Information model of the
realization miniworld
query M: Database model of the
M schema evolution M' miniworld (schema)
A: Mapping of all relevant
objects and relationships
Æabstraction step
• Schema Evolution:
ƒ Change or new definition of types and rules.
ƒ Not every change from M to M' can be automated by the DBMS
itself
Æ stored data (objects and relationships) have to obey valid types
and rules
6

3
Introduction Anwendersoftware (AS)

Issues in Business Information Systems (1)

• Requirements on a business information system can be


distinguished according to the following three levels:
ƒ Operational level (clerks)
ƒ Middle management level (middle management)
ƒ Strategic level (board of directors)

Operational Level:
• Enhancement of processes by usage of query systems,
report systems, reservation systems, production systems and
all their applications (enterprise resource planning, ERP)
• Characteristics:
ƒ huge amounts of data
ƒ high update probability

Introduction Anwendersoftware (AS)

Issues in Business Information Systems (2)

Middle Management Level:


• Support of and partial automations of business processes:
ƒ Interactive data analysis
ƒ Automation of routine decisions
ƒ Use of mathematical and statistical methods
ƒ Characteristics:
- partially unpredictable information need
- aggregated data
- no updates

Strategic Level:
• Data provision for mostly unpredictable information needs
8

4
Introduction Anwendersoftware (AS)

Examples for Information Systems (1)


Airline Reservations Systems
Data items:
1. Reservations by a single customer on a single flight, including such
information as assigned seat or meal preference.
2. Information about flights - the airports they fly from and to, their
departure and arrival times, or the aircraft flown, for example.
3. Information about ticket prices, requirements, and availability.

Typical queries ask for flights leaving about a certain time from one given city to
another, what seats are available, and at what prices.
Typical data modifications include the booking of a flight for a customer, assigning
a seat, or indicating a meal preference.
Many agents will be accessing parts of the data at any given time.
The DBMS must allow such concurrent accesses, prevent problems such as two
agents assigning the same seat simultaneously, and protect against loss of records
if the system suddenly fails.

Introduction Anwendersoftware (AS)

Examples for Information Systems (2)


Corporate Records

Many early applications concerned corporate records, such as record of


each sale, information about accounts payable and receivable, or
information about employees ---their names, addresses, salary, benefit
options, tax status, and so on.

Queries include the printing of reports such as accounts receivable or


employees’ weekly paychecks. Each sale, purchase, bill, receipt,
employee hired fired, or promoted, and so on, results in a modification to
the database.

10

5
Introduction Anwendersoftware (AS)

Examples for Information Systems (3)


Banking Systems
• Data items:

• Typical queries:

• Typical data modifications:

• Access requirements:

11

Introduction Anwendersoftware (AS)

Examples for Information Systems (3)


Banking Systems

Data items include names and addresses of customers, accounts, loans, and their
balances, and the connection between customers and their accounts and loans
e.g., who has signature authority over which accounts. Queries for account
balances are common, but far more common are modifications representing a
single payment from or deposit to an account.

As with the airline reservation system, we expect that many tellers and customers
(through ATM machines) will be querying and modifying the bank’s data at once.
It is vital that simultaneous accesses to an account not cause the effect of an
ATM transaction to be lost. Failures cannot be tolerated. For example, once the
money has been ejected from an ATM machine, the bank must record the debit,
even if the power immediately fails. On the other hand it is not permissible for the
bank to record the debit and then not deliver the money because the power fails.
The proper way to handle this operation is far from obvious and can be regarded
as one of the significant achievements of DBMS technology.

12

6
Introduction Anwendersoftware (AS)

Examples for Information Systems (4)


What's the fundamental difference between the CIS
of a car manufacturer and a bank?

The role of CIS in banking systems:


“In banking, by contrast, the data actually is the inventory - the two are
synonymous. In increasingly many cases, the DB transaction is the
financial transaction. There are no real, tangible tokens (greenbacks)
moved as a result of the monetary transfer transaction. If the data is bad,
money is lost or created. There is no possibility of counting the money
(bits) in order to verify the status. Fiscal responsibility dictates that
creating or destroying money - even temporarily - is unacceptable.”

(Mike Burman, Bank of America)

13

Introduction Anwendersoftware (AS)

Introduction to Database and Information


Systems
• The Concept of Information Systems
ƒ Computer-based Information System (CIS)
ƒ Universe of Discourse
ƒ Requirements for Business Information Systems
• Database System (DBS)
ƒ Properties
ƒ Example: Relational Model
• Classes of Database Applications
ƒ Information Pyramid
ƒ Transaction Processing
ƒ Web-based Application Architecture
ƒ Data Warehousing
• Classes of Databases
ƒ Non-Standard-DBS
ƒ Object-oriented DBS and Object-relational DBS
ƒ Information Retrieval Systems

14

7
Introduction Anwendersoftware (AS)

Database Systems – First Characterization

• General Tasks of a DBS


ƒ Management of persistent data
ƒ Efficient access to large amounts of data (GBytes - TBytes)
ƒ Flexible multi-user
ƒ Join of objects of different types
(Ætype-spanning operations)

• Classical Data Models


ƒ relational model
ƒ network model
ƒ hierarchical model

15

Introduction Anwendersoftware (AS)

Database Systems – First Characterization


(cont.)
• Data Structures
ƒ Formatted data structures, fixed record structure
ƒ Record type, attributes and attribute values (Si/Aj/AVk) describe
objects
ƒ The description information (metadata) Aj and Si determine the
meaning of each attribute value AVk .
Schema Instance(State)
EMPLOYEE ENR NAME FUNCTION SALARY AGE

496 PEINL Manager 2100 63

497 KINZINGER Administrator 2800 25

Record type 498 MEYWEG Resercher 4500 56


(Table, Relation)

16

8
Introduction Anwendersoftware (AS)

Database Systems (2)

• Data Model / DBS Interface


ƒ Operations to define object types (description of objects)
Æ DB schema: which objects should be stored in the DB?

ƒ Operations to find and update data


Æ Application interface: how to create, update and select DB objects.

ƒ Definition of integrity constraints


Æ ensuring of quality: which DB states are acceptable?

ƒ Definition of data control (e.g., access rights)


Æ which user is allowed to invoke which operation on which object with
which parameters?

17

Introduction Anwendersoftware (AS)

Database Systems (cont.)

• The Nature of DB Languages


ƒ Depends on the data model
ƒ Formal language
ƒ Navigation-based or descriptive
ƒ Tuple- or set-oriented
ƒ Selection power: minimum Predicate Logic (1st order)

• Search Methods
ƒ Character or value comparison:
(FUNCTION = ‘ADMINISTRATOR’) AND (AGE > 60)
ƒ Exact match query:
find/select exactly all records with the specified property
ƒ Search for synonyms, fuzzy search, ...Æ no support
ƒ Natural language support, recognition of ambiguities
Æ no support in formatted DBS (see Information Retrieval Systems)

18

9
Introduction Anwendersoftware (AS)

Example
Schema Faculty FNBR FNAME DEAN
Student MATRNR SNAME FNBR BEGIN
Examination PNR MATRNR SUBJECT DATE MARK
Professor PNR NAME FNBR

Instance FACULTY FNBR FNAME DEAN

(state) F9 BUSINESS ADMIN. 4711

F5 COMPUTER SCIENCE 2223

STUDENT MATNR SNAME FNBR BEGIN


123766 COY F9 1.10.95
225332 MILLER F5 15. 4.87
654711 TAYLOR F5 15.10.94
226302 CANETTI F9 1.10.95
EXAMINATION PNR MATNR SUBJECT DATE SCORE
196481 BERNSTEIN F5 23.10.95
5678 123766 DS 22.10.98 4
130680 SMITH F9 1. 4.97
4711 123766 OS 16. 1.98 3
1234 654711 DB 17. 4.97 2
1234 123766 DB 17. 4.97 4
6780 654711 DS 19. 9.97 1
6780 196481 OS 23.12.97 3
19

Introduction Anwendersoftware (AS)

Schema
FACULTY
Example
FNBR FNAME DEAN
Q1: Find all students, who belong to F5
STUDENT MATRNR SNAME FNBR BEGIN and had started studying before
EXAMINATION PNR MATNR SUBJECT DATUM MARK
1995.
PROFESSOR PNR NAME FBNR SELECT *
Instance (state) FROM STUDENT
FACULTY FNBR FNAME DEAN WHERE FNBR=’F5’ AND BEGIN<’1.1.95’
F9 BUSINESS ADMIN. 4711
F5 COMPUTER SCIENCE 2223

STUDENT MATNR SNAME FNBR BEGIN

123 766 COY F9 1.10.95 Q2: Find all students, who belong to F5
225 332 MILLER F5 15. 4.87 and passed the examination in
654 711 TAYLOR F5 15.10.94 database and information systems
226 302 CANETTI F9 1.10.95
with 2 or better
196 481 BERNSTEIN F5 23.10.95

130 680 SMITH F9 1. 4.97


SELECT *
EXAMINATION PNR MATNR SUBJECT DATE SCORE FROM STUDENT
5678 123 766 DS 22.10.98 4 WHERE FNBR = ’F5’ AND MATNR IN
4711 123 766 OS 16. 1.98 3 (SELECT MATNR
1234 654711 DB 17. 4.97 2
FROM EXAMINATION
1234 123766 DB 17. 4.97 4
WHERE SUBJECT=’DB’
6780 654 711 DS 19. 9.97 1
AND SCORE <= ’2’)
6780 196 481 OS 23.12.97 3

20

10
Introduction Anwendersoftware (AS)

Introduction to Database and Information


Systems
• The Concept of Information Systems
ƒ Computer-based Information System (CIS)
ƒ Universe of Discourse
ƒ Requirements for Business Information Systems
• Database System (DBS)
ƒ Properties
ƒ Example: Relational Model
• Classes of Database Applications
ƒ Information Pyramid
ƒ Transaction Processing
ƒ Web-based Application Architecture
ƒ Data Warehousing
• Classes of Databases
ƒ Non-Standard-DBS
ƒ Object-oriented DBS and Object-relational DBS
ƒ Information Retrieval Systems

21

Introduction Anwendersoftware (AS)

Classes of Database Applications: Terminology

• OLTP (On-Line Transaction Processing)


• DW (Data Warehouse)
• OLAP (On-Line Analytical Processing):
Analysis of business data
ƒ ROLAP (Relational OLAP)
ƒ MOLAP (Multi-dimensional OLAP)
• DSS (Decision Support System)
• Data Mining: Search for data patterns inherent to voluminous databases
“In Data Mining applications, not only does the system define the
semantics, it actually defines the queries. The user simply says ‘Go’,
and the system produces what it believes to be useful answers.”
• KDD (Knowledge Discovery in Databases), mostly used as synonym to
Data Mining

22

11
Introduction Anwendersoftware (AS)

Information Pyramid

• Data Warehouse comprises a thematic, integrated time


varying non-volatile data set
• Separation of operational data from warehouse data
23

Introduction Anwendersoftware (AS)

Transactions

• Business Transaction
ƒ An interaction in the real world, usually between an enterprise and a
person, where something is exchanged.
• (On-line) Transaction
ƒ The execution of a program that performs some functions of a
business transaction by accessing a shared database, usually on
behalf of an online user. (P. Bernstein, E. Newcomer: Transaction Processing, 1997)
ƒ A transaction is a collection of operations on the physical and abstract
application state. (J. Gray, A. Reuter: Transaction Processing, 1993)

BOT O11 O12 O13 EOT


T1

BOT O21 O22 O23 EOT


T2

24

12
Introduction Anwendersoftware (AS)

ACID Properties

• Atomicity
ƒ A transaction's changes to the state are atomic; either all happen or none
happen. These changes include database changes, messages, and actions on
transducers.
• Consistency
ƒ A transaction is a correct transformation of the state. The actions taken as a
group do not violate any of the integrity constraints associated with the state.
This requires that the transaction be a correct program.
• Isolation
ƒ Even though transactions execute concurrently, it appears to each
transaction, T, that others either execute before T of after T, but not both.
• Durability
ƒ Once a transaction completes successfully (commits), it’s changes to the state
survive failures.

(J. Gray, A. Reuter: Transaction Processing, 1993)


25

Introduction Anwendersoftware (AS)

Transaction Processing (1)

• Three Aspects:
ƒ A transaction executes the activity in our reality or mini world
within a computing system. Such an activity typically represents
a non-trivial step (unit of work) in a business activity.

ƒ A (on-line) transaction is the execution of a program that uses


database accesses to fulfill the application functionalities.

ƒ A transaction is a non-interruptible sequence of DB operations


that transforms the given logically consistent database state to
a new logically consistent database state.

26

13
Introduction Anwendersoftware (AS)

Transaction Processing (2)

• Examples:
ƒ Money transfer
ƒ Seat reservation
ƒ Order processing
ƒ Processing telephone calls
ƒ etc.

27

Introduction Anwendersoftware (AS)

Transaction Processing (3)

• Transaction Program „Debit-Credit“:


Read message(acctno, tellerno, branchno, delta) from terminal;
BEGIN TRANSACTION
UPDATE ACCOUNT
SET balance TO balance + delta
WHERE acct_no = acctno and balance >= delta
UPDATE TELLER
SET balance TO balance +delta
WHERE teller_no = tellerno
UPDATE BRANCH
SET balance TO balance +delta
WHERE branch_no = branchno
INSERT INTO HISTORY (timestamp, values)
COMMIT TRANSACTION ;
Write message(acctno, balance, . . .) to terminal

28

14
Introduction Anwendersoftware (AS)

Transaction Systems (1)

• Overview

29

Introduction Anwendersoftware (AS)

Transaction Systems: Characteristics

• Dialog-oriented: “parameterized” user


• Few, short transaction types showing a very high repeating
frequency
• Many concurrent users
• Shared data with high actuality
• Short response time, weighted higher than utilization
• Stochastic request arrival
• High system availability

30

15
Introduction Anwendersoftware (AS)

Examples for High Performance


Requirements (1)
1. Banking and Reservation Systems
• Throughput of some 1000 TPS with a mean response time of less than 1
second.

2. Telephone Payment System


• For each telephone call a user profile has to be read from the DB and a
payment record has to be written to the DB. Sometimes more than
15.000 telephone transaction have to be processed per second, response
time should be less than 0.2 seconds.

3. Management Information System


• Complex queries run on a 500 GB database that at worst need a full scan
of the database. The DBMS should support a throughput of 5 TPS with a
response time of less than 30 seconds.

31

Introduction Anwendersoftware (AS)

Examples for High Performance


Requirements (2)
4. On-line Stock Trading Systems
• Stock broadcast
ƒ Broadcast to brokers
ƒ Use a selection profile
ƒ Brokers might buffer data locally at their PC or workstation
• Bidding service for stock trading
• Automatic Processing of ‘Deals‘
ƒ searching the DB for acceptable offers
ƒ refresh of stock values
ƒ broadcast to brokers associated with that deal

32

16
Introduction Anwendersoftware (AS)

Web-based Application Architecture

• Special case of Multi –Tier


• Based on Internet-technology, i.e. TCP/IP, HTTP
• Standard components (Web-Browser/Server, DBMS)
• Customization: Applets, Servlets, database tables
• Suitable for Internet and Intranets
Browser Web-Server Database

33

Introduction Anwendersoftware (AS)

Data Warehouse System

• Business people (marketing, inventory management, …) need to know:


ƒ What are the gross sales for an article in each country?
ƒ What are the gross sales for an article per month?
ƒ What are the gross sales for an article per quarter?
ƒ ...
• DW Application
ƒ Derivation of business reference numbers from voluminous DW
databases (Gigabyte range to Terabyte range)
ƒ Multi-dimensional organization and visualization
ƒ Business reference numbers support management decisions
ƒ Pre-aggregation is needed for performance reasons
ƒ Incremental update of the pre-aggregated data out of the operational
databases (over night time)

34

17
Introduction Anwendersoftware (AS)

Data warehouse: basic idea


SELECT
FROM supplier Query
WHERE supp_date > 040126
AND supp_name LIKE 'A%'

Reports

Enterprise
Data
Warehouse
OLAP

Data
Mining

35

Introduction Anwendersoftware (AS)

Data Warehouse System (2): Architecture

source

source mapping data


system warehouse
source

• The retail company runs several database systems that capture all the operational
data:
ƒ one point-of-sale (POS) database per department store
ƒ one supplier database per country
ƒ ...
• Broad range of data sources:
ƒ Database systems (relational, object-relational, hierarchical, …)
ƒ External information sources (other companies, surveys, …)
ƒ Files of standard applications (Excel, …)
ƒ Other documents (Word, WWW, …)

36

18
Introduction Anwendersoftware (AS)

Data Warehouse System (3)


Data mining is the process of discovering hidden, previously unknown
and usable information from a large amount of data. The data is
analyzed without any expectation on the result. Data mining delivers
knowledge that can be used for a better understanding of the data.
[ISO/IEC JTC1/SC32 WG4 SQL/MM Part 6 WD, 2000]

• Data Mining Techniques:


ƒ Association Rule Discovery
ƒ Classification
{beer, nappies} Æ {potato chips}
support = 0.04
confidence = 0.81 ƒ Regression

revenue revenue
ƒ Clustering

age
age
#children 37

Introduction Anwendersoftware (AS)

Introduction to Database and Information


Systems
• The Concept of Information Systems
ƒ Computer-based Information System (CIS)
ƒ Universe of Discourse
ƒ Requirements for Business Information Systems
• Database System (DBS)
ƒ Properties
ƒ Example: Relational Model
• Classes of Database Applications
ƒ Information Pyramid
ƒ Transaction Processing
ƒ Web-based Application Architecture
ƒ Data Warehousing
• Classes of Databases
ƒ Non-Standard-DBS
ƒ Object-oriented DBS and Object-relational DBS
ƒ Information Retrieval Systems

38

19
Introduction Anwendersoftware (AS)

Variations in Object Representation

Business Applications
• Each object is represented by exactly one tuple instance that
contains all descriptive attributes

39

Introduction Anwendersoftware (AS)

Variations in Object Representation (cont.)

CAD Application
• A complex object “product (under design)“ consists (or is
assembled) of simpler (complex) objects, each perhaps
showing a different type.

40

20
Introduction Anwendersoftware (AS)

Example: Urban Information Systems

41

Introduction Anwendersoftware (AS)

Classification of DBMS Market (1 of 2)

• Simple Data, Simple Queries


ƒ In the future there might be a
query capability, e.g. SQL
available. CAD CAD, SE
multimedia

• Simple Data, Complex Queries complex

ƒ RDBS: scalable, robust, object-oriented object-relational


access based on content and DBS DBS
DATA
structure. file relational
ƒ Limited support for complex systems DBS

objects and BLOB data. simple


QUERIES complex

ƒ No indexing, manipulation and sequential office appl.


value-based searching of processing OLAP, data
BLOBs in RDBS. warehouse

42

21
Introduction Anwendersoftware (AS)

Classification of DBMS Market (2 of 2)

• Complex Data, Simple Queries


ƒ OODBS provides persistent complex
objects, which are manipulated by
C++, SmallTalk, ... . CAD CAD, SE
ƒ Limited scalability w.r.t.: multimedia
- huge amounts of data
complex
- huge numbers of users.
• Complex Data, Complex Queries object-oriented object-relational
DBS DBS
ƒ ORDBS are capable to manage DATA
complex data as objects. file relational
systems DBS
ƒ User defined functions are
available to manipulate the simple
QUERIES complex
data within the server.
sequential office appl.
ƒ Data types and functions are processing OLAP, data
extensible. warehouse

43

Introduction Anwendersoftware (AS)

Example of an Object-Relational Data Model


(1)
• Integrated Content Search and Management
ƒ SQL supports the management of both conventional and non-
conventional data
ƒ A query might refer to all kinds of data at once
ƒ User-defined data types and functions can be employed
• Query Example

44

22
Introduction Anwendersoftware (AS)

Information Retrieval Systems (IRS) (1)

• Tasks / Properties
ƒ Management of documents, books, abstracts etc.
ƒ Efficient search in voluminous datasets
ƒ Typically only retrieval in multi-user environments

• Data Structures
ƒ Unformatted (perhaps semi-formatted text data)
ƒ Ambiguity: synonym and homonym problem
ƒ Usage of a so-called thesaurus (complex organized dictionary):
consists of relationships among (technical) terms in the
dictionary used for document indexing

45

Introduction Anwendersoftware (AS)

Information Retrieval Systems (IRS) (2)

• Interface to IRS
ƒ No formal data model
ƒ Retrieval language only
Æ close to natural language
• Search
ƒ Queries are mostly fuzzy
ƒ Nearest neighbor, best match, pattern matching, ...
ƒ Result assessment via precision and recall
• Usages
ƒ Libraries
ƒ Literature inquiries (perhaps via Internet)
ƒ Information services (chemical sciences, etc.)
ƒ Patent information systems
46

23
Introduction Anwendersoftware (AS)

The Big Picture

47

Introduction Anwendersoftware (AS)

Summary

• Basic Components of Computer-based Information Systems


ƒ Database systems
ƒ Transaction Processing system (TP-Monitor, DC-System)
• DBS Technology
ƒ Management of persistent data
ƒ Efficient access
ƒ Flexible multi-user mode
ƒ Concepts:
- Data Model and DB Language
- Transaction Processing
• All critical execution steps in a CIS are processed by
transactions showing ACID property

48

24
Anwendersoftware (AS)

as
Anwendungssoftware

Chapter 2
Realization of Information Systems

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

2. Realization of Information Systems


• Shortcomings (Disadvantages, Limitations) of File Systems

• DBS Requirements
ƒ Operational Data Control
ƒ System Enforced Integrity
ƒ Ease of Use
ƒ High Degree of Data Independence
ƒ Efficiency

• DBS Architecture
ƒ Historical Evolution
ƒ Five-layer Model
ƒ Three-Schema Architecture (ANSI/SPARC Architecture)
ƒ Dynamic System Behavior
ƒ Application Programming Interfaces

1
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

The Use of Files to Realize Applications

• Simple Operations on File Systems


ƒ Management: create/drop,
redudant data
open/close
ƒ Access: read, write
file 1 file 2 file 3
• Different kinds of File Organizations
ƒ Structured files
- Directly accessed files (entry-
sequenced, relative)
- Associatively accessed files (key-
T1
sequenced, hash)
P1 P2
ƒ Unstructured files (byte stream)
• Concurrency Control?
ƒ Need for communication on
updates

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Short exercise: ACID in file systems

• How would you implement the ACID properties using a file


system?
ƒ A

ƒ C

ƒ I

ƒ D

2
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Shortcomings of File Systems (1)

• Redundant data and inconsistency


ƒ data integrity problems
ƒ different files with different formats
ƒ duplicated data in different files Æ inconsistency
• Difficulty in accessing data
ƒ hard to meet requirements that are not anticipated
ƒ file systems do not offer convenient data retrieving
• Integrity problems
ƒ no support for checking and ensuring consistency constraints
• Atomicity problems
ƒ no support for ensuring atomicity of changes
• Concurrent-access problems
ƒ read-write, write-write conflicts
5

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Shortcomings of File Systems (2)

• Security problems
ƒ not everybody should be able to access everything
• Lack of Isolation between Data Structure and Program
Structure
• Need to Solve Same Tasks in All Application Programs
ƒ Storage management
ƒ Data management
ƒ Update service (change service)
ƒ Retrieval
ƒ Security and access control
• Assumption:
ƒ Everything remains stable!
ƒ Nothing can go wrong!
6

3
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Data Independence

beginning of application program


data processing file structure

data management
application program
system as part of
file structure
operation systems

network DBS, which application program


provides procedural database
DML

application program
the „ideal“ DBS, that
database
provides application-
oriented interfaces

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Historical Evolution of DBMS (1)

4
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Historical Evolution of DBMS (2)

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Historical Evolution of DBMS (3)

10

5
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Database System Requirements (1)


1. Operational Data Control
ƒ Sharing of data
ƒ Eliminating redundancies
ƒ Enforcing of standards for data storage and management
Îresponsibility of database administrator (DBA)

2. System Enforced Integrity


ƒ Restricting unauthorized access (data protection, data control)
ƒ Enforcing logical integrity constraints
(= ensure that the data reflects the true state of affairs)
ƒ Enforcing physical integrity constraints
(= protect against the loss of data from media faiure)
ƒ Need for running a controlled multi-user mode
3. Ease of Use
ƒ Simple data models
ƒ Provision of languages, which are easy to learn
ƒ Logical view to the data
ƒ Various classes of users each showing different capabilities

11

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Database System Requirements (2)


4. High Degree of Data Independence
ƒ Conventional application programs (AP) accessing files
ƒ Data dependent applications are highly undesirable
- Application programs (AP) accessing files conventionally use knowledge on the
data organization as well as access properties thereof
Æ maximize isolation between APs and data
ƒ Multiple Kinds of Data Independence
- Device independence
- Independence of page mappings
- Independence of storage structures
- Access path independence
- Data structure independence

12

6
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Data Abstraction

• View level
ƒ parts of database view level
ƒ simplier structures view 1 view 2 view 3

ƒ application dependent
• Logical level
logical
ƒ what data are stored in the database level
ƒ relationships between those data
ƒ used by database administrator
• Physical level physical
ƒ how the data is actually stored level

13

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

A sample Application and its logical data structure

EMPLOYEE ( ENUM ENAME ADDRESS DNUM )

406 COY DARMSTADT K55

123 TAYLOR DARMSTADT K51

829 SCHMITH FRANKFURT K55

574 CANETTI NEU-ISENBURG K51

DEPARTMENT ( DNUM DNAME DLOCATION )

K51 PLANNING DARMSTADT

K55 SALES FRANKFURT

14

7
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

...its logical access paths


EMPLOYEE ( ENUM ENAME ADDRESS DNUM )

406 COY DARMSTADT K55

123 TAYLOR DARMSTADT K51

829 SCHMITH FRANKFURT K55

574 CANETTI NEU-ISENBURG K51

DARMSTADT

FRANKFURT

NEU-ISENBURG

DEPARTMENT ( DNUM DNAME DLOCATION )

K51 PLANNING DARMSTADT

K55 SALES FRANKFURT

Logical access paths: 1. OWNER – MEMBER


2. Sort order ENUM ASC
3. Search (inverting ADDRESS)

15

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

...its storage structures


EMPLOYEE ( ENUM ENAME ADDRESS DNUM )

ISN
406 3 COY 9 DARMSTADT K55
5561
f3 v . char v . char f3
num char

123 6 TAYLOR 9 DARMSTADT K51

829 6 SCHMITH 9 FRANKFURT K55

574 4 CANETTI 12 NEU-ISENBURG K51

CHAIN POINTER-ARRAY

DEPARTMENT ( DNUM DNAME DLOCATION )

K51 8 PLANNING 9 DARMSTADT


f3 v . char v . char
char

K55 8 SALES 9 FRANKFURT

Storage structures: 1. Formats


2. Datatypes
3. Implementation methods
16

8
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

...its Addressing Scheme and Properties of Devices


EMPLOYE ( ENUM ENAME ADDRESS DNUM )
MD1, C6, S3
ISN
406 3 COY 9 DARMSTADT K55
5561
f3 v . char v . char f3
num char

123 6 TAYLOR 9 DARMSTADT K51

829 6 SCHMITH 9 FRANKFURT K55

574 4 CANETTI 1 NEU-ISENBURG K51


2

MD2, C127, S1
CHAIN POINTER-ARRAY

DEPARTMENT ( DNUM DNAME DLOCATION )

K51 7 PLANNING 9 DARMSTADT


f3 v . char v . char
char

K55 8 SALES 9 FRANKFURT

MD2, C17, S13

Memory mapping structures: 1. Physical size/capacity of blocks


2. Spanned record facility
Properties of devices: 1. Properties of storage media (disk, tape, …)
2. Mapping to magnetic storage
17

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Database System Requirements (3)


5. Efficiency of Applications
• Solutions for Conflicting Requirements
ƒ DBA is responsible for global optimization
ƒ Disadvantages are possible for single applications

• Efficient Access
ƒ A DBS problem, no application problem
ƒ Query optimization
ƒ DBA determines access paths /ideally automatically through DBMS

• Performance Trade-off
ƒ Tight coupling Î fast access
ƒ But low isolation and stability of programs
Î high maintenance costs on update
ƒ Therefore: less tight coupling of programs to data
Î slower access, if not optimized the right way!

18

9
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

2. Realization of Information Systems


• Shortcomings (Disadvantages, Limitations) of File Systems

• DBS Requirements
ƒ Operational Data Control
ƒ System Enforced Integrity
ƒ Ease of Use
ƒ High Degree of Data Independence
ƒ Efficiency

• DBS Architecture
ƒ Historical Evolution
ƒ Five-layer Model
ƒ Three-Schema Architecture (ANSI/SPARC Architecture)
ƒ Dynamic System Behavior
ƒ Application Programming Interfaces

19

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

DBMS Structure

• Historical Evolution of DBS

ORDBS

20

10
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

A Layered System Model (1)

• Goal: a data independent DBS architecture

• How many layers are needed and useful?


ƒ There is no common theory, that defines, how to build large
software systems.

• Concepts, that are recommended:


ƒ Information hiding
ƒ Hierarchical structuring (abstraction hierarchy)

21

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

A Layered System Model (2)

• Principle of a Layered Abstraction

• “Uses”-Relation:
A uses B, if A calls B and the correct and successful execution of B is necessary
to execute A completely.

22

11
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

A Layered System Model (3)

• Advantages of a hierarchically structured system, that


corresponds to the “Uses”-Relation

ƒ The usage of lower-level system components simplifies the


implementation of higher-level system components.
ƒ Lower-level system components are independent of changes of
higher-level system components.
ƒ The functionality of a system component is independent of
higher-level system components
ƒ Testing of lower-level system components separately from the
higher ones is possible

23

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

A Layered System Model (4)

• Every hierarchical level can be seen as an abstract or virtual


machine
ƒ Programs of a level are implemented by using operations of
lower levels (i.e., the abstract machine of layer i+1 is
implemented by using the abstract machine of layer i)

• Abstraction Hierarchy
ƒ Hide some properties of an abstract machine from higher-layer
machines
ƒ The implementation of higher-level operations extends the
functionality of an abstract machine

• Lower-layer system components are as usable as hardware


24

12
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Static Model of a Database System

25

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

The Mapping Hierarchy of a DBMS

26

13
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Component Views of a Database


characterization of single system component of the
interfaces based on typical database system design levels
objects and operators mapping transactional
hierarchy issues
set-oriented interface:
tables, languages compilation, integrity control logical data
views, as SQL or access path structure
tuples XQuery optimization access control

tuple-oriented interface:
external FIND NEXT data dictionary transaction logical
records, index <name record> currency concept management access
and STORE sorting paths
set structures <name record> component

internal record interface:


int. records, store record, record manager lock manager storage
trees, hash insert value/item (and data)
structures, in B*Tree access path log/recovery structure
address chains manager component
27

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Component Views of a Database (cont)


characterization of single
system component of the
interfaces based on typical design levels
database system
objects and operators

database buffer interface


segments, get page i, buffer manager page
pages free page i (+ mapping concepts for updates) mapping
structures

file system interface


files, read block j, memory
external storage manager
blocks write block j mapping
structures

device interface:
cylinders, channel hardware
external storage
sectors, programs
media
tracks

28

14
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Three-Schema Architecture

• ANSI/SPARC Architecture

29

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Three-Schema Architecture (2)

• Conceptional Schema (logical system view):


ƒ Describes the structure of the whole database for a community
of users
ƒ Data Definition Language (DDL)

• External Schema (user’s view):


ƒ Describes the part of the database, that a particular user group
is interested in

• Internal Schema (physical system view):


ƒ Determines the physical storage structure of the database (e.g.,
formats of physical records, access path)
ƒ Storage Structure Language (SSL)
30

15
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

A Simplified Example of a Data Definition in


the Three-Schema Architecture

• Creating Views Via the


External Schema

ƒ Adaptation of data types to


the host language
(DBS is “multi-lingual”)

ƒ Access protection: isolation


of attributes, tables, ...

ƒ Reduction of complexity:
only the data of interest is
visible to the application
program 31

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

ANSI/SPARC-Model
• Description Levels for Database Applications

32

16
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Layered Model – Runtime Aspects

33

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Execution of a Query – Dynamic Behaviour


(1)

34

17
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Execution of a Query – Dynamic Behaviour


(2)
Sample Schema:
Conceptual Schema: EMP (ENO, NAME, FUNCTION, ADR, DNR, SALARY)
I5... CHAR (50)...
External Schema: EMP‘ (ENO, FUNCTION, SALARY, DNR)
PIC 9(5), PIC A(25),...

Internal Execution Steps:


1.) SELECT * FROM EMP`
WHERE ENO = ‘12345’

2.) Complete the information using the conceptional and internal schema;
Locate pages# (e.g., with hashing): P789

3.) Database Buffer Access: succesful (goto 7) or

35

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Execution of a Query – Dynamic Behaviour


(3)
4.) Database access using buffer manager /operation system
5.) Execute I/O-request
6.) Store page in database buffer
12345 Smith PhD KL K55 50000

P789 ...

7.) Transfer to UWA 12345 PhD 50000 K55

8.) Status information: return code, cursor information


9.) Manipulation with programming language statements

36

18
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Comparison of the Layered Models


Set-oriented
database interface

Tuple-oriented Access path


database interface independent data
model
Internal record Access path- Access-path-based
interface oriented data model

Database buffer Record manager Record manager Record manager


interface access path access path access path manager
manager manager
File system Database buffer Database buffer Database buffer
interface manager manager manager

Device interface External storage External storage External storage


manager manager manager

Relational Architecture of Architecture of


database system hierarchical and database systems,
architecture network database that provide an
systems access method-
based programming
interface
37

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Comparison of Database Languages


access
navigation-
requirements set oriented methods-
based
based

Language level high medium low

1 (or multiple 1 (or multiple


Number of DB
1 per visited per visited
requests
record) record)

Records found set of records 1 record 1 record

partly in DB
in database
Joining Data language, partly in host language
language
in host language
38

19
Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Summary

• DBS Characteristics
ƒ Centralized control of operational data (role of DBA)
ƒ Centralized control of data integrity and multi-user capabilities
ƒ Support of adequate interfaces (data model and database
language)
ƒ High degree of data independence
ƒ Efficiency!!!

• Three-schema Architecture
ƒ External schema
ƒ Conceptional schema
ƒ Internal schema

39

Chapter 2: Realization of Database Systems Anwendersoftware (AS)

Summary

• Description Model of a DBMS


ƒ Static layered model
ƒ Dynamic behavior during the processing of a DB query

• Programing Interfaces (database languages)


ƒ Set-oriented database interface Æ relational DBMS
ƒ Tuple-oriented database interface Æ hierarchical/network
DBMS
ƒ Access-path-oriented programming interface Æ “DMS”

40

20
Anwendersoftware (AS)

as
Anwendungssoftware

Chapter 3: Information Models and


Data Models

Information Modeling Anwendersoftware (AS)

3. Information Models and Data Models

• Introduction
• Entity-Relationship Model (ERM)
ƒ Concepts, Definitions, Notations
ƒ Relationship Set, Entity Set
ƒ Diagrammatic Representation, Graphical Notation
ƒ Examples
• Extended ERM Features, Enhanced-ER Model Concepts
ƒ Mapping Constraints
ƒ Abstraction Concepts
- Generalization/Specialization
- Aggregation
- Association
- Integrated View

1
Information Modeling Anwendersoftware (AS)

Introduction to Information Models


and Data Models
• GOAL: Design of Data Models / Database Design
ƒ model of an application-oriented part of the real world
(mini world, universe of discourse)
ƒ the information is stored as data
ƒ Specification of functional and processing requirements
ƒ design and implement the application programs as database
transactions to realize the application processes
R activity R'

modeling

A
I modeling I
realization

query

M transaction M'

Information Modeling Anwendersoftware (AS)

Introduction to Information Models


and Data Models (cont.)
• Additional Requirements:
ƒ Accurate mapping/model, formal non-ambiguous specification of the
information
ƒ Up-to-date information
ƒ Simple, understandable, natural representation, easy-to-understand
structuring of the information, etc.
• Intermediate Goal:
ƒ Collect information during system analysis (information needs!)
ƒ Information model (generally: system model)
• Components / Elements / Constituents:
ƒ Objects: entities
ƒ Relations: relationships

2
Information Modeling Anwendersoftware (AS)

Introduction to Information Models


and Data Models (cont.)
• Stepwise Derivation: (different views)
1. Information in mind
2. Information structure: form of information organization
3. Access path independent data structure (what)
4. Access path dependent data structure (what and how)

• A database stores information about some part of the real world („mini
world“/Universe of Discourse (UoD)):

Information Modeling Anwendersoftware (AS)

Information Models

• Information Model

Î some kind of formal language to describe information


(representation of elements + rules)
• Provide information about objects and relationships only if they are
ƒ Distinguishable and identifiable
ƒ Relevant
ƒ Selectively describable

3
Information Modeling Anwendersoftware (AS)

Database Design Process

information semantical logical data database


collection data modeling modeling installation

analysis of rough precise


meaning modeling modeling
conceptual time
schema

• interview • ERM
• hierarchical
• noun analysis • NIAM
• network-like
• brainstorming • EXPRESS-G
• relational • DB2
• document analysis • IDEF1X
• object-oriented • Informix
• STEP
• ... • ORACLE
• ... • UML
• XML • Postgres
• ...
• mySQL
DBMS independent DBMS dependent • ...
conceptional DB logical physical
schema design schema design schema design

Information Modeling Anwendersoftware (AS)

Database Design Languages

• ERM (Entity-Relationship Model):


ƒ Generally applicable modeling approach
• STEP (STandard for the Exchange of Product Definition Data):
ƒ Modeling, accessing, exchanging data, that defines products during
their whole product life cycle
• UML (Unified Modeling Language):
ƒ Diagrammatic notation and language to support object-oriented
software development
• XML (Extended Markup Language):
ƒ Data (and document) modeling language, e.g. XML Schema

4
Information Modeling Anwendersoftware (AS)

ERM Overview (1)

• Concepts
Chen, P. P.-S.: The Entity-Relationship Model
ƒ Entity sets — Toward a Unified View of Data,
in: ACM TODS 1:1, March 1976, pp. 9-36.
ƒ Relationship sets
ƒ Attributes
ƒ Value sets (domains)
ƒ Primary keys
• Classification of Relationship Types
ƒ User defined relationships
ƒ Mapping type / mapping cardinalities
- 1 : 1 (one-to-one)
- n : 1 (many-to-one)
- n : m (many-to-many)

Information Modeling Anwendersoftware (AS)

ERM Overview (2)

• GOAL:
ƒ Determination of semantic aspects
ƒ Explicit definition of structural integrity constraints

• Information (e.g., in a database) can be modeled as:


ƒ Collection of entities
ƒ Relationships among entities

5
Information Modeling Anwendersoftware (AS)

Concept 1: Entities and Entity Sets (set of objects)

• Entity: "A thing that has real or individual existence in reality or in mind“
(Webster)

• An entity is an object that exists and is distinguishable from other


objects.
Example: specific person, company, event, plant

• An entity set is a set of entities of the same type that share the same
properties.
Example: set of all persons, companies, trees, holidays

• A predicate decides on the membership: ei ∈ Ej ⇔ is-Ej (ei)

• DB contains a finite number of entity sets


ƒ E1, E2, ..., En ; not necessarily disjoint
ƒ Example: E1 ... persons, E2 ... customers: E2 ⊆ E1

Information Modeling Anwendersoftware (AS)

Concept 2: Relationships and Relationship Sets (1)

• A relationship is an association among several entities


ƒ Example:
Hayes depositor A-102
customer entity relationship set account entity

• A relationship set R is a mathematical relation among n>=2 entities


(typically n=2 or n=3), each taken from entity sets

{(e1, e2, ..., en ) | e1 ∈ E1, e2 ∈ E2, … en ∈ En} = R


where (e1, e2, ..., en ) is a relationship (i.e., R ⊆ E1 × E2 × ... × En)

ƒ Example: (Hayes, A-102) ⊆ depositor

6
Information Modeling Anwendersoftware (AS)

Concept 2: Relationship Sets (2)

• The entity sets that participate in a relationship set Ri do not have to be disjoint

MARRIAGE

• The role name (r n) signifies the role that a participating entity from the entity set
(entity type) plays in each relationship instance (order!)

rn1/e1, rn2/e2

• Degree of a Relationship Set


ƒ Refers to number of entity sets that participate in a relationship set.
ƒ Relationship sets that involve two entity sets are binary (or degree two).
Generally, most relationship sets in a database system are binary.
ƒ Relationship sets may involve more than two entity sets. The entity sets
customer, loan, and branch may be linked by the ternary (degree three)
relationship set CLB.

Information Modeling Anwendersoftware (AS)

Concept 3, 4: Attributes, Value Sets (1)


• An entity is represented by a set of attributes, that is, descriptive properties
possessed by all members of an entity set
Example:
ƒ customer = (customer-name, social-security, customer-street, customer-city)
ƒ account = (account-number, balance)

• Mathematically: an attribute A of entity type E whose value set is W can be


defined as a function from E to the (power) set of W or the Cartesian product of
the (power) sets Wi:
ƒ A : E → W respectively W1 × W2 × ... × Wk
or (... of relationship type R ...)
ƒ A : R → W respectively W1 × W2 × ... × Wm

• Domain (value set):


ƒ the set of permitted values
ƒ Value sets Wi must not necessarily be different
ƒ Relationship sets can have attributes, too
- WORKING_TIME_PORTION
- LAST_DEPOSIT

7
Information Modeling Anwendersoftware (AS)

Concepts 3, 4: Atributes, Value Sets (2)

• Attribute Types
ƒ Simple and composite attributes:
CNBR ADDRESS
ƒ Single-valued and multi-valued attributes:
COLOR_OF_CAR
ƒ Null attributes: the value of an attribute is not applicable or it is
unknown
- i.e., it exists but is missing or it is not known whether the attribute value
exists
- e.g., private phone number) A(e) = {}
ƒ Derived attributes: in some cases two or more attribute values are
related
AGE and BIRTHDATE

Information Modeling Anwendersoftware (AS)

Information about Entities in an Entity Set

8
Information Modeling Anwendersoftware (AS)

Information about Relationships


in a Relationship Set

Information Modeling Anwendersoftware (AS)

Concept 5: Primary Key (entity key) (1)

• Information about the entities expressed only by means of attribute


values

• The values of the primary key attribute(s) can be used to identify each
entity uniquely
ƒ (1:1) - one-to-one relationship
ƒ Sometimes it has to be created artificially (serial number)

• A super key of an entity set is a set of one or more attributes whose


values uniquely determine each entity

• A candidate key of an entity set is a minimal super key


ƒ Social-security is candidate key of customer
ƒ Account number is candidate key of account

9
Information Modeling Anwendersoftware (AS)

Concept 5: Primary Key (entity key) (2)

• Let {A1, A2, ..., Am} = A be a set of attributes for the entity set E

K ⊆ A is called candidate key of E


⇔ K minimal; ei , ej ∈ E ;
ei ≠ ej → K(ei) ≠ K(ej)

• Although several candidate keys may exist, one of the candidate keys is
selected to be primary key

• The combination of primary keys of the participating entity sets forms a


candidate key of a relationship set
ƒ Must consider the mapping cardinality and the semantics of the
relationship set when selecting the primary key
ƒ (social-security, account-number) is the primary key of depositor

Information Modeling Anwendersoftware (AS)

Mappings using Primary Keys (1)

one to one mapping between an entity set Ei and the


value set of a primary key attribute

E1 (employee) W1 (employee number)

F1 (ENR)
e1 0007
e2 4711
e3 0042

10
Information Modeling Anwendersoftware (AS)

Mappings using Primary Keys (2)

substitute the value set of the primary key


for the entity set Ei

Information Modeling Anwendersoftware (AS)

Representing the Miniworld by Values in


(Regular) Relations /Tables

11
Information Modeling Anwendersoftware (AS)

Representing the Miniworld by Values in


(Regular) Relations /Tables

Information Modeling Anwendersoftware (AS)

Representing the Miniworld by values in


(weak) Entity Relations

12
Information Modeling Anwendersoftware (AS)

Representing the Miniworld by values in


(weak) Entity Relations
• An entity set that does not have a primary key is referred to as a weak
entity set.
• The existence of a weak entity set depends on the existence of a strong
entity set; it must relate to the strong set via a one-to-many relationship
set.
• The discriminator (or partial key) of a weak entity set is the set of
attributes that distinguishes among all the entities of a weak entity set.
• The primary key of a weak entity set is formed by the primary key of the
strong entity set on which the weak entity set is existence dependent,
plus the weak entity set’s discriminator.

Information Modeling Anwendersoftware (AS)

Design Issues

• Use of entity sets vs. attributes


ƒ Choice mainly depends on the structure of the enterprise being
modeled, and on the semantics associated with the attribute in
question
• Use of entity sets vs. relationship sets
ƒ Each relation could also be an entity – there's no general criteria!
ƒ Possible guideline is to designate a relationship set to describe an
action that occurs between entities
• Binary versus n-ary relationship sets
ƒ Although it is possible to replace a nonbinary ( n-ary, for n>2)
relationship set by a number of distinct binary relationship sets, a n-
ary relationship set shows more clearly that several entities participate
in a single relationship.

13
Information Modeling Anwendersoftware (AS)

Mapping Cardinalities

• Express the number of entities to which another entity can be associated via a
relationship set.
• Most useful in describing binary relationship sets.
• For a binary relationship set the mapping cardinality must be one of the following
types;
ƒ One-to-one
ƒ One-to-many
ƒ Many-to-one
ƒ Many-to-many
• We distinguish among these types by drawing either a directed line (―>),
signifying “one”, or an undirected line (―), signifying “many”, between the
relationship set and the entity set (optional!).

Child Mother Student Course

Information Modeling Anwendersoftware (AS)

ERM Schema – Example


DECLARE VALUE-SETS REPRESENTATION ALLOWABLE-VALUES
EMPLOYEE-NBR INTEGER(5) (1,10000)
FIRST NAME CHARACTER(15) ALL
LAST NAME CHARACTER(25) ALL
OCCUPATION CHARACTER(25) ALL
PROJECT-NBR INTEGER(3) (1,5000)
NO_OF_YEARS INTEGER(3) (0,100)
LOCATION CHARACTER(15) ALL
PERCENTAGE FIXED(5.2) (0,100.00)
NO_OF_MONTHS INTEGER(3) (0,100)

DECLARE REGULAR ENTITY RELATION EMPLOYEE


ATTRIBUTES/VALUE SET:
ENR/EMPLOYEE-NBR
NAME/(FIRST NAME,LAST NAME)
STAGE-NAME/(FIRST NAME, LAST NAME)
PROFESSION/OCCUPATION
AGE/NO_OF_YEARS
PRIMARY KEY:
ENR

14
Information Modeling Anwendersoftware (AS)

ERM Schema – Example (cont.)


DECLARE REGULAR ENTITY RELATION PROJECT
ATTRIBUTES/VALUE SET:
PRO-NBR/PROJECT-NBR
PRO-LOCATION/LOCATION
PRIMARY KEY:
PRO-NBR

DECLARE RELATIONSHIP RELATION WORKING_ON_PROJECT


ROLE/ENTITY RELATION.PK/MAX-NO-OF-ENTITIES
STAFF/EMPLOYEE.PK/n
PROJECT /PROJECT.PK/m
ATTRIBUTE/VALUE SET:
WORKING_TIME_PORTION/PERCENTAGE
PERIOD/NO_OF_MONTHS

Information Modeling Anwendersoftware (AS)

ERM Schema – Example (cont.)


DECLARE RELATIONSHIP RELATION EMPLOYEE-RELATIVES
ROLE/ENTITY RELATION.PK/MAX-NO-OF-ENTITIES
LIABLE_TO_MAINTAIN/EMPLOYEE.PK/1
CHILD/ CHILDREN.PK/n
EXISTENCE OF CHILD DEPENDS ON
EXISTENCE OF LIABLE_TO_MAINTAIN

DECLARE WEAK ENTITY RELATION CHILDREN ATTRIBUTES/VALUE SET:


NAME/FIRST NAME
AGE/NO_OF_YEARS
PRIMARY KEY:
NAME
EMPLOYEE.PK THROUGH EMPLOYEE-RELATIVES

15
Information Modeling Anwendersoftware (AS)

3. Information Models and Data Models

• Introduction
• Entity-Relationship Model (ERM)
ƒ Concepts, Definitions, Notations
ƒ Relationship Set, Entity Set
ƒ Diagrammatic Representation, Graphical Notation
ƒ Examples
• Extended ERM Features, Enhanced-ER Model Concepts
ƒ Mapping Constraints
ƒ Abstraction Concepts
- Generalization/Specialization
- Aggregation
- Association
- Integrated View

Information Modeling Anwendersoftware (AS)

ERM Diagrammatic Notation


entity set:
Name

weak entity set: Name

relationship set: Name

attribute: Name single- Name multi-


valued valued

mapping cardinality
1 N
one-to-many: E1 R E2

16
Information Modeling Anwendersoftware (AS)

Examples of ER Diagrams
entity sets relationship sets entity sets

staff project
emp works_on project
N M

wife
person 1
1 marriage
husband

M works_on N

emp project
project
1 manager N

emp relatives children


M N

Entity-Relationship-Model
Information Modeling Anwendersoftware (AS)

Nested Attributes

Name City

ISBN Author Title Publ. Year price

Buch

17
Information Modeling Anwendersoftware (AS)

Ternary Relationship Sets

supplier supply project


N M

part

The ternary relationship “supply”


Attention: this is not the same as three binary relationship sets!
(connection trap)

supplier M N
supply project

M M

supply get

N N
part

Information Modeling Anwendersoftware (AS)

3. Information Models and Data Models

• Introduction
• Entity-Relationship Model (ERM)
ƒ Concepts, Definitions, Notations
ƒ Relationship Set, Entity Set
ƒ Diagrammatic Representation, Graphical Notation
ƒ Examples
• Extended ERM Features, Enhanced-ER Model Concepts
ƒ Mapping Constraints
ƒ Abstraction Concepts
- Generalization/Specialization
- Aggregation
- Association
- Integrated View

18
Information Modeling Anwendersoftware (AS)

ERM Limitations

• Up to now:
ƒ Mapping cardinalities are specified only coarse
- one-to-one, one-to-many, many-to-many
ƒ Only the type-level is represented
- no instances/data
ƒ Inadequate modeling of overlapping entity sets
- e.g. Car and Taxi
ƒ Only relationships between entity sets
- not between entities
ƒ Only user-defined relationships
- no standardization of often used relationships

Information Modeling Anwendersoftware (AS)

ERM Requirements
CAR TAXI CAR

Is_a
io io

TAXI

CAR

io

instance-of (io):

CAR

• Cardinality constraints refine relationship mappings


• Entity instances (objects) of an entity set should be represented explicitly in the
model
• The diagrammatic notations for entity sets and entity instances are of the same
kind
• Establishing of system controllable relationships (abstraction concepts)

19
Information Modeling Anwendersoftware (AS)

ERM Extended Features

• Up to now: only coarse-structured relationships sets

(e.g., one-to-one means “at most one to at most one”)

• Cardinality restrictions refine the relationship type semantics:


let be R ⊆ E1 x E2 x ... x En

the cardinality restriction card(R,Ei) = [min,max] defines that each entity from the
entity set Ei participates in at least min and at most max relationship instances of
relationship set R

(0 <= min <= max, max >= 1)

Information Modeling Anwendersoftware (AS)

ERM Extended Features (cont.)

• Graphical representation [min1 ,max1] R


[min2 ,max2]
E1 E2

cardinality
restructions:
e1 participates in [min1, max1] relationship instances of relationship type R
e2 participates in [min2, max2] relationship instances of relationship type R

• Examples
R E1 E2 card(R, E1) card(R, E2)
head_of_department DEPARTM EMPLOYEE
marriage WOMAN MAN

parents COUPLE CHILD

membership PARTY PERS

lecture_attending LECTURE STUDENT


works_on PERS PROJECT

20
Information Modeling Anwendersoftware (AS)

3. Information Models and Data Models

• Introduction
• Entity-Relationship Model (ERM)
ƒ Concepts, Definitions, Notations
ƒ Relationship Set, Entity Set
ƒ Diagrammatic Representation, Graphical Notation
ƒ Examples
• Extended ERM Features, Enhanced-ER Model Concepts
ƒ Mapping Constraints
ƒ Abstraction Concepts
- Generalization/Specialization
- Aggregation
- Association
- Integrated View

Information Modeling Anwendersoftware (AS)

Abstraction Concepts (1)

ƒ Newer applications of database technology have more complex


requirements than do the more traditional ones (e.g.,
engineering/CAD/CAM, image/graphics, cartographic/geological and
multimedia databases)

• GOAL:
ƒ Modeling of additional semantics of the miniworld with the (enhanced)
ERM

ƒ Need for development of additional semantic modeling concepts that


are able to represent our natural perception, understanding, and sight
of the miniworld (universe of discourse) concerned as accurate and
explicitly as possible.

21
Information Modeling Anwendersoftware (AS)

Abstraction Concepts (2)

• TASK:
Identification of essential constructs that are applied by a human to
describe his/her universe of discourse.
Î use of abstraction to organize the information
Î abstraction permits someone to suppress specific details of particular objects
emphasizing those pertinent to the actual view

• Abstractions are in general expressed as relationships between objects,


having as their purpose the organization of these objects in some desired
form.
(e.g., instance-of, subclass-of, element-of, subset-of, part-element-of,
subcomponent-of)

Information Modeling Anwendersoftware (AS)

Abstraction Concepts (3)

• Consequences
ƒ Adequate and accurate modeling: one-to-one mapping between mini
world and object model
ƒ All relevant things in our mini world are objects of the ERM (objects
that describe other objects, too)
Î entity sets (classes) and entities (instances) are independent objects in
the model
Î operations cannot distinguish between classes and instances

• Object Properties
Objects are either
ƒ simple (i.e., defined completely by itself) or
ƒ composite (i.e., described as an “abstraction” of other objects)

22
Information Modeling Anwendersoftware (AS)

Abstraction Concepts (4)

• Two Abstraction Types


ƒ From simple to composite objects ( one-level relationship)
ƒ From composite to (more complex) composite objects ( n-level
relationship)
• Abstraction Concepts are Being Used Mainly
ƒ to organize the information and thereby also
ƒ to limit the search space during the retrieval as well as
ƒ for system controlled reasoning
• Overview
ƒ Classification – Instantiation
ƒ Generalization – Specialization
ƒ Element Association – Set Association
ƒ Element Aggregation – Component Aggregation

Information Modeling Anwendersoftware (AS)

Classification

• Classification corresponds to the creation of entity sets:


grouping objects (entities, instances) with common properties to new
composite objects (entity type, class, class object)
Î grouping of instances

• The composite object is defined as a set of more simple objects with


common properties in each case

• Constructing an 'instance-of'-relationship (‘io’) as one-level relationship

23
Information Modeling Anwendersoftware (AS)

Instantiation
• Instantiation is the inverse operation of classification
• Used to obtain instances/objects that conform to the constraints associated with
the properties specified by the class
ƒ Same structure (attributes)
ƒ Same operations
ƒ Same integrity constraints

• Classification and instantiation are the primary concepts for creating and
structuring objects
• Graphical representation

The representation of the other abstraction concepts is analogous with this.

Information Modeling Anwendersoftware (AS)

Generalization (1)

• Task:
The generalization concept supplements the classification concept.
By generalization a more general class that absorbs the common aspects of the
base classes and that suppresses their differences is defined.

• Usage
ƒ A bottom-up design process – combine a number of entity sets that
share the same features into a higher-level entity set
ƒ It builds the 'subclass-of'-relationship (‘sc’- or 'is-a'-relationship)
ƒ It is recursively applicable (n-level relationship) and organizes the
classes in a generalization hierarchy
ƒ A superclass is a complex composite object that is built by a collection
of less complex composite objects (subclasses)

24
Information Modeling Anwendersoftware (AS)

Generalization (2)
• Structural Properties of the Generalization
ƒ Each instance of a subclass is an instance of the superclass, too
ƒ At the same time an object can be an instance of different classes as well as a
subclass of multiple superclasses
(Æ networks, many-to-many!)
ƒ The affiliation/membership of an object to a class/superclass is determined
mainly by the structure, the operations, and the integrity constraints of the
class/superclass

Information Modeling Anwendersoftware (AS)

Modeling with Generalization (Example)

25
Information Modeling Anwendersoftware (AS)

Specialization (1)

• Task:
ƒ Specialization and generalization are simple inversions of each other,
they are represented in an E-R diagram in the same way.
ƒ It supports the 'top-down'-design method:
ƒ Initially, the more common objects are described (superclasses), then
the more specific ones (subclasses)

- Designate subgroupings within an entity set that are distinctive from


other entities in the set. These subgroupings become lower-level entity
sets that have attributes or participate in relationships that do not apply
to the higher-level entity set.

Information Modeling Anwendersoftware (AS)

Specialization (2)
• System Controlled Reasoning
makes use of inheritance concept:
ƒ Superclass properties are being 'inherited' to all subclasses, because they are
valid for them, too
ƒ ADVANTAGES:
- No repetition of descriptive information
- Shortened/condensed description
- Avoiding failures/mistakes
• Kinds of Inheritance
ƒ Inheritance of attributes, constants and default values.
ƒ Inheritance of methods, predicates and operations
ƒ Problems: multiple inheritance

• Reasoning Via Inheritance Rules:


HasAttribute (C1, A) Å Isa (C1, C2), HasAttribute (C2, A)
HasValue (C1, A, V) Å Isa (C1, C2), HasValue (C2, A, V)
P(..., C1, ...) Å Isa (C1, C2), P (..., C2, ...)

26
Information Modeling Anwendersoftware (AS)

Inheritance
member of name
university birthday

• Each subclass inherits all attributes


from the superclass
• Multiple inheritance can cause
conflicts staff
faculty
student
MatNR
faculty
• Has to be (dis)solved explicitly by the
io
user
ƒ e.g., renaming: is-a
research_assistant_in_faculty Æ official employee Daisy

faculty of employee
io
join_in_faculty Æ faculty of student is-a

research hous-per-week
Garfield assistant department

io

Ernie

Information Modeling Anwendersoftware (AS)

Specialization: Definitions
• Subclass:
class S, whose entities are a subset of a superclass G: S ⊆ G
(i.e., each element (instance) of S is element of G, too)

• Specialization:
Z = {S1, S2, ... Sn} set of subclasses Si with the same superclass G

• Completness constraint – specifies whether or not an entity in the


higher-level entity set must belong to at least one of the lower-level
entity sets within a generalization.
Z is total, if G = ∪ Si (i = 1..n) else partial.

• Constraint on wether or not entities may belong to more than one


lower-level entity set within a single generalization
Z is disjoint, if Si ∩ Sj = { } for i ≠ j else overlapping.

27
Information Modeling Anwendersoftware (AS)

Types of Specialization X superclass

type of specialization
Y Z subclass

1. partial, disjoint (PD) 2. partial, overlapping (PO)


X X
Y Z Y Z

EMPL EMPL
PO

PROF STUD ASSI STUD

3. total, disjoint (TD) 4. total, overlapping (TO)


X X
Y Z Y Z

EMPL EMPL
TO
TD

MAN WOMAN MANAGER MAN WOMAN

Information Modeling Anwendersoftware (AS)

Generalization Example

28
Information Modeling Anwendersoftware (AS)

Specialization Example

Information Modeling Anwendersoftware (AS)

Element Association (1)

• Task:
ƒ The element association groups objects (elements) to describe them
by an object group (set object) as a whole
- i.e., to describe the properties of the group of objects as a whole
ƒ On the – one hand details of individual elements are being suppressed
and on the other hand properties that characterize the whole group of
objects are being emphasized
Î grouping of elements

29
Information Modeling Anwendersoftware (AS)

Element Association (2)

• Usage
ƒ An element association (called as well: grouping, partitioning, cover
aggregation) constructs composite (set) objects based on simple
(element) objects.
ƒ Embodying 'element-of'-relationship (‘eo’) as one-level abstraction
relationship.
ƒ It is possible to combine heterogeneous objects to form a set object.
In case of automatic reasoning all objects have to fulfill the set
predicate. In case of manual construction a user selects the objects
and connects them with the set object.

Information Modeling Anwendersoftware (AS)

Element Association (3)

• Graphical Representation

30
Information Modeling Anwendersoftware (AS)

Set Association (1)

• Task
The set association concept supplements the element association concept. It
expresses a relationship between composite set objects.
Î grouping of sets

• Usage
ƒ Embodying a 'subset-of'-relationship (‘ss’)
ƒ It is recursively and organizes the set objects in an association
hierarchy (n-level relationship)

Information Modeling Anwendersoftware (AS)

Set Association (2)

• Structural Properties of Association


ƒ Each element of a set object is an element of the superset, too
ƒ Objects can be elements of different set objects as well as subsets of
multiple supersets at the same time
Î networks (many-to-many)!

• System Controlled Reasoning for Association


ƒ It does not support inheritance, because set properties are not
element properties
ƒ Membership stipulation can be used to determine properties that
must be satisfied for each valid element of the set
ƒ Set properties are properties of the set that are derived/deducted
from the element properties

31
Information Modeling Anwendersoftware (AS)

Association Example

Information Modeling Anwendersoftware (AS)

Element Aggregation (1)

• Task:
The element aggregation allows to compose objects from simple objects. It
defines a part-whole-relation with objects that are not further decomposable
Î grouping of components
• Usage
ƒ A collection of simple objects (element, part) is treated as a
composite object (component object/aggregate object)
ƒ Establishing 'part-element-of'-relationship (‘po’) (one-level
abstraction relationship). Typically, an user creates an aggregation of
parts using connect statements; structural properties have to be
considered (e.g., a soccer team comprises 11 players)
ƒ The possibility to aggregate heterogeneous objects adds flexibility to
the application

32
Information Modeling Anwendersoftware (AS)

Element Aggregation (2)


• Graphical Representation

Information Modeling Anwendersoftware (AS)

Component Aggregation (1)

• Task:
Component-aggregation supplements element-aggregation.
Applying the part-whole-relation to components

• Usage
ƒ Establishing a 'component-of'-relationship (‘co’) between the
component elements (e.g., with connect statement)
ƒ It is recursively applicable and organizes an aggregation hierarchy (n-
level relationship)

33
Information Modeling Anwendersoftware (AS)

Component Aggregation (2)

• Structural Properties of Aggregation


(aggregation means also 'consists-of')

ƒ Describes necessary properties that an object must have to be


consistent
Î in contrast to classes and set objects that may exist
without instances and elements, respectively.

ƒ Elements of a subcomponent are elements of all supercomponents of


this subcomponent, too

ƒ Objects can be element of different components or subcomponents of


multiple supercomponents at the same time
Î networks, (many-to-many) !

ƒ No inheritance, because aggregate properties are not component


properties!

Information Modeling Anwendersoftware (AS)

Aggregation (Example)

34
Information Modeling Anwendersoftware (AS)

Aggregation (Example) (cont.)

• System Controlled Reasoning: implied predicates


ƒ Predicates that are specified over the aggregation hierarchy and are
based on common properties of elements/aggregates
ƒ ‘Upward implied predicate’
P(x) is true => P (aggregate objects (x)) is true
ƒ ‘Downward implied predicate’
P(x) is true => P (component objects (x)) is true

• Example (continued):
ƒ ‘Upward implied predicate’: weight > x
ƒ ‘Downward implied predicate’: price < y

Information Modeling Anwendersoftware (AS)

Aggregation: Example

• Which solution contains more semantics?

35
Information Modeling Anwendersoftware (AS)

Aggregation (Example) (cont.)

Information Modeling Anwendersoftware (AS)

An Example Object
Feijoada
instance-of: main dishes
structural attributes
element-of: brazilian specialities
has-components: black beans, meat, spices

price: 36
possible-values: integer > 0
cardinality: [1,1]
unit: Euro declarative attributes
preparation-time:
possible-values: integer > 0
cardinality: [1,1]
demon: compute-preparation time
suitable-drinks: dry red wine, beer
possible-values: instance-of drinks
cardinality

order (no-of-persons)
procedure BEGIN ... END procedural attributes
...

36
Information Modeling Anwendersoftware (AS)

Object-Centered Representation

• Integration of abstraction
concepts:
ƒ Each object can built up
to 6 relationships types
ƒ Corresponding to the
kinds of roles that
appear in abstraction
concepts
ƒ Context/roles of objects
determine semantics of
objects

Information Modeling Anwendersoftware (AS)

Built-In Reasoning Facilities

• Three abstraction concepts allow


different forms of organization of
modeled objects and their
relationships

• Can be used for reasoning:


ƒ To make deductions about
objects and their properties
ƒ Additional to manipulation
and to operations for retrieval

37
Information Modeling Anwendersoftware (AS)

Information Modeling Anwendersoftware (AS)

ERM with UML

• UML:
ƒ standardized language to support software development
• ERM:
ƒ general modeling tool for information models (logical DB design)

38
ER-Modellierung
Information Modeling mit UML Anwendersoftware (AS)

ERM in UML: Entity Typ

• Employee with a staff number, lastname, firstname

ERM UML
UML-Stereotype

PersonalNr
StaffNr
<<entity>>
Employee
Last name Employee
<<key
+ PersonalNr:
attrbibute>>
Integer+ StaffNr: Integer
Firstname + Last_name
Nachname :: string
string
+ Vorname
First_name : string
: string

ER-Modellierung
Information Modeling mit UML Anwendersoftware (AS)

ERM in UML: Multi valued attributes

• Car with color(s) and plate number

ERM UML

<<entity>>
Color
Color Car
Car
PlateNr + Color : string [1..*]
<<key attrbibute>> + PlateNr:
AutoNr: Integer
String

39
Information Modeling Anwendersoftware (AS)

ERM in UML: Relations

• A car is owned by a person

ERM UML
<<entity>>
Car
Car

n
*

owned owned

1
1
<<entity>>
Person
Person

Information Modeling Anwendersoftware (AS)

ERM in UML: Relations with attributes

• Car is owned by person, since buying date

ERM UML
<<entity>>
Car
Car

n
*
<<relationship>>
owned
Date owned
+ Date: date

1
1
<<entity>>
Person
Person

40
Information Modeling Anwendersoftware (AS)

ERM in UML: Multiple Relations

• Professors grade students on courses at dates

ERM UML
<<entity>> <<entity>>
Professor Course
Professor Course

1 n 0..1 *

<<relationship>>
grades
Date grades
+ Date: date

m *

<<entity>>
Student
Student

Information Modeling Anwendersoftware (AS)

ERM in UML: Aggregation

• Parts are parts of parts

ERM UML

is-part- is-part-of
of

* 1..*
N M
component subpart - component - subpart

<<entity>>
part
part

41
Information Modeling Anwendersoftware (AS)

ERM in UML: weak entities

• A house has an address and an owner. It consists of rooms (relation to


weak entities room), that have room numbers.

ERM UML
Address <<entity>>
House
House
Owner
<<key attribute>> +Address: string
1 + Owner: String

consists-of consists-of

N 1..*
<<weak entity>>
<<entity>>
Room
RoomNr Zimmer
Room

<<key attribute>> +RoomNr: smallint

Information Modeling Anwendersoftware (AS)

ERM in UML: Generalization

• Employees are persons

ERM UML
<<entity>>
Person
Person

<<entity>>
Employee
Employee

42
Information Modeling Anwendersoftware (AS)

Short Exercise

• Model the following facts in ERM:


ƒ There are documents that have a title, a year and a signature.
ƒ Find a proper primary key for that entity.
ƒ A thesis is a special document.
ƒ Professors grade theses.
ƒ A thesis consists of the abstract and the full text. Without a full text, a
thesis would not be existent.
ƒ Professors are staff of an university.
ƒ University staff have a staff number, a name and an address, that
consists of street, zip code and city.
ƒ There is a rule that each professor has at least one and at last five
proxys/substitutes that are also professors. Each professor should be
substitute for at last four other professors.

Information Modeling Anwendersoftware (AS)

A Solution Street Zip City

Title Address

Year Document Uni staff Name

Signatur StaffNr

is-a is-a

[0..4]
Dissertation grades Professor
substritutes

is substituted [1..5]
by
Substitute
has parts

Abstract Volltext

43
Information Modeling Anwendersoftware (AS)

Summary

Information Modeling Anwendersoftware (AS)

Summary

• ERM Characteristics
ƒ Entity sets and relationship sets
(attribute, value set, primary key)
ƒ Classification of relationship types
ƒ ER diagrams

• Abstraction Concepts and Their Implications


ƒ Generalization and inheritance
ƒ Association with set properties and membership stipulation
ƒ Aggregation and implied predicates
ƒ Integration of abstraction concepts using object-centered representation

44
Information Modeling Anwendersoftware (AS)

Summary (cont.)
• ER Design Decisions
ƒ The use of an attribute or entity set to represent an object.
ƒ Whether a real-world concept is best expressed by an entity set or a
relationship set.
ƒ The use of a ternary relationship versus a pair of binary relationships.
ƒ The use of a strong or weak entity set.
ƒ The use of generalization
- Contributes to modularity in the design
ƒ The use of association
- Close to aggregation, but allows for empty sets
ƒ The use of aggregation
- Can treat the aggregate entity set as a single unit
without concern for the details of its internal structure.
ƒ The use of
- methods
- rules
- triggers, ...

45
Anwendersoftware (AS)

as
Anwendungssoftware

Chapter 4: Relational Model

Relational Model Anwendersoftware (AS)

4. Relational Model: Overview

• Basic Concepts
• Mapping the Entity-Relationship Model
to the Relational Data Model
ƒ Fundamental Concepts
ƒ ER-to-Relational Mapping
ƒ Mapping of Generalization and Aggregation
ƒ Mapping of Relationships
ƒ Relational Invariants
• Referential Integrity
ƒ Basic Concepts
ƒ Referential Actions

1
Relational Model Anwendersoftware (AS)

Database Design Process

information semantical logical data database


collection data modeling modeling installation

analysis of rough precise


meaning modeling modeling
conceptual time
schema

• interview • ERM
• hierarchical
• noun analysis • NIAM
• network-like
• brainstorming • EXPRESS-G
• relational • DB2
• document analysis • IDEF1X
• object-oriented • Informix
• STEP
• ... • ORACLE
• ... • UML
• XML • Postgres
• ...
• mySQL
DBMS independent DBMS dependent • ...
conceptional DB logical physical
schema design schema design schema design
3

Relational Model Anwendersoftware (AS)

Relational Model: Overview

• Data Structure: Table (relation)


Æ the only data structure (beside atomic values)
Æ information is represented only through data values
Æ integrity constraints on/between tables: relational invariants

Name of Table Attribute Attribute Attribute

Value Value Value

Value Value Value

Value Value Value

2
Relational Model Anwendersoftware (AS)

Relational Model: Overview

• Relationships
ƒ are always explicit, binary, and symmetric
ƒ are represented by values: role of primary and foreign key
(ensuring referential integrity)
ƒ are automatically maintainable in SQL (referential actions)

• Design theory
ƒ Normal form approach (desirable and expedient tables)
ƒ Synthesis approach

Relational Model Anwendersoftware (AS)

Relational Model: Operators on Tables (I)

• Restrict/Select (σ)
ƒ Returns a relation containing all tuples from a specified
relation that satisfy a specific condition
• Project (π)
ƒ Returns a relation containing all (sub)tuples that remain in a
specified relation after specified attributes have been
removed
• Product
ƒ Returns a relation containing all possible tuples that are a
combination of two tuples, one from each of two relations
• Union
ƒ Returns a relation containing all tuples that appear in either
or both of two specified relations

3
Relational Model Anwendersoftware (AS)

Relational Model: Operators on Tables (II)


• Intersect
ƒ Returns a relation containing all tuples that appear in both of two
specified relations
• Difference
ƒ Returns a relation containing all tuples that appear in the first and not
the second of two specified relations
• Join
ƒ Returns a relation containing all possible tuples that are a combination
of two tuples, one from each of two specified relations,
ƒ such that the two tuples contributing to any given combination have a
common value for the common attributes of the two relations,
ƒ and that common value appears just once, not twice in the result
tuple
• Divide
ƒ Takes two unary relations and one binary relation and returns a
relation containing all tuples fom one unary relation that appear in the
binary relation matched with all tuples in the other unary relatoin

Relational Model Anwendersoftware (AS)

Relational Model: Operators on Tables

Restrict Project

Difference Intersection Union

4
Relational Model Anwendersoftware (AS)

Relational Model: Operators on Tables

Join A B

Table A a b Table B b c Result a b c


a1 b1 b1 c1 a1 b1 c1
a2 b1 b2 c2 a2 b1 c1
a3 b2 b3 c3 a3 b2 c2

Divide

Table A c1 Table B c1 c2 Table C c2 Result c1


a a x x a
b a y z
c a z
b x
c y
9

Relational Model Anwendersoftware (AS)

Relational Model: Basic concepts

• ER-to-Relational Mapping

ERM Concept RM Concept


Domain Domain

Attribute Attribute

Primary Key Primary Key

Entity Set
Relation (Table)
Relationship Set

10

5
Relational Model Anwendersoftware (AS)

Relational Model: Basic Concepts (cont.)

• Def.: normalized relation / table

• Representation of R: table with n columns


Æ each relation can be represented as a table
Æ henceforth we use mostly the technical term TABLE
• A table is a set
Æ the uniqueness of rows and tuples is guaranteed
Æ primary key (multiple candidate keys are possible) 11

Relational Model Anwendersoftware (AS)

Determining Primary Keys From E-R Sets

• Strong entity set


ƒ The primary key of the entity set becomes the primary key of the
table.
• Weak entity set
ƒ The primary key of the table consists of the union of the primary key
of the strong entity set and the discriminator of the weak entity set.
• Relationship set.
ƒ The union of the primary keys of the related entity sets becomes a
super key of the table.
ƒ For binary many-to-many relationship sets, above superkey is also the
primary key.
ƒ For binary many-to-one relationship sets, the primary key of the
“many” entity set becomes the table’s primary key.
ƒ For one-to-one relationship sets, the table’s primary key can be that
of either entity set.

12

6
Relational Model Anwendersoftware (AS)

Table Instance

• The current values of a table are specified by the table


(instance)
• An element t of r is a tuple
• A tuple is represented by a row in a table

Customer table

13

Relational Model Anwendersoftware (AS)

Keys

• Let K⊆R; K is a superkey of R if values for K are sufficient


to identify a unique tuple of each table instance r for
relation/table R written as r(R).
ƒ Example:
{ customer-name, customer-street} and { customer - name}
are both superkeys of Customer, if no two customers can
possibly have the same name.

• K is candidate key if K is minimal


ƒ Example: { customer-name} is a candidate key for Customer,
since it is a superkey (assuming no two customers can possibly
have the same name), and no subset of it is a superkey.

• Although several candidate keys may exist, one of the


candidate keys is selected to be primary key.
14

7
Relational Model Anwendersoftware (AS)

Normalized Tables (1)

Fundamental Rules:
1. Each row (tuple) is unique and it describes one object (entity) of the
miniworld
2. The order of rows is not relevant;
the order of rows does not contain any relevant information for the user
3. The order of columns is not relevant, because they have a unique name
(attribute name)
4. Each data value is an atomic data element in a table
5. The whole meaningful information for the user is represented solely
through data values
6. A primary key exists and sometimes multiple additional candidate keys

15

Relational Model Anwendersoftware (AS)

Normalized Tables (2)


FC FCNO FCNAME DEAN
FC 9 BUSINESS 4711
ADMINISTRATI
ON
FC 5 COMPUTER 2223
SCIENCE

PROF PNO PNAME FCNO SUBJECT

1234 HÄRDER FC 5 DATABASE SYSTEMS


5678 WEDEKIND FC 9 INFORMATION
SYSTEMS
4711 MÜLLER FC 9 OPERATIONS
RESEARCH
6780 NEHMER FC 5 OPERATING
SYSTEMS

16

8
Relational Model Anwendersoftware (AS)

Relational Model: Basic Concepts (cont.)

• Representation of Information in RM
ƒ Solely by values V(Ai) = Di
ƒ Order of rows and columns contains no information

• How to Represent Information that Overlaps Tables?


ƒ Foreign keys
- define reference to primary key or a candidate key of another (or same)
table (same domain)
- carry inter-relational or intra-relational information
ƒ Relationships are being expressed by foreign keys and their
associated primary keys or candidate keys!
ƒ Primary keys and foreign keys establish/allow meaningful relationships
between tables based on attribute values

17

Relational Model Anwendersoftware (AS)

Foreign Key (1)

Definition:

A foreign key that is relative to a table R1 is a (composite)


attribute FK of a table R2, for that the following is always true:
for every value (not Null) of FK there must exist
the same value for the primary key PK
or for a candidate key CK in some tuple of table R1.

18

9
Relational Model Anwendersoftware (AS)

Foreign Key (2)

Remarks:
1. Foreign key and associated primary key (candidate key) bear important
interrelational (sometimes also intrarelational) information. They are defined on
the same domain (comparable and uniteable). They allow to combine tables by
means of relational operations.
2. Foreign keys may have NULL values, if they are not part of a primary key.
3. Candidate keys may have NULL values, if NOT NULL was not specified explicitly.
4. A foreign key is a composite key, if the associated primary key is composite.
(FK value = NULL means that all components are NULL (MATCH FULL) or that
some components are NULL (no MATCH type specified)).
5. A table may have several foreign keys, which reference the same or different
tables.
6. Referenced and referencing table are not necessarily different ("self-
referencing table").
7. Cycles are possible (closed referential path).

19

Relational Model Anwendersoftware (AS)

Relational Model: Basic Concepts (cont.)

• Built-in Integrity Constraints:


Which assertion/integrity constraints can be guaranteed by the data
model?
ƒ Set properties of entity and relationship sets
ƒ Types of relationships (one-to-one, ..., many-to-many)
Î under some restrictions (one-to-many)
ƒ Referential integrity
ƒ Cardinality restrictions?
Î desirable (constraints)
ƒ Semantics of user defined relationships?
Î no system support is provided

20

10
Relational Model Anwendersoftware (AS)

Mapping of Entity and Relationship Sets


(ERM) into Tables (RM)

• Criteria
ƒ Preservation of information
ƒ Minimization of redundancy
ƒ Minimization of work for combination of tables
ƒ Additional:
- A natural mapping
- No mixture of object types
- Understandable
• Detailed Examples: see exercises 21

Relational Model Anwendersoftware (AS)

Two Entity-Sets and a One-To-Many


Relationship

Possible Representations in the RM:


1. Use of three tables
DEPT (DNO, DNAME, ...)
EMP(ENO, ENAME, ...)
WORKS-IN (DNO ,ENO )
Î Usually the one-to-many relationship type is only mapped into an own
table, if it has describing attributes
2. Use of two tables
DEPT (DNO, DNAME, ...)
EMP (ENO, ENAME, ..., DNO)
Î Default mapping of one-to-many relationship type using primary and
foreign key

22

11
Relational Model Anwendersoftware (AS)

A Single Entity Set and a One-to-One


Relationship

Representation Possibilities in RM:


1. Use of two tables
PERS (PNO, PNAME, ...)
MARRIAGE(MPNO , FPNO )or MARRIAGE(FPNO , MPNO )

2. Use of a single table


PERS (PNO, PNAME, ..., SPOUSE)

23

Relational Model Anwendersoftware (AS)

An Entity Set and a Many-to-Many


Relationship Set (1)

Representation Possibilities in RM:


PART (PNO, NAME, MATERIAL, STOCK)
STRUCTURE (TPNO , BPNO, NO_OF_PARTS)

Example:
PART PNO NAME MATERIAL STOCK
A geering aluminium 10
B casing steel 0
C axle steel 100
1 screw steel 200
D ball bearing steel 50
3 disc lead 0
2 screw chromium 100
24

12
Relational Model Anwendersoftware (AS)

An Entity Set and a Many-to-Many


Relationship Set (2)

STRUCTURE TPNO BPNO NO_OF_PARTS


A B 1
A C 5
A 1 8
B 1 4
B 2 2
C 1 4
C D 2

25

Relational Model Anwendersoftware (AS)

Three Entity Sets and Ternary Relationship


SNO

SNAME
supplier
SLOCATION
PNO p PRONO

PNAME m n PRONAME
part supply project
WEIGHT PLOCATION

NUMBER DATE

Representation Possibilities in RM:


SUPPLIER (SNO, SNAME, SLOCATION)
PROJECT (PRONO, PRONAME, PLOCATION)
PART (PNO, PNAME, WEIGHT)
SUPPLY (SNO, PRONO, PNO, NUMBER, DATE)
26

13
Relational Model Anwendersoftware (AS)

Mapping Types within an Entity Set (1)


EMP (ENO, NAME, ADDRESS, ..., SALARY, BA)

• Horizontal Partitioning
ƒ Building of tables (classes) using selection constraints
EMP-VIP (ENO, ..., BA) SALARY > 100K
EMP (ENO, ..., BA) SALARY <= 100K

⇒ is actually a task for the view concept

• Vertical Partitioning
ƒ To satisfy performance and security constraints more easily:

EMP-PUB (ENO, ENAME, ADDRESS, ...)


EMP-PRIV (ENO, BASE-SALARY, BONUS, ...)

⇒ task for internal schema and the view concept


27

Relational Model Anwendersoftware (AS)

Mapping Types within an Entity Set (2)

• Mapping of Multi-valued Attributes


ƒ Entity set:

EMP (ENO, ENAME, {favorite dishes}, {children (first name, age)})


P1, Miller, {wiener schnitzel, roast, rollmops}, -
P2, Taylor, {pizza}, {(Natalie, 5), (Philip, 2)}

ƒ Possible representation in RM:

EMP (ENO, ENAME, ...)


FAV-DISHES(ENO, DISH, ...)
CHILDREN (ENO, FIRST-NAME, AGE)

28

14
Relational Model Anwendersoftware (AS)

4. Relational Model: Overview

• Basic Concepts
• Mapping the Entity-Relationship Model
to the Relational Data Model
ƒ Fundamental Concepts
ƒ ER-to-Relational Mapping
ƒ Mapping of Generalization and Aggregation
ƒ Mapping of Relationships
ƒ Relational Invariants
• Referential Integrity
ƒ Basic Concepts
ƒ Referential Actions

29

Relational Model Anwendersoftware (AS)

Mapping Generalization and Aggregation


into the Relational Model
• RM does not provide support for abstraction concepts
ƒ No inheritance (of structure, integrity constraints, operations)
• Limited forms of generalization and aggregation can be
simulated

30

15
Relational Model Anwendersoftware (AS)

Generalization – Relational View (1)

Solution 1: House Class Model


• Each instance is stored in
its house class completely
and exactly once
• Horizontally partitioned MEMBER-OF-UNI ID NAME
111 Ernie
instances of DB
• Usage: in ORION and in EMPLOYEE ID NAME BAT
007 GARFIELD IA
O2
TECHNICIAN ID EXPERIENCE NAME BAT
123 SUN DONALD IVA

RES-EMPL ID MASTER EXPERTISE NAME BAT


333 Computer Sc. RECOVERY DAISY IIA
765 Mathematics ERM GROUCH IIA
31

Relational Model Anwendersoftware (AS)

Generalization – Relational View (2)

• Properties of House Class Model:


ƒ Low storage costs and no update anomalies
ƒ Retrieval might require recursive search in subclasses
ƒ Explicit reconstruction using relational operations
⇒ example: find all EMPLOYEES:

πID, NAME, BAT(TECHNICIAN)

∪ πID, NAME, BAT(RES-EMPL)

∪ EMPLOYEES

32

16
Relational Model Anwendersoftware (AS)

Generalization – Relational View (3)


MEMBER-OF-UNI ID NAME
Solution 2: Partitioning Model 007 Garfield
111 Ernie
• Each instance is decomposed into a is-
a-hierarchy corresponding to the class 123 Donald
attributes and its parts are stored in 333 Daisy
the associated tables 765 Grouch
• Only the ID attribute is duplicated
EMPLOYEE ID BAT
• Vertical partitioning of DB
007 Ia
• Usage: IRIS 123 IVa
333 IIa
765 IIa

TECHNICIAN ID EXPERIENCE
123 SUN

RES-EMPL ID MASTER EXPERTISE


333 Computer Sc. RECOVERY
765 Mathematics ERM
33

Relational Model Anwendersoftware (AS)

Generalization – Relational View (4)

Properties of Partitioning Model approach:


ƒ Slightly higher storage costs, but high costs for retrieval and
updates
ƒ Integrity constraints: TECHNICIAN.ID ⊆ EMPLOYEE.ID, etc.
ƒ Access to instances requires implicit or explicit join operations
⇒ example: find all TECHNICIAN specific data

TECHNICIAN EMPLOYEE MEMBER-OF-UNI


ID=ID ID=ID

34

17
Relational Model Anwendersoftware (AS)

Generalization – Relational View (5)


Solution 3: MEMBER-OF-UNI ID NAME

Complete Redundancy 007 Garfield


111 Ernie
• An instance is stored 123 Donald
in each class to 333 Daisy

which it belongs EMPLOYEE ID NAME BAT


repeatedly 007 Garfield Ia

• It contains the 123 Donald IVa


333 Daisy IIa
inherited attribute 765 Grouch IIa
values and the
attribute values of TECHNICIAN ID EXPERIENCE NAME BAT

the subclass at the 123 SUN DONALD IVA

same time
RES-EMPL ID MASTER EXPERTISE NAME BAT
333 Computer Sc. RECOVERY DAISY IIA
765 Mathematics ERM GROUCH IIA
35

Relational Model Anwendersoftware (AS)

Generalization – Relational View (6)

Properties of Complete Redundancy approach:


ƒ Very high need for storage
ƒ Update anomalies are possible
ƒ Easy retrieval, because only the destination class (e.g.,
EMPLOYEES) has to be visited

36

18
Relational Model Anwendersoftware (AS)

Aggregation – Relational View

⇒ relational operations do not support properties of aggregation


37

Relational Model Anwendersoftware (AS)

Mapping of Relationships (1)

• ER Diagram

• Mapping into the Relational Model


DEP (DNO …, EMPL (ENO ...,
… DNO ...,
PRIMARY KEY (DNO)) PRIMARY KEY (ENO),
FOREIGN KEY (DNO) REFERENCES DEP)

• Reference Graph

38

19
Relational Model Anwendersoftware (AS)

Mapping of Relationships (2)

• Additional Rules:
Each employee (EMPL) has to be employed in one department ([1,1]).
⇒ EMPL.DNO ... NOT NULL

• Remark:
In SQL2 it is not possible to specify that a parent has to have a child (e.g., [1,n]).
In addition the number of children cannot be restricted.

ƒ SQL3 provides a PENDANT-clause, so that [1,n] can be managed


ƒ On creation these relationships have to be checked deferred

39

Relational Model Anwendersoftware (AS)

Mapping of Relationships (3)

• ER Diagram

• Mapping into the Relational Model


DEP (DNO ..., EMPL (ENO ...,
… DNOB ... NOT NULL,
PRIMARY KEY (DNO)) DNOA…,
PRIMARY KEY (ENO),
FOREIGN KEY (DNOB) REFERENCES DEP,
FOREIGN KEY (DNOA) REFERENCES DEP)

DNOB
• Reference Graph
DEP EMPL

DNOA
• Remark:
- For each FK relation a separate FK attribute is necessary.
40
- Multiple FK attributes can reference to the same PK/CK attribute.

20
Relational Model Anwendersoftware (AS)

Mapping of Relationships (4)

• Goal: representation of an one-to-one relationship

• First Approach as ER Diagram

• Mapping into the Relational Model

DEP (DNO ..., MGR(MNO…,


MNO… UNIQUE, DNO…UNIQUE,
… …
PRIMARY KEY (DNO), PRIMARY KEY(MNO)
FOREIGN KEY (MNO) FOREIGN KEY(DNO)
REFERENCES MGR) REFERENCES DEP(MNO))

⇒ symmetrical solutions are possible

41

Relational Model Anwendersoftware (AS)

Mapping of Relationships (5)

• Aditional Rules
ƒ Each department has one manager → DEP.MNO ... UNIQUE NOT
NULL
ƒ Each manager manages one department → MGR.DNO ... UNIQUE
NOT NULL

• Reference Graph

• Can an one-to-one relationship be expressed by these two many-to-one


relationships?
42

21
Relational Model Anwendersoftware (AS)

Mapping of Relationships (6)

• ER Diagram

• Mapping into the Relational Model

DEP (DNO ..., MGR (MNO…,


MNO ... UNIQUE NOT NULL, …
… PRIMARY KEY (MNO),
PRIMARY KEY (DNO), FOREIGN KEY (MNO)
FOREIGN KEY (MNO) REFERENCES DEP(MNO))
REFERENCES MGR)

⇒ symmetrical solutions are possible

43

Relational Model Anwendersoftware (AS)

Mapping of Relationships (7)


• Reference Graph

ƒ The one-to-one relationship can be ensured by using MNO-attributes for both


FK-relationships
ƒ Expressing of ([0,1], [0,1]) is not possible this way

• Variations in Candidate Keys


DEP (DNO ..., MGR (SNO …,
MNO ... UNIQUE, MNO …UNIQUE,
… …
PRIMARY KEY (DNO), PRIMARY KEY (SNO),
FOREIGN KEY (MNO) FOREIGN KEY(MNO)
REFERENCES MGR(MNO)) REFERENCES DEP(MNO))
⇒ symmetrical solutions are possible

• The use of candidate keys with option NOT NULL allows to define ([1,1] , [1,1])

• All combinations of [0,1] and [1,1] are possible


44

22
Relational Model Anwendersoftware (AS)

Mapping of Relationships (8)

• Goal: representation of many-to-many relationships


• ER Diagram

• Mapping into the Relational Model


EMP (ENO ..., PROJ(PNO,…
... …
PRIMARY KEY (ENO)) PRIMARY KEY (PNO))

WORKS_ON (ENO, PNO ...,


PRIMARY KEY (ENO,PNO),
FOREIGN KEY (ENO) REFERENCES EMP,
FOREIGN KEY (PNO) REFERENCES PROJ)

⇒ This default solution forces an “existence dependency” of WORKS-ON. To


avoid this it is not allowed to specify the foreign keys of WORKS-ON as part of the
primary key.
45

Relational Model Anwendersoftware (AS)

Mapping of Relationships (9)

• It is possible to realize [1,n] or [1,m] within the mapping of a many-to


many relationship?
1 1

Reference Graph

46

23
Relational Model Anwendersoftware (AS)

Mapping of Relationships (10)

• Goal: representation of a one-to-many relationship as self reference


• ER Diagram

• Mapping into the Relational Model


EMP (ENO ..., MNO ...,
...
PRIMARY KEY (ENO),
FOREIGN KEY (MNO) REFERENCES EMP (ENO))
⇒ The employee hierarchy of an enterprise can be expressed using this solution. In this
case the referential relation is a partial function, because the managers on the very
top in the hierarchy don’t have a manager.
⇒ MNO ... NOT NULL is only realizable, if the managers on the very top are seen as if
they were their own manager, too. This implies reference cycles so that the query
processing and consistency checking get more difficult.
• Which relationship structure is created by MNO ... UNIQUE NOT NULL?
• Reference Graph
47

Relational Model Anwendersoftware (AS)

A Sample Miniworld
1 Faculty 1
ER Diagram is
belongs to member
of
N N

Professor exam Student


N M

Graphical Representation
of the Relational Schema

48

24
Relational Model Anwendersoftware (AS)

Specification of the Relational DB Schema


(SQL2 Standard) (1)
• Domain Definitions (Value Sets):
CREATE DOMAIN NO_OF_FACULTY AS CHAR (4)
CREATE DOMAIN FACULTY_NAME AS VARCHAR (20)
CREATE DOMAIN SUBJECT_NAME AS VARCHAR (20)
CREATE DOMAIN NAMES AS VARCHAR (30)
CREATE DOMAIN EMPLOYEE_NO AS CHAR (4)
CREATE DOMAIN MATRIKELNUMBER AS INT
CREATE DOMAIN MARKS AS SMALLINT
CREATE DOMAIN DATE AS DATE

• Tables:
CREATE TABLE FC (
FCNO NO_OF_FACULTY PRIMARY KEY,
FCNAME FACULTY_NAME UNIQUE NOT NULL,
DEAN EMPLOYEE_NO UNIQUE NOT NULL)

49

Relational Model Anwendersoftware (AS)

Specification of the Relational DB Schema


(SQL2 Standard) (2)
• Tables (cont):
CREATE TABLE PROF (
PNO EMPLOYEE_NO PRIMARY KEY,
PNAME NAMES NOT NULL,
FCNO NO_OF_FACULTY,
SUBJECT SUBJECT_NAME,
CONSTRAINT PFK FOREIGN KEY (FCNO)
REFERENCES FC (FCNO)
ON UPDATE CASCADE
ON DELETE SET NULL)

CREATE TABLE STUDENT (


MATNR MATRIKELNUMBER PRIMARY KEY,
SNAME NAMES NOT NULL,
FCNO NO_OF_FACULTY NOT NULL,
BEGIN DATE,
CONSTRAINT SFK FOREIGN KEY (FCNO)
REFERENCES FC (FCNO)
ON UPDATE CASCADE
50
ON DELETE RESTRICT )

25
Relational Model Anwendersoftware (AS)

Specification of the Relational DB Schema


(SQL2 Standard) (3)
• Tables:
CREATE TABLE EXAM (
PNO EMPLOYEE_NO,
MATNR MATRIKELNUMBER,
SUBJECT SUBJECT_NAME,
EDATE DATE NOT NULL,
MARK MARKS NOT NULL,
PRIMARY KEY (PNO, MATNR),

CONSTRAINT PR1FK FOREIGN KEY (PNO)


REFERENCES PROF (PNO)
ON UPDATE CASCADE
ON DELETE CASCADE,

CONSTRAINT PR2FK FOREIGN KEY (MATNR)


REFERENCES STUDENT (MATNR)
ON UPDATE CASCADE
ON DELETE CASCADE )

51

Relational Model Anwendersoftware (AS)

Representing the content of the Miniworld


with Tables (1)

FC FCNO FCNAME DEAN


FC 9 BUSINESS ADMINISTRATION 4711
FC 5 COMPUTER SCIENCE 2223

PROF PNO PNAME FCNO SUBJECT


1234 Härder FC 5 DATABASE SYSTEMS
5678 Wedekind FC 9 INFORMATION SYSTEMS
4711 Miller FC 9 OPERATIONS RESEARCH
6780 Nehmer FC 5 OPERATING SYSTEMS

52

26
Relational Model Anwendersoftware (AS)

Representing the content of the Miniworld


with Tables (2)
STUDENT MATNR SNAME FCNO BEGIN
123766 COY FC 9 1.10.86
225332 MILLER FC 5 15.4.87
654711 BERNSTEIN FC 5 15.10.84
226302 CANETTI FC 9 1.10.86
196481 TAYLOR FC 5 23.10.87
130680 SCHMITH FC 9 1.4.88

EXAM PNO MATNR SUBJECT EDATE MARK


5678 123766 IS 22.10.89 4
4711 123766 OR 16. 4.90 3
1234 654711 DB 17. 4.90 2
1234 123766 DB 17. 4.90 4
6780 654711 OS 19. 9.91 2
1234 196481 DB 15.10.89 1
6780 196481 OS 25. 3.91 3
53

Relational Model Anwendersoftware (AS)

Relational Invariants (1)

1. Primary Key Condition: uniqueness, NULL values are not allowed!


2. Foreign Key Condition: associated PK (CK) must exist

PROBLEMS:
• Operations on the Child Table
a) Insert a child tuple
b) Update the FK value in a child tuple
c) Delete a child tuple
⇒ which actions are necessary?

ƒ On insert it must be checked if there exist a parent tuple with the


same PK/CK value as the inserted FK value
ƒ On update of a FK value an analogous check is done

54

27
Relational Model Anwendersoftware (AS)

Relational Invariants (2)

• Operations on the Parent Table


d) Delete a parent tuple
e) Update the PK/CK in a parent tuple
f) Insert a parent tuple
⇒ which reactions are possible and useful and at which time?

ƒ Prohibit operation
ƒ Delete/update tuples with referencing FK values recursively
ƒ If the child tuple should be kept (this is not always possible (e.g.,
existence dependency)), set the value of FK to NULL or to the default
value

55

Relational Model Anwendersoftware (AS)

Relational Invariants (3)

• How to Handle NULL Values?

⇒ special and particular semantics of NULL values

ƒ Three-valued logic:
- TRUE
- FALSE
- UNKNOWN (?)

ƒ Setting: NULL ≠ NULL (e.g., in case of join)


ƒ In case of operations:
a.) ignore, if NULL or
b.) compare only non-null values

56

28
Relational Model Anwendersoftware (AS)

4. Relational Model: Overview

• Basic Concepts
• Mapping the Entity-Relationship Model
to the Relational Data Model
ƒ Fundamental Concepts
ƒ ER-to-Relational Mapping
ƒ Mapping of Generalization and Aggregation
ƒ Mapping of Relationships
ƒ Relational Invariants
• Referential Integrity
ƒ Basic Concepts
ƒ Referential Actions

57

Relational Model Anwendersoftware (AS)

Referential Integrity (1)

• Referential Integrity:
the DB must not contain any unmatched foreign key values
• SQL2 Standard Introduces “Referential Actions”
• More exact specification of referential actions for each
foreign key (FK)
1. Are “NULLs” forbidden?
NOT NULL
2. Deletion rule on destination table (referenced table)
ON DELETE
{CASCADE | RESTRICT | SET NULL | SET DEFAULT | NO ACTION}
3. Update rule on destination primary key or candidate key
ON UPDATE
{CASCADE | RESTRICT | SET NULL | SET DEFAULT | NO ACTION}

58

29
Relational Model Anwendersoftware (AS)

Referential Integrity (2)

• RESTRICT:
operation is restricted to the case where no associated records (FK
values) exist
• CASCADE:
operation "cascades" to all associated records
• SET NULL:
FK is set to NULL in all associated records
• SET DEFAULT:
FK is set to an user-defined default value in every associated record
• NO ACTION:
no referential action is executed on the specified reference; the
referential integrity will be checked when the execution of all referential
actions is completed (temporary violation of referential integrety is
possible)

59

Relational Model Anwendersoftware (AS)

Effects of Referential Actions (1)

Referential Actions:
ON DELETE {CASCADE | RESTRICT | SET NULL | SET DEFAULT | NO ACTION}
ON UPDATE {CASCADE | RESTRICT | SET NULL | SET DEFAULT | NO ACTION}

1. Isolated View of STUDENT – FC

60

30
Relational Model Anwendersoftware (AS)

Effects of Referential Actions (2)


FC FCNO FCNAME
FC9 BUSINESS ADMINISTRATION
FC5 COMPUTER SCIENCE

STUDENT MATNR SNAME FCNO


123766 COY FC 9
225332 MILLER FC 5
654711 BERNSTEIN FC 5
226302 CANETTI FC 9

• Operations
ƒ Delete FC (FCNO=FC5)
ƒ Update FC ((FCNO=FC9) Æ (FCNO=FC10))
• Referential actions
ƒ DC, DSN, DSD, DR, DNA
ƒ UC, USN, USD, UR, UNA
• Unambiguous operations?
61

Relational Model Anwendersoftware (AS)

Effects of Referential Actions (3)


2. Isolated View of STUDENT - EXAM – PROF
Prof Student

PNO MATNR

Exam
• Example DB
STUDENT MATNR SNAME
123766 COY
654711 BERNSTEIN

PROF PNO PNAME


1234 Härder
4711 Miller

EXAM PNO MATNR SUBJECT


4711 123766 OR
1234 654711 DB
1234 123766 DB 62

31
Relational Model Anwendersoftware (AS)

Effects of Referential Actions (4)

• Use of
ƒ USN, DSN Î violation of keys
ƒ USD, DSD Î sometimes ambiguous
ƒ UNA, DNA Î identical effects with UR, DR

• Effects of update operations


ƒ Compatibility of referential actions
Î independent referential relationships can be defined independently

Prof Student

PNO MATNR

Exam

63

Relational Model Anwendersoftware (AS)

Effects of Referential Actions (5)


3. Complete Example

DB operation: Delete FC (FCNO=FC9)


left first right first
- deletion in FC - deletion in FC
- deletion in PROF - deletion in STUDENT
- deletion in EXAM - deletion to EXAM
- deletion in STUDENT - deletion in PROF
- deletion in EXAM - deletion in EXAM

Î the result is independent from the processing order of the referential actions
Î a definite DB state is reached

• Multiple combination of referential actions are possible


(e.g., DSD, UC or DC, DSN)
• If all update operations are unambiguously ÆSecure Schema
64

32
Relational Model Anwendersoftware (AS)

Processing of Update Operations

• Times to check referential integrity


ƒ IMMEDIATE -- at statement execution
ƒ DEFERRED -- at EOT (end of transaction)

• Execution of Referential Actions


ƒ In SQL user operations are always atomic
- Tuple-oriented processing model
- Set-oriented processing model

• In Case of Cyclic Referential Paths


ƒ at least one foreign key in a cycle must allow "NULL" values or
ƒ referential integrity has to be checked deferred (e.g., at COMMIT)

• IMMEDIATE-conditions must be valid at statement boundaries


(→set-oriented update)

65

Relational Model Anwendersoftware (AS)

Summary

66

33
Relational Model Anwendersoftware (AS)

Relational Model – Summary (1)

• Data Structure
Table (Relation)
⇒ the only data structure (besides atomic values)
⇒ representation of information solely by values
⇒ integrity constraints on and between tables: relational invariants

• Mapping of Relationships Using PK/CK – FK


ƒ Principally, every relationship type has to be represented using
many-to-many relationships
ƒ Only a limited mapping of cardinality restrictions is possible

67

Relational Model Anwendersoftware (AS)

Relational Model – Summary (2)

• Maintaining Referential Integrity


ƒ SQL2/3 provides referential actions with extensive options
ƒ If a static analysis of the schema is too restrictive the control of
referential actions at runtime is necessary

• Assessment of Abstraction Concepts


ƒ Abstraction concepts are not provided
(only classification of tuples in tables is partially supported)
ƒ Limited mapping of abstraction concepts

68

34
5. Relational Algebra
• Classic Set of Operations
• Table Operations
• Rewrite Rules
• Algebraic Optimization

Languages of Relational Model (1)


Data Model = data objects + operators
Support Different Classes of Users
Provide Uniform Language for:

• Tasks of data management


ƒ data definition
ƒ Queries
ƒ data manipulation
ƒ access, integrity, and transaction control

• Application
ƒ ‘stand-alone’ mode (ad-hoc queries)
ƒ in a host language (embedded DB statements)

1
Languages of Relational Model (2)

Four Different Basic Types:


• Relational Algebra (e.g., ISBL)
• Relational Calculus (e.g., Alpha)
• Mapping-oriented languages (e.g.,. SQL)
• Graphic-oriented languages (e.g., Query-by-Example)

Relational Algebra
A system, which consist of a non-empty set and a family of operations,
is called algebra.

Object:
The obligatory elements to form the sets are the tables.

Operations:
Operations on tables have one or more tables as input and produce a
table as output (closure property)

2
Relational Algebra (cont.)

• Classic Set Operations


ƒ Union
ƒ Set difference
ƒ Cartesian product
ƒ Set intersection (derivable)
• Table Operations:
ƒ Projection
ƒ Restriction (selection)
ƒ Join (derivable)
ƒ Division (derivable)
⇒ Expressive power with regard to retrieval corresponds to first
order predicate calculus (“relational complete”)

Relational Algebra (cont.)


Auxiliary Concepts:

R, S are tables over D1 × D2 × … × Dn

Union-compatibility:
Same domain - same degree
Attribute sequences:

⇒ D(Ai) = D(Bi) : i = 1,..., n


Concatenation
Concatenation of two tuples
d = <d1, d2, . . . , dn>,
e = <e1, e2, . . . , em>
d|e = <d1, d2, . . . , dn , e1, e2, . . . , em>

3
Examples

Classic Set of Operations (1)


1. Union of R and S R∪S

R ∪ S = { t | t ∈ R ∨ t ∈ S}

2. Set Difference R–S

R – S = { t | t ∈ R ∧ t ∉S }

In addition (redundant set operations):

3. Set Intersection R∩S

R ∩ S = R – (R – S)
={t|t∈R∧t∈S}

RΔS
4. Symmetric Difference (XOR)
R Δ S = (R ∪ S) − (R ∩ S)
= ((R ∪ S) − (R –( R – S )))
= { t| t ∈ R ⊕ t ∈ S}
8

4
Classic Set of Operations (2)
5. (Extended) Cartesian Product
R (degree r) and S (degree s) arbitrary

C =R × S
= {k | ∃ x ∈ R, y ∈ S: (k = x | y)}

k = x|y = <x1, . . . , xr, y1, . . . , ys>


not <<x1, . . . , xr>, <y1, . . . , ys>> like the usual Cartesian product

Special Table Operations (1)


6. Projection (PROJECT)
• Selection of the columns with numbers j1, j2, . . . , jk ∈ {1, 2, . . . , n} from
a table R with degree n

L = ( ji | i=1,..,k )
P = πL( R )
= { p |∃ t ∈ R: p = 〈t [ j1] , t [ j2 ], …, t [ jk ] 〉}

Duplicates are removed!


• Alternative: use of column names
P = π Aj1, Aj2, … Ajk (R )

• Example:
P = πFORENAME, SURNAME, … , SALARY(EMP)

10

5
Special Table Operations (2)
7. Restriction (Selection)
• Selection of rows of a table by means of predicates, short σP;
P = log. formula (without quantifiers!) composed of:

ƒ constant values
operands
column numbers or names

ƒ Θ ∈ {< , = , > , ≤ , ≠, ≥}

ƒ ∨,∧,¬

T = σP ( R ) = { t | t ∈ R ∧ P(t )}

• Examples:
σ DNO = ’K55’ ∧ SALARY > 50000 (EMP)
σ SALARY > COMMISSION (EMP)
11

Special Table Operations (3)

8. JOIN and Θ -JOIN


• Informal description:
Cartesian product of two tables R (degree r) and S (degree s),
restricted by Θ-conditions between column no. i of R and column no. j
of S.
Let Θ ∈ {<, =, >, ≤, ≠, ≥} (arithmetic comparison)
Θ-Join between R and S:

V = RiΘjS
= σiΘ r+j (R × S)

• Remarks:
(1) Special case Θ = ’=’ : equijoin
(2) Instead of i and j: column names A and B
e.g.: R iΘj S ≡ RAΘBS
(3) An equijoin between R and S is called lossless, if all rows of R
and S participate in the join. The inverse operation, projection,
recreates R and S (lossless join)
12

6
Equijoin (Example)

• Application of:
DEPT EMP
DNO = DNO

DEPT DNO DNAME LOC EMP ENO AGE DNO


K51 planning KL 406 47 K55
K53 purchase F 123 32 K51
K55 sales who F 829 36 K53
574 28 K55

R = DEPT EMP DNO DNAME LOC ENO AGE DNO’

13

Equijoin (Example) (cont.)


• Lossless equijoin:
π DNO, DNAME, LOC (R) = DEPT
π ENO, AGE, DNO’ (R) = EMP

• Lossy Equijoin
if rows in DEPT or EMP do not have join partners, e.g. (K56, finance, M) in
DEPT or (471, 63, -) in EMP, then π as inverse operation does not yield DEPT or
EMP

14

7
Relational Algebra –Example-DB
DEPT DNO DNAME LOC
K51 PLANNING KAISERSLAUTERN
K53 PURCHASE FRANKFURT
K55 SALES FRANKFURT

EMP ENO ENAME AGE SALARY DNO MNO


406 COY 47 50 700 K55 123
123 MILLER 32 43 500 K51 -
829 SMITH 36 45 200 K53 777
574 TAYLOR 28 36 000 K55 123

• Find all employees (ENO, ENAME) in department K55 earning more than
40000$

• Find all Employees (ENO, AGE, DNAME), who work in a department in


Frankfurt and who are older than 30.

15

Rename Operation
• Allows us to name, and therefore to refer to, the results of relational-
algebra expressions.
• Allows us to refer to a table by more than one name.
• If a relational-algebra expression E has arity n, then

ρ x(A1,A2,...,An) ( E)

returns the result of expression E under the name x, and with the
attributes renamed to A1, A2, ..., An.

16

8
Formal Definition

• A basic expression in the relational algebra consists of


either one of the following:
ƒ A table in the database
ƒ A constant table

• Let E1 and E2 be relational-algebra expressions; the


following are relational algebra expressions as well:
ƒ E1 ∪ E2
ƒ E1 - E2
ƒ E1 x E2
ƒ σP( E1), P is a predicate on attributes in E1
ƒ πS( E1), S is a list consisting of some of the attributes in E1
ƒ ρx ( E1), x is the new name for the result of E1

17

Assignment Operation
• The assignment operation (←) provides a convenient way to express
complex queries; write query as a sequential program consisting of a
series of assignments followed by an expression whose value is
displayed as the result of the query.

• Assignment must always be made to a temporary table variable.


ƒ The result to the right of the ← is assigned to the table variable on the left
of the ←.
ƒ Assigned variable may be used in subsequent expressions.

• Example: Write r ÷ s as
ƒ temp1 ← π R - S (r)
ƒ temp2 ← π R - S ((temp1 x s) – π R - S,S( r))
ƒ Result = temp1 - temp2

18

9
Special Table Operations

9. Natural Join
• Informal description: Equijoin over all corresponding columns and
projection over the different columns
• Given: R(A1, A2, . . . , A r-j+1, . . . , Ar)
S(B1, B2, . . ., Bj, . . . , Bs)
and without limitation of the general case:(otherwise reorder)
B1 = A r-j+1
B2 = A r-j+2
Bj = Ar
Natural join between R and S:

N = R S
= π A1, … ,Ar, B j+1, … ,Bs σ (R.A r–j +1 = S.B1) ∧ … ∧ (R.Ar = S.Bj ) (R × S)

= symbol for natural join⇒ Θ = ’=’


• Remark:
Columns are given by correspondence condition
19

Natural Join (Example)


• Application of:
DEPT EMP
DEPT DNO DNAME LOC EMP ENO AGE DNO
K51 PLAN. KL 406 47 K55
K53 PURCH. F 123 32 K51
K55 SALES F 829 36 K53
574 28 K55
DE = DEPT EMP DNO DNAME LOC ENO AGE
K51 PLAN. KL 123 32
K53 PURCH. F 829 36
K55 SALES F 406 47
K55 SALES F 574 28

Î lossless natural join: π DNO, DNAME, LOC (DE) = DEPT


π ENO, AGE, DNO (DE) = EMP
• Lossy Natural Join is Analogous to Equijoin

20

10
Natural Join (Example) (cont.)
• Is the join always the inverse operation to the projection (π)?
• Example 1 (1:n): !

DE1 = π DNO, DNAME, LOC (DE) DE3 = DE1 DE2 = DE


DE2 = π ENO AGE, DNO (DE)

• Example 2 (n:m):
ACTOR ( ENO, ROLE, LOC)
P1 Faust MA
P1 Mephisto KL
P2 Wallenstein MA

21

Natural Join (Example) (cont.)

ACT1 = π ENO, LOC (ACTOR) ACT2 = π ROLE, LOC (ACTOR)

= ACT1 ENO LOC = ACT2 ROLE LOC

ACT3 = ACT1 ACT2


= ACT3 ENO ROLE LOC

΄connection trap“ in case of projection of parts of a key and


subsequent join

22

11
Outer join (1)

• Goal: Enforce lossless join


Ex.: The result of R S T should also yield partial
objects
(e.g. complex objects)

• So far: R S T yields only "complete objects„


• Trick: Insert a special empty row to generate artificial join
partners

23

Outer join (2)

• Def.: Let A be the join columns, {≡} the undefined value


and
R’ := R ∪ ((πA(S) - πA(R)) × {≡} × ... × {≡})
S’ := S ∪ ((πA(R) - πA(S)) × {≡} × ... × {≡})

outer equijoin
R S := R’ S’
R.A = S.A R’.A = S’.A

outer natural join


R S := R’ S’

⇒ twofold application of outer equijoin


R S T
yields the desired result for the above example

24

12
Variants of the Outer Equijoin (1)
• Left Outer Equijoin
In case of this operation the left argument table remains lossless, i.e. if
necessary a row is filled with NULL values “to the right”.

left outer equijoin


R S := R S’
R.A = S.A R.A = S’.A
• The application of
R T S
yields the following graphically depicted result:

25

Variants of the Outer Equijoin (2)


• Right Outer Equijoin
Analogously the right argument table remains lossless; missing partner rows
are replaced by NULL values “to the left”.

right outer equijoin


R S := R’ S
R.A = S.A R’.A = S.A

26

13
Variants of the Outer Equijoin (3)
• Correspondingly
R S T
yields the following result:

27

Variants of the Outer Equijoin (4)

• Summary
ƒ The application of the outer equijoin
R S T
yields the maximum of information relative to the sequence of
operations. Even isolated rows are expanded to a path
- The left outer equijoin returns only paths, which are defined
at the “left border”.
- The right outer equijoin returns only paths, which are defined
at the “right border”.

ƒ The use of the equijoin with


R S T
yields the minimum of information;
only completely defined paths are added to the result.

28

14
Examples of the Outer Equijoin (1)

• Equijoin

R A B C S C D E RES A B C D E
a1 b1 c1 c1 d1 e1 = a1 b1 c1 d1 e1
a2 b2 c2 c3 d2 e2

• Left outer equijoin


R A B C S C D E RES A B C D E
a1 b1 c1 c1 d1 e1 = a1 b1 c1 d1 e1
a2 b2 c2 c3 d2 e2 a2 b2 c2 -- --

29

Examples of the Outer Equijoin (2)


• Right outer equijoin

R A B C S C D E RES A B C D E
a1 b1 c1 c1 d1 e1 = a1 b1 c1 d1 e1
a2 b2 c2 c3 d2 e2 -- -- c3 d2 e2

• Outer equijoin

R A B C S C D E RES A B C D E
a1 b1 c1 c1 d1 e1 = a1 b1 c1 d1 e1
a2 b2 c2 c3 d2 e2 a2 b2 c2 -- --
-- -- c3 d2 e2

30

15
Further Outer Operations (1)
• Outer Union
This operation allows the union of two tables that are not union-compatible.
If two tables are partially compatible, i.e. some of their columns are union-
compatible, then the outer union operation can be applied.
Example:

STUDENT MATNR FACULTY TERM RES.ASSI MATNR FACULTY JOB


123 FB5 5 456 FB5 Tutor
789 FB9 9 987 FB9 Prog.

31

Further Outer Operations (2)


OUTER UNION

STUDASSI MATNR FACULTY TERM JOB


123 FB5 5 -
789 FB9 9 -
456 FB5 - Tutor
987 FB9 - Prog.

⇒ The result might be very difficult to interpret

• Similarly, further operations may be introduced:


ƒ outer intersection
ƒ outer difference
⇒ These operations do not seem to be very useful

32

16
Division (1)
• Goal:
ƒ Answering queries in which a whole table is used to qualify rows.
ƒ Simulation of universal quantification ⇒ a row of R is with all rows of S in a
specific relation.

• Def.:
Let R be of degree r and let S be of degree s, r > s and s ≠ 0.
t be a (r-s)-row, u be a s-row.
Furthermore let S-columns ⊂ R-columns
Then the following holds:

R ÷ S = { t | ∀ u ∈ S: ( t|u ∈ R)}

33

Division (2)
• Description of the Division With the Basic Operations

T = π 1 ,2, … , r – s ( R )
W=(T×S)–R
V = π 1, 2, … ,r – s ( W )

R÷S=T–V
= π 1, 2 …, r – s ( R ) –
π 1, 2, … r – s (( π 1, 2, … ,r – s (R) × S ) – R )

• The following holds: ( R × S) ÷ S = R

34

17
Generalized Projection

• Extends the projection operation by allowing arithmetic


functions to be used in the projection list.

Π F1, F2, ..., Fn(E)

• E is any relational-algebra expression

• Each of F1, F2, ..., Fn are arithmetic expressions involving


constants and attributes in the schema of E.

• Given table credit-info( customer-name, limit, credit-


balance), find how much more each person can spend:
Π customer-name, limit - credit-balance (credit-info)
35

Banking Example
branch ( branch-name, branch-city, assets )
customer ( customer-name, customer-street, customer-city )
account ( branch-name, account-number, balance )
loan ( branch-name, loan-number, amount )
depositor ( customer-name, account-number )
borrower ( customer-name, loan-number )

36

18
Example Queries

• Find all loans of over $1200


σ amount > 1200 ( loan)

• Find the loan number for each loan of an amount greater


than $1200
π loan number (σ amount > 1200 ( loan))

• Find the names of all customers who have a loan, an


account, or both, from the bank.
π customer-name ( borrower)∪ π customer-name (depositor)

37

Example Queries (cont.)

• Find the names of all customers who have a loan at the


Perryridge branch.
π customer-name (σ branch-name= “Perryridge”
(σ borrower.loan-number=loan.loan-number( borrower x loan)))

• Find the names of all customers who have a loan at the


Perryridge branch but do not have an account at any
branch of the bank.
π customer-name (σ branch-name= “Perryridge”
(σ borrower.loan-number=loan.loan-number( borrower x loan))
-π customer-name( depositor)

38

19
Example Queries (cont.)
• Find the names of all customers who have a loan at the Perryridge
branch.
ƒ Query 1
π customer-name(σ branch-name = “Perryridge”
(σ borrower.loan-number = loan.loan-number( borrower x loan)))
ƒ Query 2
π customer-name (σ borrower.loan-number = loan.loan-number(
(σ branch-name = “Perryridge” ( borrower)) x loan))

• Find the largest account balance


ƒ Rename account table as d
π balance ( account) – π account.balance
(σ account.balance < d.balance (account x ρd ( account)))

39

Example Queries (cont.)


• Find all customers who have an account from at least the “Downtown”
and “Uptown” branches.
π CN (σ BN= “Downtown” ( depositor account)) ∩
π CN(σ BN = “Uptown” ( depositor account))

where CN denotes customer-name and BN denotes branch-name


π customer-name, branch-name ( depositor account)
÷ρ temp(branch-name)({(“ Downtown”), (“ Uptown”)})

• Find all customers who have as account at all branches located in


Brooklyn
π customer-name, branch-name ( depositor account)
÷ π branch-name ( σ branch-city = “Brooklyn” ( branch))

40

20
Aggregate Functions

• Aggregation operator γ takes a collection of values and


returns a single value as a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values

G1, G2, ..., Gn γ F1 A1, F2 A2, ..., Fm Am ( E)

ƒ E is any relational-algebra expression


ƒ G1, G2, ..., Gn is list of attributes on which to group
ƒ Fi is an aggregate function
ƒ Ai is an attribute name

41

Aggregate Functions (Example)

Example:
• Table r

• Sum c(r)

42

21
Aggregate Functions (Example) (cont.)
• Table account grouped by
branch-name:

• branch-name γ sum balance(account)

43

Modification of the Database - Deletion

• A delete request is expressed similarly to a query, except


instead of displaying tuples to the user, the selected
tuples are removed from the database.

• Can delete only whole tuples; cannot delete values on only


particular attributes.

• A deletion is expressed in relational algebra by:


r ←r–E

where r is a table and E is a relational algebra query.

44

22
Modification of the Database - Deletion (cont.)

Examples:
• Delete all account records in the Perryridge branch.
account ← account – σ branch-name = “Perryridge” (account)

• Delete all loan records with amount in the range 0 to 50.


loan ← loan- σ amount ≥ 0 and amount ≤ 50 (loan)

• Delete all accounts at branches located in Needham.


r1← σ branch-city=”Needham” ( account branch)
r2←∏ branch-name, account-number, balance( r1)
r3← ∏ customer-name, account-number( r2 depositor)

account← account - r2
depositor← depositor - r3

45

Modification of the Database – Insertion

• To insert data into a table, we either:


ƒ specify a tuple to be inserted
ƒ write a query whose result is a set of tuples to be inserted

• In relational algebra, an insertion is expressed by:

r←r∪E

where r is a table and E is a relational algebra expression.

• The insertion of a single tuple is expressed by letting E be


a constant table containing one tuple.

46

23
Modification of the Database – Insertion (cont.)

Examples:

• Insert information in the database specifying that Smith


has $1200 in account A-973 at the Perryridge branch.
account ← account ∪ {(“Perryridge”, A-973, 1200)}
depositor ← depositor ∪ {(“Smith”, A-973)}

• Provide as a gift for all loan customers in the Perryridge


branch, a $200 savings account. Let the loan number serve
as the account number for the new savings account.
r1 ← ( σ branch-name=”Perryridge”( borrower loan))
account ← account ∪ ∏ branch-name, loan-number, 200 (r1)
depositor← depositor ∪ ∏ customer-name, loan-number (r1)

47

Modification of the Database – Updating

• A mechanism to change a value in a tuple without


changing all values in the tuple

• Use the generalized projection operator to this task

r←∏ F1 , F2 , ..., Fn ( r )

ƒ Each Fi is either the ith attribute of r, if the ith attribute is not updated,
or, if the attribute is to be updated Fi is an expression, involving only
constants and the attributes of r, which gives the new value for the
attribute

48

24
Modification of the Database – Updating (cont.)

Examples:
• Make interest payments by increasing all balances by 5
percent.
account ←∏ BN,AN,BAL← BAL *1.05 ( account)

where BAL, BN and AN stand for balance, branch-name and account-


number, respectively.
• Pay all accounts with balances over $10,000 6 percent
interest and pay all others 5 percent.

account ← ∏ BN,AN,BAL← BAL *1.06 (σ BAL > 10000 ( account))


∪ ∏ BN,AN,BAL← BAL *1.05 ( σ BAL ≤ 10000 ( account))

49

Views (1)
• In some cases, it is not desirable for all users to see the entire logical
model
(i.e., all the actual tables stored in the database.)

• Consider a person who needs to know a customer’s loan number but


has no need to see the loan amount. This person should see a table
described, in the relational algebra, by
∏ customer-name, loan-number ( borrower loan )

• Any table that is not part of the conceptual model but is made visible
to a user as a “virtual table” is called a view.

50

25
Views (2)

View Definition:
• A view is defined using the create view statement which has the form

create view v as < query expression >

where <query expression> is any legal relational algebra query


expression.
The view name is represented by v.

• Once a view is defined, the view name can be used to refer to the
virtual table that the view generates.

• View definition is not the same as creating a new table by evaluating


the query expression. Rather, a view definition causes the saving of an
expression to be substituted into queries using the view.

51

Views (3)
Example:
• Consider the view (named all-customer) consisting of branches and
their customers.

create view all-customers as


∏ branch-name, customer-name ( depositor account)
∪ ∏ branch-name, customer-name ( borrower loan)

• We can find all customers of the Perryridge branch by writing:

∏ customer-name (σ branch-name=”Perryridge” (all-customer))

52

26
Updates through Views (1)
• Database modifications expressed as views must be translated to
modifications of the actual tables in the database.

• Consider the person who needs to see all loan data in the loan table
except loan-amount. The view given to the person, branch-loan, is
defined as:
create view branch-loan as
∏branch-name, loan-name ( loan)
Since we allow a view name to appear wherever a table name is
allowed, the person may write:
branch-loan ← branch-loan ∪ {( “Perryridge”, L-37)}

53

Updates through Views (2)

• The previous insertion must be represented by an insertion


into the actual table loan from which the view branch-loan
is constructed.

• An insertion into loan requires a value for amount. The


insertion can be dealt with by either
ƒ rejecting the insertion and returning an error message to the user
ƒ inserting a tuple (“Perryridge”, L-37, null) into the loan table

54

27
Views defined using other Views (1)

• One view may be used in the expression defining another


view
• A view table v1 is said to depend directly on a view table
v2 if v2 is used in the expression defining v1
• A view table v1 is said to depend on view table v2 if and
only if there is a path in the dependency graph from v2 to
v1.
• A view table v is said to be recursive if it depends on
itself.

55

Views defined using other Views (2)

View Expansion
• A way to define the meaning of views defined in terms of other views.
• Let view v1 be defined by an expression e1 that may itself contain
uses of view tables.
• View expansion of an expression repeats the following replacement
step:
repeat
Find any view table vi in e1
Replace the view table vi by the expression defining vi
until no more view tables are present in e1

• As long as the view definitions are not recursive, this loop will
terminate.

56

28
Summary – Relational Model (1)
• Algebra has the same expressive power as first order predicate
calculus
• Closed with regard to algebraic operations
• Classic set operations
• Table operations

57

Summary – Relational Model (2)

58

29
Relational Algebra – Optimization (1)

• Expressions in the relational algebra specify the order of


execution (procedural elements)
⇒ however, equivalent transformations between expressions are
possible
• Problem
- Given: expression in relational algebra (RA)
- Wanted: equivalent, most efficiently executable expression in
relational algebra
• Determination of an Execution Order that is as Efficient
as Possible (via the use of heuristics) for
ƒ Unary operations: π, σ
ƒ Binary operations: ∩, ∪, -, ×, ,÷,

59

Relational Algebra – Optimization (2)


• Characteristic Statistical Values are retrieved from the system tables
- Ni = Card(Ri)
- ji = number of different values Ai
• Algebraic Optimization (example):
Database:
DEP ( DNO, BUDGET, DLOC )
EMPL ( ENO, NAME, OCCUPATION, SALARY, AGE, DNO)
PM ( ENO,PNO), PERIOD, WORKINGTIMEPORTION)
PROJ ( PNO, TITLE, SUM, PLOC)

Query:
Find the names and jobs of employees whose department is located in
KL and who are assigned to a project in KL

60

30
Rewrite-Rules of the Relational Algebra (1)

Equivalence of Expressions
ƒ Transformation of expressions for more efficient evaluation
ƒ List of important rules
ƒ Ri: tables (or expressions in relational algebra)

1. Commutativity Law for Joins and Products


R1 F R2 ≡ R2 F R1
R1 R2 ≡ R2 R1
R1 × R2 ≡ R2 × R1
2. Associativity Law for Joins and Products
( R1 F1 R2) F2 R3 ≡ R1 F1 ( R2 F2 R3 )
(R1 × R2 ) × R3 ≡ R1× ( R2 × R3 )

61

Rewrite-Rules of the Relational Algebra (2)


3. Sequences of Projections
π A, B, C ( π A, B, C, … Z ( R )) = π A, B, C( R )

4. Sequences of Selections
σ F1 (σ F2 ( R )) = σ F1 ∧ F2( R )
( F1 ∧ F2 = F2 ∧ F1):
σ F1 ( σF2 ( R )) = σ F2 ( σF1 ( R ))

5. Exchange of Selections and Projections


F contains only attributes from A . . . Z:
σF ( π A , … , Z( R )) ≡ π A , … ,Z ( σF ( R ))

if F contains also attributes from B1 . . . Bm:


π A , … , Z ( σF ( R )) ≡ π A , …, Z ( σF ( πA , … ,Z ,B1 , … ,Bm ( R )))

62

31
Rewrite-Rules of the Relational Algebra (3)
6. Exchange of Selection and Cartesian Product
F contains only attributes from R1
σF ( R1 × R2 ) = σF ( R1 ) × R2

More general:
F = F1 ∧ F2 ∧ F3
F1: only attributes from R1
F2: only attributes from R2
F3: both

σF ( R1 × R2 ) = σF1( R1 ) F3 σF2 ( R2 )

63

Rewrite-Rules of the Relational Algebra (4)


7. Combination of Join Operations
ƒ Associativity and commutativity of union, set intersection, and join
T = R3 (R1 R2)
T = R2 (R1 R3)
T = R1 (R2 R3)

• The Problem of Join Ordering


ƒ Assumptions:
- lossless join
- j = # of values of the join column values are distributed equally
ƒ Expected values:
Each row of R2 (son) is joined with N(R1)/j rows of R1 (father):
in case of (n:m)-join: N(T1) = N(R2) ⋅ N(R1)/j ,
N(R1) > j
in case of (1:n)-join: N(R1) = j
⇒ N(T1) = N(R2)

64

32
Rewrite-Rules of the Relational Algebra (5)

• Determination of the Join Order:

⇒ Determine the join order in such a way that number and size of
intermediate objects are minimized

65

Rewrite-Rules of the Relational Algebra (6)


8. Order of Set Operations
ƒ Cardinality of union

MAX(N(R1) , N(R2)) < N(R1 ∪ R2) < N(R1) + N(R2)


ƒ Cardinality of intersection

0 < N(R1 ∩ R2) < MIN(N(R1) , N(R2))

ƒ Expectation:

ƒ Heuristic Rule:
ƒ ⇒ In case of set operations combine the smallest tables always first
66

33
Summary: Algebraic Optimization
Heuristic Rules:

1. Execute selections as early as possible


2. Execute projections (without duplicate elimination) early
3. Combine sequences of unary operations like selection and projection
4. Combine simple selections
5. Combine certain selections with a preceding Cartesian product to a
join
6. Evaluate common subtrees only once
7. Determine the join order in such a way that number and size of
intermediate objects are minimized
8. In case of set operations combine the smallest tables always first

67

34
SQL Anwendersoftware (AS)

Chapter 6. SQL

• Introduction
• Basic SQL Queries
• SQL Query Evaluation
• Advanced Retrieval
• Data Manipulation and Data Definition
• New Features in SQL:1999

SQL Anwendersoftware (AS)

Outline

• Introduction
• Basic SQL Queries
• SQL Query Evaluation
• Advanced Retrieval
• Data Manipulation and Data Definition
• New Features in SQL:1999

1
SQL Anwendersoftware (AS)

Database Language SQL

• Most commonly used language to query and modify


databases based on the relational model.
• SQL provides means to:
ƒ define data (DDL = data definition language)
ƒ access data (query language)
ƒ insert, update, and delete data
(DML = data manipulation language)
ƒ control access to data
ƒ specify assertions for semantic integrity control
ƒ embed SQL statements into host languages
• SQL is a descriptive language, i.e.,
ƒ it allows users to specify WHAT data they need and
NOT HOW it should be derived from the database.
3

SQL Anwendersoftware (AS)

Overview of Commands
• Data Manipulation (DML): • Data Control:
ƒ SELECT ƒ constraints definitions with
ƒ INSERT - CREATE TABLE
ƒ UPDATE - CREATE ASSERTION
- DROP ASSERTION
ƒ DELETE
ƒ GRANT
ƒ built-in functions: COUNT, SUM, AVG,
MAX, MIN ƒ REVOKE
• Data Definition (DDL): ƒ COMMIT
ƒ CREATE SCHEMA ƒ ROLLBACK
ƒ CREATE DOMAIN • Embedded SQL:
ƒ CREATE TABLE ƒ DECLARE CURSOR
ƒ CREATE VIEW ƒ FETCH
ƒ ALTER TABLE ƒ CLOSE CURSOR
ƒ DROP SCHEMA ƒ SET CONSTRAINTS
ƒ DROP DOMAIN ƒ SET TRANSACTION
ƒ DROP TABLE ƒ CREATE TEMPORARY TABLE
ƒ DROP VIEW

2
SQL Anwendersoftware (AS)

SQL History

• Since 1974 many language designs:


ƒ SQUARE: Specifying Queries As Relational Expressions
ƒ SEQUEL: Structured English Query Language (System R, IBM)
ƒ QUEL: Query Language (Ingres, Relational Technology)
ƒ RDML (Rdb/VMS, Digital Equipment)
ƒ OLQ, PRTV, . . .
• SEQUEL/2 a revised version of SEQUEL was later called SQL
• First products based on SQL:
ƒ Oracle V.2 (1979) by Relational Software Inc. (later Oracle)
ƒ SQL/DS (1981) and DB2 (1983) by IBM
• SQL became the "de facto" standard in the relational world

SQL Anwendersoftware (AS)

SQL Standards (1 of 2)

• ANSI X3.135-1986 Database Language SQL 1986


• ISO 9075-1987 Database Language SQL 1987
• SQL-89
ƒ ANSI X3.135-1989 and ISO 9075:1989 1989 1989
ƒ ANSI X3.168-1989 Database Language Embedded SQL
• SQL 2, SQL-92
ƒ ANSI X3.135-1992 and ISO/IEC 9075:1992 Information 1992
Systems - Database Language - SQL
ƒ ISO 9075-3:1995 Call-Level Interface (SQL/CLI) 1995
ƒ ISO 9075-4:1996 Persistent Stored Modules (SQL/PSM) 1996
ƒ ISO 9075-10:1998 Object Language Bindings (SQL/OLB) 1998

3
SQL Anwendersoftware (AS)

SQL Standards (2 of 2)

• SQL3, SQL:1999
ƒ ISO/IEC 9075-x:1999 1999
- Part 1: Framework (SQL/Framework)
- Part 2: Foundation (SQL/Foundation)
- Part 3: Call-Level Interface (SQL/CLI)
- Part 4: Persistent Stored Modules (SQL/PSM)
- Part 5: Host Language Bindings (SQL/Bindings)
ƒ ISO/IEC 9075-x:2000 2000
- Part 9: Management of External Data (SQL/MED)
- Part 10: Object Language Bindings (SQL/OLB)
- Part 13: SQL Routines and Types Using the Java TM
Programming Language (SQL/JRT)

SQL Anwendersoftware (AS)

Outline

• Introduction
• Basic SQL Queries
• SQL Query Evaluation
• Advanced Retrieval
• Data Manipulation and Data Definition
• New Features in SQL:1999

4
SQL Anwendersoftware (AS)

Examples of Tables
parts usage
partno version projectno part_description partno version uses_ uses_ quantity
partno version
P050 1.0 PJ23 bodywork
P050 1.0 P101 1.0 1
P050 2.0 PJ23 bodywork
P050 1.0 P102 1.2 2
P101 1.0 PJ23 front body section
P050 1.0 P103 1.2 2
P101 1.1 PJ23 front body section
P050 1.0 P104 1.2 2
P101 2.0 PJ23 front body section
P050 1.0 P111 1.0 2
P102 1.2 PJ23 a column
P050 2.0 P101 1.1 1
P103 1.2 PJ23 b column
P050 2.0 P102 1.2 2
P104 1.2 PJ23 c column
P050 2.0 P103 1.2 2
P111 1.0 PJ15 rear wing
P050 2.0 P104 1.2 2
P111 1.2 PJ15 rear wing
P050 2.0 P111 1.2 2
P112 1.0 PJ15 front wing
P050 2.0 P112 1.0 2

projects
projectno manager description budget
PJ23 Miller main bodywork team 1 000 000
PJ15 Maynard specialized wings 100 000
PJ47 Morris electronics 500 000

SQL Anwendersoftware (AS)

Basic Structure of a SQL Query

• SQL is based on set operations and relational operations with


some modifications and enhancements.
• A typical SQL query has the form:

SELECT A1, A2, ..., An


FROM r1, r2, ..., rm
WHERE P

Where:
ƒ Ai represent attributes
ƒ ri represent tables
ƒ P is a predicate.
• The result of a SQL query is a table.
10

5
SQL Anwendersoftware (AS)

SELECT Clause (1 of 3)

• The SELECT clause is used to list the attributes desired in the


result of a query.
• SQL allows duplicates in tables as well as in query results.
• Example:
Find the names of all managers. manager
Miller
SELECT manager Maynard
FROM projects
Morris

• An asterisk in the select clause denotes "all attributes":


SELECT *
FROM projects

11

SQL Anwendersoftware (AS)

SELECT Clause (2 of 3)

• To force the elimination of duplicates, insert


projectno
the keyword DISTINCT after SELECT.
PJ23
• Example:
PJ23
Find the numbers of all projects in table
PJ23
part and remove duplicates. projectno PJ23
SELECT DISTINCT projectno PJ23 PJ23
FROM parts PJ15 PJ23

• The keyword ALL specifies that duplicates PJ23


should not be removed. PJ23
PJ15
• Example:
PJ15
SELECT ALL projectno
PJ15
FROM parts
12

6
SQL Anwendersoftware (AS)

SELECT Clause (3 of 3)

• The SELECT clause can contain arithmetic expressions


involving the operators, +, -, *, and /, and operating on
constants or attributes of tuples.
• Example:
SELECT projectno, budget * 1.1 AS new_budget
FROM projects

• The query returns the budget of


each project increased by 10%. projectno new_budget
PJ23 1 100 000
PJ15 110 000
PJ47 550 000

13

SQL Anwendersoftware (AS)

WHERE Clause (1 of 2)

• The WHERE clause allows to select certain rows that meet a


set of predicates. Each predicate is defined on attributes of
the tables that appear in the FROM clause.
• SQL uses the logical connectives AND, OR, and NOT. It
allows the use of arithmetic expression as operands to the
comparison operators.
• Example:
Find all projects that have a manager named 'Miller' and a
budget of more than $ 100 000.
SELECT projectno
FROM projects
WHERE manager = 'Miller'
AND budget > 100000
14

7
SQL Anwendersoftware (AS)

WHERE Clause (2 of 2)

• SQL includes a BETWEEN comparison operator in order to


simplify WHERE clauses that specify that a value be less than
or equal to some value and greater than or equal to some
other value.
• Example:
Find all projects that have a budget between $ 200,000 and
$ 1,000,000.

SELECT projectno projectno


FROM projects
PJ23
WHERE budget BETWEEN 200000
PJ47
AND 1000000

15

SQL Anwendersoftware (AS)

FROM Clause

• The FROM clause corresponds to the Cartesian product. It


lists the tables to be scanned in the evaluation of the query.
• Example: SELECT *
FROM parts, projects

partno version projectno description projectno manager description budget


P050 1.0 PJ23 bodywork PJ23 Miller Main bodywork team 1000000
P050 1.0 PJ23 bodywork PJ15 Maynard Specialized wings 100000
P050 1.0 PJ23 bodywork PJ47 Morris Electronics 500000
P050 2.0 PJ23 bodywork PJ23 Miller Main bodywork team 1000000
P050 2.0 PJ23 bodywork PJ15 Maynard Specialized wings 100000
P050 2.0 PJ23 bodywork PJ47 Morris Electronics 500000
P101 1.0 PJ23 front body section PJ23 Miller Main bodywork team 1000000
P101 1.0 PJ23 front body section PJ15 Maynard Specialized wings 100000
P101 1.0 PJ23 front body section PJ47 Morris Electronics 500000
P101 1.1 PJ23 front body section PJ23 Miller Main bodywork team 1000000
… … … … … … … …

16

8
SQL Anwendersoftware (AS)

Join Operation (1 of 2)

• Example:
List all parts of version 1.0 and the manager that is
responsible for the corresponding project.
SELECT partno, version, manager join condition
FROM parts, projects
WHERE parts.projectno = projects.projectno
AND version = '1.0'

partno version manager predicate on


P111 1.0 Maynard table parts
P112 1.0 Maynard
P050 1.0 Miller
P101 1.0 Miller
17

SQL Anwendersoftware (AS)

Join Operation (2 of 2)

• There are several ways to provide the join condition.


• Example:
SELECT partno, version, manager
join condition
FROM parts, projects
WHERE parts.projectno = projects.projectno
or
FROM parts JOIN projects
ON parts.projectno = projects.projectno
or
join on all columns
FROM parts NATURAL JOIN projects
or in both tables that
FROM parts JOIN projects share the same name
USING(projectno) join on given columns

WHERE version = '1.0'


18

9
SQL Anwendersoftware (AS)

Tuple Variables

• Tuple variables are defined in the FROM clause using an AS


clause.
• Example:
Find all projects that have a budget higher than the project
of manager Maynard.
SELECT pj1.projectno, pj1.budget join condition
FROM projects AS pj1, projects AS pj2
WHERE pj1.budget > pj2.budget
AND pj2.manager = 'Maynard'

projectno budget
PJ23 1000000
PJ47 500000
19

SQL Anwendersoftware (AS)

String Operations

• SQL includes a string-matching operator for comparisons on


character strings. Patterns are described using two special
characters:
ƒ percent (%). The % character matches any substring.
ƒ underscore (_). The _ character matches any character.
• Example:
Which parts description includes 'wing'?
SELECT DISTINCT partno, description
FROM parts
WHERE description like '%wing%'
partno description
P112 front wing
P111 rear wing
20

10
SQL Anwendersoftware (AS)

Ordering of Result Sets

• The ORDER BY clause allows to define an order for the rows


of a result set.
• Specify DESC for descending order or ASC for ascending
order, for each attribute; ascending order is the default.
• Without an ORDER BY clause, the description
order is defined by the system. rear wing
• Example: front wing
List the description of all parts in front body section
descending order. c column
SELECT DISTINCT description bodywork
FROM parts
b column
ORDER BY description DESC
a column
21

SQL Anwendersoftware (AS)

Aggregate Functions (1 of 2)

• SQL provides functions that operate on the multiset of values


of a table column, and return a value:
ƒ AVG: average value
ƒ MIN: minimum value
ƒ MAX: maximum value
ƒ SUM: sum of values
ƒ COUNT: number of rows or number of values
• Example:
Find the average budget of all projects.
SELECT AVG(budget)
FROM projects

22

11
SQL Anwendersoftware (AS)

Aggregate Functions (2 of 2)

• Example:
How many rows are in table parts?
SELECT COUNT(*) AS number_of_rows number_of_rows
FROM parts 11

• Example:
How many different parts are stored in table parts?
SELECT COUNT(DISTINCT partno) AS number_of_parts
FROM parts

number_of_parts
7

23

SQL Anwendersoftware (AS)

GROUP BY Clause (1 of 2)

• The GROUP BY clause allows to apply aggregate functions to


groups of rows.
• Note: Columns in the SELECT clause outside of aggregate
functions must appear in the GROUP BY list.
• Example:
How many different versions do exist for each part?

SELECT partno, COUNT(*) AS number_of_versions


FROM parts
GROUP BY partno

24

12
SQL Anwendersoftware (AS)

GROUP BY clause (2 of 2)

Table parts Grouping Result


partno

P050
partno version projectno description P050
P050 1.0 PJ23 bodywork
P101 partno number_of
P050 2.0 PJ23 bodywork _versions
P101
P101 1.0 PJ23 front body section P050 2
P101
P101 1.1 PJ23 front body section P101 3
P101 2.0 PJ23 front body section P102 P102 1
P102 1.2 PJ23 a column P103 1
P103
P103 1.2 PJ23 b column P104 1
P104
P104 1.2 PJ23 c column P111 2
P111 1.0 PJ15 rear wing P112 1
P111
P111 1.2 PJ15 rear wing
P111
P112 1.0 PJ15 front wing
P112

25

SQL Anwendersoftware (AS)

HAVING Clause

• The HAVING clause allows to select the groups that go into


the result.
• Note: Predicates in the HAVING clause are applied after the
formation of groups.
• Example:
Which project is responsible for more than two different
parts?
SELECT projectno, COUNT(DISTINCT partno)
FROM parts
GROUP BY projectno
HAVING COUNT(DISTINCT partno) > 2

26

13
SQL Anwendersoftware (AS)

Outline

• Introduction
• Basic SQL Queries
• SQL Query Evaluation
• Advanced Retrieval
• Data Manipulation and Data Definition
• New Features in SQL:1999

27

SQL Anwendersoftware (AS)

Model of SQL Query Evaluation (1 of 7)

• Step 1:
ƒ The table(s) that should be evaluated are determined by the
FROM clause. The multiple occurrence of a single table is
possible by using aliases, i.e., tuple variables.

X
A B Y
1 9 C D
SELECT X.A, COUNT(*)
1 6 1 7
FROM X, Y
2 7 2 5
WHERE X.B > 5
2 6 3 5
AND X.A = Y.C
3 5 4 11
GROUP BY X.A
HAVING SUM(Y.D) > 10 4 6
ORDER BY X.A DESC

28

14
SQL Anwendersoftware (AS)

Model of SQL Query Evaluation (2 of 7)


A B C D
• Step 2: 2 7 1 7
2 7 2 5
ƒ The Cartesian product of all tables
2 7 3 5
listed in the FROM clause is built. 2 7 4 11
A B C D 2 6 1 7
1 9 1 7 2 6 2 5
1 9 2 5 2 6 3 5
1 9 3 5 2 6 4 11
1 9 4 11 3 5 1 7
SELECT X.A, COUNT(*)
1 6 1 7 3 5 2 5
FROM X, Y
1 6 2 5 3 5 3 5
WHERE X.B > 5
1 6 3 5 3 5 4 11
AND X.A = Y.C
1 6 4 11 4 6 1 7
GROUP BY X.A
… … … … 4 6 2 5
HAVING SUM(Y.D) > 10
4 6 3 5
ORDER BY X.A DESC
4 6 4 11

29

SQL Anwendersoftware (AS)

Model of SQL Query Evaluation (3 of 7)


A B C D
• Step 3: 2 7 1 7
2 7 2 5
ƒ All rows that satisfy the condition
2 7 3 5
of the WHERE clause are selected. 2 7 4 11
ƒ The predicate has to be "true" A B C D 2 6 1 7
1 9 1 7 2 6 2 5
1 9 2 5 2 6 3 5
1 9 3 5 2 6 4 11
1 9 4 11 3 5 1 7
SELECT X.A, COUNT(*)
1 6 1 7 3 5 2 5
FROM X, Y
1 6 2 5 3 5 3 5
WHERE X.B > 5
1 6 3 5 3 5 4 11
AND X.A = Y.C
1 6 4 11 4 6 1 7
GROUP BY X.A
… … … … 4 6 2 5
HAVING SUM(Y.D) > 10
4 6 3 5
ORDER BY X.A DESC
4 6 4 11

30

15
SQL Anwendersoftware (AS)

Model of SQL Query Evaluation (4 of 7)

• Step 4:
ƒ The selected rows are divided into groups as specified in the
GROUP BY clause.
ƒ Each group contains all rows with the same values of the
columns specified in the GROUP BY clause.

A B C D
SELECT X.A, COUNT(*) 1 9 1 7
FROM X, Y A B C D 1 6 1 7
WHERE X.B > 5 1 9 1 7
A B C D
AND X.A = Y.C 1 6 1 7
2 7 2 5
GROUP BY X.A 2 7 2 5
2 6 2 5
HAVING SUM(Y.D) > 10 2 6 2 5
ORDER BY X.A DESC 4 6 4 11 A B C D
4 6 4 11

31

SQL Anwendersoftware (AS)

Model of SQL Query Evaluation (5 of 7)

• Step 5:
ƒ Groups that satisfy the condition specified in the HAVING
clause are selected.
ƒ The predicate of the HAVING clause must evaluate to "true".
ƒ The HAVING condition must only be related to group
properties, i.e. columns specified in the GROUP BY clause and
the application of aggregate functions. A B C D
SELECT X.A, COUNT(*) 1 9 1 7
FROM X, Y 1 6 1 7
WHERE X.B > 5
A B C D
AND X.A = Y.C
2 7 2 5
GROUP BY X.A
2 6 2 5
HAVING SUM(Y.D) > 10
ORDER BY X.A DESC A B C D
4 6 4 11

32

16
SQL Anwendersoftware (AS)

Model of SQL Query Evaluation (6 of 7)

• Step 6:
ƒ The output is derived by evaluating the SELECT clause.
ƒ If a GROUP BY clause is specified only expressions that
calculate a single value for each group are allowed as select
items, i.e., columns specified in the GROUP BY clause or the
application of aggregate functions.

SELECT X.A, COUNT(*) A B C D


FROM X, Y 1 9 1 7
WHERE X.B > 5 1 6 1 7 A COUNT(*)
AND X.A = Y.C
1 2
GROUP BY X.A
A B C D 4 1
HAVING SUM(Y.D) > 10
ORDER BY X.A DESC 4 6 4 11

33

SQL Anwendersoftware (AS)

Model of SQL Query Evaluation (7 of 7)

• Step 7:
ƒ Order the tuples in the result by the values of one or more
attributes according to the ORDER BY clause.
• Remark: Step 1-7 show how the result of a given query
could be derived. The database system may choose an
alternative, more efficient way of evaluating an SQL query.

SELECT X.A, COUNT(*)


FROM X, Y
A COUNT(*)
WHERE X.B > 5
AND X.A = Y.C 4 1
GROUP BY X.A 1 2
HAVING SUM(Y.D) > 10
ORDER BY X.A DESC

34

17
SQL Anwendersoftware (AS)

Outline

• Introduction
• Basic SQL Queries
• SQL Query Evaluation
• Advanced Retrieval
• Data Manipulation and Data Definition
• New Features in SQL:1999

35

SQL Anwendersoftware (AS)

Null Values

• It is possible for rows to have a null value, denoted by NULL,


for some of their attributes. NULL signifies "unknown value"
or "value does not exist".
• Any arithmetic expression involving NULL results in NULL.
• All aggregate operations except COUNT (*) ignore tuples
with NULL values on the aggregated columns.
• Roughly speaking, all comparisons involving NULL return
false.
• Use IS NULL or IS NOT NULL to check for NULL values.
• Example: SELECT partno
FROM parts
WHERE projectno IS NULL

36

18
SQL Anwendersoftware (AS)

Nested Subqueries

• SQL provides a mechanism for the nesting of subqueries.


• A subquery is a SELECT-FROM-WHERE expression that is
nested within another query.
• A common use of subqueries is to perform tests for set
membership, set comparisons, and set cardinality.
• The inner and the outer table could be identical.
• In general, we can have several levels of nested queries.
• Introduction of correlation names is necessary.

37

SQL Anwendersoftware (AS)

Set Membership

• A predicate specified in the WHERE clause can check the set


membership of an attribute as follows: 0

ƒ explicit set definition : (5 IN 4 ) = true


Ai IN (a1, aj, ak) 5

0
ƒ implicit set definition : (5 IN 4 ) = false
Ai IN (SELECT …) 6
0

• Example: (5 NOT IN 4 ) = true


Which projects do not work on any part?
6

SELECT projectno
FROM projects
WHERE projectno NOT IN ( projectno
SELECT projectno PJ47
FROM parts)
38

19
SQL Anwendersoftware (AS)

Other predicates for subqueries

• The EXISTS construct returns • The UNIQUE construct tests


the value true if the argument whether a subquery has any
subquery is nonempty. duplicate rows in its result.
• The SOME (ALL) construct tests
whether an equation holds for
• Example: some (all) rows in the subquery.
Find all projects that currently
work on at least one part. • Example:
SELECT projectno Which parts are used in at least
FROM projects pj one of the other parts?
WHERE EXISTS ( SELECT partno
SELECT projectno FROM parts
FROM parts p WHERE partno = SOME (
WHERE pj.projectno SELECT uses_partno
= p.projectno) FROM usage)

39

SQL Anwendersoftware (AS)

Derived Tables

• Several types of tables may be provided in a FROM clause:


ƒ base tables
ƒ views
ƒ tables that are calculated by nested select statements
• Example:
Which parts of version 1.0 are designed by project PJ23?
p1
SELECT p1.partno partno projectno
FROM (SELECT partno, projectno P050 PJ23
FROM parts
P101 PJ23
WHERE version ='1.0') AS p1
P111 PJ15
WHERE p1.projectno='PJ23'
P112 PJ15
40

20
SQL Anwendersoftware (AS)

Views (1 of 2)

• Provide a mechanism to hide certain data from the view of


certain users.
• To create a view we use the command:
CREATE VIEW view [ (column-commalist ) ] AS table-exp
[WITH [ CASCADED | LOCAL] CHECK OPTION]

ƒ Where table-exp is any legal expression


ƒ The view name is represented by “view”

• A view is a named virtual table (query) that is computed from


one ore more underlying tables (base tables or even views).
• Corresponding to external schema of ANSI/SPARC
(typically a user sees more than one view and base table)
41

SQL Anwendersoftware (AS)

Views (2 of 2)

• Example:
Create a view that holds all parts of version 1.0
CREATE VIEW V1_parts (PNO, PROJNO)
AS
SELECT partno, projectno
FROM parts
WHERE version ='1.0'
• Advantages
ƒ More user friendly
ƒ Higher degree of data independence
• Properties of views
ƒ A view can be handled like a table
ƒ Views on views are possible
ƒ Semantic of views: dynamic window on the base tables
ƒ Limited updates: updatable and non-updatable views
42

21
SQL Anwendersoftware (AS)

Updatable Views

• Informal rule: For a view to be updatable, the DBMS must be


able to trace any row or column back to its row or column in
the source table
• Definition by ISO standard: A view is updatable if and only if:
ƒ DISTINCT is not specified
ƒ Every element in the SELECT list of the defining query is a
column name (rather than a constant, expression, or aggregate
function) and no column name appears more than once
ƒ The FROM clause specifies only one table: i.e., single source for
the view, no JOIN, UNION, INTERSECT, or EXCEPT
ƒ The WHERE clause does not include any nested SELECTs that
references the table in the FROM clause
ƒ There is no GROUP BY or HAVING clause in the defining query
43

SQL Anwendersoftware (AS)

Set Operations

• The operations UNION, INTERSECT, and EXCEPT operate on


tables and correspond to the common set operations.
• To retain duplicates use the corresponding multiset versions
UNION ALL, INTERSECT ALL and EXCEPT ALL.
• If a row occurs m times in R and n times in S, then, it occurs
ƒ m+n times in R UNION ALL S
ƒ min(m, n) times in R INTERSECT ALL S
ƒ max(0, m-n) times in R EXCEPT ALL S
• Example: SELECT projectno
FROM projects projectno
INTERSECT PJ23
SELECT projectno PJ15
FROM parts
44

22
SQL Anwendersoftware (AS)

Outline

• Introduction
• Basic SQL Queries
• SQL Query Evaluation
• Advanced Retrieval
• Data Manipulation and Data Definition
• New Features in SQL:1999

45

SQL Anwendersoftware (AS)

Deleting Rows

• The DELETE statement allows to remove rows from a table.


• The WHERE clause may be used to specify the rows to be
deleted.
• Example: DELETE
Delete all rows in table projects. FROM projects

• Example: DELETE
Delete part P101 from table parts. FROM parts
WHERE partno = 'P101'

46

23
SQL Anwendersoftware (AS)

Inserting Rows

• The INSERT statement allows two different ways to provide


the rows that should be inserted into a table:
ƒ list of values
ƒ table value constructor, i.e., the result of a query
• Examples:
Insert part P120 into table parts.
INSERT INTO parts (partno, version, projectno, description)
VALUES ('P120', '1.0', 'PJ47', 'fuel tank')
For each part insert the latest version as well as the number
of versions into table num_versions.
INSERT INTO num_versions
SELECT partno, MAX(version), MAX(description), COUNT(*)
FROM parts
GROUP BY partno;
47

SQL Anwendersoftware (AS)

Updating Rows

• The UPDATE statement allows to change rows:


ƒ Use the SET clause to assign the new values
ƒ Use the WHERE clause to specify the rows that should be
updated
• Example:
Increase the budget of each project managed by 'Morris' by
10%.
UPDATE projects
SET budget = budget * 1.1
WHERE manager = 'Morris'

48

24
SQL Anwendersoftware (AS)

Creating Base Tables

• The CREATE TABLE statement allows to define:


ƒ the columns of a table and their data type/domain (1)
ƒ constraints that define conditions for valid rows in the table (2)
• Example:
CREATE TABLE num_versions (
partno CHAR(4) NOT NULL,
version VARCHAR(10) NOT NULL, (1)
description VARCHAR(50),
version_count INT,
CONSTRAINT num_versions_pk
PRIMARY KEY (partno, version),
(2)
CONSTRAINT count
CHECK( version_count>0))
49

SQL Anwendersoftware (AS)

Data Types

• Some important data types of SQL:1999:


ƒ char(n): Fixed length character string, with user-specified length n.
ƒ varchar(n): Variable length character string, with user-specified
maximum length n.
ƒ int: Integer, a finite subset of the integers that is machine-dependent.
ƒ smallint: Small integer, a machine-dependent subset of integers.
ƒ numeric(p,d): Fixed point number, with user-specified precision of p
digits, with n digits to the right of decimal point.
ƒ real, double precision: Floating point and double-precision floating
point numbers, with machine-dependent precision.
ƒ float(n): Floating point number, with user-specified precision of at
least n digits.
ƒ date: Dates, containing a (4 digit) year, month and date.
ƒ time: Time of day, in hours, minutes and seconds.
50

25
SQL Anwendersoftware (AS)

Defining Domains

• A DOMAIN allows to pull together a specific data type as well


as some characteristics.
• The CREATE DOMAIN statement includes a data type as well
as optional default values and constraints.
• Example:
CREATE DOMAIN counter AS INT
DEFAULT 1
CONSTRAINT not_null
CHECK( VALUE IS NOT NULL),
CONSTRAINT count
CHECK( VALUE > 0)

51

SQL Anwendersoftware (AS)

Removing and Altering Objects

• Base tables, domains and other objects can be removed by


the DROP statement.
• Example:
DROP TABLE num_versions

• The definition of a base table may be changed by the


ALTER TABLE statement. ALTER TABLE parts
DROP COLUMN description
• It allows to:
ADD COLUMN new_description
ƒ add or remove columns VARCHAR(100)
ƒ add or remove constraints DEFAULT ' '
• Example: ADD CONSTRAINT desc_not_null
CHECK(
new_description IS NOT NULL)
52

26
SQL Anwendersoftware (AS)

Integrity Contraints in SQL (1 of 2)

• Semantic Integrity Constraints


ƒ Only 'useful' and 'permissible' updates of the DB
ƒ Closest possible correspondence of DB and miniworld
(quality of data)
Î Integrity constraints of mini world must be specified explicitly to
enable automatic control
• (Logical Consistency Requires ) Physical Consistency
of DB
ƒ Consistency of devices
ƒ Consistency of storage structures/access paths
• Logical Consistency
ƒ Model inherent conditions (e.g., relational invariants)
ƒ User defined conditions of mini world

53

SQL Anwendersoftware (AS)

Integrity Contraints in SQL (2 of 2)

• CHECK constraints with domain, table and attribute definition


• NOT NULL constraint, UNIQUE, PRIMARY KEY
• Foreign key constraints (FOREIGN-KEY clause)
• Example: DEP.NO_OF_EMPL equals the sum of all employees that belong to
this department
CREATE ASSERTION CONSTRAINT A1
CHECK DEP.NO_OF_EMPL =
(SELECT COUNT (*)
FROM EMPL
WHERE EMPL.DNO = DEP.DNO)
INITIALLY DEFERRED DEFERRABLE

• Specifying the time of evaluation:


ƒ IMMEDIATE: at end of update operation (default)
ƒ DEFERRED : at end of transaction (COMMIT) 54

27
SQL Anwendersoftware (AS)

Other Important Concepts

• Some other important concepts in SQL:1999:


ƒ Routine: A procedure, function or method that is known (in
some cases also stored) by the system. It can be written in SQL
or an external host language.
ƒ Schema: A named collection of objects in the database.
ƒ Catalog: A named collection of schemas in a database.
ƒ User: Authorization identifier to control access to the database.
ƒ Privilege: Defines the allowed operations for each user.

55

SQL Anwendersoftware (AS)

Outline

• Introduction
• Basic SQL Queries
• SQL Query Evaluation
• Advanced Retrieval
• Data Manipulation and Data Definition
• New Features in SQL:1999

56

28
SQL Anwendersoftware (AS)

Trigger Concept (1 of 2)

• A trigger specifies a condition and an action to be taken in


case that condition is satisfied
• Start of further updates automatically to ensure the DB
integrity
• Example: DEP.SUM_OF_SALARY is sum of the salaries of all employees
that belong to this department
CREATE TRIGGER T1
AFTER UPDATE OF SALARY ON EMPL (* event *)
REFERENCING OLD OP NEW NP
WHEN NP.DNO = OP.DNO (* condition *)
(UPDATE DEP (* action *)
SET DEP.SUM_OF_SALARIES =
DEP.SUM_OF_SALARIES + (NP.SALARY - OP.SALARY)
WHERE DEP.DNO = NP.DNO) 57

SQL Anwendersoftware (AS)

Trigger basic format

CREATE TRIGGER TriggerName


BEFORE | AFTER <triggerEvent> ON <TableName>
[REFERECING <oldOrNewValuesAliasList>]
[FOR EACH {ROW | STATEMENT}]
[WHEN (triggerCondition)]
<triggerBody>
• Execution:
ƒ FOR EACH ROW: row-level trigger
ƒ FOR EACH STATEMENT (default): only once for the entire event
• <oldOrNewValuesAliasList>:
ƒ OLD/NEW or OLD ROW/NEW ROW: row-level trigger
ƒ OLD TABLE/NEW TABLE: AFTER trigger
ƒ no old values for INSERT events, no new values for DELETE events

58

29
SQL Anwendersoftware (AS)

Trigger Concept (2 of 2)

• Time: {BEFORE / AFTER} {INSERT / DELETE / UPDATE}


• WHEN condition is optional
• Multiple condition/action pairs and multiple actions per
condition are possible
• Nested activations of triggers are possible
• Expensive, recursion is difficult to detect

59

SQL Anwendersoftware (AS)

Object-Oriented Extensions in SQL:1999

• User-defined structured types


ƒ Type hierarchy supported
ƒ Type-specific behaviour may be specified as methods
ƒ Structured types may be used in
- Column definitions
- Table definitions
- View definitions
• Typed tables and typed views
ƒ Tables and views each of whose rows is an instance of a
structured type
ƒ Table hierarchy or view hierarchy supported
ƒ Reference types allow navigational access

60

30
SQL Anwendersoftware (AS)

User-Defined Structured Types

• Example: Type hierarchy


CREATE TYPE part_t AS (
partno CHAR(4),
version VARCHAR(10),
projectno CHAR(4), objects are
description VARCHAR(50)) identified by
REF USING INT integer values

CREATE TYPE colored_part_t UNDER part_t AS (


color_id INT)
inherits all
NOT FINAL
components
of type part
61

SQL Anwendersoftware (AS)

Typed Tables

• Example: Table hierarchy based on a type hierarchy


CREATE TABLE parts OF part_t ( parts
oid partno version projectno description
partno WITH OPTIONS NOT NULL, 1 P050 1.0 PJ23 bodywork
version WITH OPTIONS NOT NULL, … … … … …

projectno WITH OPTIONS NOT NULL,


REF IS oid USER GENERATED) column
constraints

Define the self referencing column oid


which contains in each row a value colored_parts
oid partno version projectno description color_id
that uniquely identifies the row. In this
10 P102 1.2 PJ23 a column 117
case the identifier is provided by the user. … … … … …

CREATE TABLE colored_parts OF colored_part_t UNDER parts


INHERIT SELECT PRIVILEGES (
color_id WITH OPTIONS NOT NULL)

62

31
SQL Anwendersoftware (AS)

OLAP Support in SQL:1999

• Defined in Ammendment 1: Online Analytical Processing (SQL/OLAP) to


Parts 1, 2 and 5 of SQL:1999
• WINDOWs allow to apply • GROUP BY clause is extended by
aggregate functions to the the CUBE and ROLLUP
current row and its neighboring keywords which allows
rows. multidimensional summaries.

er
om
st
cu
partno date sales Sum(sales)

partno
P050 2003-09-15 500 800
P050 2003-09-16 200 850
P050
P050 2003-09-17 100 1050
P050 2003-09-18 550 950
P101
P050 2003-09-19 400
P101 2003-09-15 600
2003-09-15

2003-09-16

2003-09-17

2003-09-18

2003-09-19
date

P101 2003-09-16 300

63

SQL Anwendersoftware (AS)

Foreign Data Management in SQL:1999

• SQL/MED (SQL:1999 Part 9) provides an interface by which


SQL servers can access data managed by other servers, i.e.,
ƒ SQL servers of different vendors
ƒ servers managing non-SQL data
• Architecture:
ƒ Foreign data wrapper mediates the access to the foreign server
and provides external data
in tabular form. Foreign
Foreign Foreign
Data
Server Tables
Wrapper

SQL SQL/MED Implementation-


SQL

Defined Interfaces
Server API
Foreign
Foreign Foreign
Data
Server Tables
Wrapper

64

32
SQL Anwendersoftware (AS)

Upcoming Features

• The following features are considered for SQL.200x:


ƒ XML Support (new Part SQL/XML), e.g., to provide mappings
between SQL data values and XML data
ƒ New data types, e.g., MULTISETs
ƒ Triggers on views
ƒ SQL-invoked routines will be able to return MULTISETs
ƒ Improved security, e.g., SQL-invoked routines using invoker's
rights
ƒ …

65

SQL Anwendersoftware (AS)

Literature

• H. Garcia-Molina, J. D. Ullman, J. Widom: Database Systems:


The Complete Book, Prentice Hall, 2002.
• J. Melton, A. R. Simon: SQL:1999: Understanding Relational
Language Components, Morgan Kaufmann, 2002.
• J. Melton: SQL:1999: Understanding Object-Relational and
Other Advanced Features, Morgan Kaufmann, 2003.
• A. Silberschatz, H.F. Korth, S. Sudarshan: Database System
Concepts, Fourth Edition, McGraw-Hill, 2002.

66

33
Database Programming Anwendersoftware (AS)

7. Database Programming

• Introduction
• Direct Invocation
• Embedded SQL and SQLJ
• Module Language
• Dynamic SQL
• Call-Level APIs
ƒ SQL/CLI
ƒ ODBC
ƒ JDBC

Database Programming Anwendersoftware (AS)

Applications Interface Mechanisms for SQL


host language
• Direct invocation:
Invoke SQL statements directly
• Embedded SQL:
DB access
Embed SQL statements directly in
a host language program
• Module language: Application
program
Write SQL statements separately
in a module and call them from a
host language program DB access
• Call-level APIs:
Invoke SQL statements through a
depends on
functional interface application interface
mechanism

• Static SQL vs. dynamic SQL

1
Database Programming Anwendersoftware (AS)

Outline

• Introduction
• Direct Invocation
• Embedded SQL and SQLJ
• Module Language
• Dynamic SQL
• Call-Level APIs
ƒ SQL/CLI
ƒ ODBC
ƒ JDBC

Database Programming Anwendersoftware (AS)

Direct invocation

• The following types of statements may be invoked directly:


ƒ SQL schema statements
ƒ SQL transaction statements
ƒ SQL connection statements
ƒ SQL session statements
ƒ multiple row SELECT statements
ƒ INSERT statements
ƒ searched UPDATE statements
ƒ searched DELETE statements
ƒ temporary table declaration

(ISI/IEC 9075:1999, Information Technology - Database languages - SQL -


Part 5: Host Language Bindings (SQLBindings), July, 1999)
4

2
Database Programming Anwendersoftware (AS)

Outline

• Introduction
• Direct Invocation
• Embedded SQL and SQLJ
• Module Language
• Dynamic SQL
• Call-Level APIs
ƒ SQL/CLI
ƒ ODBC
ƒ JDBC

Database Programming Anwendersoftware (AS)

Embedded SQL

• Program in some conventional host host language


+
language. embedded SQL
• Some parts of the program are
actually static SQL statements.
• A preprocessor is necessary to Preprocessor
translate embedded SQL requests
into legal code of the host language.
• Preprocessed host-language program host language
is then compiled in the usual manner +
function calls
• DBMS vendor provides a library that
supplies the necessary function.
definitions host-language SQL
• SQL standard provides a binding with compiler library
eight languages:
Ada, C, COBOL, Fortran, Mumps,
Pascal, PL/I, Java object-code
program

(H. Garcia-Molina, J. Ullman, J. Widom: Database Systems: The Complete Book, 2002) 6

3
Database Programming Anwendersoftware (AS)

Impedance Mismatch

• Data access methods of SQL and


other programming languages Programming language
differ:
integer, real, character,
ƒ SQL: set-at-a-time pointer,
ƒ Other: can only handle a record structures,
finite, known number of items arrays,
no sets
• Data model of SQL differs from
the model of other languages

relational data model,


no pointers,
no loops and branches

SQL
7

Database Programming Anwendersoftware (AS)

Shared Variables

• Shared variables (host variables) are used to transfer


information between database and host-language program
• Shared variables are declared in a DECLARE section:
EXEC SQL BEGIN DECLARE SECTION;

EXEC SQL END DECLARE SECTION;
ƒ Main part of a declare section depends on the host language
• A shared variable can be used in SQL statements in places
where a constant is allowed (variable name preceded by a
colon)
• Special variable SQLSTATE indicates any problems found
during call to SQL library:
ƒ '00000': no error condition, '02000': no tuple found
8

4
Database Programming Anwendersoftware (AS)

Examples of Tables
parts usage
partno version projectno part_description partno version uses_ uses_ quantity
partno version
P050 1.0 PJ23 bodywork
P050 1.0 P101 1.0 1
P050 2.0 PJ23 bodywork
P050 1.0 P102 1.2 2
P101 1.0 PJ23 front body section
P050 1.0 P103 1.2 2
P101 1.1 PJ23 front body section
P050 1.0 P104 1.2 2
P101 2.0 PJ23 front body section
P050 1.0 P111 1.0 2
P102 1.2 PJ23 a column
P050 2.0 P101 1.1 1
P103 1.2 PJ23 b column
P050 2.0 P102 1.2 2
P104 1.2 PJ23 c column
P050 2.0 P103 1.2 2
P111 1.0 PJ15 rear wing
P050 2.0 P104 1.2 2
P111 1.2 PJ15 rear wing
P050 2.0 P111 1.2 2
P112 1.0 PJ15 front wing
P050 2.0 P112 1.0 2

projects
projectno manager description budget
PJ23 Miller main bodywork team 1 000 000
PJ15 Maynard specialized wings 100 000
PJ47 Morris electronics 500 000

Database Programming Anwendersoftware (AS)

Example: Shared Variables and INSERT


void getParts() {
EXEC SQL BEGIN DECLARE SECTION;
char part[4], project[4]; version[10], description[50];
char SQLSTATE[6];
EXEC SQL END DECLARE SECTION;

/* request part, project, version, description */

EXEC SQL INSERT INTO parts(partno, version,


projectno, part_description)
VALUES (:part, :version, :project, :description);
}

10

5
Database Programming Anwendersoftware (AS)

Example: Single-Row SELECT Statements


void getNumProjects() {
EXEC SQL BEGIN DECLARE SECTION;
int num, budget;
char SQLSTATE[6];
EXEC SQL END DECLARE SECTION;

/* request budget */

EXEC SQL SELECT COUNT(*)


INTO :num
FROM projects
WHERE budget >= :budget;

/* check that SQLSTATE has all 0's */


/* and if so print the value of num */
}

11

Database Programming Anwendersoftware (AS)

Cursors (1 of 2)

• A cursor allows to run through the tuples of a relation.


• To create and use a cursor, the following statements are
needed:
ƒ A cursor declaration:
EXEC SQL DECLARE <cursor> CURSOR FOR <query>
- replace <cursor> by the name of the cursor
- replace <query> by an expression that results in a relation. The
declared cursor ranges over the tuples of this relation.
ƒ Cursor initialization:
EXEC SQL OPEN <cursor>
- Initialize the cursor to a position where it is ready to retrieve the
first tuple of the relation over which the cursor ranges.
- Query is evaluated.

12

6
Database Programming Anwendersoftware (AS)

Cursors (2 of 2)

ƒ Fetch tuples:
EXEC SQL FETCH FROM <cursor> INTO <variables>
- Get the next tuple of the relation over which the cursor ranges.
- Repeated calls to fetch/get successive tuples in the query result.
- If the tuples have been exhausted, then the value of SQLSTATE is
set to '02000'.
ƒ Close cursor:
EXEC SQL CLOSE <cursor>
- Close cursor, i.e., the cursor no longer ranges over tuples of the
relation.
• Cursors may also be used to update the current tuple.

13

Database Programming Anwendersoftware (AS)

Example: Cursor
void getAllProjects() {
EXEC SQL BEGIN DECLARE SECTION;
char project[4], description[50];
char SQLSTATE[6];
EXEC SQL END DECLARE SECTION;
EXEC SQL DECLARE execCursor CURSOR FOR
SELECT projectno, description
FROM projects;

EXEC SQL OPEN CURSOR execCursor;


while (1) {
EXEC SQL FETCH FROM execCursor
INTO :project, :description;
if (!(strcmp(SQLSTATE, "02000")) break;
printf("projectno: %s, description: %s",
project, description);
}
EXEC SQL CLOSE execCursor;
}
14

7
Database Programming Anwendersoftware (AS)

SQLJ

• Allows static SQL statements to be embedded in Java


programs.
• SQLJ was defined by an informal open group of companies
(IBM, Informix, Microsoft, Oracle, Sun, Sybase, ...).
• SQLJ consists of three parts:
ƒ Part 0: Embedded SQL in Java
ƒ Part 1: SQL Routines using Java
ƒ Part 2: SQL Types using Java
• Parts of SQLJ are adopted by the SQL standard.
ƒ Part 0: SQL - Part 10: Object Language Bindings (SQL/OLB)

(J. Melton, A. Eisenberg: Understanding SQL and Java Together, 2000)

(ISI/IEC 9075:1999, Information Technology - Database languages - SQL - Part 10: Object Language Bindings (SQL/OLB))
15

Database Programming Anwendersoftware (AS)

Outline

• Introduction
• Direct Invocation
• Embedded SQL and SQLJ
• Module Language
• Dynamic SQL
• Call-Level APIs
ƒ SQL/CLI
ƒ ODBC
ƒ JDBC

16

8
Database Programming Anwendersoftware (AS)

Module Language

• Allows to separate SQL statements and application program.


ƒ A module includes procedures and declarations of cursors and
temporary tables. It is stored in the database.
ƒ The application may call the procedures of a module.
ƒ SQL statements and the application program are combined by the
linker. MODULE projects_module
• Sample Module: NAMES ARE ascii LANGUAGE C
SCHEMA user_schema AUTHORIZATION user

PROCEDURE num_projects
( :budget INTEGER,
:num INTEGER, SQLSTATE )
SELECT COUNT(*)
INTO :num
FROM projects
WHERE budget >= :budget;

(J. Melton, A. R. Simon: SQL:1999 Understanding Relational Language Components, 2002) 17

Database Programming Anwendersoftware (AS)

Outline

• Introduction
• Direct Invocation
• Embedded SQL and SQLJ
• Module Language
• Dynamic SQL
• Call-Level APIs
ƒ SQL/CLI
ƒ ODBC
ƒ JDBC

18

9
Database Programming Anwendersoftware (AS)

Dynamic SQL

• Allows applications to execute SQL statements that are constructed and


submitted at runtime, i.e., it is not known:
ƒ whether a statement will retrieve data from or store data into the
database, and
ƒ how many parameters and host variables are used and what types
they have.
• Two way of statement execution:
ƒ Prepare and Execute: Allows to run the same statement as many
times as required. Results of the preparation are preserved, e.g. the
execution plan.
ƒ Execute Immediate: If the same statement is executed again, you
must incur the preparation overhead another time.
• Dynamic SQL allows to request the DBMS to describe SQL
statements before execution, i.e., it provides information on input
and output parameters, etc. in a descriptor area.
(J. Melton, A. R. Simon: SQL:1999 Understanding Relational Language Components, 2002) 19

Database Programming Anwendersoftware (AS)

Example: Dynamic SQL


/* execute statement only once */
EXEC SQL EXECUTE IMMEDIATE "UPDATE projects
SET budget = 10 000 000
WHERE projectno = 'PJ47'";

/* prepare and execute statement */


dynstmt = "DYN1";
temp = "UPDATE projects
SET budget = 1 000 000
WHERE projectno = ?";
EXEC SQL PREPARE :dynstmt FROM :temp;

prjno = "PJ47";
EXEC SQL EXECUTE :dynstmt USING :prjno;

20

10
Database Programming Anwendersoftware (AS)

Outline

• Introduction
• Direct Invocation
• Embedded SQL and SQLJ
• Module Language
• Dynamic SQL
• Call-Level APIs
ƒ SQL/CLI
ƒ ODBC
ƒ JDBC

21

Database Programming Anwendersoftware (AS)

Call-Level APIs

• ODBC (Open Database Connectivity)


ƒ In 1992 a number of database vendors came together to define a
new API for SQL database access.
ƒ Microsoft adopted the interface and trademarked the initials ODBC.
• SQL/CLI
ƒ A formal consortium of database vendors (SQL Access Group) took
over the development of the interface, now called CLI (Call-Level
Interface).
ƒ The result was published as a new part of the SQL-92 standard in
1995. This is now part 3 of SQL:1999.
ƒ ODBC is very similar to SQL/CLI.
• JDBC (Java Database Connectivity)
ƒ Call-level interface for java applications.
ƒ JDBC was strongly influenced by APIs like ODBC and SQL/CLI.
(J. Melton, A. R. Simon: SQL:1999 Understanding Relational Language Components, 2002) 22

11
Database Programming Anwendersoftware (AS)

Call-Level Interface (CLI)

• CLI is a set of routines, i.e. a database library.


• CLI routines can be called by ordinary applications using normal language
mechanisms for invoking routines.
• CLI provides a dynamic interface to the database.
• Characteristics:
ƒ Application program does not need to contain any hard-coded SQL
statements.
ƒ No preprocessor is required.
ƒ Application program may be executed with any of several SQL
database systems (without being recompiled).
ƒ SQL statements are not precompiled and preoptimized as they are in
embedded SQL.

(ISI/IEC 9075-3:1999, Information Technology - Database languages - SQL - Part 3: Call-Level Interface (SQL/CLI), 1999.) 23

Database Programming Anwendersoftware (AS)

JDBC

• JDBC: Java Database Connectivity


• JDBC was strongly influenced by call-level APIs like
Microsoft's ODBC and SQL/CLI.
• JDBC provides a method of connecting to a remote database,
executing ad hoc SQL statements, and examining the results
of those statements.
• JDBC may be used in all kinds of Java applications:
ƒ Java Applications
ƒ Java Applets
ƒ Servlets, JSPs
ƒ ...

(J. Melton, A. Eisenberg: Understanding SQL and Java Together, 2000)


24

12
Database Programming Anwendersoftware (AS)

Summary

• Several ways to access SQL databases from applications.


• Embedded SQL, SQLJ, and Module Language are appropriate
if the SQL statement is known before runtime. A pre-
compilation step is needed.
• Dynamic SQL and Call-level APIs allow to construct
statements at runtime.
• Not all approaches are available for all programming
languages.
• Not all approaches are supported by all database vendors.

25

13
8. Logical Database Design
• Logical Database Design
• Normalization of Tables
• Design Theory - Synthesis Approach

Logical Database Design

• Goal
Theoretical foundation for the design of a
‘good’ (relational database) schema

Î design theory

• Goodness/Quality:
ƒ Easy to handle, clarity …
ƒ Design theory specifies ‘quality’ to measure formally why a
particular set of groupings of attributes in tables is better than
another one

1
Logical Design

• What are the properties of a bad DB schema design?


ƒ Implicit representation of information
ƒ Redundancies
ƒ Potential inconsistency (update anomalies)
ƒ Insertion anomalies
ƒ Deletion anomalies
ƒ Modification anomalies
ƒ ...
→ often caused by mixing up entities, partioning of entities and storing
entities repeatedly, ...

• Normalization of Tables
to optimize a given design
• Creating Tables by the Synthesis of Attributes
to construct an “optimal” DB schema

Normalization and Synthesis Approach

Creating Tables by the


Normalization of Tables
Synthesis of Attributs
functional initial functional
dependencies relation dependencies
schema attributes
ENO Æ DNO ENO Æ DNO
DNO Æ DNAME R(ENO, DNO, DNAME) DNO Æ DNAME ENO, DNO, DNAME

process of synthesis
normalization algorithm

relation schema relation schema


in 3NF in 3NF
R1 (ENO, DNO) R1 (ENO, DNO)
R2 (DNO, DNAME) R2 (DNO, DNAME)

2
Functional Dependencies (1)
• Constraints on the set of legal tables.
• Require that the value for a certain set of attributes determines
uniquely the value for another set of attributes.
• A functional dependency is a generalization of the notion of a key.
• Let R be a table schema
α ⊆ R, β ⊆ R
• The functional dependency
α→β
holds on R if and only if for any legal tables r(R), whenever any two
tuples t1 and t2 of r agree on the attributes α, they also agree on the
attributes β.
That is, t1[α] = t2 [α] ⇒ t1 [β] = t2 [β]

• K is a superkey for table schema R if and only if K → R

Functional Dependencies (2)


• K is a candidate key for R if and only if
ƒ K→ R, and
ƒ for no α ⊂ K, α → R

• Functional dependencies allow us to express constraints that cannot be


expressed using superkeys. Consider the schema:

loan-info-schema=(branch-name,loan-number,customer-name, amount).

We expect this set of functional dependencies to hold:


loan-number → amount
loan-number→ branch-name
we would not expect the following to hold:
loan-number→ customer-name

• The table T satisfies FD X → Y


if for each value of x, πY(σX=x(T)) has at most one tuple.
6

3
Use of Functional Dependencies

• We use functional dependencies to:


ƒ Test tables to see if they are legal under a given set of functional
dependencies. If a table r is legal under a set F of functional
dependencies, we say that r satisfies F.
ƒ Specify constraints on the set of legal tables; we say that F holds
on R if all legal tables on R satisfy the set of functional
dependencies F.

• Note:
A specific instance of a table schema may satisfy a functional
dependency even if the functional dependency does not hold on all
legal instances. For example, a specific instance of loan-schema may,
by chance, satisfy loannumber Æ customer-name.

Closure (1)

• Given a set F of functional dependencies, there are


certain other functional dependencies that are logically
implied by F.
• The set of all functional dependencies logically implied by
F is the closure of F.
• We denote the closure of F by F+.
• We can find all of F+ by applying Armstrong’s Axioms:

ƒ if β ⊆ α, then α → β (reflexivity)
ƒ if α → β, then γα → γβ (augmentation)
ƒ if α → β and β → γ, then α → γ (transitivity)
These rules are sound and complete:
- complete: these axioms find all of F+ of F
- sound: no FD of F will be found that is not in F+

4
Closure (2)

• We can further simplify computation of F+ by using the


following additional rules.
ƒ If α → β holds and α → γ holds, then α → βγ holds (union)
ƒ If α → βγ holds, then α → β holds and α → γ holds (decomposition)
ƒ If α → β holds and γβ → δ holds, then αγ → δ holds
(pseudotransitivity)
The above rules can be inferred from Armstrong’s axioms.

Example
• R = ( A, B, C, G, H, I )
• F = { A → B, A → C, CG → H, CG → I, B → H }
• Some members of F+ : A → H, AG → I, CG → HI

Closure of Attribute Set


• Define the closure of α under F (denoted by α+) as the set of attributes
that are functionally determined by α under F:
α → β is in F+ ⇔ β ⊆ α+

• Algorithm to compute α+, the closure of α under F


result := α;
while (changes to result) do
for each β → γ in F do
begin
if β ⊆ result then result:= result ∪ γ ;
end

10

5
Closure of Attribute Set: Example
• R = ( A, B, C, G, H, I )
F={A→B
A→C
CG → H
CG → I
B→H}

• (AG+)
0. result = AG
1. result = ABG (A → B and A ⊆ ΑG)
2. result = ABCG (A → C and A ⊆ ΑGB)
3. result = ABCGH (CG → H and CG ⊆ AGBC)
4. result = ABCGHI (CG → I and CG ⊆ AGBCH)

• Is AG a candidate key?
1. AG → R
2. does A → R?
3. does G → R?
11

Normalization of Tables

Non Normalized & Normalized Tables


1NF Tables
2NF Tables
3NF Tables
BCNF Tables
4NF Tables
5NF Tables

12

6
Normalization of Tables (1)

Full functional dependency:


A1, A2, ..., An → B1, B2, ..., Bm

B = {B1, B2, ..., Bm} is full functional dependent from


A = {A1, A2, ..., An} if removal of any attribute Ai from A
means that the dependency does not hold anymore, that
is, B is functional dependent from A but not from any
proper subset of A.

Example:
A → B is a partial dependency, if there exist an
attribute Ai in A so that ( A - {Ai} ) → B is valid.

13

Normalization of Tables (2)

Non normalized table: non first normal form (NF2)

EXAM ( ENO, EXAMINER, SUBJECT, STUDENT (MATNR, NAME, ...))


1 HÄRDER DBS 1234 May
5678 Clinton
9000 Smith
2 EULER Math 5678 Clinton
007 Coy

→ contains “attributes” that are tables as well


⇒representation of complex objects (hierarchical views)

• ADVANTAGES: clustering efficient processing in one nested object


• DISADVANTAGES: asymmetrical (one direction of relationship only),
implicit representation of information, redundancies of many-to-many
relationships, update anomalies, the parent must be defined

14

7
Normalization of Tables (3)

• Normalization:
ƒ Unnesting the table: “copying down” the values leads to a high
degree of redundancy
→ decomposition of tables
ƒ But: keep informational content!

• Normalized Table:
ƒ Each tuple contains exactly one value for each attribute
ƒ A table that satisfies this property is said to be normalized (i.e., be
in the 1NF)

15

Normalization of Tables (4)

Non normalized table:


EXAM (ENO, EXAMINER, SUBJECT, STUDENT)

(MATNR, NAME, DATE_OF_BIRTH, ADDRESS, FCNO, FCNAME, EDATE, MARK, DEAN)

STUDENT contains 9 simple attributes that build a nested table

Normalization (converting into 1NF):


1. Start with the parent table
2. Use its primary key to extent each direct child table with this
primary key so that the child table becomes an independent table
3. Delete all composite attributes (child tables) from the parent table
4. Repeat this algorithm recursively

16

8
Normalization of Tables (5)

RULES:
• Composite attributes, multivalued attributes and combinations of
them (e.g., composite attributes that are themselves multivalued)
built up new tables
• Copy down the key

Relation schema in 1NF

EXAM (ENO, EXAMINER, SUBJECT)

CANDIDATE (ENO, MATNR, NAME, DATE_OF_BIRTH, ADDRESS, FCNO, FCNAME,


EDATE, MARK, DEAN)

17

Converting into 2NF (1)

The second normal form is based on the concept of full


functional dependency.
• 1NF causes still a lot of anomalies, because different entity sets can
be stored in a table and because of redundancy within a table (e.g.,
CANDIDATE)
• 2NF avoids some of the anomalies by avoiding not fully functionally
(partial) dependent attributes

⇒ Separate different entity sets into different tables

18

9
Converting into 2NF (2)

Definition:
An attribute of a relation schema is called a prime attribute (key
attribute) of the relation schema if it is a member of at least one
candidate key of the schema.
A relation schema R is in 2NF if it is in 1NF and if every nonprime
attribute A in R is fully functionally dependent on every candidate key
of R (i.e., every nonprime attribute of R is not partially dependent on any
candidate key of R)
Transformation into 2NF:
1.Determine functional dependencies between nonprime attributes and
candidate keys
2. Cut out partially dependent attributes and combine them into an own
table (and add the corresponding prime attributes)

19

Full Functional Dependencies in CANDIDATE

• Relation Schema in 2NF

20

10
Transformation into 3NF (1)
• Because of transitive dependencies update anomalies are still possible
in 2NF.
ƒ Example: mixture of faculty data and student data within student

Definition:
A set of attributes Z of relation schema R is transitive dependent from
a set of attributes X in R if:
ƒ X and Z are disjoint
ƒ if exists a set of attributes Y in R with:

Z Æ Y permissible, otherwise called strict transitivity

21

Transformation into 3NF (2)


• Definition 1: A relation schema R is in 3NF if every nonprime attribute
of R is
ƒ fully functionally dependent on every candidate key of R, and
ƒ nontransitively dependent on every candidate key of R.

• (Alt.) Definition 2: A relation schema is in 3NF if, whenever a


functional dependency X → A holds in R, either
ƒ X is a superkey of R, or
ƒ A is a prime attribute of R.

22

11
Functional Dependencies in STUDENT

• Relation Schema in 3NF

23

Boyce-Codd Normal Form (BCNF) (1)


• The definition of 3NF has some shortcomings if tables contain multiple
candidate keys which are
ƒ composite and
ƒ overlapping

• Example:
EXAM (ENO, MATNR, SUBJECT, MARK)
PRIMARY KEY (ENO, MATNR), UNIQUE (SUBJECT, MATNR)
ƒ It exists a one-to-one relationship between ENO and SUBJECT
ƒ The only nonprime attribute is MARK ⇒ EXAM is in 3NF

EXAM ( ENO , MATNR , SUBJECT , MARK)


4 4711 OS 1
4 1007 OS 2
4 1234 OS 2
5 4711 CA 3

⇒ update anomalies (e.g., SUBJECT)

• GOAL: avoiding anomalies in the prime attributes


24

12
Boyce-Codd Normal Form (BCNF) (2)
Definition: An attribute (or a set of attributes) from which others are
fully functionally dependent is called determinant.

What are the determinants in EXAM?

Definition: A relation schema R is in BCNF, if every determinant is a


candidate key of R.

Formal Definition:
A relation schema is in BCNF if the following is true:
ƒ If a set of attributes Y is (fully functionally) dependent from another
disjoint set of attributes X, then every other set of attributes Z is also
(fully functionally) dependent from X.
ƒ I.e. for all X, Y, Z with X and Y are disjoint the following is valid:
X → Y implies X → Z

25

Boyce-Codd Normal Form (BCNF) (3)


• Decomposition of EXAM
EXAM1 (ENO, MATNR, MARK) or EXAM2 (MATNR, SUBJECT, MARK)
SUBJ_NAME(ENO, SUBJECT) SUBJ_NAME(ENO, SUBJECT)

• Both decompositions lead to BCNF tables


ƒ The update anomalies disappear
ƒ All functional dependencies are preserved

26

13
Boyce-Codd Normal Form (BCNF) (4)
• Are BCNF decompositions always a good idea?

Example: STUDENT, SUBJECT → EXAMINER


EXAMINER → SUBJECT

SSE (STUDENT SUBJECT EXAMINER)


Sloppy DBS Härder
Hazy DBS Mitschang
Sloppy DS Rothermel

ƒ Each examiner examines only one subject (a subject can be examined by


multiple examiners)
ƒ Each student takes only one exam in each subject successfully

27

Boyce-Codd Normal Form (BCNF) (5)


• How does the BCNF decomposition looks like?
SE (STUDENT EXAMINER) ES (EXAMINER SUBJECT)
Sloppy Härder Härder DBS
Hazy Mitschang Mitschang DBS
Sloppy Rothermel Rothemel DS

• New problems
ƒ Now, STUDENT, SUBJECT → EXAMINER is "external"
Æ consistency check?

• In this case a BCNF decomposition is too strict to preserve all


functional dependencies (key breaking transitive dependency)

28

14
Normal Forms: Overview
Normal Form Focus / Eliminated
1NF repetition of attribute groups

2NF partial functional dependencies (functional dependencies


corresponding to subsets of candidate keys)

3NF transitive dependencies (functional dependencies


between set of attributes that are not candidate keys)

BCNF overlapping candidate keys

4NF multivalued dependencies

5NF join dependencies in case of three or more interrelatedly


dependent key attributes

29

Design Theory of Relational Databases

• Information requirements analysis gives:


ƒ set of all attributes (universal relation schema)
ƒ set F of functional dependencies between attributes and attribute
sets

• Based on this the synthesis algorithm creates a relational


DB schema in 3NF

• The synthesis algorithm requires (besides other things)


ƒ Inference rules to derive additional functional dependencies from F
ƒ Inference rules to determine the keys and to understand the logical
implications.

30

15
Design Theory of Relational Databases – An
Overview

31

Test for Set Membership

• Algorithm MEMBER:

Input: X → Y, F
Output: TRUE, if F├ X → Y, else FALSE

MEMBER (F, X → Y)

begin
if Y ⊆ CLOSURE(X,F) then
return (TRUE)
else return (FALSE)
end

32

16
Cover over Functional Dependencies
If F ≡ G, then is F a cover over G

Lemma: Given two sets of functional dependencies F and G over R:


F ≡ G iff F├ G and G├ F

Def.: Not Redundant Cover


A set F of functional dependencies is not redundant, if there is no F' ⊂ F with
F' ≡ F.

Alternative:
X → Y in F is redundant, if F - {X → Y} ├ X → Y
Remark:
In case of X → Y
Y can be composite and X cannot be minimal
(i.e., X’ → Y with X’ ⊂ X).

33

Canonical Cover (1)

To a given set of functional dependencies it can always be found an


equivalent set of functional dependencies which contains only
functional dependencies that have one attribute on the right-hand
side:
ƒ decomposition through decomposition rule
ƒ recovery of the original set through union rule
(⇒ equivalence)

Definition: A set of functional dependencies F is minimal, if the


following is true:
1. Each right-hand side of a functional dependency that is member of
F contains only one attribute.
2. There is no X → A in F, so that the set F - {X → A} is equivalent to F.
3. There is no X → A in F and no proper subset Z of X, so that
F – {X → A} ∪ { Z → A} is equivalent to F.

34

17
Canonical Cover (2)
Algorithm MINCOVER:
Input: set G of functional dependencies

Output: canonical cover over G

MINCOVER (G)
begin
F := G;
F := REDUCE_RHS (F)
F := REDUCE_LHS (F)
F := REMOVE_REDUNDANT_FDs (F)
return {F};
end

• MINCOVER(G) = REMOVE_REDUNDANT_FDs(REDUCE_LHS( REDUCE_RHS ( G )))


35

Design Theory – Synthesis Approach


Given:
• A; F (inquired about the miniworld)
• Assumption: the universal relation schema U comprises all attributes
Searching for:
Relational DB schema R with the following properties:
1) lossless-join decomposition
2) dependency preservation
3) minimization of redundancies

ad 1: Each attribute of U is included in at least one table of R; the


decomposition of multiple tables is a lossless-join decomposition
ad 2: The candidate keys of the tables of R embody all functional
dependencies that are member of the canonical cover of F
ad 3: All tables are in 3NF; the number of tables is minimal.

36

18
Synthesis Approach – Prerequisites
1. Assumption of Unambiguous Functional Dependencies
f : X → Y and g : X → Y imply f ≡ g

Example:
f1 : ENO → PHONE_NBR (employee uses phone)
f2 : PHONE_NBR→ DNO (phone costs are handled departmentwise)

⇒ derived functional dependency


f12 : ENO → DNO ⇒ "uses phone of"

⇒ generally different to the explicitly inquired functional dependency


f3 : ENO → DNO (employee belongs to department)

⇒ problem of semantic transitivity!


2. Representation of Non-Functional Relationship
X ⎯ Y n : m, i.e. X →Y and Y→ X
⇒ XY→ Θ with Θ "empty attribute”

37

Synthesis Algorithm
Input: A; F
Output: RS in 3NF with a minimal number of tables

Step 1: Calculate a canonical cover H over F (→ MINCOVER(F))


Step 2: Split H into disjoint subsets containing all functional
dependencies with the same left-hand sides each
Step 3: Mix equivalent keys (→ candidate keys should be assigned to the
same table)
Step 4: Eliminate transitive dependencies (→H') that were inserted by
step 3 (→ within attributes of keys)
Step 5: Construct one table for each subset H'
(→ each set of attributes on the left-hand side of a functional
dependency is a candidate key)

38

19
Application of the Synthesis Algorithm (1)
Example 1:

What does F+ looks like?

Step 1: H=

Step 2: g1
g2
g3

Step 5: R1
R2
R3

39

Application of the Synthesis Algorithm (2)


Example 2:

f1: ENO → NAME f8: MNO → DNO


f2: ENO → AGE f9: MNO → TOWN, STR, HNO
f3: ENO → SALARY f10: MNO → DNAME
f4: ENO → DNO, STR, HNO f11: DNO → MNO
f12:DNO → DNAME
f5: ENO → MNO f13: DNO → TOWN,
f6: ENO → TOWN, STR, HNO f14: TOWN, STR →ZIP_CODE
f7: ENO → DNAME f15:ZIP_CODE →COUNTRY

• How many canonical covers exist?


• Selection of an appropriate cover: semantic criteria!
40

20
One Possible Solution
R1 (ENO, NAME, AGE, SALARY, MNO)
R2 (MNO, DNO, DNAME, TOWN, STR, HNO)
R3 (TOWN, STR, ZIP_CODE)
R4 (ZIP_CODE, COUNTRY)

1. Is the decomposition of
TOWN, STR → ZIP_CODE→ COUNTRY
into R3 and R4 a good idea?
ƒ Update frequency?
ƒ Select addresses (join operation)!
ƒ Does TOWN, STR or ZIP_CODE represents an entity in this context?
(to build an own table in 3NF)
⇒ better solution: R2 in 2NF!

2. Stability of MNO? Update frequency of DNO and MNO!


⇒ R1 (ENO, NAME, AGE, SALARY, DNO)
R2 (DNO, MNO, DNAME, TOWN, STR, HNO, ZIP_CODE, COUNTRY)
41

Design Theory – Summary (1)

• The Determination of All Functional Dependencies


ƒ supports a precise way of thinking for designing schemata
ƒ enables integrity control by the DBMS

• GOAL: clear and natural assignment of objects and data


structures
ƒ with higher degrees of normalization the information content
increases
ƒ each tuple type (table) describes only one object type

• Normalization of Tables
ƒ Local approach on existent data structures
ƒ Stepwise elimination of update anomalies
ƒ Comprehensive approach for DB schema integration

42

21
Design Theory – Summary (2)

• Creating Tables by Synthesis of Attributes


ƒ Global approach to construct 3NF tables
ƒ Possibly further examination of tables for overlapping candidate
keys, multi-valued dependencies and join dependencies
=> BCNF, 4NF or 5NF decomposition
• Additional Problems
ƒ In case of a huge set of attributes it is difficult to determine all
relevant FDs
ƒ Generally the design algorithms produce multiple canonical covers
ƒ During conversion from 3NF into BCNF FDs can get lost
• Reworking of DB Schema
ƒ Stability aspects and update frequencies can force the usage of
weaker normal forms
ƒ Considering abstraction concepts
=> the design is determined by the designer not by the approach

43

22