Database Assignment 2 Solution NYU

Name Singh, Siddharth Kumar
Section: CSCI-GA.2433-001 - Spring 2016
Date: 02/25/2016
Assignment 2
Assignment Layout (25%)
Assignment is neatly assembled on 8 1/2 by 11 paper.
Cover page with your name (last name first followed by a comma then first name),
username and section number with a signed statement of independent effort is included.
Answers to Questions 1 to 6 are correct.
File name is correct.
Answers to Individual Questions:

(100 points total, all questions weighted equally)
Assumptions provided when required.

Total in points (100 points total):
Professors Comments:
Affirmation of my Independent Effort:
Siddharth Kumar Singh

(Sign here)
1.9 What is the difference between controlled and uncontrolled redundancy? Illustrate with
examples.
Solution
Redundancy is when the same fact is stored multiple times in several places in a database.
For example, say the name of the student with StudentNumber=8 is Brown is stored multiple times.
Redundancy is controlled when the DBMS ensures that multiple copies of the same data are
consistent. Controlled redundancy in a database should be an objective, but it is very hard to get to
perfect control.
For example, if a new record with StudentNumber=8 is stored in the database, the DBMS will ensure
that this record is for Student Brown.
If the DBMS has no control over this, we have uncontrolled redundancy. The system should be aware
of any data duplication - the system is responsible for ensuring updates are carried out correctly.
A DB with uncontrolled redundancy can be in an inconsistent state - it can supply incorrect or
conflicting information
A given fact represented by a single entry cannot result in inconsistency - few systems are capable of
propagating updates i.e. most systems do not support controlled redundancy.
1.12 Cite some examples of integrity constraints that you think can apply to the database
shown in Figure 1.2.
Solution
(a) The StudentNumber should be unique for each STUDENT record (key constraint).
(b) The CourseNumber should be unique for each COURSE record (key constraint).
c) A value of CourseNumber in a SECTION record must also exist in some COURSE record
(referential integrity constraint).
(d) A value of StudentNumber in a GRADE_REPORT record must also exist in some STUDENT record
(referential integrity constraint).
( e) The value of Grade in a GRADE REPORT record ) _ must be one of the values in the set (A, B,
C, D, F, I, U, S} (domain constraint).
(f) Every record in COURSE must have a value for CourseNumber (entity integrity constraint).
(g) A STUDENT record cannot have a value of Class=2 (sophomore) unless the student has
completed a number of sections whose total course CreditHours is greater that 24 credits (general
semantic integrity constraint).
2.14 If you were designing a Web-based system to make airline reservations and sell airline
tickets, which DBMS architecture would you choose from Section 2.5? Why? Why would the
other architectures not be a good choice?
Solution
Three-Tier Client/Server Architecture for Web Application is the appropriate choice for airline
reservation system. The Client consists of Web User Interface. The Web Server contains the
application logic which includes all the rules and regulations related to the reservation process and
the issue of tickets; the Database Server contains the DBMS. The web server accepts requests from
client, processes the request and ten sends commands to database server and then acts as a
conduit for passing processed data from database server to clients.
A web based system has user interface and database server on different machines, hence
Centralized DBMS Architecture would not work since the user interface and database server are on
different machines for a web-based system.
A web based application generates lot of network activity. Basic Client/Server Architecture and TwoTier Client/Server Architecture would work if the Business Logic can reside on server other than the
DBMS Server. . In general, if the business logic was on the DBMS Server, it will put an excessive
burden on the server. If the business logic were to reside on the web client, it will burden the
communication network as well a possibly thin client
2.15) Consider Figure 2.1. In addition to constraints relating the values of columns in one table
to columns in another table, there are also constraints that impose restrictions on values in a
column or a combination of columns within a table. One such constraint dictates that a column
or a group of columns must be unique across all rows in the table. For example, in the
STUDENT table, the Student_number column must be unique (to prevent two different students
from having the same Student_number). Identify the column or the group of columns in the
other tables that must be unique across all rows in the table.
Solution
Sl.no
1.
Table
STUDENT
Column/Columns
Student_number
2.
COURSE
Course_number
3.
PREREQUISITE
Prerequisite_number
Constrains
-Student number should be unique
across all rows in the table to avoid
overlapping of the tables if any two
students have same names in a
section.
-No two course number can be same,
course number determines the
department and course name itself.
-If any new course is added to the
catalog then it must be assigned a
unique number to differentiate from
the existing ones.
- Prerequisites are unique because
they depend on the course in the
section table.
4.
SECTION
Section_Identifier
5.
GRADE_REPORT
Student_number
&
Section_Identifier
-Few courses have prerequisites, few

dont so it is really important to make
sure that Prerequisite_number is
unique.
-Sections offered in a particular
semester must be different to avoid
the overlapping of the classes.
-This would affect the registration
process as also.
-It also depends on the year if the
course is newly added.
-It depends on the professor if he
wants to add an extra section or not
based on the number students
enrolled.
-Student number should be unique as
mentioned above even though they
have same names.
-Section Identified is a unique number
as stated above as it depends on the
semester, year the course is offered.
5.16 Consider the following relations for a database that keeps track of student enrollment in
courses and the books adopted for each course:
STUDENT(Ssn, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(Ssn, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_isbn)
TEXT(Book_isbn, Book_title, Publisher, Author)
Solution
The schema of this question has the following four foreign keys:
3. the attribute SSN of relation ENROLL that references relation STUDENT,
4. the attribute Course# in relation ENROLL that references relation COURSE,
5. the attribute Course# in relation BOOK_ADOPTION that references relation COURSE, and
6. the attribute Book_ISBN of relation BOOK_ADOPTION that references relation TEXT.
We now give the queries in relational algebra:
5.18 Database design often involves decisions about the storage of attributes. For example, a
Social Security number can be stored as one attribute or split into three attributes (one for each
of the three hyphen-delineated groups of numbers in a Social Security numberXXX-XXXXXX). However, Social
Security numbers are usually represented as just one attribute. The decision is based on how
the database will be used. This exercise asks you to think about specific situations where
dividing the SSN is useful.
Solution:
The Social Security number is a nine-digit number in the format "AAA-GG-SSSS". The number is
divided into three parts.
The Area Number, the first three digits, is assigned by geographical region. Prior to 1973, cards
were issued in local Social Security offices around the country and the Area Number represented the
office code where the card was issued.
The middle two digits are the Group Number. The Group Numbers range from 01 to 99.
The last four digits are Serial Numbers. They represent a straight numerical sequence of digits from
0001 to 9999 within the group.
In general, if the each attribute has an independent logical existence based on the application, it
would make sense to store it in a separate column otherwise there is no advantage in storing each
sub attribute separately. For example, SSN need not be split into its component unless we are using
the sub sequences to make deductions about validity, geography, etc. In the two cases below, it
made logical and business sense to split the attributes.
a. We need the area code (also know as city code in some countries) and perhaps the country
code (for dialing international phone numbers).
b. I would recommend storing the numbers in a separate attribute as they have their own
independent existence. For example, if an area code region were split into two regions, it
would
c. Change the area code associated with certain numbers, and having area code in a separate
attribute will make it is easier to update the area code attribute by itself.
d. I would recommend splitting first name, middle name, and last name into different attributes
as it is likely that the names may be sorted and/or retrieved by the last name, etc.
e. In general, if the each attribute has an independent logical existence based on the
application, it would make sense to store it in a separate column otherwise there is no clear
advantage. For example, SSN need not be split into its component unless we are using the
subsequences to make deductions about validity, geography, etc. In the two cases above, it
made logical and business sense to split the attributes.
5.20 Recent changes in privacy laws have disallowed organizations from using Social Security
numbers to identify individuals unless certain restrictions are satisfied. As a result, most U.S.
universities cannot use SSNs as primary keys (except for financial data). In practice,
Student_id, a unique identifier assigned to every student, is likely to be used as the primary
key rather than SSN since Student_id can be used throughout the system.
a. Some database designers are reluctant to use generated keys (also known as surrogate
keys) for primary keys (such as Student_id) because they are artificial. Can you propose
any natural choices of keys that can be used to identify the student record in a
UNIVERSITY database?
b. Suppose that you are able to guarantee uniqueness of a natural key that includes last
name. Are you guaranteed that the last name will not change during the lifetime of the
database? If last name can change, what solutions can you propose for creating a
primary key that still includes last name but remains unique?
c. What are the advantages and disadvantages of using generated (surrogate) keys?
Solution
a. We can consider taking a superkey which consists of last_name and prefered_phone_number for
identifying a particular student record in the database. So if want to identify a particular student we
can identify the student using both last_name and prefered_phone_number.
b. There are less chances of student changing his last_name. But even if last_name gets updated,
we can still search a student using the superkey based on his updated last_name and
prefered_phone_number.
c. Advantages of surrogate keys

i. It gives us an invariant key without any worries about choosing a unique primary key.
ii. Has no meaning and thus no privacy violations
iii. Can use autonumbers, which is usually fast
iv. Avoid that data is ambiguous, due to potential reuse of production keys
v. A surrogate key is an artificial key which is internally generated by the system to act as a
primary key for a particular relation in order to identify a particular tuple in the relation
vi. Surrogate keys do not change while the row exists. Applications cannot lose their reference to
a row in the database since the identifier never changes. The primary or natural key data can
always be modified
vii. Surrogate keys are useful in the event of requirement gets changed i.e. when attributes that
uniquely identify an entity might change, which might invalidate the suitability of natural keys.
viii.
ix.
x.
xi.
Some problem domains do not clearly identify a suitable natural key. Surrogate key avoids
choosing a natural key that might be incorrect.
Surrogate keys are useful for indexing in order to increase performance of database. A nonredundant distribution of keys causes the resulting b-tree index to be completely balanced.
Surrogate keys are also less expensive to join (fewer columns to compare) than compound
keys.
While using several database application development systems, drivers, and object-relational
mapping systems, such as Ruby on Rails or Hibernate, it is much easier to use an integer or
GUID surrogate keys for every table instead of natural keys in order to support databasesystem-agnostic operations and object-to-row mapping.
When every table has a uniform surrogate key, some tasks can be easily automated by writing
the code in a table-independent way.
It is possible to design key-values that follow a well-known pattern or structure which can be
automatically verified.
Disadvantages of Surrogate Key

i. They do not have a business meaning (making some aspects of database management
challenging)
ii. They are slightly less efficient (because they require another pass when inserting a row to
return the generated key to the application)

Database Assignment 2 Solution NYU

Cargado por

Información del documento

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Database Assignment 2 Solution NYU

Cargado por

Copyright:

Formatos disponibles

Name Singh, Siddharth Kumar

Section: CSCI-GA.2433-001 - Spring 2016

Answers to Individual Questions:

Assumptions provided when required.

Affirmation of my Independent Effort:

Siddharth Kumar Singh

-Few courses have prerequisites, few

We now give the queries in relational algebra:

c. Advantages of surrogate keys

Disadvantages of Surrogate Key

También podría gustarte