MC0067 Database Management System

MC0067- DATABASE MANAGEMENT SYSTEM
Assignment Set 1
1. Describe the concept of Clustering. What is intra-file Clustering? Ans: Clustering : Clustering means that records related to each other are stored physically beside each other. Clustering is a method of storing data on a disc. A cluster is used to store tuples from one or more relations physically close to other tuples in the database. The purpose of clustering is to speed up the performance of certain types of queries. When tuples that are physically close to each other are retrieved they are retrieved more quickly than tuples that are not physically close to each other Because clustering affects how the data is actually stored on the disc, the decision to use clustering in the database is part of the physical database design process. Clustering does not affect the applications that access the relations which have been clustered. Clustered and unclustered relations appear the same to users of the system. Intra-file Clustering : Data items in two or more files are stored together.
Shipments from one file are stored beside suppliers in another file. In inter-file clustering records from one file are stored close to records from another file. For example, a shipment from a shipments file would be stored close to the supplier of the shipment.
Page | 1
MC0067- DATABASE MANAGEMENT SYSTEM Q 2. Write about:

y y y y
Integrity Rules Relational Operators with examples for each Linear Search Collision Chain
Ans: Integrity Rules: These are the rules which a relational database follows in order to stay accurate and accessible. These rules govern which operations can be performed on the data and on the structure of the database. There are three integrity rules defined for a relational databse,which are:y
y y y y y y y
Distinct Rows in a Table - this rule says that all the rows of a table should be distinct to avoid in ambiguity while accessing the rows of that table. Most of the modern database management systems can be configured to avoid duplicate rows. Entity Integrity (A Primary Key or part of it cannot be null) - this rule says that 'null' is special value in a relational database and it doesn't mean blank or zero. It means the unavailability of data and hence a 'null' primary key would not be a complete identifier. This integrity rule is also termed as entity integirty. Referential Integrity - this rule says that if a foreign key is defined on a table then a value matching that foreign key value must exist as th e primary key of a row in some other table. The following are the integrity rules to be satisfied by any relation. No Component of the Primary Key can be null. The Database must not contain any unmatched Foreign Key values. This is called the referential integrity rule. Unlike the case of Primary Keys, there is no integrity rule saying that no component of the foreign key can be null. This can be logically explained with the help of the following example: Consider the relations Employee and Account as given below. Employee Emp# X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal EmpAcc# 120001 120002 Null 120003
Account ACC# 120001 OpenDate 30-Aug-1998 BalAmt 5000

Page | 2
MC0067- DATABASE MANAGEMENT SYSTEM 120002 120003 120004 29-Oct-1998 01-Jan-1999 04-Mar-1999 1200 3000 500
EmpAcc# in Employee relation is a foreign key creating reference from Employee to Account. Here, a Null value in EmpAcc# attribute is logically possible if an Employee does not have a bank account. If the business rules allow an employee to exist in the system without opening an account, a Null value can be allowed for EmpAcc# in Employee relation. In the case example given, Cust# in Ord_Aug cannot accept Null if the business rule insists that the Customer No. needs to be stored for every order placed. Relational Operators: In the relational model, the database objects seen so far have specific names: Name Relation Tuple Attribute Cardinality Degree(or Arity) View Meaning Table Record(Row) Field(Column) Number of Records(Rows) Number of Fields(Columns) Query/Answer table
On these objects, a set of operators (relational operators) is provided to manipulate them: 1. Restrict 2. Project 3. Union 4. Difference 5. Product 6. Intersection 7. Join 8. Divide Restrict: Restrict simply extract records from a table. it is also known as Select, but not the same SELECT as defined in SQL. Project: Project selects zero or more fields from a table and generates a new table that contains all of the records and only the selected fields (with no duplications).
Page | 3
MC0067- DATABASE MANAGEMENT SYSTEM Union: Union creates a new table by adding the records of one table to another tables, must be compatible: have the same number of fields and each of the field pairs has to have values in the same domain. Difference: The difference of two tables is a third table which contains the records which appear in the first BUT NOT in the second. Product: The product of two tables is a third which contains all of the records in the first one added to each of the records in the second. Intersection: The intersection of two tables is a third tables which contains the records which are common to both. Join: The join of two tables is a third which contains all of the records in the first and the second which are related.
Divide: Dividing a table by another table gives all the records in the first which have values in their fields matching ALL the records in the second.
The eight relational algebra operators are 1. SELECT To retrieve specific tuples/rows from a relation.
Ord# 101 104
OrdDate Cust# 02-08-94 002 18-09-94 002
2. PROJECT To retrieve specific attributes/columns from a relation.
Page | 4
Description Power Supply 101-Keyboard 2000 Mouse 800 MS-DOS 6.0 5000 MS-Word 6.0 8000
Price 4000 2000 800 5000 8000
2. PRODUCT To obtain all possible combination of tuples from two relations.
Ord# 101 101 101 101 101 102 102
OrdDate 02-08-94 02-08-94 02-08-94 02-08-94 02-08-94 11-08-94 11-08-94
O.Cust# 002 002 002 002 002 003 003
C.Cust# 001 002 003 004 005 001 002
CustName Shah Srinivasan Gupta Banerjee Apte Shah Srinivasan
City Bombay Madras Delhi Calcutta Bombay Bombay Madras
Page | 5
MC0067- DATABASE MANAGEMENT SYSTEM 4. UNION To retrieve tuples appearing in either or both the relations participating in the UNION.
Eg: Consider the relation Ord_Jul as follows (Table: Ord_Jul)
Note: The union operation shown above logically implies retrieval of records of Orders placed in July or in August Ord# 101 102 101 102 103 104 105 OrdDate 03-07-94 27-07-94 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94 Cust# 001 003 002 003 003 002 005
5. INTERSECT To retrieve tuples appearing in both the relations participating in the INTERSECT.
Page | 6
MC0067- DATABASE MANAGEMENT SYSTEM Eg: To retrieve Cust# of Customers whove placed orders in Cust# 003 6.DIFFERENCE To retrieve tuples appearing in the first relation participating in the DIFFERENCE but not the second. July and in August
Eg: To retrieve Cust# of Customers whove placed orders in July but not in August Cust# 001
7. JOIN To retrieve combinations of tuples in two relations based on a common field in both the relations.
Page | 7
Eg: ORD_AUG join CUSTOMERS (here, the common column is Cust#)
Ord# 101 102 103 104 105
OrdDate 02-08-94 11-08-94 21-08-94 28-08-94 30-08-94
Cust# 002 003 003 002 005
CustNames Srinivasan Gupta Gupta Srinivasan Apte
City Madras Delhi Delhi Madras Bombay
Note: The above join operation logically implies retrieval of details of all orders and the details of the corresponding customers who placed the orders. Such a join operation where only those rows having corresponding rows in the both the relations are retrieved is called the natural join or inner join. This is the most common join operation. Consider the example of EMPLOYEE and ACCOUNT relations. EMPLOYEE EMP # X101 X102 X103 X104 EmpName Shekhar Raj Sharma Vani EmpCity Bombay Pune Nagpur Bhopal Acc# 120001 120002 Null 120003
Page | 8
MC0067- DATABASE MANAGEMENT SYSTEM ACCOUNT Acc# 120001 120002 120003 120004 OpenDate 30. Aug. 1998 29. Oct. 1998 1. Jan. 1999 4. Mar. 1999 BalAmt 5000 1200 3000 500
A join can be formed between the two relations based on the common column Acc#. The result the (inner) join is : Emp# X101 X102 X104 EmpName Shekhar Raj Vani EmpCity Bombay Pune Bhopal Acc# 120001 120002 120003 OpenDate 30. Aug. 1998 29. Oct. 1998 1. Jan 1999 BalAmt 5000 1200 3000
Note that, from each table, only those records which have corresponding records in the other table appear in the result set. This means that result of the inner join shows the details of those employees who hold an account along with the account details. The other type of join is the outer join which has three variations the left outer join, the right outer join and the full outer join. These three joins are explained as follows: The left outer join retrieves all rows from the left-side (of the join operator) table. If there are corresponding or related rows in the right-side table, the correspondence will be shown. Otherwise, columns of the right-side table will take null values.
Page | 9
MC0067- DATABASE MANAGEMENT SYSTEM EMPLOYEE left outer join ACCOUNT gives:
Emp# X101 X102 X103 X104
EmpName Shekhar Raj Sharma Vani
EmpCity Bombay Pune Nagpur Bhopal
Acc# 120001 120002 NULL 120003
OpenDate 30. Aug. 1998 29. Oct. 1998 NULL 1. Jan 1999
BalAmt 5000 1200 NULL 3000
The right outer join retrieves all rows from the right-side (of the join operator) table. If there are corresponding or related rows in the left-side table, the correspondence will be shown. Otherwise, columns of the left-side table will take null values.
EMPLOYEE right outer join ACCOUNT gives:
Emp# X101 X102 X104 NULL
EmpName Shekhar Raj Vani NULL
EmpCity Bombay Pune Bhopal NULL
Acc# 120001 120002 120003 120004
OpenDate 30. Aug. 1998 29. Oct. 1998 1. Jan 1999 4. Mar. 1999
BalAmt 5000 1200 3000 500
(Assume that Acc# 120004 belongs to someone who is not an employee and hence the details of the Account holder are not available here) The full outer join retrieves all rows from both the tables. If there is a correspondence or relation between rows from the tables of either side, the correspondence will be shown. Otherwise, related columns will take null values.
Page | 10
MC0067- DATABASE MANAGEMENT SYSTEM EMPLOYEE full outer join ACCOUNT gives:
Emp# X101 X102 X103 X104 NULL
EmpName Shekhar Raj Sharma Vani NULL
EmpCity Bombay Pune Nagpur Bhopal NULL
Acc# 120001 120002 NULL 120003 120004
OpenDate 30. Aug. 1998 29. Oct. 1998 NULL 1. Jan 1999 4. Mar. 1999
BalAmt 5000 1200 NULL 3000 500
8. DIVIDE Consider the following three relations:
R1 divide by R2 per R3 gives: a Thus the result contains those values from R1 whose corresponding R2 values in R3 include all R2 values. Linear Search Linear search, also known as sequential search, means starting at the beginning of the data and checking each item in turn until either the desired item is found or the end of the data is reached. Linear search is a search algorithm, also known as sequential search that is suitable for searching a list of data for a particular value. It operates by checking every element of a list one at a time in sequence until a match is found. The Linear Search, or sequential search, is simply examining each element in a list one by one until the desired element is found. The Linear Search is not very efficient. If the item of data to be found is at the end of the list, then all previous items must be read and checked before the item that
Page | 11
MC0067- DATABASE MANAGEMENT SYSTEM matches the search criteria is found. This is a very straightforward loop comparing every element in the array with the key. As soon as an equal value is found, it returns. If the loop finishes without finding a match, the search failed and -1 is returned. For small arrays, linear search is a good solution because it's so straightforward. In an array of a million elements linear search on average will take500, 000 comparisons to find the key. For a much faster search, take a look at binary search. Algorithm For each item in the database if the item matches the wanted info exit with this item Continue loop wanted item is not in database Collision Chain: In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). Thus, a hash table implements an associate array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought. Ideally, the hash function should map each possible key to a unique slot index, but this ideal is rarely achievable in practice (unless the hash keys are fixed; i.e. new entries are never added to the table after it is created). Instead, most hash table designs assume that hast collisionsdifferent keys that map to the same hash valuewill occur and must be accommodated in some way.
Q3) Discuss the different types of user friendly interfaces & the types of users who typically use it? With what other computer system software does a DBMS interact? Ans: 1. Interfaces for the DBA Most database systems contain privileged commands that can be used only by the DBAs staff. These includes commands for creating accounts, setting system parameters, granting account authorization, changing a schema & re-organizing the storage structure of a database. 2. Menu-Based Interfaces for Web Clients or Browsing These interfaces present the user with lists of options called menus that lead the user through the formulation of a request. By using these menus, users do not want to memorize the specific commands & syntax of a query language. Pull down menus are a very popular technique used in web based user interfaces. They also called browsing interfaces which allow a user to look through the contents of a database in an exploratory & unstructured manner.
Page | 12
3. Natural Language Interfaces (NLI) These interfaces accept requests written in English like languages & try to understand them. It has its own schema, which is similar to the database conceptual schema or a dictionary. The NLI refers to the words in its schema as well as to the set of words in its dictionary to interpret the request. Then the interface generates a high level query corresponding to the NLI request. 4. Forms-Based Interfaces Forms-based interface displays a form to the user. Users can fill either all the data fields or some of them as per the requirements. Forms are usually designed & programmed for nave users as interfaces to canned transactions. DBMSs have forms specification languages, which are special languages to help programmers specify such forms. 5. Graphical User Interfaces (GUI) A graphical User Interface or GUI displays a schema to the user in diagrammatic form. User can specify a query by manipulating the diagram. GUIs utilize both menus & forms. Most GUIs use a pointing device to pick the parts & place them. 6. Interface for Parametric Users Parametric users usually have small set of operations to be performed repeatedly. So system analysts & programmers design & implement a special interface for each known class of nave users. The commands are limited so that number of keystrokes to be pressed will be reduced. These are used by bank tellers. With what other computer system software does a DBMS interact? 1. Data dictionary systems 2. CASE tools 3. Information repository systems 4. Application development environments 5. Communication software 4. Define the following terms: disk, disk pack, track, block, cylinder, sector, interblock gap, read/write head. Ans: Disk: Disk s are used for storing large amounts of data. The most basic unit of data on the disk is a single bit of information. By magnetizing a area on disk in certain ways, one can make it represent a bit value of either 0 or 1. To code information, bits are grouped into bytes. Byte sizes are typically 4 to 8 bits, depending on the computer and the device. We assume that one character is stored in a single byte, and we use the terms byte and character interchangeably. The capacity of a disk is the number of bytes it can store, which is usually very large. Small floppy disks used with microcomputers typically hold from 400 kbytes to 1.5 Mbytes; hard disks for micros typically hold from several hundred Mbytes up to a few Gbytes. Whatever their capacity, disks are all made of magnetic material shaped as a thin circular disk and protected by a plastic or acrylic cover. A disk is single-sided if it stores information on only one of its surface and double-sided if both surfaces are used. Disk Packs:
Page | 13
MC0067- DATABASE MANAGEMENT SYSTEM To increase storage capacity, disks are assembled into a disk pack, which may include many disks and hence many surfaces. A Disk pack is a layered grouping of hard disk platters (circular, rigid discs coated with a magnetic data storage surface). Disk pack is the core component of a hard disk drive. In modern hard disks, the disk pack is permanently sealed inside the drive. In many early hard disks, the disk pack was a removable unit, and would be supplied with a protective canister featuring a lifting handle.
Track and cylinder: The (circular) area on a disk platter which can be accessed without moving the access arm of the drive is called track. Information is stored on a disk surface in concentric circles of small width, for each having a distinct diameter. Each circle is called a track. For disk packs, the tracks with the same diameter on the various surfaces are called cylinder because of the shape they would form if connected in space. The set of tracks of a disk drive which can be accessed without ch anging the position of the access arm are called cylinder. The number of tracks on a disk range from a few hundred to a few thousand, and the capacity of each track typically range from tens of Kbytes to 150 Kbytes. Sector: A fixed size physical data block on a disk drive. A track usually contains a large amount of information; it is divided into smaller blocks or sectors. The division of a track into sectors is hard-coded on the disk surface and cannot be changed. One type of sector organization calls a portion of a track that subtends a fixed angle at the center as a sector. Several other sector organizations are possible, one of which is to have the sectors subtend smaller angles at the center as one moves away, thus maintaining a uniform density of recording. Block and Interblock Gaps: A physical data record, separated on the medium from other blocks by inter-block gaps is called block. The division of a track into equal sized disk blocks is set by the operating system during disk formatting. Block size is fixed during initialization and cannot be changed dynamically. Typical disk block sizes range from 512 to 4096 bytes. A disk with hard coded sectors often has the sectors subdivided into blocks during initialization. An area between data blocks which contains no data and which separat s the blocks e is called interblock gap. Blocks are separated by fixed size interblock gaps, which include specially coded control information written during disk initialization. This information is used to determine which block on the track follows each interblock gap. Read/write Head: A tape drive is required to read the data from or to write the data to a tape reel. Usually, each group of bits that forms a byte is stored across the tape, and the bytes themselves are stored consecutively on the tape. A read/write head is used to read or write data on tape. Data records on tape are also stored in blocks-although the blocks may be substantially larger than those for disks, and interblock gaps are also
Page | 14
MC it l i
B it t t
M i lt t
M iti t t t
M t t i t i li t l
Page | 15
ASSIGNMENT SET 2
1. Explain the purpose of Data Modeling. What are the basic constructs of E-R Diagrams? Ans: Data modeling in is the process of creating a data model by applying formal data model descriptions using data modeling techniques. Data modeling is the act of exploring data-oriented structures. Like other modeling artifacts data models can be used for a variety of purposes, from highlevel conceptual models to physical data models. Data modeling is the formalization and documentation of existing processes and events that occur during application software design and development. Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data flows and processes, creating a blueprint for construction and/or re-engineering. Basic Constructs of E-R Modeling: The ER model views the real world as a construct of entities and association between entities. The basic constructs of ER modeling are entities, attributes, and relationships. Entity: An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities of some domain. When we speak of an entity we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world. An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term. Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem. Relationship: A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem. Attributes:
Page | 16
MC0067- DATABASE MANAGEMENT SYSTEM Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute. 2 ) . What is functional dependence? What are the objectives of Normalization? Ans : A functional dependency occurs when one attribute in a relation uniquely determines another attribute. This can be written A -> B which would be the same as stating "B is functionally dependent upon A." Examples: In a table listing employee characteristics including Social Security Number (SSN) and name, it can be said that name is functionally dependent upon SSN (or SSN -> name) because an employee's name can be uniquely determined from their SSN. However, the reverse statement (name -> SSN) is not true because more than one employee can have the same name but different SSNs. Objectives of Normalization The objectives of normalization process are*: y To make it feasible to represent any relation in the database. y To free relations from undesirable insertion,update, and deletion anomalies. y To reduce the need for restructuring the relations asnew data types are introduced.
3 ) What is a relationship type? Explain the differences among a relationship instance, a relationship type, and a relationship set Ans: There are three type of relationships 1) One to one 2) One to many 3) Many to many Say we have table1 and table2 For one to one relationship, a record(row) in table1 will have at most one matching record or row in table2 I.e. it mustnt have two matching records or no matching records in table2. For one to many, a record in table1 can have more than one record in table2 but not vice versa Lets take an example, Say we have a database which saves information about Guys and whom they are dating. We have two tables in our database Guys and Girls
Page | 17
MC0067- DATABASE MANAGEMENT SYSTEM Guy id 1 2 3 Girl id 1 2 3 Guy name Andrew Bob Craig Girl name Girl1 Girl2 Girl3
Here in above example Guy ID and Girl ID are primary keys of their respective table. Say Andrew is dating Girl1, Bob Girl2 and Craig is dating Girl3. So we are having a one to one relationship over there. So in this case we need to modify the Girls table to have a Guy id foreign key in it.
Girl id 1 2 3
Girl name Girl1 Girl2 Girl3
Guy id 1 2 3
Now let say one guy has started dating more than one girl. i.e. Andrew has started dating Girl1 and say a new Girl4 That takes us to one to many relationships from Guys to Girls table. Now to accommodate this change we can modify our Girls table like this Girl Id 1 2 3 4 Girl Name Girl1 Girl2 Girl3 Girl4 Guy Id 1 2 3 1
Now say after few days, comes a time where girls have also started dating more than one boy i.e. many to many relationships
Page | 18
MC0067- DATABASE MANAGEMENT SYSTEM So the thing to do over here is to add another table which is called Junction Table, Associate Table or linking Table which will contain primary key columns of both girls and guys table. Let see it with an example Guy id 1 2 3 Girl id 1 2 3 Guy name Andrew Bob Craig Girl name Girl1 Girl2 Girl3
Andrew is now dating Girl1 and Girl2 and Now Girl3 has started dating Bob and Craig so our junction table will look like this Guy ID Girl ID 1 1 1 2 2 2 2 3 3 3 It will contain primary key of both the Girls and Boys table. A relationship type R among n entity types E1, E2, , En is a set of associations among entities from these types. Actually, R is a set of relationship instances ri where each ri is an n-tuple of entities (e1, e2, , en), and each entity ej in ri is a member of entity type Ej, 1jn. Hence, a relationship type is a mathematical relation on E1, E2, , En, or alternatively it can be defined as a subset of the Cartesian product E1x E2x xEn . Here, entity types E1, E2, , En defines a set of relationship, called relationship sets. Relationship instance: Each relationship instance ri in R is an association of entities, where the association includes exactly one entity from each participating entity type. Each such relationship instance ri represent the fact that the entities participating in ri are related in some way in the corresponding miniworld situation. For example, in relationship type WORKS_FOR associates one EMPLOYEE and DEPARTMENT, which associates each employee with the department for which the employee works. Each relationship instance in the relationship set WORKS_FOR associates one EMPLOYEE and one DEPARTMENT.
Page | 19
MC0067- DATABASE MANAGEMENT SYSTEM 5. Discuss the advantages and disadvantages of using (a) an unordered file, (b) an ordered file. Which operations can be performed efficiently on each of these organizations, and which operations are expensive? Ans: Distinguish between unordered file and ordered file . Ordered file is a file that is sorted in the order of the primary key field. We can physically order the records of a file on disk based on the values of one of their fields called the ordering field.This leads to an ordered or sequential file. Ordered records have some advantages over unorered files.Reading the records in order of the ordering key values becomes extremely efficient,because no sorting is required.Finding the next record from the current one in order of the ordering key usually requires no additional block accesses,because the next record is in the same block as the current one(unless the current record is the last one in the block).Binary search is applied. In files of unordered records,they are placed in the file in the order in which they are inserted,so new records are inserted at the end of the file.Inserting a new record is very efficient,the last disk block of the file is copied into the buffer,the new record is added and the block is rewritten back to the disk.To delete a record we must find its block ,copy the block into a buffer then delete the record from the buffer and finally rewrite the block back to the disk.Linear search is applied. Operations on Files Operations on files are usually grouped into retrieval operations and update operations. The former do not change any data in the file, but only locate certain records so that their field values can be examined and processed. The latter change the file by insertion or deletion of records or by modification of field values. In either case, we may have to select one or more records for retrieval, deletion, or modification based on a selection condition (or filtering condition), which specifies criteria that the desired record or records must satisfy. Consider an EMPLOYEE file with fields NAME, SSN, SALARY, JOBCODE, and DEPARTMENT. A simple selection condition may involve an equality comparison on some field valuefor example, (SSN = 123456789) or (DEPARTMENT = Research). More compl ex conditions can involve other types of comparison operators, such as > or ; an example is (SALARY 30000). The general case is to have an arbitrary Boolean expression on the fields of the file as the selection condition. Search operations on files are generally based on simple selection conditions. A complex condition must be decomposed by the DBMS (or the programmer) to extract a simple condition that can be used to locate the records on disk. Each located record is then checked to determine whether it satisfies the full selection condition. For example, we may extract the simple condition (DEPARTMENT = Research) from the complex condition ((SALARY 30000) AND (DEPARTMENT = Research)); each record satisfying (DEPARTMENT = Research) is located and then tested to see if it also satisfies (SALARY 30000). When several file records satisfy a search condition, the first recordwith respect to the physical sequence of file recordsis initially located and designated the current record. Subsequent search operations commence from this record and locate the next record in the file that satisfies the condition. Actual operations for locating and accessing file records vary from system to system. Below, we present a set of representative operations. Typically, high-level programs, such as DBMS software programs, access the records by using these commands, so we sometimes refer to program variables in the following descriptions:
Page | 20
MC0067- DATABASE MANAGEMENT SYSTEM Open: Prepares the file for reading or writing. Allocates appropriate buffers (typically at least two) to hold file blocks from disk, and retrieves the file header. Sets the file pointer to the beginning of the file. Reset: Sets the file pointer of an open file to the beginning of the file. Find (or Locate): Searches for the first record that satisfies a search condition. Transfers the block containing that record into a main memory buffer (if it is not already there). The file pointer points to the record in the buffer and it becomes the current record. Sometimes, different verbs are used to indicate whether the located record is to be retrieved or updated. Read (or Get): Copies the current record from the buffer to a program variable in the user program. This command may also advance the current record pointer to the next record in the file, which may necessitate reading the next file block from disk. FindNext: Searches for the next record in the file that satisfies the search condition. Transfers the block containing that record into a main memory buffer (if it is not already there). The record is located in the buffer and becomes the current record. Delete: Deletes the current record and (eventually) updates the file on disk to reflect the deletion. Modify: Modifies some field values for the current record and (eventually) updates the file on disk to reflect the modification. Insert: Inserts a new record in the file by locating the block where the record is to be inserted, transferring that block into a main memory buffer (if it is not already there), writing the record into the buffer, and (eventually) writing the buffer to disk to reflect the insertion. Close: Completes the file access by releasing the buffers and performing any other needed cleanup operations. The preceding (except for Open and Close) are called record-at-a-time operations, because each operation applies to a single record. It is possible to streamline the operations Find, FindNext, and Read into a single operation, Scan, whose description is as follows: Scan: If the file has just been opened or reset, Scan returns the first record; otherwise it returns the next record. If a condition is specified with the operation, the returned record is the first or next record satisfying the condition. In database systems, additional set-at-a-time higher-level operations may be applied to a file. Examples of these are as follows: FindAll: Locates all the records in the file that satisfy a search condition. FindOrdered: Retrieves all the records in the file in some specified order. Reorganize: Starts the reorganization process. As we shall see, some file organizations require periodic reorganization. An example is to reorder the file records by sorting them on a specified field. At this point, it is worthwhile to note the difference between the terms file organization and access method. A file organization refers to the organization of the data of a file into records, blocks, and access structures; this includes the way records and blocks are placed on the storage medium and interlinked. An access method, on the other hand, provides a group of operationssuch as those listed earlierthat can be applied to a file. In general, it is possible to apply several access methods to a file organization. Some access methods, though, can be applied only to files organized in certain ways. For example, we cannot apply an indexed access method to a file without an index (see Chapter 6).
Page | 21
MC0067- DATABASE MANAGEMENT SYSTEM Usually, we expect to use some search conditions more than others. Some files may be static, meaning that update operations are rarely performed; other, more dynamic files may change frequently, so update operations are constantly applied to them. A successful file organization should perform as efficiently as possible the operations we expect to apply frequently to the file. For example, consider the EMPLOYEE file (Figure 05.07a), which stores the records for current employees in a company. We expect to insert records (when employees are hired), delete records (when employees leave the company), and modify records (say, when an employees salary or job is changed). Deleting or modifying a record requires a selection condition to identify a particular record or set of records. Retrieving one or more records also requires a selection condition. If users expect mainly to apply a search condition based on SSN, the designer must choose a file organization that facilitates locating a record given its SSN value. This may involve physically ordering the records by SSN value or defining an index on SSN (see Chapter 6). Suppose that a second application uses the file to generate employees paychecks and requires that paychecks be grouped by department. For this application, it is best to store all employee records having the same department value contiguously, clustering them into blocks and perhaps ordering them by name within each department. However, this arrangement conflicts with ordering the records by SSN values. If both applications are important, the designer should choose an organization that allows both operations to be done efficiently. Unfortunately, in many cases there may not be an organization that allows all needed operations on a file to be implemented efficiently. In such cases a compromise must be chosen that takes into account the expected importance and mix of retrieval and update operations. In the following sections and in Chapter 6, we discuss methods for organizing records of a file on disk. Several general techniques, such as ordering, hashing, and indexing, are used to create access methods. In addition, various general techniques for handling insertions and deletions work with many file organizations.
Page | 22

MC0067 Database Management System

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

MC0067 Database Management System

Cargado por

Copyright:

Formatos disponibles

MC0067- DATABASE MANAGEMENT SYSTEM