Está en la página 1de 38

MCA 202, Data Warehousing & Data Mining

UNIT-1

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Learning Objective
Escalating need for strategic information Building blocks of data warehouse Defining the business requirements

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Why do enterprise really need data warehouses?

Operational computer
Information to run day to day business Event driven Not directly suitable for review from different point Different kind of information for Strategic decisions
eg which product line to expand, which market should be strength Trend over time Review Sales quantities by product, salesperson, region etc.

Executives

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.1

MCA 202, Data Warehousing & Data Mining

Organizations use of data warehousing


Retail
Customer loyalty Market planning

Manufacturing
Cost reduction Logistics management

Financial
Risk management Fraud detection

Utilities
Asset management Resource management

Airlines
Route profitability Yield management

Government
Manpower planning Cost control

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Escalating Need for strategic information

Failures of Past decision-support systems, Operational versus decision-support systems Data warehousing the only viable solution

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Need for strategic information


After 1990s,business grew more complex. Corporate spread globally More competition is there Operational systems did provide info. To run day-to-day operations but managers,executives needed diff. Kinds of info. That could be used to make strategic decisions. DW is a new paradigm specifically intended to provide vital strategic info. Why do enterprises really need dw? ESCALATING NEED FOR STRATEGIC INFO.
The executives & managers who are responsible for keeping the enterprise competitive need info. to make proper decisions.they need info to formulate the business strategies ,establish goals ,set objectives & monitor results.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.2

MCA 202, Data Warehousing & Data Mining

Escalating need for Strategic Information


Who needs strategic information in an enterprise? Executives and managers To make proper decision
For keeping the enterprise competitive To formulate and execute business strategies Establish goals, Set objectives Monitor results.

What exactly information?

do

we

mean

by

strategic

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Some business Objectives


Retain the present customer base Increase the customer base by 15% over the next 5 years. Bring new product in 2 yrs Improve product quality levels in top 5 product group Gain market share by 10% in next 3 years Increase sale by 10% in East division

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Cont.. For making business objectives managers needs information for the following purpose: depth knowledge of companys operations. Monitor how the business factor change over time. Compare companys performance relative to competition and industry bench marks.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.3

MCA 202, Data Warehousing & Data Mining

Strategic information

Executives and managers


need to focus their attention on customers need and preferences, emerging technologies, sales and marketing results, quality levels of product and services.

This type of information needed to make decisions in formulation and execution of business strategies and objectives :
All these essentials information in one group is called Strategic Information

Strategic information is not for running the day to day operations of the business. It is important for the continued growth and survival of corporation.
U1.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

10

Characteristics of Strategic Information Integrated


Must have a single, enterprise wide view

Data Integrity
Information must be accurate and must conform to business rule.

Accessible
Easily accessible with responsive for analysis. intuitive access path and

Credible
Every business factor must have one and only one value.

Timely
Information must be available with in the stipulated time frame.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

11

Escalating need for strategic information Information Crisis Technology trends Opportunities and risks Failure of past decision support systems

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

12

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.4

MCA 202, Data Warehousing & Data Mining

Information Crisis.
In IT Dept. of big or small organization. various computer applications in company. data bases and the Quantities of data that support the operation of company. How many years worth of customer data is saved and available? How many years worth of financial data is kept in storage? 10years or 15 years Where is all this data ? On one platform? In legacy systems? In Client/server applications?

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

13

Information Crisis cont..


Facts faced by organization
Organizations have lots of data. IT systems are NOT effective at turning all the data into useful strategic information.

In organization we have lot of data, then why executives and managers uses this data for making strategic decisions?
Information Crisis Data available not accessible
Old technology/different platform

For proper decision making on over all corporate strategies and objectives Information integrated from all systems. Data needed for strategic decision making must be in a format suitable for analyzing trends.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

14

Technology Trends
Computing Technology Main Frame Mini PC | Networking Client/Server

Human/Machine Interface Punch Card Video Display GUI VOICE

Processing Options Batch Online Networked

1950

60

70

80

90

2000

Growth of Information Technology

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

15

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.5

MCA 202, Data Warehousing & Data Mining

Opportunities and Risks


Examples of the opportunities made available to companies through the use of strategic information: A community based pharmacy competes on a national scale with more than 800 franchised pharmacies coast to coast gains
in-depth understanding of what customers buy,
reduced inventory levels,

improved effectiveness of promotions and marketing campaigns


improved profitability for the company.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

16

Opportunities and Risks cont..


Consider the cases where risks and threats of failures existed before strategic information was made available for analysis and decision making. Example: For a world leading supplier of systems and components to automobile and light truck equipment manufacturer across nearly 100 plants, inability to benchmark quality matrices and time consuming manual collection of data. Reports needed to support decision making tool weeks. Not easy for company to get company wide integrated information
17

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Failures of Past Decision Support System


A marketing department is concern about performance of the west cost region.
The marketing Vice President wants to get some reports from the IT department to analyze the performance over the past two years, Product by Product, and compared to monthly targets. CEO wants to deliver as soon as possible to manager and manager immediately go to the sub ordinate, to give marketing report. There is no report available
gather the data from multiple application (different platform) and start from scratch These reports lacks the actual agenda, which causes in consistencies among the data obtained from different applications.

It is also possible the person from IT dept.


create a report from single application for his/her convenience, so such information may not be helpful in strategic decisions making.

So, from the scenario we come to know that when information is scattered in different places with forms, it is difficult to use the available information in strategic Decisions.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

18

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.6

MCA 202, Data Warehousing & Data Mining

Operational Vs Decision Support Systems


The fundamental reason for the in ability to provide strategic information is that we have been trying all along to provide strategic information from the operational systems. These operational systems such as order processing, inventory control, claims processing, out patient billing , and so on are not designed or intended to provide strategic information.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

19

Cont..
Making the Business Turn Get data in

wheels

of

Watching the wheels Business Turn Get information out


of

Take an order Process a claim Make a shipment Generate an invoice Receive cash Reserve an air line seat

Shows the top-selling products. Shows the problem region. Shows the highest margins Alert whenever a district sells below target.

Operational systems support the basic business processes of the company Day to day business

Decision Support Systems (DSS) run the core business processes.


No immediate payout DSS systems are developed to get str. Info out of the data base where as OLTP systems are designed to put the data into database
20

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

History of decision support systems # Ad-Hoc Reports This was the earliest stage Users would send the request the IT dept. for special reports. IT would write special program typically one for each request, and produce the ad Hoc reports. # special Extract Programs That stage was attempt by IT to anticipate the reports that would be requested from time to time. IT would write a suit of programs and run the programs periodically to extract the data from various applications IT would create and keep the extract files to fulfill any request for special reports.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

21

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.7

MCA 202, Data Warehousing & Data Mining

Cont..
# Small Applications In this Stage It formalized the extract process Create simple application based on extracted files. User could specify the parameters for each special report. The Report printing programs would prints the reports based on user-specified parameters # Information Center In early 1970s,Major corporations created Information centers. Information center, User could go to request ad hoc reports or view special reports on screen. These were predetermined reports or screens. IT personnel were there to help the users to obtain desired information.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

22

Cont..
# Decision Support Systems In this Stage, Companies began to build more sophisticated systems to provide strategic information. Systems were menu driven and provided on line information. Systems were supported by extracted files. User could specify the parameters for each special report. Ability to print the reports.

# Executive Information systems


This was first attempt to bring the strategic information to the executive desktop. Systems were designed to display key info. every day. Straight forward reports. Only preprogrammed screens and reports were available. It was not possible to see analysis by region, by product, or by any dimension unless such break downs were already programmed. This limitations caused frustration and executives info. Systems did not last long in many companies.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

23

What is basic reason for failure of all previous attempts by IT to provide strategic information?

The fundamental reason for the inability to provide strategic information is that we have been trying all along to provide strategic information from Operational systems. These info. Sys. Like order processing, inventory control, claims processing etc. are not designed to provide strategic information. We must get info. from different type of systems, only special designed decision support systems can provide strategic information.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

24

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.8

MCA 202, Data Warehousing & Data Mining

Typical OLAP Operations


Roll up (drill-up): summarize data
by climbing up hierarchy or by dimension reduction

Drill down (roll down): reverse of roll-up


from higher level summary to lower level summary or detailed data, or introducing new dimensions

Slice and dice:


project and select

Pivot (rotate):
reorient the cube, visualization, 3D to series of 2D planes.

Other operations
drill across: involving (across) more than one fact table drill through: through the bottom level of the cube to its back-end relational tables (using SQL)

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

25

Data Ware housing The only viable Solutions Need for different types of DSS to provide Strategic information. for analysis, discerning trends monitoring performance. Escalating Need for strategic information data ware housing is the only viable solution for providing Strategic information

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

26

New System Environment


Desirable features and processing requirements of new type of system environment.
Data Base designed for analytical tasks. Data from multiple applications. Easy to use and Conducive to long interactive sessions by users. Content updated periodically and stable Content to include current and historical data Ability for users to run queries and get results online. Ability for users to initiative reports.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

27

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.9

MCA 202, Data Warehousing & Data Mining

Processing Requirements in the New Environment



New environment for strategic information analytical 4 levels of analytical processing requirements

are

Running of Simple queries and report against current and historical data. Ability to perform What if Analysis in many different ways. Ability to Query, step back, analyze, and then continue to process to any desired length. Spot historical trends and apply them for future results.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

28

Business Intelligence at the data Ware House


Extraction, Cleansing, aggregation Operational Systems Data Transformation Basic Business Processes Key Measurements, Business dimensions.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

29

Definition Data warehouse is an information environment Provides an integrated and total view of the enterprise Makes the enterprise current and historical information easily available for decision making Make decision support transaction possible without hindering operational system. Renders organizations information consistent Present a flexible and interactive source of strategic information
30

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.10

MCA 202, Data Warehousing & Data Mining

Conclusion
Operational system are not for strategic information Data warehouse is an computing environment not product to provide strategic information
Data analysis and decision support Flexible and interactive User driven

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

31

Lets Discuss 1. How strategic information can increase the quality and realize opportunities with readily available strategic information
Insurance Company Airlines Company

Proposal to explain problems with reasons Why data warehouse is viable ? 2. A Senior Analyst (IT Dept.) of a company manufacturing automobile parts. Marketing VP complains about poor IT response in providing strategic information.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

32

Data Warehouse :Building Block Defining Features Data warehouses and data marts Overview of the components Metadata in the data warehouse

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

33

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.11

MCA 202, Data Warehousing & Data Mining

Defining Features
Key Defining Features of the Data ware house based on these Definitions. What is the nature of the Data in the Data Warehouse? How is this Data Different from the Data in any operational System? Why does it have to be different? How is the Data content in the Data Ware house used?

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

34

What is a Data Warehouse?


Defined in many different ways, but not rigorously.
A decision support database that is maintained separately from the organizations operational database Support information processing by providing a solid platform of consolidated, historical data for analysis.

A data warehouse is a subject- o riented, integrated, time v - ariant, and nonvolatile collection of data in support of managements decision mking a process.W. H. Inmon Data warehousing:
The process of constructing and using data warehouses

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

35

Data WarehouseSubject-Oriented
Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.

Operational Systems Data stored by individual applications. Data sets for an order processing application, These data sets provide the Data for all the functions for entering orders, Checking stock, Verifying customers credit, and assigning the order for shipment.

Subject-Oriented Data: But in Data Ware house, Data is stored by subjects. Business Subjects differ from organization to organization.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

36

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.12

MCA 202, Data Warehousing & Data Mining

Data WarehouseIntegrated
Constructed by integrating multiple, heterogeneous data sources
relational databases, flat files, on-line transaction records

Data cleaning and data integration techniques are applied.


Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

37

Data WarehouseTime Variant


The time horizon for the data warehouse is significantly longer than that of operational systems.
Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

Every key structure in the data warehouse


Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain time element.

The time-variant nature of the Data in a Data Warehouse.


Allows for analysis of the past. Relates information to the present. Enables forecasts for the future.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

38

Data WarehouseNon-Volatile
A physically separate store of data transformed from the operational environment. Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: initial loading of data and access of data.

Data from an operational system is added, deleted as each transaction happens Data updates are common place and operational Database; volatile data in the Operational Databases

No update, once the data is captured in the data ware house, do not run individual transactions to change the data there. Non volatile in data warehouse

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

39

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.13

MCA 202, Data Warehousing & Data Mining

Data Granularity
Operational system
Lowest level of detail lot of Data Daily details

Data warehouse
Data Granularity in a Data ware house refers to the level of details. Data summarized at different levels. Monthly/quarterly summary

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

40

Data Warehouse vs. Heterogeneous DBMS Traditional heterogeneous DB integration:


Build wrappers/mediators on top of heterogeneous databases Query driven approach

When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set Complex information filtering, compete for resources

Data warehouse: update-driven, high performance


Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

41

Data Warehouse vs. Operational DBMS

OLTP (on-line transaction processing)


Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.

OLAP (on-line analytical processing)


Major task of data warehouse system Data analysis and decision making

Distinct features (OLTP vs. OLAP):


User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

42

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.14

MCA 202, Data Warehousing & Data Mining

OLTP vs. OLAP


OLTP users function DB design data clerk, IT professional day to day operations application-oriented current, up-to-date detailed, flat relational isolated repetitive read/write index/hash on prim. key short, simple transaction tens thousands 100MB-GB transaction throughput OLAP knowledge worker decision support subject-oriented historical, summarized, multidimensional integrated, consolidated ad-hoc lots of scans complex query millions hundreds 100GB-TB query throughput, response

usage access unit of work # records accessed #users DB size metric

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

43

Why Separate Data Warehouse?


High performance for both systems
DBMS tuned for OLTP: access methods, indexing, concurrency control, recovery Warehousetuned for OLAP: complex multidimensional view, consolidation. OLAP queries,

Different functions and different data:


missing data: Decision support requires historical data which operational DBs do not typically maintain data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

44

Data Ware Houses and Data Marts Cont..


Data Ware House Enterprise-wide Union of all Data marts Data Received from Staging Area Structure for corporate view of Data Organized on E-R model Data Mart Departmental A Single Business Process. Facts and Dimensions Technology optimal for data access and analysis. Structure to Suit the departmental View of data

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

45

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.15

MCA 202, Data Warehousing & Data Mining

Data Warehousing and OLAP Technology for Data Mining

What is a data warehouse? A multi d - imensional data model Data warehouse building blocks

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

46

Overview of Components

Data Ware house Components

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

47

Data Warehouse Components


Information Delivery Component Source Data Component

Mgt. & Control Component Data Staging Component Data Storage Component & Meta data Component

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

48

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.16

MCA 202, Data Warehousing & Data Mining

Data Ware house Components cont..


1. Source Data Component: grouped into four broad categories Production Data: This category of data comes from various operational systems of the enterprise. Internal Data: In every organization, user keep their private spread sheets, documents, customer profiles and some times even departmental Databases.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

49

Data Ware house Components cont..


Archived Data:
In operational systems, periodically take the old data and store it in archived files. The Data in these archived files is referred to as Archived Data. In this Category, the data included the data from the external sources.
For Example: competitors. Market share data of

External Data:

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

50

Data Ware house Components cont..


2) Data Staging Component: Data extracted from various operational systems and external source Prepare data for storing in the data ware house. The Extracted data from several disparate sources needs to be
changed converted Make data ready to be stored in format suitable for querying and analysis.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

51

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.17

MCA 202, Data Warehousing & Data Mining

Cont..
The 3 major functions need to be performed for getting the data ready. Data Extraction / Extract the Data: For data ware house extract the data using appropriate techniques from large amount of data received from the operational system Data Transformation: involves many forms of combining pieces of data from the different sources.

Merging, sorting in large scale in the staging area


When data transformation functions ends (collection of integrated data is cleaned, standardized and summarized). The data is ready to be loaded data in data warehouse. Data Loading: In this phase initial movement of moves large volumes of data using up substantial amount of time. As data warehouse function
continuous extraction the changes to source data Transform, revision, feed incremental data revision.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

52

Data Movement in data warehouse

Yearly refresh

Quarterly refresh

Data Sources

Data Warehouse
Monthly refresh

Daily refresh

Base data load

Time consuming Initial load moves large volume of data Business condition determine refresh cycle
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

53

Cont.
3)Data Storage Component: The data Storage for the data ware house is a separate repository. The operational systems of our enterprise support the day-to-day operations. The Data repositories of the operational systems typically contain only the current data, while the data repository for a data ware house, we need to keep large volumes of historical data for analysis. So keep the data need to be kept in the data ware house in structures suitable for analysis, and not for quick retrieval of individual pieces of Information.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

54

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.18

MCA 202, Data Warehousing & Data Mining

Cont...
4) Informational Delivery Component: Who are the user who need information from data warehouse. To Provide information to the wide community of Data Warehouse users. Novoice user
No training Prefabricated reports and present queries

Casual user
Need information once in while Need prepackaged information Navigate through data warehouse, create customer report, adhoc queries

The information delivery component includes a variety of information delivery. Such as, we may include several information delivery mechanisms, we provide for online queries and reports.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

55

Information delivery Component

Data Warehouse
Information Delivery Component

Online Intranet Internet

Ad hoc reports

No voice Casual user

Complex queries MD Analysis MD Analysis

Data Marts

Statistical Analysis

Business Analyst

E-mail

Executive Info System (EIS) feed

Senior Manager High Level Managers

Data Mining

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

56

Data Ware house Components cont..


5) Meta Data Component: Meta Data in a Data ware house is similar to the Data dictionary or the Data Catalog in a Data Base Management System. In data dictionary
information about the logical data Structures, information about the files and addresses, information about the indexes.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

57

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.19

MCA 202, Data Warehousing & Data Mining

Cont..
6) Management and Control Component: This component of the data ware house architecture sits on top of all the other components, The management and control component co-ordinates the services and activities with in the data warehouse. Moderates the information delivery to the users. Works with the database mgt. systems and enables data to be properly stored in the repositories. Monitors the movement of the data into the staging area to the data warehouse storage. Management and control component interact with metadata component to perform the management and control functions Metadata : source of information for management module
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

58

Meta Data in the Data Warehouse


Meta Data component serve as a directory of contents of data warehouse. Meta data in a data warehouse fall in three major categories. 1) Operational Meta Data: Operation meta data gets its data from operational data sources. These sources contains different data structures for storing data from various operational system.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

59

Meta Data in the Data Warehouse cont..


2) Extraction and Transformation Meta Data: Extraction and transformation metadata contains data about the extraction of data from the source system like extraction frequency, extraction methods for data extraction. This also contains the information about all the data transformation that take place in the data staging area. 3) End-User Meta Data: The end-user meta data is the navigational map of the data ware house. It enables the end-users to find information from the data warehouse.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

60

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.20

MCA 202, Data Warehousing & Data Mining

Conclusion
The Data ware house is an informational environment that Provides an integrated and total view of the enterprise. Makes the enterprises current and historical information easily available for Decision Making. Makes Decision S upport transactions possible with out hindering Operational Systems. Renders the Organizations information Consistent. Presents a Flexible and interactive Source of Strategic information.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

61

Lets Discuss
1. Data Analyst on project building a data warehouse for an insurance company.
List all possible data sources from which data will be brought too data warehouse (State assumptions). Identify three operational applications that would feed into the data ware What would be the data load and refresh cycle

2. For an airlines company,


3. Identify potential users and information delivery methods for a data warehouse supporting large national grocery chain.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

62

Defining The Business Requirements Dimensional analysis Information packages Requirements gathering methods Requirements definition

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

63

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.21

MCA 202, Data Warehousing & Data Mining

Dimensional Analysis
A data warehouse is an information delivery system. It is not about technology, but about solving users problems and providing strategic information to the user.
Requirement defining phase What information users need, not how the information will be provide

Building a data ware house is different from building an operational system.


Users cannot fully describe what they want in a data warehouse but they provide with important insights into how they think about business. Analysis required Business dimensions Measurement unit

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

64

Manager think in business dimension (number)


Marketing VP How much did the new product generate Month by month, in southern division, by user demographic, by sales office, relative to previous version, plan

Marketing Manager Sales statistics By product, summarized by product categories, daily, weekly, monthly, by sale districts, by distribution channel

Financial Controller Show expenses Listing actual vs budget, by months, quarters, annual, by budget line item, by district, by division, , summarized for whole company

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

65

From Tables and Spreadsheets to Data Cubes


A data warehouse is based on a multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

66

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.22

MCA 202, Data Warehousing & Data Mining

Multidimensional Data

Juice Cola Milk Cream

10 47 30 12

Sales Volume as a function of time, city and product

Y NY

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

Cube: A Lattice of Cuboids

LA SF

3/1 3/2 3/3 3/4

Date

U1.

67

all time item location supplier

0-D(apex) cuboid

1-D cuboids

time,item

time,location

item,location

location,supplier

time,supplier time,item,location

2-D cuboids
item,supplier

time,location,supplier

3-D cuboids
item,location,supplier

time,item,supplier

4-D(base) cuboid
time, item, location, supplier

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

68

Dimensional nature of business data

Delhi
TV sets

Product

Jan
r ap
Slice of product sale info (units sold)

Time

can be extended to multiple dimension Multidimensional cubes : Hypercube


Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

Ge og

hy

69

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.23

MCA 202, Data Warehousing & Data Mining

Examples of business dimensions


Time Customer Flight Frequent flights Status Airport Fare class

Time
Claims

Agent Type Insured Party

Airlines Company

Status Policy
Promotion

Time

Insurance Business
Sales units Product Status Store

Supermarket chain
70

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

Information Packages-A New Concept


Information Packages: A methodology for determining requirement for a data warehouse based on business dimensions for analysis on business dimension. It incorporates basic measurements and business dimensions Information package enables to
Define the common subject areas. Design key business metrics. Decide how data must be presented Determine how users will aggregate or roll up. Decide the data quantify for user analysis or query. Decide how data will be accessed. Establish data granularity Estimate data ware house size Determine the frequency for data refreshing
U1.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

71

Information Subject : Sales Analysis Dimensions


Time Period Locations Products Age Groups

Year

Country

Class

Group 1

Hierarchies
Measured Facts : Forecast Sales, Budget Sales, Actual Sales An Information Packages
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

72

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.24

MCA 202, Data Warehousing & Data Mining

Cont..
Business dimensions basis of IP Hierarchical levels for further processing
Drilling down and rolling up for analysis

Categories :
Data elements within business dimensions e.g. sales on holiday

Key business metrics or facts


number

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

73

Business dimension for auto sales analysis


Hierarchies and categories for each dimension Product : Model name, Model year, package styling, product line, product category, exterior color, interior color, first model year Dealer : Dealer name, city, state, single brand flag, date first operation Customer demographics: Age, gender, income, marital status, house hold size, vehicle owned, home value, own or rent Payment method: Financial type, term in months, interest rate, agent Time: Date, month, quarter, year, day of week, day of month, season, holiday flag w
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

74

Cont..
Metrics for analyzing automobile
Actual sale price Option price Full price Dealer add-ons Dealer credits Dealer invoice Amount of down Amount financed

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

75

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.25

MCA 202, Data Warehousing & Data Mining

Information Subject : Automaker Sales


Dimensions
Time Product Payment Method Financial type Customer Demo Graphics Age Gender Dealer

Year Quarter Month

Model Name Model Year Package

Dealer Name City State Single Brand flag

Hierarchies

Date Week Month Season Holiday Flag

Measured Facts : Actual sale price, Option price, Full price, Dealer add-ons, etc

An Information Packages
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

76

Classification of users of data warehouse Senior executive ( including sponsors)


Have sense of direction, Involved in focused area

Key departmental manager


Report to executive in the area of focus

Business analysts
Prepare reports and analyses for executive and manager

Operational system DBA


Only gives info

Other nominated by above

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

77

What requirements to gather? Broad list: Data elements: fact classes, dimensions Recording of data in terms of time Data extracts from source systems Business rules: attributes, ranges, domains, operational records

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

78

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.26

MCA 202, Data Warehousing & Data Mining

Requirements Gathering Methods


Interviews one to one sessions Group Sessions
Not good initial state Useful for confirming requirements

JAD (Joint Application Development) sessions


Joint approach concerned group for a well defined purpose

Review the existing documents


Documentation from user department Documentation from IT

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

79

Interview process task before project launches


Select and train team member conducting interview Assign roles for team member Prepare questionnaire
Current information sources Subject areas Key performance matrices Information frequency History and current structure of business unit No. of employee and roles and responsibilities Location of user Primary purpose of business unit Company market Competitor in market

Pre interview research

List of user to be interviewed List expectations


Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

80

Initial document for requirement definition

Interview write ups User profile Background and objective Information requirement Analysis requirement Current tools used Success criteria Useful business metrics Relevant business dimensions

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

81

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.27

MCA 202, Data Warehousing & Data Mining

Expectations from interviews


Senior executive
Organization executive Criteria for measuring success Key business issues, current and future Problem identification Vision and direction of organization Anticipated usage of DW

Dep. Managers /Analyst


Departmental objective Success metrics Factor limiting success Key business issues Product and services Useful business dimensions for analysis Anticipated usage of DW

IT Dept. Professional
Key operational source system Current information deliver process Type routing analysis Known quality issue Current IT support for information requests Concerns about proposed DW
U1.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

82

JAD five phased approach


Project definition Complete high level interviews Conduct management interviews Prepare management definition guide Research Become familiar with the business are and systems Document user information requirements Document business process Gather preliminary information Prepare agenda for the session Preparation Create working documents from previous phase Train the scribes Prepare visual aids Conduct pre session meetings Set up a venue for session Prepare checklist for objective
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

83

Cont..
JAD sessions
Open with review of agenda and purpose Review assumptions Review data requirement Review business metrics and dimensions Discuss dimensions hierarchies and roll ups Resolve open issues Close sessions with the list of action items Convert the working document Map the gathered information List all data sources Identify all business dimensions and hierarchies Assemble and edit the document Conduct review sessions Get final approvals Establish procedure to change requirements

Final document

Success of project using JAD depend on JAD team


Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

84

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.28

MCA 202, Data Warehousing & Data Mining

JAD team
Executive sponsor
Person controlling the funding, providing direction, empowering team member

Facilitator
Person guiding the team through JAD process

Scribe
Person designated to record all decision

Full time participants


Involved in decision making for data warehouse

On call participants
Person affected by project but only in specific area

Observers
Person for specific session without participating in decison
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

85

Requirements Definition:
Scope And Content: Formal documentation is often neglected requirements definition Phase.
conduct interviews and GD . review the existing documentation

requirements definition document is the basis for the next phases in the system development life cycle.
But often skip the detailed documentation of the requirements definition.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

86

Data Sources
The requirement definition document should include the following information:
Available Data sources Data Structures with in the data sources Location of the Data Sources Data extraction procedures Availability of historical data.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

87

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.29

MCA 202, Data Warehousing & Data Mining

Cont.. Data Transformation


Data Transformation necessarily involve mapping of source data to the data in the data ware house.

Data Storage:
requirement definition document must sufficient details about storage requirement. include

Information Delivery:
Drill-Down Analysis. Roll-Up Analysis Slicing Ad hoc reports

Information Package Diagram


Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

88

Information Package Diagrams


The information packages diagrams crystallize the information requirements for the data warehouse. It contains the critical matrices measuring the performance of the business units, the business dimensions along which the metrics are analyzed, and the details how drill-down & roll-up analyses are done.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

89

Requirements Definition Document Outline


1. Introduction (Purpose and Scope of the Project) 2. General Requirements description (Source system review e.g. interview Summary). State what type information are required in data warehouse. 3. Specific Requirements ( data transformation and Storage requirements) 4. Information Package (form of IP dig) 5. Other Requirements ( data extract frequency, Includes Data Loading Methods, location for info delivery etc.) 6. User Expectations (How the users expect to use the data ware House) 7. User Participation (List of tasks in which users expected to participate through out the development life cycle) 8. General Implementation Plan: (give a high level plan for implementation).
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

90

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.30

MCA 202, Data Warehousing & Data Mining

Lets Discuss
1. VP of marketing for nation wide appliance manufacturer with three production plants. Describe three ways to analyze sales. What are business dimension for analysis. BigBook Inc is a large book distributor with domestic and international distributors to all leading bookseller. Initially build data ware house to analyze shipments that are ,made from the company many data warehouse. Determine, metrics, and business dimensions. Prepare an information package diagram. For a data warehouse on AuctionsPlus.com, an Internet auction upscale for works of art gather requirement for sales analysis. Find out key metrics, business dimensions, hierarchies and categories. Draw the information package diagram. Create a detailed outline formal requirements definition document for a data warehouse to analyze profitability of large departmental store chain

2.

3.

4.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

91

Business Requirements as the driving force

Business Requirements

Planning & Management

Maintenance

Design Architecture Infrastructure Data Acquisition Data Storage Information Delivery

Construction Architecture Infrastructure Data Acquisition Data Storage Information Delivery

Deployment

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

92

Data Design
In design phase data models are required for
Staging area
Transform, cleanse and integrate data from source system

Data warehouse repository

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

93

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.31

MCA 202, Data Warehousing & Data Mining

Requirements driving the data model

Information Package Diagram Dimensional Model

Data Marts (Conformed/Dependent)

Enterprise Data Model

Relational Model

Enterprise data warehouse

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

94

Composition of the components


Source data
Operational source systems Computing platforms, O/S, database files Departmental data such as files, documents & spreadsheets External data sources Data mapping between data sources and staging area data structure Data transformation Data cleansing Data integration Size of extracted and integrated data DBMS features Growth potential Centralized or distributed

Data staging

Data Storage

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

95

Cont
Information delivery
Types and number of users Types of queries and reports Classes of analysis Front end DSS applications Operational meta data ETL (data extraction/transformation/loading) metadata End user meta data Metadata storage Data loading External sources Alert systems End user information delivery

Metadata Operational

Management & control

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

96

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.32

MCA 202, Data Warehousing & Data Mining

Impact of requirement on architecture


Business Managing & Control Source Data
Metadata Information Delivery

Data Staging Data Storage

Requirements
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

97

Data Quality Bad data leads to based decisions


Data Pollution Sources System conversions & Migrations Heterogeneous system integration Inadequate database design of source systems Data aging Incomplete information from customers Input errors Internationalization/localization of systems Lack of data management policies/procedures Type of data quality problems Dummy values in source system fields Absence of data in source system fields Multipurpose fields Cryptic data Contradicting data Improper use of name Violation of rules Reused primary key Non-unique identifiers

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

98

Impact of requirement on metadata


Operational Source system data structure, External data formats

Business Requirements

Extraction/Transformation Data cleansing, conversion, integration End-user Querying, reporting, analysis, OLAP, special apps

Data Warehouse metadata

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

99

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.33

MCA 202, Data Warehousing & Data Mining

Data Storage specifications


DBMS should be compatible with back and front end Business elements that effect the choice of DBMS
Level of experience Type of queries Need for openness Data loads Metadata management Data repository location Data warehouse growth Data staging area Overall corporate data warehouse Data marts, dependent or conformed Multi dimensional database

Size estimation

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

100

Impact of business requirement on Information delivery

Requirement definition on Users, location, queries, reports, analysis

Business Requirements

Ad hoc reports

No voice Casual user

Online
Complex queries MD Analysis

Intranet
Information Delivery Component
MD Analysis

Internet E-mail

Statistical Analysis

Business Analyst

Executive Info System (EIS) feed

Senior Manager High Level Managers

Data Mining

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

101

Conclusion
Gathering requirement for data warehouse is not same as for an operational system. Requirement definition guides the whole process of system design and development. Data warehouse environment is an information delivery system where user themselves access the data repository and create their own output whereas in operational system user is provided with predefined outputs. It is essential to have right elements of information in the mist optimal format.

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

102

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.34

MCA 202, Data Warehousing & Data Mining

Review Questions
Objective Questions:
1) A data warehouse is which of the following? a) Can be updated by end users. b) Contains numerous naming conventions and formats. c) Organized around important subject areas. d) Contains only current data. 2)An operational system is which of the following? a) A system that is used to run the business in real time and is based on historical data. b) A system that is used to run the business in real time and is based on current data. c) A system that is used to support decision making and is based on current data. d) A system that is used to support decision making and is based on historical data.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

103

Review Questions cont..


3)The generic two-level data warehouse architecture includes which of the following? a) At least one data mart b) Data that can extracted from numerous internal and external sources c) Near real-time updates d) All of the above. 4)The active data warehouse architecture includes which of the following? a) At least one data mart b) Data that can extracted from numerous internal and external sources c) Near real-time updates d) All of the above.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

104

Review Questions cont..


5)Reconciled data is which of the following? a) Data stored in the various operational systems throughout the organization. b) Current data intended to be the single source for all decision support systems. c) Data stored in one operational system in the organization. d) Data that has been selected and formatted for end-user support applications. 6)Transient data is which of the following? a) Data in which changes to existing records cause the previous version of the records to be eliminated b) Data in which changes to existing records do not cause the previous version of the records to be eliminated c) Data that are never altered or deleted once they have been added d) Data that are never deleted once they have been added
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

105

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.35

MCA 202, Data Warehousing & Data Mining

Review Questions cont..


7)The extract process is which of the following? a) Capturing all of the data contained in various operational systems b) Capturing a subset of the data contained in various operational systems c) Capturing all of the data contained in various decision support systems d) Capturing a subset of the data contained in various decision support systems 8)Data scrubbing is which of the following? a) A process to reject data from the data warehouse and to create the necessary indexes b) A process to load the data in the data warehouse and to create the necessary indexes c) A process to upgrade the quality of data after it is moved into a data warehouse d) A process to upgrade the quality of data before it is moved into a data warehouse
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

106

Review Questions cont..


9)The load and index is which of the following? a) A process to reject data from the data warehouse and to create the necessary indexes b) A process to load the data in the data warehouse and to create the necessary indexes c) A process to upgrade the quality of data after it is moved into a data warehouse d) A process to upgrade the quality of data before it is moved into a data warehouse 10)Data transformation includes which of the following? a) A process to change data from a detailed level to a summary level b) A process to change data from a summary level to a detailed level c) Joining data from one source into various sources of data d) Separating data from one source into various sources of data

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

107

Review Questions cont..


Short answer type Questions Q1. Explain the need of metadata in a data warehouse? Q2. What do you mean by Strategic Information? Q3. Differentiate between Data Warehouse and Data Mart? Q4. What do you mean by a Web-enabled data warehouse? Q5. Define OLTP? Q6. What type of Processing take Place in a data warehouse? Q7. Define ETL routine? Q8. What data does an information package contain? Q9. In which situations can JAD methodology be successful for collecting requirements? Q10. List various data sources that feed the data warehouse?
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

108

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.36

MCA 202, Data Warehousing & Data Mining

Review Questions cont..


Long answer type Questions Q1. Explain Data warehouse Architecture in detail? Q2. Explain business Dimensions. Why and how can business dimensions be useful for defining requirements for the data warehouse? Q3. State any three factors that indicate the continued growth in data warehousing. Can you think of some examples? Q4. Discuss the top d - own and bottom up approach of creating a data warehouse?

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

109

Review Questions cont..


Q5. For a commercial bank, name five types of strategic objectives and explain each objective in detail. Q6. What do you mean by Information Packages and also explain the need for information packages. Q7. A data warehouse is an environment, not a product. Discuss. Q8. Explain various type of data ware house meta data in detail.
Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor. U1.

110

Review Questions cont..


Q9. For an airlines company, how can strategic information increases the number of frequent flyers? Discuss giving specific details. Q10. Examine the opportunities that can be provided by strategic information for a medical center. Can you explain five such opportunities

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

111

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.37

MCA 202, Data Warehousing & Data Mining

Suggested Reading/References
1. Paul Raj Poonia, Fundamentals of Data Warehousing, John Wiley & Sons, 2003. 2. Sam Anahony, Data Warehousing in the real world: A practical guide for building decision support systems, John Wiley, 2004 3. W. H. Inmon, Building the operational data store, 2nd Ed., John Wiley, 1999. 4. Kamber and Han, Data Mining Concepts and Techniques, Hartcourt India P. Ltd.,2001

Bharati Vidyapeeths Institute of Computer Applications and Management, New Delhi-63, by Deepali Kamthania, Associate Professor.

U1.

112

Bharati Vidyapeeths Institute of Computer Applicationsand Management, New Delhi-63, by Shivendra Goel

U1.38

También podría gustarte