Está en la página 1de 50

Business Intelligence and Analytics:

Systems for Decision Support


(10th Edition)

Chapter 3:
Data Warehousing
Learning Objectives
 Understand the basic definitions and
concepts of data warehouses
 Learn different types of data warehousing
architectures; their comparative
advantages and disadvantages
 Describe the processes used in developing
and managing data warehouses
 Explain data warehousing operations
 … (Continued…)
3-2 Copyright © 2014 Pearson Education, Inc.
Learning Objectives
 Explain the role of data warehouses in
decision support
 Explain data integration and the
extraction, transformation, and load (ETL)
processes
 Describe real-time (a.k.a. right-time
and/or active) data warehousing
 Understand data warehouse
administration and security issues
3-3 Copyright © 2014 Pearson Education, Inc.
Opening Vignette…
“Isle of Capri Casinos Is Winning with
Enterprise Data Warehouse”

 Company background
 Problem description
 Proposed solution
 Results
 Answer & discuss the case questions.
3-4 Copyright © 2014 Pearson Education, Inc.
Questions for the
Opening Vignette
1. Why is it important for Isle to have an EDW?
2. What were the business challenges or opportunities
that Isle was facing?
3. What was the process Isle followed to realize EDW?
Comment on the potential challenges Isle might have
had going through the process of EDW development.
4. What were the benefits of implementing an EDW at
Isle? Can you think of other potential benefits that were
not listed in the case?
5. Why do you think large enterprises like Isle in the
gaming industry can succeed without having a capable
data warehouse/business intelligence infrastructure?
3-5 Copyright © 2014 Pearson Education, Inc.
Main Data Warehousing Topics
 DW definition
 Characteristics of DW
 Data Marts
 ODS, EDW, Metadata
 DW Framework
 DW Architecture & ETL Process
 DW Development
 DW Issues

3-6 Copyright © 2014 Pearson Education, Inc.


What is a Data Warehouse?
 A physical repository where relational data
are specially organized to provide
enterprise-wide, cleansed data in a
standardized format
 “The data warehouse is a collection of
integrated, subject-oriented databases
designed to support DSS functions, where
each unit of data is non-volatile and
relevant to some moment in time”
3-7 Copyright © 2014 Pearson Education, Inc.
A Historical Perspective to
Data Warehousing
ü Mainframe computers ü Centralized data storage ü Big Data analytics
ü Simple data entry ü Data warehousing was born ü Social media analytics
ü Routine reporting ü Inmon, Building the Data Warehouse ü Text and Web Analytics
ü Primitive database structures ü Kimball, The Data Warehouse Toolkit ü Hadoop, MapReduce, NoSQL
ü Teradata incorporated ü EDW architecture design ü In-memory, in-database

1970s 1980s 1990s 2000s 2010s

ü Mini/personal computers (PCs) ü Exponentially growing data Web data


ü Business applications for PCs ü Consolidation of DW/BI industry
ü Distributer DBMS ü Data warehouse appliances emerged
ü Relational DBMS ü Business intelligence popularized
ü Teradata ships commercial DBs ü Data mining and predictive modeling
ü Business Data Warehouse coined ü Open source software
ü SaaS, PaaS, Cloud Computing

3-8 Copyright © 2014 Pearson Education, Inc.


Characteristics of DWs
 Subject oriented
 Integrated
 Time-variant (time series)
 Nonvolatile
 Summarized
 Not normalized
 Metadata
 Web based, relational/multi-dimensional
 Client/server, real-time/right-time/active...
3-9 Copyright © 2014 Pearson Education, Inc.
Data Mart
A departmental small-scale “DW” that
stores only limited/relevant data

 Dependent data mart


A subset that is created directly from a data
warehouse

 Independent data mart


A small data warehouse designed for a
strategic business unit or a department

3-10 Copyright © 2014 Pearson Education, Inc.


Other DW Components
 Operational data stores (ODS)
A type of database often used as an interim
area for a data warehouse
 Oper marts - an operational data mart.
 Enterprise data warehouse (EDW)
A data warehouse for the enterprise.
 Metadata: Data about data.
In a data warehouse, metadata describe the
contents of a data warehouse and the manner
of its acquisition and use
3-11 Copyright © 2014 Pearson Education, Inc.
Application Case 3.1
A Better Data Plan: Well-Established
TELCOs Leverage Data Warehousing and
Analytics to Stay on Top in a Competitive
Industry
Questions for Discussion
1. What are the main challenges for TELCOs?
2. How can data warehousing and data analytics
help TELCOs in overcoming their challenges?
3. Why do you think TELCOs are well suited to
take full advantage of data analytics?
3-12 Copyright © 2014 Pearson Education, Inc.
A Generic DW Framework
No data marts option
Data Applications
Sources (Visualization)
Access
Routine
ERP Business
ETL
Reporting
Process Data mart
(Marketing)
Select

/ Middleware
Legacy Metadata Data/text
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate

API
Data mart Dashboard,
(Finance) Web
Other Load
OLTP/wEB
Replication Data mart
(...) Custom built
External
applications
data

3-13 Copyright © 2014 Pearson Education, Inc.


Application Case 3.2
Data Warehousing Helps MultiCare
Save More Lives

Questions for Discussion


1. What do you think is the role of data

warehousing in healthcare systems?


2. How did MultiCare use data warehousing
to improve health outcomes?

3-14 Copyright © 2014 Pearson Education, Inc.


DW Architecture
 Three-tier architecture
1. Data acquisition software (back-end)
2. The data warehouse that contains the data &
software
3. Client (front-end) software that allows users to
access and analyze data from the warehouse
 Two-tier architecture
First two tiers in three-tier architecture is combined
into one
… sometimes there is only one tier?

3-15 Copyright © 2014 Pearson Education, Inc.


DW Architectures

Tier 1: Tier 2: Tier 3:


Client workstation Application server Database server

Tier 1: Tier 2:
Client workstation Application & database server

3-16 Copyright © 2014 Pearson Education, Inc.


Data Warehousing Architectures
 Issues to consider when deciding which
architecture to use:
 Which database management system (DBMS)
should be used?
 Will parallel processing and/or partitioning be
used?
 Will data migration tools be used to load the data
warehouse?
 What tools will be used to support data retrieval
and analysis?

3-17 Copyright © 2014 Pearson Education, Inc.


A Web-Based DW Architecture

Web pages
Application
Server

Client Web
(Web browser) Internet/ Server
Intranet/
Extranet
Data
warehouse

3-18 Copyright © 2014 Pearson Education, Inc.


Alternative DW Architectures
(a) Independent Data Marts Architecture

ETL
End user
Source Staging Independent data marts
access and
Systems Area (atomic/summarized data)
applications

(b) Data Mart Bus Architecture with Linked Dimensional Datamarts

ETL
Dimensionalized data marts End user
Source Staging
linked by conformed dimensions access and
Systems Area
(atomic/summarized data) applications

(c) Hub and Spoke Architecture (Corporate Information Factory)

ETL
End user
Source Staging Normalized relational
access and
Systems Area warehouse (atomic data)
applications

Dependent data marts


(summarized/some atomic data)
Alternative DW Architectures
(d) Centralized Data Warehouse Architecture

ETL
Normalized relational End user
Source Staging
warehouse (atomic/some access and
Systems Area
summarized data) applications

(e) Federated Architecture

Data mapping / metadata


End user
Logical/physical integration of access and
Existing data warehouses
common data elements applications
Data marts and legacy systems

 Each architecture has advantages and


disadvantages!
 Which architecture is the best?
Ten factors that potentially affect the
architecture selection decision

1. Information 6. Strategic view of the data


interdependence between warehouse prior to
organizational units implementation
2. Upper management’s 7. Compatibility with existing
information needs systems
3. Urgency of need for a data 8. Perceived ability of the in-
warehouse house IT staff
4. Nature of end-user tasks 9. Technical issues
5. Constraints on resources 10. Social/political factors

3-21 Copyright © 2014 Pearson Education, Inc.


Teradata Corp. DW Architecture

3-22 Copyright © 2014 Pearson Education, Inc.


Data Integration and the Extraction,
Transformation, and Load Process
 ETL = Extract Transform Load
 Data integration
Integration that comprises three major processes: data
access, data federation, and change capture.
 Enterprise application integration (EAI)
A technology that provides a vehicle for pushing data
from source systems into a data warehouse
 Enterprise information integration (EII)
An evolving tool space that promises real-time data
integration from a variety of sources, such as relational
or multidimensional databases, Web services, etc.

3-23 Copyright © 2014 Pearson Education, Inc.


Data Integration and the Extraction,
Transformation, and Load Process

Packaged Transient
application data source

Data
warehouse

Legacy
Extract Transform Cleanse Load
system

Data mart
Other internal
applications

3-24 Copyright © 2014 Pearson Education, Inc.


ETL (Extract, Transform, Load)
 Issues affecting the purchase of an ETL tool
 Data transformation tools are expensive
 Data transformation tools may have a long learning
curve
 Important criteria in selecting an ETL tool
 Ability to read from and write to an unlimited number
of data sources/architectures
 Automatic capturing and delivery of metadata
 A history of conforming to open standards
 An easy-to-use interface for the developer and the
functional user
3-25 Copyright © 2014 Pearson Education, Inc.
Data Warehouse Development
Data warehouse development approaches
 Inmon Model: EDW approach (top-down)
 Kimball Model: Data mart approach
(bottom-up)
 Which model is best?
 Table 3.3 provides a comparative analysis
between EDW and Data Mart approach
 One alternative is the hosted warehouse

3-26 Copyright © 2014 Pearson Education, Inc.


Application Case 3.5
Starwood Hotels & Resorts Manages
Hotel Profitability with Data
Warehousing
Questions for Discussion
1. How big and complex are the business
operations of Starwood Hotels & Resorts?
2. How did Starwood Hotels & Resorts use data
warehousing for better profitability?
3. What were the challenges, the proposed
solution, and the obtained results?
3-27 Copyright © 2014 Pearson Education, Inc.
Additional DW Considerations
Hosted Data Warehouses
 Benefits:
 Requires minimal investment in infrastructure
 Frees up capacity on in-house systems
 Frees up cash flow
 Makes powerful solutions affordable
 Enables solutions that provide for growth
 Offers better quality equipment and software
 Provides faster connections
 … more in the book
3-28 Copyright © 2014 Pearson Education, Inc.
Representation of Data in DW
 Dimensional Modeling
 A retrieval-based system that supports high-volume
query access
 Star schema
 The most commonly used and the simplest style of
dimensional modeling
 Contain a fact table surrounded by and connected to
several dimension tables
 Snowflakes schema
 An extension of star schema where the diagram
resembles a snowflake in shape
3-29 Copyright © 2014 Pearson Education, Inc.
Multidimensionality
The ability to organize, present, and analyze data
by several dimensions, such as sales by region, by
product, by salesperson, and by time (four
dimensions)
 Multidimensional presentation

 Dimensions: products, salespeople, market segments,


business units, geographical locations, distribution
channels, country, or industry
 Measures: money, sales volume, head count,
inventory profit, actual versus forecast
 Time: daily, weekly, monthly, quarterly, or yearly
3-30 Copyright © 2014 Pearson Education, Inc.
Star versus Snowflake Schema
Star Schema Snowflake Schema
Dimension Dimension Dimension Dimension
TIME PRODUCT MONTH BRAND
Quarter Brand M_Name Brand
... ... ... Dimension Dimension ...
DATE PRODUCT
Date LineItem
Fact Table
SALES Dimension ... ... Dimension
QUARTER CATEGORY
UnitsSold
Q_Name Category
... Fact Table
... SALES ...
UnitsSold
Dimension Dimension
PEOPLE GEOGRAPHY ...
Division Country
... ... Dimension Dimension
PEOPLE STORE
Division LocID
... ... Dimension
LOCATION
State
...

3-31 Copyright © 2014 Pearson Education, Inc.


Analysis of Data in DW
 OLTP vs. OLAP…
 OLTP (online transaction processing)
 Capturing and storing data from ERP, CRM, POS, …
 The main focus is on efficiency of routine tasks

 OLAP (Online analytical processing)


 Converting data into information for decision support
 Data cubes, drill-down / rollup, slice & dice, …
 Requesting ad hoc reports
 Conducting statistical and other analyses
 Developing multimedia-based applications
 …more in the book
3-32 Copyright © 2014 Pearson Education, Inc.
OLAP vs. OLTP

3-33 Copyright © 2014 Pearson Education, Inc.


OLAP Operations
 Slice - a subset of a multidimensional array
 Dice - a slice on more than two dimensions
 Drill Down/Up - navigating among levels of data
ranging from the most summarized (up) to the
most detailed (down)
 Roll Up - computing all of the data relationships
for one or more dimensions
 Pivot - used to change the dimensional
orientation of a report or an ad hoc query-page
display
3-34 Copyright © 2014 Pearson Education, Inc.
A 3-dimensional
OLAP cube with Sales volumes of

OLAP slicing
operations
a specific Product
on variable Time
and Region

Slicing
Operations on a Ti
m
e

Simple Tree- Product

Dimensional

Geography
Cells are filled
Sales volumes of
Data Cube
with numbers
representing a specific Region
sales volumes on variable Time
and Products

Sales volumes of
a specific Time on
variable Region
and Products

3-35 Copyright © 2014 Pearson Education, Inc.


Variations of OLAP
 Multidimensional OLAP (MOLAP)
OLAP implemented via a specialized
multidimensional database (or data store) that
summarizes transactions into multidimensional
views ahead of time
 Relational OLAP (ROLAP)
The implementation of an OLAP database on
top of an existing relational database
 Database OLAP and Web OLAP (DOLAP and
WOLAP); Desktop OLAP,…
3-36 Copyright © 2014 Pearson Education, Inc.
Technology Insights 3.2
Hands-On DW with MicroStrategy
 A wealth of teaching and learning
resources can be found at TUN portal

www.teradatauniversitynetwork.com

 The available resource includes scripted


demonstrations, assignments, white
papers, etc…

3-37 Copyright © 2014 Pearson Education, Inc.


DW Implementation Issues
 Identification of data sources and governance
 Data quality planning, data model design
 ETL tool selection
 Establishment of service-level agreements
 Data transport, data conversion
 Reconciliation process
 End-user support
 Political issues
 … more in the book
3-38 Copyright © 2014 Pearson Education, Inc.
Successful DW Implementation
Things to Avoid
 Starting with the wrong sponsorship chain
 Setting expectations that you cannot meet
 Engaging in politically naive behavior
 Loading the data warehouse with information
just because it is available
 Believing that data warehousing database design
is the same as transactional database design
 Choosing a data warehouse manager who is
technology oriented rather than user oriented
 … more in the book
3-39 Copyright © 2014 Pearson Education, Inc.
Failure Factors in DW Projects
 Lack of executive sponsorship
 Unclear business objectives
 Cultural issues being ignored
 Change management
 Unrealistic expectations
 Inappropriate architecture
 Low data quality / missing information
 Loading data just because it is available
3-40 Copyright © 2014 Pearson Education, Inc.
Massive DW and Scalability
 Scalability
 The main issues pertaining to scalability:
 The amount of data in the warehouse
 How quickly the warehouse is expected to grow
 The number of concurrent users
 The complexity of user queries
 Good scalability means that queries and other
data-access functions will grow linearly with
the size of the warehouse

3-41 Copyright © 2014 Pearson Education, Inc.


Real-Time/Active DW/BI
 Enabling real-time data updates for real-
time analysis and real-time decision
making is growing rapidly
 Push vs. Pull (of data)
 Concerns about real-time BI
 Not all data should be updated continuously
 Mismatch of reports generated minutes apart
 May be cost prohibitive
 May also be infeasible

3-42 Copyright © 2014 Pearson Education, Inc.


Enterprise Decision Evolution
and Data Warehousing

3-43 Copyright © 2014 Pearson Education, Inc.


Real-Time/Active DW at Teradata

3-44 Copyright © 2014 Pearson Education, Inc.


Traditional versus Active DW

3-45 Copyright © 2014 Pearson Education, Inc.


DW Administration and Security
 Data warehouse administrator (DWA)
 DWA should…
 have the knowledge of high-performance software,
hardware and networking technologies
 possess solid business knowledge and insight
 be familiar with the decision-making processes so as to
suitably design/maintain the data warehouse structure
 possess excellent communications skills
 Security and privacy is a pressing issue in DW
 Safeguarding the most valuable assets
 Government regulations (HIPAA, etc.)
 Must be explicitly planned and executed
3-46 Copyright © 2014 Pearson Education, Inc.
The Future of DW
 Sourcing…
 Web, social media, and Big Data
 Open source software
 SaaS (software as a service)
 Cloud computing
 Infrastructure…
 Columnar
 Real-time DW
 Data warehouse appliances
 Data management practices/technologies
 In-database & In-memory processing New DBMS
 Advanced analytics
 …
3-47 Copyright © 2014 Pearson Education, Inc.
Free of Charge DW Portal
for Teaching & Learning
 www.TeradataStudentNetwork.com
 Password to signup: <check with your instructor>

3-48 Copyright © 2014 Pearson Education, Inc.


End of the Chapter

 Questions, comments

3-49 Copyright © 2014 Pearson Education, Inc.


All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written permission of the publisher. Printed in the
United States of America.

3-50 Copyright © 2014 Pearson Education, Inc.

También podría gustarte