Documentos de Académico
Documentos de Profesional
Documentos de Cultura
all
0-D(apex) cuboid
time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
June 11, 2012 Data Mining: Concepts and Techniques 14
Conceptual Modeling of
Data Warehouses
Modeling data warehouses: dimensions & measures
Star schema: A fact table in the middle connected to a
set of dimension tables
Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a
set of smaller dimension tables, forming a shape
similar to snowflake
Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
June 11, 2012 Data Mining: Concepts and Techniques 15
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country
<dimension_name_first_time> in cube
<cube_name_first_time>
all all
Specification of hierarchies
Schema hierarchy
day < {month <
quarter; week} < year
Set_grouping hierarchy
{1..10} < inexpensive
Office Day
Month
June 11, 2012 Data Mining: Concepts and Techniques 26
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR
Country
sum
Canada
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
Visualization
OLAP capabilities
Interactive manipulation
June 11, 2012 Data Mining: Concepts and Techniques 29
Typical OLAP Operations
ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
DIVISION
Location Each circle is
called a footprint Promotion Organization
June 11, 2012 Data Mining: Concepts and Techniques 31
Chapter 2: Data Warehousing and
OLAP Technology for Data Mining
Choose the dimensions that will apply to each fact table record
Choose the measure that will populate each fact table record
Monitor
& OLAP Server
other Metadata
sources Integrator
Analysis
Operational Extract Query
Transform Data Serve Reports
DBs
Load
Refresh
Warehouse Data mining
Data Marts
materialized
June 11, 2012 Data Mining: Concepts and Techniques 36
Data Warehouse Development:
A Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts
Enterprise
Data Data
Data
Mart Mart
Warehouse
techniques)
fast indexing to pre-computed summarized data
C c3 61
c2 45
62 63 64
46 47 48
c1 29 30 31 32
c0
B13 14 15 16 60
b3 44
B 28 56
b2 9
40
24 52
b1 5
36
20
b0 1 2 3 4
a0 a1 a2 a3
A
C c3 61
c2 45
62 63 64
46 47 48
c1 29 30 31 32
c0
B13 14 15 16 60
b3 44
B 28 56
b2 9
40
24 52
b1 5
36
20
b0 1 2 3 4
a0 a1 a2 a3
A
Layer2
MDDB
MDDB
Meta Data