Está en la página 1de 42

Analysis Services ROLAP for SQL Server Data

Warehouses
SQL Server Technical Case Study

Writers: Denny Lee, Kay Unkroth

Investigators: Denny Lee, Thomas Kejser

Contributors: James Podgorski, Richard Tkachuk, Mosha Pasumansky, Akshai Mirchandani,


Edward Melomed, Jonathan Ehren, Mark Tameling, Mukesh Kumar

Technical Reviewers: Akshai Mirchandani, Kevin Cox, Nicolae Mogoreanu

Published: July 2010

Applies to: Microsoft SQL Server 2008 and SQL Server 2008 R2

Summary: This technical case study describes how the SQL Server Customer Advisory Team
(SQLCAT), in collaboration with SQL Server developers, tested and optimized a large Online
Analytical Processing (OLAP) solution based on SQL Server 2008 Analysis Services by using
the Relational OLAP (ROLAP) storage mode. The study examines ROLAP system requirements
and usage scenarios, highlights advantages and disadvantages of ROLAP in comparison with
Multidimensional OLAP (MOLAP), and evaluates various ROLAP-related data warehouse (DW)
optimization techniques regarding their effectiveness and limitations.

This case study is for data warehouse architects, database administrators, and storage
engineers, and assumes the audience is already familiar with the concepts of large-scale data
warehouse designs for servers, storage subsystems, and databases. A high-level understanding
of Analysis Services optimization techniques for cube processing and query performance is also
helpful. Detailed information is available in the SQL Server 2008 Analysis Services Performance
Guide at http://go.microsoft.com/fwlink/?LinkId=165486.

Note: For security reasons, the actual names of servers, cubes, tables, views, columns, and
other resources have been replaced with fictitious names. The sample names mentioned in this
paper do not represent real resource names and are for illustration purposes only.
Copyright
The information contained in this document represents the current view of Microsoft Corporation
on the issues discussed as of the date of publication. Because Microsoft must respond to
changing market conditions, it should not be interpreted to be a commitment on the part of
Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the
date of publication.

This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES,
EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the
rights under copyright, no part of this document may be reproduced, stored in, or introduced into
a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written
permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual
property rights covering subject matter in this document. Except as expressly provided in any
written license agreement from Microsoft, the furnishing of this document does not give you any
license to these patents, trademarks, copyrights, or other intellectual property.

Unless otherwise noted, the example companies, organizations, products, domain names, e-
mail addresses, logos, people, places, and events depicted herein are fictitious, and no
association with any real company, organization, product, domain name, e-mail address, logo,
person, place, or event is intended or should be inferred.

2010 Microsoft Corporation. All rights reserved.

Microsoft and SQL Server are trademarks of the Microsoft group of companies.

All other trademarks are property of their respective owners.

2
Contents
Introduction.................................................................................................................................................4
Scoping the Project and Building the TEST Lab............................................................................................5
Performing Initial Data Warehouse Optimization........................................................................................7
Deploying the Lab Cube and Defining the Test Cases..................................................................................9
Lab Cube Deployment.............................................................................................................................9
Cube Test Scenarios...............................................................................................................................10
Selecting Test Tools and Methods..............................................................................................................11
Establishing a Performance Baseline.........................................................................................................12
Implementing Data Warehouse Aggregations...........................................................................................13
Cube-Based ROLAP Aggregations..........................................................................................................14
Transparent ROLAP Aggregations..........................................................................................................15
Choosing an Aggregation Strategy.........................................................................................................17
Creating Transparent Aggregations........................................................................................................20
Table Binding versus Query Binding.......................................................................................................22
Reorganizing the Lab Data Warehouse..................................................................................................23
Replacing Table Partitions with Individual Base Tables......................................................................24
Loading the Facts Data.......................................................................................................................25
Realigning Cube Partitions.................................................................................................................27
Reviewing Measure Groups and Measures............................................................................................29
Zooming in on Measures...................................................................................................................29
Row Counting and Query Binding......................................................................................................31
Benchmarking the ROLAP Cube.................................................................................................................32
TopCount Customer Queries..................................................................................................................32
Additional SQLCAT MDX Queries...........................................................................................................34
Understanding Query Concurrency and Processor Utilization...................................................................34
Usage-Based Optimization of Long-Running Queries................................................................................36
Dealing with Skewed Distribution of Data.................................................................................................37
Lessons Learned and Best Practices...........................................................................................................38
Conclusion.................................................................................................................................................40

3
Introduction
Data warehouse architects and storage engineers face exciting and challenging times as
emerging hardware and software make it possible to build large-scale data warehouses with
hundreds of terabytes of data at affordable prices. AMD Opteron and Intel Xeon Nehalem-EX
servers with up to 256 logical processors and hundreds of gigabytes of memory offer enormous
potential to achieve high performance and scalability. Such features let organizations keep
increasing amounts of analytical data online, enabling users to readily analyze massive
datasets, drill down into intricate details, and uncover actionable insights with a microscopic
level of granularity. The challenge is on to stay on top of this data mountain that keeps
accumulating at a rapid pace in the enterprise.

SQL Server Analysis Services is a crucial technology for most Microsoft BI solutions, yet
building Analysis Services cubes that keep up with growing data sizes is difficult. Gigantic
MOLAP cubes with hundreds of terabytes of data are out of the question. Even the increased
cube processing performance available with the latest versions of Analysis Services cant help
manage this enormous volume. Full cube processing would take weeks, and there are query-
related performance issues to consider as well. A MOLAP cube of 5 terabytes is stretching the
limit. A more reasonable maximum ranges between 1.5 and 3 terabytes. The adCenter cubes,
documented in the white paper Accelerating Microsoft adCenter with Microsoft SQL Server
2008 Analysis Services (http://go.microsoft.com/fwlink/?LinkId=183798), can serve as a
reference.

With MOLAP (at this time) impractical for data volumes larger than 5 terabytes, ROLAP
becomes an interesting option. ROLAP lets data and aggregates remain in the relational data
warehouse. This is attractive because the SQL Server 2008 relational engine can be amazingly
fast on top-end server models with storage subsystems designed for fast serial input/output
(I/O), which means that DW architects can take full advantage of the latest hardware and
software trends. Microsoft SQL Server Fast Track Data Warehouse
(http://www.microsoft.com/sqlserver/2008/en/us/fasttrack.aspx) even supports the deployment
of such systems with reference architectures for HP, Dell, Bull, EMC, and IBM. In contrast to the
relational engine, Analysis Services is currently limited to 64 logical processors. Plus, a ROLAP
cube can, in some cases, fan out SQL queries across multiple database servers and combine
the results for analysis. ROLAP can also provide low-latency data delivery for real-time
solutions. In short, there are many good reasons to evaluate ROLAP.

Of course, there are also complicating factors, which are good reasons to evaluate this storage
mode just the same. In comparison with MOLAP, ROLAP handles attribute relationships less
efficiently, requires more physical scans to get the data, and does not benefit from Analysis
Services data compression. Hardware requirements are therefore substantially higher. ROLAP
is also more susceptible to data quality issues, which MOLAP can automatically detect during
cube processing. And there is a significant disadvantage relating to aggregation design and
management. DW architects have to build ROLAP aggregates directly into the relational data
warehouse. The relational engine of SQL Server includes all necessary capabilities to support

4
aggregations seamlessly, yet optimizing a ROLAP cube for query performance and scalability
remains complex. Repeated testing, tweaking, and aggregation building are essential.

For all of these reasons, SQLCAT decided to investigate typical ROLAP design and optimization
tasks in a realistic lab setting representative of an enterprise environment. A large Microsoft
customer agreed to help by providing SQLCAT with a copy of a 1.35 terabyte production data
warehouse. During the project, SQLCAT converted an existing Analysis Services cube from
MOLAP to ROLAP, compared the ROLAP performance with the MOLAP baseline, analyzed
performance issues, optimized the ROLAP design to increase performance, and engaged SQL
Server developers to determine best solutions and workarounds for difficult issues. The result of
this work is the collection of ROLAP optimization techniques, lessons learned, and best
practices summarized in this paper.

Scoping the Project and Building the TEST Lab


One of the first decisions the SQLCAT team had to make in the ROLAP investigation concerned
the overall scope of the project. Important questions revolved around the preferred sizes and
complexities of the cubes as well as the desired scalability scenarios.

On the one hand, SQLCAT wanted to test with the largest possible cubes, multiple servers, and
high query concurrency, but it also wanted to provide actionable information and guidelines
without getting lost in implementation complexity or troubleshooting minutiae. For this reason,
SQLCAT decided to rule out scalability testing and constrained the project scope by starting with
a single cube of low complexity. This approach enabled SQLCAT to tackle typical ROLAP issues
efficiently and to quickly develop effective optimization techniques. The ultimate goal was to
benchmark the original query performance of a given MOLAP cube and then try to achieve
similar or better results with ROLAP.

Figure 1 illustrates the lab environment that SQLCAT deployed according to the overall project
scope. This lab was based on a single quad-socket server with four dual-core processors and
64 GB of memory. With a total of eight processor cores and 64 GB of memory, the lab system
was able to sustain the ROLAP processing demand, which is generally higher in comparison to
MOLAP, as discussed in the section Understanding Query Concurrency and Processor
Utilization later in this case study.

5
Figure 1: Server Configuration for ROLAP Performance Tests and Benchmarking

Particularly noteworthy is the lab servers storage design. Because I/O performance greatly
influences query performance, SQLCAT optimized the storage subsystem by using three
separate SANs. In particular, SQLCAT connected the logical unit number (LUN) drives of each
SAN to different ports so that two SAN ports handled the workload for every pair of LUNs (that
is, O/P, S/T, and U/V). This configuration doubled the maximum I/O throughput and ensured that
the storage subsystem did not create a bottleneck during performance testing. With 64 KB block
sizes (except for the OLAP LUNs, which were set to 32 KB block sizes) and every port
delivering 600-900 MB per second, SQLCAT was able to achieve in this configuration an I/O
throughput of between 1.2 and 1.9 GB per second with minimal latencies.

SQLCAT implemented the following storage design for the lab server:

Fact tables for the ROLAP cube Stored in a file group that was evenly split across the
LUNs S, T, U, and V, formatted with a 64 KB cluster size according to SQL Server best
practices.

TempDB Placed on its own two separate LUNs, O and P, so that TempDB usage did
not generate write requests to the data LUNs during querying.

SQL Server log folder Placed on LUN H to ensure that the sequential writing of the
transaction log would occur on a separate, dedicated drive.

Windows page file Placed on LUN M to isolate most of the paging activity on a
separate, dedicated drive. Only a small page file of 800 MB remained on the C drive.
Using multiple page files placed on separate physical disks is a best practice to increase
6
system performance, and by leaving a small page file on the system drive, SQLCAT
ensured that critical operating system processes, such as the Local Security Authority
Service (Lsass.exe), are able to access a page file without requiring a SAN environment.

OLAP folder Placed on LUN N, which was optimized for random I/O requests
according to the typical I/O pattern of MOLAP measures and was formatted with a 32 KB
cluster size. SQLCAT performance tests show that a 32 KB cluster size can benefit
Analysis Services performance more than a 64 KB cluster size.

OLAP log folder Placed on LUN H, which also hosted the SQL log folder. Storing the
OLTP and OLAP logs on the same LUN did not impact test results because SQLCAT
performed the MOLAP and ROLAP tests in separate cycles.

OLAP temp folder Placed on LUN O, which also hosted the TempDB database. Again,
it was acceptable to share a LUN between OLAP temp folder and TempDB because
SQLCAT performed MOLAP and ROLAP tests separately and there was very little disk
activity.

Note: SQLCAT hosted all resources on a single server to run the MOLAP and ROLAP
performance tests with exactly the same system configuration. However, the single-server
design depicted in Figure 1 is not a recommended configuration for production environments.
For large-scale production designs, refer to the SQL Server Best Practices article Scale-Out
Querying with Analysis Services, available on Microsoft TechNet at
http://go.microsoft.com/fwlink/?LinkId=193053, as well as the white paper Accelerating
Microsoft adCenter with Microsoft SQL Server 2008 Analysis Services at
http://go.microsoft.com/fwlink/?LinkId=183798.

Performing Initial Data Warehouse Optimization


With the lab server design and deployment completed, SQLCAT was ready to build and load the
relational data warehouse, which marked the next important optimization milestone. Figure 2
illustrates the steps performed at this stage.

7
Figure 2: Cube Design in Relationship to the Source Facts Table

Most importantly, SQLCAT streamlined the source facts table in the lab data warehouse to
reduce the size of the data rows. Smaller data rows lead to smaller indexes and lower I/O
overhead to retrieve the data, which ultimately benefits query performance. After all, ROLAP
performance depends on the capabilities of the relational data warehouse to return results
quickly.

SQLCAT carried out the following steps to optimize the source facts table:

Removal of unnecessary columns The original source facts table contained more
columns than the lab cube needed. The columns in red in Figure 2 were not necessary.
By removing these columns, SQLCAT was able to reduce the size of the facts table by
more than 50 percent.
Conversion of data types to save space By converting the numeric columns in the
original facts table to the bigint data type, SQLCAT was able to save 11 bytes per value.
This optimization also potentially improved the performance of Multidimensional
Expressions (MDX). For example, the DistinctCount function is optimized for integer
values.

8
Table 1 compares the original and optimized lab data warehouse versions.

Table 1: Original and Optimized Lab Data Warehouse Sizes


Dataset Original Optimized
MOLAP Cube
30 days worth of data 230 GB 230 GB
SQL Server DW
Dimension Data 0.1 GB 0.1 GB
Facts Data 1.35 TB 521 GB
Indexed Views* 0 3.2 GB
* After ROLAP Performance Optimization

Note: As a best practice to achieve high ROLAP performance, SQLCAT recommends avoiding
complex data models and simplifying the relational data warehouse as much as possible, such
as by removing unnecessary columns and, if possible, by replacing wide data types with
narrower alternatives to increase storage efficiency. SQLCAT also recommends using a
straightforward star schema, avoiding snowflake designs and many-to-many attribute
relationships in the dimensions.

Deploying the Lab Cube and Defining the Test Cases


Having optimized the storage subsystem for fast serial I/O and reduced the size of the data
warehouse by more than 50 percent, SQLCAT felt confident that the lab environment would
reach satisfactory performance levels. It was time to deploy the original MOLAP cube and take it
for a test run.

Lab Cube Deployment


To deploy the lab cube, SQLCAT reused the customers Business Intelligence Development
Studio (BIDS) solution. In this context, it is important to note that SQLCAT did not modify any
aspect of the MOLAP cube design and even refrained from altering aggregations to keep the lab
cube closely aligned with the customers original. It was not necessary to apply design changes
or cube optimizations because the cube already featured a clean star schema without any
parent/child or many-to-many relationships in the dimension tables, as illustrated in Figure 2.
This straightforward cube was an ideal candidate for a later conversion to ROLAP.

Note: Parent-child and many-to-many attribute relationships can have a substantial


performance impact in both MOLAP and ROLAP modes. And unlike MOLAP, ROLAP cant
determine the individual cube partitions that can satisfy the query if the query involves complex
dimensions, which implies that Analysis Services must query all cube partitions. The section
Understanding Query Concurrency and Processor Utilization later in this paper discusses the
resulting performance impact and design implications.

The only extra optimization that SQLCAT applied to the MOLAP cube concerned the OLAP
drive N, which SQLCAT formatted with a 32 KB cluster size for best performance, as mentioned
earlier.
9
Cube Test Scenarios
For the test scenarios, SQLCAT planned to use a set of original customer queries. These
queries promised realistic test runs, but a quick inspection revealed that the queries all used
similar MDX expressions. Essentially, all of the queries the customer provided relied on the
TopCount() function and analyzed three days or seven days of data.

The TopCount() function can help to highlight storage mode differences because it requires a
full scan of the data across all relevant cube partitions, which also implies a full scan of the
underlying facts tables in ROLAP mode. Still, SQLCAT deemed it necessary to create additional
sets of three-day and seven-day queries, as well as advanced queries that had more variety
and did not rely on the TopCount() function. This expanded portfolio of test cases enabled
SQLCAT to simulate standard queries as well as complex ad-hoc queries that individual
analysts might generate in a production environment.

Table 2 provides an overview of the test scenarios and a breakdown of the corresponding MDX
queries that SQLCAT decided to use for the ROLAP investigation.

Table 2: Query Breakdown for SQLCAT Test Cases


Test Case Query IDs Comments
MDX subqueries restricting the Three-day queries: All queries use similar constructs
main querys multidimensional 3\01, 3\02, 3\03, with a cross product of multiple
set based on various date/time 3\04, 3\05, 3\06, sets in the MDX subquery.
ranges, then using the 3\07, and 3\08
TopCount() function to sort and Seven-day
then return the top-ten queries: 7\01, 7\02,
members for a variety of 7\03, 7\04, 7\05,
dimensions and measure 7\06, 7\07, 7\08,
groups. 7\09, 7\10, and
7\11.
SELECT statements with Three-day queries: Queries use different members or
members from the date 3\09, 3\10, 3\11, cross joins of multiple member sets
dimension on columns and all and 3\12. on rows.
members from another
dimension on rows.
SELECT statements with One-day queries: Queries include all measure groups
results filtered on various 1\01 and shaped slices or highly
criteria in WHERE clause. Two-day queries: selective criteria in the WHERE
2\01, 2\02 clause. Query O/01 slices by
Other queries: attributes related to cube
O\01 partitioning.
Complex SELECT statements Additional Queries calculate standard
with calculated members and queries: A\01, deviations, average values, and
WHERE clauses. A\02, A\03, and year-to-date (YTD) values for all or
A\04. selected date ranges, such as for
all, three, or seven selected
calendar days.

10
Selecting Test Tools and Methods
The standard tools to measure and benchmark Analysis Services performance are SQL Server
Profiler and Performance Monitor. Another useful tool, and a SQLCAT favorite, is the Analysis
Services command-line tool (ascmd.exe). ASCMD can run XML for Analysis (XMLA) scripts,
MDX queries, and Data Mining Extensions (DMX) statements while tracking results and trace
information in a log file.

Note: The ASCMD source code is available as part of the SQL Server 2008 Samples at
http://msftasprodsamples.codeplex.com. With SQL Server 2008 Samples installed, the path to
the source code is %ProgramFiles%\Microsoft SQL Server\100\Samples\Analysis
Services\Administrator\ASCMD. The source code can be compiled using Microsoft Visual Studio
2008, or Microsoft Visual C# 2008 Express Edition available as a free download at
http://go.microsoft.com/fwlink/?LinkId=184935.

Figure 3 shows how SQLCAT automated the test runs by using a batch file to start the ASCMD
command-line tool for each individual MDX query in a FORDO loop.

Figure 3: MOLAP and ROLAP Performance Test Method

For the MOLAP and ROLAP performance tests, SQLCAT used the following tools and methods:

ASCMD command-line tool SQLCAT fully automated the test runs for all MDX queries
by using a batch file (see Figure 3). In each loop, SQLCAT first cleared the Analysis
Services data cache by running the ClearCache.xmla script as documented in the
11
ASCMD readme file, then executed the currently selected MDX query, recording the
start time, end time, and query duration in milliseconds.
SQL Server Profiler During the early stages of the ROLAP tests, SQLCAT double-
checked the accuracy of the ASCMD results with Profiler traces. A Profiler trace can
provide detailed information about where Analysis Services spends its time processing
an MDX query, starting with the command parser and ending with the formula engine
finishing query execution. SQLCAT also used Profiler traces to examine the SQL queries
that the lab ROLAP cube generated.
Performance Monitor SQLCAT used Performance Monitor to keep an eye on CPU
load, memory utilization, and I/O throughput to ensure that no unexpected factor or
bottleneck distorted the query performance measurements.

Note: All performance results mentioned in this case study are based on a cold Analysis
Services cache. By clearing the data cache prior to running each MDX query, SQLCAT ensured
accurate tests. A warm cache would boost query performance but would also distort test results,
as Analysis Services could then answer queries directly from memory without going to disk or
querying the relational data warehouse.

Establishing a Performance Baseline


Figure 4 shows the performance baseline for the original MOLAP cube, which SQLCAT
established directly on the lab server using the tools and methods described in the previous
section. This cold-cache baseline served as the yardstick to evaluate ROLAP performance
during the subsequent investigation.

Note: The figures in this paper use a logarithmic scale to help emphasize the difference
between MOLAP and ROLAP on a per-query basis. A linear scale would emphasize the
difference between short- and long-running queries in each storage mode, but would make it
very difficult to compare MOLAP and ROLAP queries of less than 1 second, as well as those
that take more than 20 minutes.

12
Figure 4: MOLAP Performance Baseline on a Logarithmic Scale

As illustrated, the majority of the MDX queries finished in less than one minute, the additional
SQLCAT queries O\01, A\02, A\03, and A\04 required more than five minutes, and query A\03
took the longest with 20 minutes and 11 seconds. A\03 required this time to calculate the
average of the cross product of two large datasets over a particular measure for 30 days of data
(25 billion rows of facts).

Implementing Data Warehouse Aggregations


Although SQLCAT also ran preliminary tests against the ROLAP cube, it did not benchmark the
initial performance because the relational data warehouse did not include aggregations yet.
Putting a MOLAP cube that includes aggregations side-by-side with a ROLAP cube that does
not include aggregations is an apples-to-oranges comparison. For a useful investigation, it is
necessary to apply at least the same aggregation design in both MOLAP and ROLAP modes. In
fact, ROLAP requires a higher degree of optimization to compensate for the performance
advantages MOLAP offers out of the box, such as automatic two-level bitmap indexing on every
attribute and lower I/O overhead due to data compression.

13
Cube-Based ROLAP Aggregations
In MOLAP and HOLAP modes, aggregations reside in the multidimensional database
maintained by Analysis Services. In ROLAP mode, on the other hand, aggregations are based
on indexed views in the relational data warehouse. Analysis Services supports ROLAP
aggregations insofar as it can create the indexed views to contain aggregations, and then uses
these indexed views instead of the source tables where possible to answer queries (see Figure
5). However, Analysis Services is not involved in materializing these views or storing their result
sets. Analysis Services delegates these tasks to the SQL Server relational engine by submitting
CREATE VIEW and CREATE UNIQUE CLUSTERED INDEX commands to the data warehouse
during cube partition processing. The relational engine then creates and maintains the indexed
views in the database in much the same way as tables with a clustered index.

To highlight how Analysis Services benefits from indexed views created by means of a cube-
based aggregation design, Figure 5 describes the following scenarios:

SQL query without aggregations in place (red arrow, left side of figure) The SQL
query goes with the lowest possible grain directly against the source facts table.
Although SQL Server can rapidly scan large data sets on a fast storage subsystem, by
aggressively performing synchronous and asynchronous I/O operations and using 512
KB block sizes, for example, performance suffers due to the time spent scanning across
unnecessarily detailed data in the database (25 billion rows of facts in the SQLCAT lab
environment).
SQL query with aggregations in place (purple arrow, right side of figure) Analysis
Services can generate a SQL query against the indexed view and the relational engine
can process the query much faster because the indexed view already summarized the
data and persisted the result set.

14
Figure 5: Analysis Services Querying with and without Cube-Based ROLAP Aggregations

Transparent ROLAP Aggregations


For a ROLAP cube to query an indexed view instead of a source table, Analysis Services must
be able to recognize that the indexed view exists and contains aggregations that can satisfy the
MDX query. This is only the case if Analysis Services created the indexed view based on a
ROLAP aggregation design, such as a design created using the aggregation wizards in BIDS. In
15
contrast, indexed views created manually using SQL Server Management Studio (SSMS) or
other database tools are transparent to Analysis Services. These database objects are not
directly associated with a ROLAP cube. Nonetheless, Analysis Services can still benefit from
transparent aggregations, which offer DW architects additional flexibility for ROLAP
performance tuning.

The key to leveraging transparent aggregations is the relational engines ability to choose an
optimal execution plan for parsed SQL queries. If the relational engines query optimizer finds a
match between the query elements (such as columns, search conditions, joins, and aggregate
functions) and the elements of an indexed view, it considers the view index as a possible option.
Query optimizer then estimates the cost of each possible option, chooses the query plan with
the lowest cost, and keeps track of the choice for subsequent queries. As a result, if an index of
a view has the lowest cost of any considered access option, query optimizer chooses this view
index even though the original SQL query references a table in the FROM clause.

The execution plan in Figure 6 shows query optimizer at work. The SQL query references the
FactLabDataOptimized table, but the relational engine transparently chooses the clustered
index on the Agg_Fact Lab Data_SW_ID view to answer the query. Microsoft introduced this
query optimization feature based on indexed views 10 years ago with SQL Server 2000 to
increase the performance of database applications without requiring T-SQL code modifications,
and has continued to improve the view-matching capabilities with every subsequent SQL Server
release.

Note: Query optimizer does not differentiate between client applications and is not aware of
Analysis Services, and Analysis Services is not aware of query optimizer.

16
Figure 6: Query Execution Plan Selecting a Transparent Aggregation

Choosing an Aggregation Strategy


For Analysis Services to leverage its sophisticated algorithms for calculating summary data
based on aggregations and selecting aggregated values for the calculation of other aggregated
values, it is necessary to create the aggregations through Analysis Services. However, these
cube-based ROLAP aggregations also have disadvantages (see Table 3). Transparent
aggregations can be a better choice to build a foundation for all cubes in the data warehouse,
while cube-based aggregations can then help to increase the performance of each cube
17
individually. By using transparent aggregations in combination with cube-based aggregations,
DW architects can implement an efficient aggregation design.

Table 3 compares general advantages and disadvantages of cube-based and transparent


aggregations. These considerations are particularly important for environments with multiple
ROLAP cubes accessing the same relational data warehouse.

Table 3: Advantages and Disadvantages of ROLAP Aggregation Types


Aggregation Advantages Disadvantages
Type
Cube-Based Efficient query Storage inefficiency Multiple
processing Analysis Services ROLAP cubes using the same
can use cube-based data warehouse do not share
aggregations even if the query cube-based aggregations. Analysis
and aggregation granularities do Services creates separate indexed
not exactly match. For example, views per cube even if the cubes
a query on [Month] can use an include the same or similar
aggregation on [Day], which aggregation designs.
requires only the summarization Processing overhead Analysis
of up to 31 numbers. Services drops and recreates
Aggregation design indexed views associated with
efficiency Analysis Services cube-based aggregations during
includes Aggregation Design cube partition processing.
Wizard and Usage-Based Dropping and recreating the
Optimization Wizard to create indexes can take an excessive
aggregation designs based on amount of time in a large-scale
storage and percentage data warehouse.
constraints or queries submitted
by client applications.
Transparent Reuse of existing indexes No sophisticated aggregation
across cubes Multiple cubes algorithms Indexed views must
using the same data warehouse match query granularity. Query
can benefit from the same set of optimizer doesnt consider
indexed views. Sharing indexed dimension hierarchies or
views among cubes mitigates aggregation granularities in the
aggregation-related storage query execution plan. For
requirements. example, a SQL query with
Less overhead during cube GROUP BY on [Month] cant use
processing Analysis Services an index on [Day].
is unaware of the aggregations Maintenance
and does not drop the indexed overhead Database
views during partition administrators must maintain
processing. There is no need to aggregations by using SSMS or
drop indexed views because the other tools. It is difficult to keep
relational engine maintains the track of the relationships between
18
indexes continuously, such as indexed views and ROLAP cubes.
during INSERT, UPDATE, and Design complexity Database
DELETE operations against the Engine Tuning Advisor can help to
base tables. facilitate aggregation design tasks
by analyzing Profiler traces, but it
cant identify all possible
candidates. Moreover, DW
architects must manually study
Profiler traces to determine
effective aggregations.

Figure 7 highlights the relationship between cube-based and transparent aggregations in a


centralized data warehouse shared among multiple ROLAP cubes. As illustrated, each cube
maintains its own aggregations even if the data warehouse already contains suitable indexed
views. For this reason, SQLCAT recommends reviewing the indexed views in the data
warehouse before and after creating or updating cube-based aggregation designs through
Aggregation Design Wizard or Usage-Based Optimization Wizard to avoid inefficient use of
storage space.

19
Figure 7: Cube-Based and Transparent ROLAP Aggregations

Creating Transparent Aggregations


Taking into consideration that transparent aggregations can be an important ROLAP design
element, SQLCAT decided to create a basic set of indexed views manually in order to add
transparent aggregations to the lab data warehouse. It was not hard to find good aggregation
candidates, given that a considerable number of test cases relied on similar MDX queries to
analyze three or seven days of data along a given dimension, such as software or segment. By
studying Profiler traces, SQLCAT determined that the lab ROLAP cube translated most of these
queries into SQL statements according to the following pattern:

SELECT SUM ( [TransmittedMB] )


, SUM ( [GatheredMB] )
, SUM ( [Duration] )
, SUM ( [IncidentCount] )
,[<Dimension Key>]
FROM [dbo].[FactLabDataOptimized],[dbo].[DimDate]
WHERE
(
( [FactLabDataOptimized].[Date_ID]=[DimDate].[Date_ID] )
20
AND ([Year]=<>)AND([Quarter]=<>)AND([Month]=<>)AND([Day]= <>)
)
GROUP BY [<Dimension Key>]

The dimension key was the only relevant variable, and with seven different dimension keys in
the source facts table SQLCAT was able to cover the most basic test cases with seven indexed
views that each joined the facts data to the date dimension on the date key and then
aggregated the values by an additional dimension key and by date dimension attributes. The
clustered indexes then physically ordered the results by dimension key, year, quarter, month,
and day so that the aggregated values for consecutive days were located in close physical
proximity in the database.

The following T-SQL listing shows the corresponding CREATE VIEW and CREATE UNIQUE
CLUSTERED INDEX statements with placeholders for dimension-specific information (for more
information about creating indexed views, refer to the topic Creating Indexed Views in SQL
Server 2008 R2 Books Online at http://msdn.microsoft.com/en-us/library/ms191432.aspx):

CREATE VIEW [dbo].[Agg_Facts By <Dimension> By Date]


( [TransmittedMB_0],[GatheredMB_1],[Duration_2],
[IncidentCount_3],[<Dimension Key>_4],
[Year_5],[Quarter_6],[Month_7],[Day_8],
[COUNT_BIG] )
WITH SCHEMABINDING AS
(
SELECT SUM ( [TransmittedMB] ),
SUM ( [GatheredMB] ),
SUM ( [Duration] ),
SUM ([IncidentCount] ),
[<Dimension Key>],
[Year],[Quarter],[Month],[Day],
COUNT_BIG(*)
FROM [dbo].[FactLabDataOptimized],[dbo].[DimDate]
WHERE
( [FactLabDataOptimized].[Date_ID]=[DimDate].[Date_ID] )
GROUP BY [<Dimension Key>],[Year],[Quarter],[Month],[Day]
)

GO

CREATE UNIQUE CLUSTERED INDEX [Agg_Facts By <Dimension> By Date]


ON [dbo].[Agg_Facts By <Dimension> By Date]
(
[<Dimension Key>_4] ASC,
[Year_5] ASC,
[Quarter_6] ASC,
[Month_7] ASC,
[Day_8] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON,
IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = OFF, ALLOW_PAGE_LOCKS = OFF, FILLFACTOR = 100)
ON [PRIMARY]

21
Table Binding versus Query Binding
Much to SQLCATs dismay, creating indexed views on top of a facts table with 25 billion rows
took far longer than expected. After more than 24 hours, SQLCAT cancelled this process as
soon as the relational engine completed the first view and index. This first indexed view now
had to suffice for a small ROLAP test with a subset of MDX queries. However, not a single query
showed performance improvements. It turned out that the facts table design and the
corresponding cube partition design did not permit the relational engine to utilize indexed views
for SQL queries originating from Analysis Services. Figure 8 shows the relevant design aspects.

Figure 8: Relational Facts Table Partitions and Corresponding Cube Partitions

According to a best practice in large-scale data warehousing, the design of the facts table relied
on partitions organized by calendar day to facilitate data pruning along a sliding time window. In
this particular case, the partition function used the Date_ID in the format yyyyMMdd. Every day,
the data warehouse would drop one expired table partition and add a new partition for the
current day. An XMLA script would then drop the expired cube partition as well and create a new
cube partition with a source query that referenced the new table partition by specifying the
current days Date_ID in the WHERE clause. This design works well for MOLAP cubes, but it
creates issues for ROLAP.

22
The primary problem is that Analysis Services must use query binding to reference individual
facts table partitions, so the SQL queries do not reference the underlying source facts table
directly. Instead, Analysis Services uses the source query of the cube partition in the FROM
clause to access the relevant facts data, and because the FROM clause now references a
subquery instead of the base table, query optimizer cant choose a view index to optimize the
execution plan. Query optimizer only considers view indexes if the SQL query references a base
table or if the FROM clause specifies an indexed view directly. Therefore, ROLAP cube
partitions must use table binding to benefit from transparent aggregations in the data
warehouse.

Figure 9 confirms this requirement. As the query execution plans indicate, the first query uses
table binding and leverages the indexed view, while the second query uses query binding and
performs all joining, sorting, and aggregating dynamically. Both queries return the same results,
but the second is much less efficient because it does not use the aggregations that already exist
in the data warehouse.

Figure 9: Query Plans for Table Binding and Query Binding

Reorganizing the Lab Data Warehouse


SQLCAT learned two lessons from this first attempt to implement ROLAP aggregations. Most
importantly, Analysis Services must not use subqueries, which means that Data Source Views
(DSVs) must reference source tables (not views or named queries), and cube partitions must
use table binding as the binding type to support ROLAP aggregations. Second, the relational
data warehouse should not maintain facts data in one large table. While it is possible to create
views and indexes that span an entire partitioned table before loading the data, changing any

23
aspect of the table or aggregation design throughout the cubes lifecycle leads to lengthy
processing time. For example, switch operations on partitioned tables require dropping and
recreating any associated indexed views.

Taking these dependencies into account, SQLCAT concluded that the lab data warehouse
required a substantial overhaul. It was necessary to replace the facts table partitions for
individual days of data with separate daily tables, load the facts data from the original source
table into the daily tables, create separate indexed views for each daily table, and then
reconfigure the cube partitions to access the daily tables instead of the original source table.

Note: Partitioning a large database by using separate daily tables used to be a best practice
for Microsoft SQL Server 2000 and earlier versions, but it is not a best practice for SQL Server
2005 Enterprise Edition and higher because these editions support table partitions based on a
partition function. SQLCAT reverted to the earlier method to achieve optimal ROLAP query
performance, but does not recommend this approach for other partitioning scenarios because of
the incurred complexity and administrative overhead.

Replacing Table Partitions with Individual Base Tables


The following T-SQL listing conveys how SQLCAT created the daily facts tables. Each table
mirrored the design of the original facts table, with the exception of the CHECK constraint on the
Date_ID column. This CHECK constraint limited the valid data range per table to the correct
calendar day. As the placeholders indicate, SQLCAT also appended the calendar-day
information to the table name. This common best practice helps to avoid confusion, particularly
during the configuration of the cube partitions in a subsequent step.

SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON

CREATE TABLE [dbo].[FactLabDataOptimized_<yyyyMMdd>](


[Base_ID] [bigint] NOT NULL,
[Segment_ID] [int] NOT NULL,
[Software_ID] [int] NOT NULL,
[Date_ID] [int] NOT NULL
CHECK ([Date_ID]=<yyyyMMdd>),
[Time_ID] [smallint] NOT NULL,
[ClientSocket_ID] [int] NOT NULL,
[TargetSocket_ID] [int] NOT NULL,
[LayerThree_ID] [smallint] NOT NULL,
[TransmittedMB] [bigint] NOT NULL,
[GatheredMB] [bigint] NOT NULL,
[Duration] [bigint] NOT NULL,
[IncidentCount] [bigint] NOT NULL,
CONSTRAINT [FK_FactLabDataOptimized_<yyyyMMdd>_DimSegments]
FOREIGN KEY([Segment_ID])
REFERENCES [dbo].[DimSegments] ([Segment_ID]),
CONSTRAINT [FK_FactLabDataOptimized_<yyyyMMdd>_DimSoftware]
FOREIGN KEY([Software_ID])
REFERENCES [dbo].[DimSoftware] ([Software_ID]),
CONSTRAINT [FK_FactLabDataOptimized_<yyyyMMdd>_DimDate]
FOREIGN KEY([Date_ID])
24
REFERENCES [dbo].[DimDate] ([Date_ID]),
CONSTRAINT [FK_FactLabDataOptimized_<yyyyMMdd>_DimSockets]
FOREIGN KEY([ClientSocket_ID])
REFERENCES [dbo].[DimSockets] ([Socket_ID]),
CONSTRAINT [FK_FactLabDataOptimized_<yyyyMMdd>_DimSockets1]
FOREIGN KEY([TargetSocket_ID])
REFERENCES [dbo].[DimSockets] ([Socket_ID]),
CONSTRAINT [FK_FactLabDataOptimized_<yyyyMMdd>_DimLayerThree]
FOREIGN KEY([LayerThree_ID])
REFERENCES [dbo].[DimLayerThree] ([LayerThree_ID]),
CONSTRAINT [FK_FactLabDataOptimized_<yyyyMMdd>_DimTime]
FOREIGN KEY([Time_ID])
REFERENCES [dbo].[DimTime] ([Time_ID]),
CONSTRAINT [PK_FactLabDataOptimized_<yyyyMMdd>] PRIMARY KEY CLUSTERED
( [Base_ID] ASC )
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Note: An alternative approach is to switch out partitions to new tables. However, switch
operations on partitioned tables require dropping and recreating any associated indexed views,
which renders this approach impractical for large-scale data warehouses with ROLAP
aggregations based on indexed views.

Loading the Facts Data


Considering the large data volume, SQLCAT opted to use SQL Server Integration Services
(SSIS) for bulk-loading the facts data into the daily tables. The SSIS packages that SQLCAT
created for this purposes included a Data Flow task with an OLE DB Source and a SQL Server
Destination. The OLE DB Source used a SQL command with the Date_ID in the WHERE clause
to restrict the extracted source data to the relevant day. The SQL Server Destination then
mapped the sources output columns in a straightforward way to the destinations input columns.

Figure 10 displays the resulting distribution of facts data across the new collection of daily tables
in the lab data warehouse.

25
Figure 10: Bulk Loading Daily Facts using SSIS

Although the SQL Server Destination object is less flexible than other Extract, Transform, and
Load (ETL) objects, it lets you customize bulk-load operations for high performance.
Specifically, SQLCAT disabled the option to check constraints and ran five packages in parallel,
achieving a throughput of 32 MB per second. By increasing the network packet size to 32 KB,
SQLCAT increased the bulk-load performance even further to 42 MB per second. For detailed
information about SSIS-based high-performance bulk loading, refer to We Loaded 1TB in 30
Minutes with SSIS, and So Can You, available on Microsoft MSDN at
http://go.microsoft.com/fwlink/?LinkId=193054. See also Top 10 SQL Server Integration
Services Best Practices on the SQLCAT Web site at http://go.microsoft.com/fwlink/?
LinkId=147462.

Note: After loading the facts data, SQLCAT created the full set of indexed views for each daily
table with the options SORT_IN_TEMPDB = ON, ALLOW_PAGE_LOCKS = OFF,
ALLOW_ROW_LOCKS = OFF, and FILLFACTOR = 100. For more information about how

26
SQLCAT created indexed views for ROLAP aggregations, see the section Creating Transparent
Aggregations earlier in this case study.

Realigning Cube Partitions


For maintenance and other reasons, it is a best practice to partition cubes in exactly the same
way as the underlying facts data. Applied to the lab environment, this meant that the ROLAP
cube required 31 cube partitions, each mapping directly to a corresponding daily table in the lab
data warehouse, as illustrated in Figure 11. Given this large number of cube partitions, SQLCAT
opted to automate the task using XMLA scripts.

Figure 11: Aligning Cube Partitions with Daily Facts Tables

To bring the ROLAP cube back into alignment with the lab data warehouse, SQLCAT performed
the following steps:

1. Adding all daily facts tables to the DSV By using the Add/Remove Tables wizard in
BIDS, SQLCAT added all 31 daily facts tables to the Analysis Services database in a
single step. The Add/Remove Tables wizard established the relationships between facts
and dimension tables automatically because the daily tables included all necessary
foreign key definitions. It is worth pointing out that SQLCAT did not remove the original

27
facts table from the DSV because it served as the source table for the measures in the
Fact Lab Data Optimized measure group.
2. Removing all existing cube partitions By deleting the existing cube partitions
associated with the original table partitions, SQLCAT excluded the original facts table
from further querying and processing in the Fact Lab Data Optimized measure group.
3. Adding daily cube partitions By using the following XMLA script, SQLCAT added all
31 daily partitions to the ROLAP cube with consistent configuration settings. Particularly
important for query performance is the Slice property, which Analysis Services uses
during query execution to determine the partitions that contain relevant data. SQLCAT
configured each cube partitions Slice property according to each base tables calendar
day (see also Figure 11).

<Create
xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<ParentObject>
<DatabaseID>SQLCAT</DatabaseID>
<CubeID>SQLCAT</CubeID>
<MeasureGroupID>Fact Lab Data Optimized</MeasureGroupID>
</ParentObject>
<ObjectDefinition>
<Partition xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Name>Fact Lab Data Optimized_<yyyyMMdd></Name>
<ID>Fact Lab Data Optimized_<yyyyMMdd></ID>
<Source xsi:type="DsvTableBinding">
<DataSourceViewID>SQLCAT</DataSourceViewID>
<TableID>dbo_FactLabDataOptimized_<yyyyMMdd></TableID>
</Source>
<StorageMode>Rolap</StorageMode>
<ProcessingMode>Regular</ProcessingMode>
<Slice>
[Dim Date].&amp;[<yyyy>]&amp;[<qq>]&amp;[<MM>]&amp;[<d>]
</Slice>
<ProactiveCaching>
<SilenceInterval>-PT1S</SilenceInterval>
<Latency>P0D</Latency>
<SilenceOverrideInterval>-PT1S</SilenceOverrideInterval>
<ForceRebuildInterval>-PT1S</ForceRebuildInterval>
<Enabled>false</Enabled>
<AggregationStorage>MolapOnly</AggregationStorage>
<OnlineMode>Immediate</OnlineMode>
<Source xsi:type="ProactiveCachingInheritedBinding">
<NotificationTechnique>Server</NotificationTechnique>
</Source>
</ProactiveCaching>

28
</Partition>
</ObjectDefinition>
</Create>

4. Processing the measure group To process all daily cube partitions and any
unprocessed dimensions for the Fact Lab Data Optimized measure group in a single
step, SQLCAT reprocessed the measure group programmatically using the following
XMLA script, This script enabled SQLCAT to process the data quickly using the ASCMD
command-line tool.

<Process
xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Object>
<DatabaseID>SQLCAT</DatabaseID>
<CubeID>SQLCAT</CubeID>
<MeasureGroupID>Fact Lab Data Optimized</MeasureGroupID>
<PartitionID>$(Partition)</PartitionID>
</Object>
<Type>ProcessFull</Type>
<WriteBackTableCreation>UseExisting</WriteBackTableCreation>
</Process>

Note: The ASCMD command-line tool supports a -v command parameter for different
variables. In this case, the different variables corresponded to the different partitions (i.e.,
different dates) within the cube.

Reviewing Measure Groups and Measures


With the successful completion of the data warehouse overhaul and subsequent cube
alignment, SQLCAT felt confident that the ROLAP cube now met all aggregation requirements
but Analysis Services did not concur. The ROLAP cube continued to generate SQL queries with
subqueries in the FROM clause, despite the fact that all cube partitions now used table binding
and mapped directly to daily facts tables. SQLCAT needed to investigate and solve this issue
because as long as a ROLAP cube uses subqueries to retrieve data from the underlying data
warehouse, it cant benefit from relational aggregations in the data warehouse.

Zooming in on Measures
Figure 12 shows the hierarchy of the most important Analysis Services objects that influence
how a ROLAP cube communicates with an underlying data warehouse.

29
Figure 12: Analysis Services Object Hierarchy

Based on this hierarchy and the following considerations, SQLCAT was able to narrow down the
root cause and focus attention on the measure group:

Data source The Data Source object of an Analysis Services database specifies the
connection string, database isolation level, and maximum number of concurrent
connections, which have no bearing on the binding type.
Data source view The DSV on top of the Data Source included no views or named
queries and only referenced base tables, so the DSV was not the reason.
Cube partitions The cube partitions also showed the correct configuration. The XMLA
script to create the partitions set the Type parameter of the Source property correctly to
DsvTableBinding (see previous T-SQL listing under Realigning Cube Partitions).
Cube-based aggregations The cube partitions were not associated with an
aggregation design, so cube-based aggregations did not yet exist at this stage.
Measure groups The next logical place to look was the Fact Lab Data Optimized
measure group with its measures Transmitted MB, Gathered MB, Duration, Incident
Count, and Fact Lab Data Optimized Count. The first four measures relied on the SUM
aggregation function to calculate summary data over their corresponding source
columns (that is, Transmitted MB, Gathered MB, Duration, and Incident Count in the
daily facts tables), but the fifth measure was different. It used the Count of Rows
30
function, which calculates the row count on the entire source table, not just a single
column. This Count of Rows measure was the likely cause of the issue.

The Fact Lab Data Optimized Count measure was also different in the sense that it did not exist
in the original MOLAP cube. When SQLCAT created the ROLAP cube, the Cube Wizard in
BIDS automatically added this measure to the measure group. As soon as SQLCAT deleted this
measure and processed the cube, Analysis Services switched to table binding in the SQL
queries, as SQLCAT verified through Profiler traces.

Row Counting and Query Binding


To understand the query-binding puzzle created by Count of Rows measures, it is important to
note that Analysis Services attempts to calculate the row count in a way that is independent of
the underlying Database Management System (DBMS). Basically, Analysis Services constructs
a SQL query that dynamically adds a column with a constant value of 1 in each row to the result
set. As the Profiler trace in Figure 13 reveals, this query then serves as the subquery of an outer
SQL query that uses the SQL aggregation function COUNT_BIG on this dynamic column to
return the number of rows as a bigint. Because all rows have a value of 1 in this column, there is
no need to pay special attention to the handling of NULL values by the DBMS, but the drawback
is query binding. If a ROLAP cube is supposed to use aggregations based on indexed views, it
cant include Count of Rows measures in its measure groups. Hence, for optimized ROLAP,
SQLCAT recommends disabling this Count of Rows measure.

Figure 13: Calculating the Row Count with the Help of a Subquery

31
Benchmarking the ROLAP Cube
At last, SQLCATs redesign and optimization efforts paid off. The ROLAP cube was finally ready
for performance testing, and thanks to the amazingly fast performance of the relational SQL
Server engine on top of a super-fast storage subsystem, the results looked better than
expected. To everybodys surprise, the ROLAP cube outpaced the MOLAP cube in 45 percent
of all queries right from the start (see Figure 14). Only 39 percent of the queries showed
substantially slower response times in ROLAP mode (more than twice the amount of MOLAP
time) and 16 percent showed moderate performance degradation (less than twice the amount of
MOLAP time). Admittedly, the lab cube featured a low number of attributes, low cardinality, and
the date dimension was relatively simple, but the performance results proved that ROLAP can
be a viable option for simple cubes with large data volumes but low dimensional complexity.

Figure 14: ROLAP Performance in Comparison to a MOLAP Baseline on a Logarithmic Scale

TopCount Customer Queries


In particular, the three-day and seven-day queries provided by the customer exhibited
remarkable performance behavior. For these TopCount() queries, ROLAP was at par with
MOLAP and in many cases faster. For example, query 7\11s average response times dropped
from 9 minutes and 50 seconds to 2 minutes and 39 seconds (see Figure 14). Across all
32
TopCount() customer queries, average response times dropped from 1 minute and 28 seconds
with MOLAP to 58 seconds with ROLAP. The average performance was better with ROLAP
because the three-day and seven-day TopCount() queries were able to leverage the transparent
aggregations in ROLAP mode, while MOLAP is naturally unable to exploit these types of
aggregations in the relational data warehouse to increase query performance. The original
MOLAP cube did not include corresponding aggregation designs.

The ROLAP performance also benefited from the daily cube partition design. As explained
earlier, the ROLAP cube included 31 partitions that mapped to 31 daily facts tables, each with
its own set of indexed views that satisfied the TopCount() queries. In this configuration, when
executing an MDX query for a given range of days, Analysis Services first checks the Slice
property of the cube partitions to determine the partitions that contain relevant data, then
submits SQL queries to the data warehouse for only those facts tables that correspond to the
relevant cube partitions. If the relational engine can process these SQL queries in parallel and
fully leverage multiple indexed views, and if the storage subsystem is optimized for fast serial
I/O, the data warehouse can return the results to Analysis Services in the shortest possible time.

Figure 15 shows a seven-day MDX query example. The seven selected days corresponded to
seven cube partitions, each owning a Slice within the specified day range. Accordingly, Analysis
Services submitted seven SQL queries to the SQLCAT lab data warehouse.

Figure 15: ROLAP Query Processing and TopCount Performance Results

33
Additional SQLCAT MDX Queries
For the most part, the additional queries that SQLCAT defined for a greater variety of test cases
also performed well in ROLAP mode, although query performance suffered noticeably over
larger data sets. As Figure 16 reveals, query A\03 was the worst performer. It calculated
average values for 30 days of dataand ROLAP required no less than 2 hours, 34 minutes,
and 57 seconds to deliver the results. The next section, Understanding Query Concurrency and
Processor Utilization, deals with query A\03 in more detail.

Figure 16: Additional MDX Query Performance Results

Understanding Query Concurrency and Processor Utilization


An interesting aspect of query A\03 is that its MDX expression was very similar to that of query
A\02, yet query A\02 performed much better. The difference was that query A\02 analyzed data
over seven days whereas query A\03 covered 30 days of data.

By studying Performance Monitor traces, SQLCAT noticed the following query behavior:

34
Seven-day MDX query A\02 Generated seven SQL queries, which the relational
engine was able to execute quickly because the number of queries did not exceed the
number of processor cores on the server. The server had eight cores, and thanks to the
high I/O performance in the storage subsystem, the relational engine was able to reach
a very high level of CPU utilization while executing all seven queries in parallel.
30-day query A\03 Generated 30 SQL queries, which far exceeded the number of
available cores, thus flooding the relational engine and causing a substantial backlog of
queries attempting to execute. The relational engine parallelized each of these 30 SQL
queries and utilized all available cores to distribute the query threads, which caused
excessive thread thrashing and I/O congestion.

To increase the performance of query A\03 and to ensure that the cores would not become a
bottleneck for a wider range of ROLAP queries, SQLCAT concluded that it would be necessary
to limit parallel query plan generation and scale out the lab environment, as illustrated in Figure
17. By restricting the maximum degree of parallelism to 1 (MAXDOP = 1), SQL queries run
single-threaded, so 30 cores would be necessary to run all SQL queries of MDX query A\03 in
parallel. For more information about the MAXDOP setting, refer to the section Max Degree of
Parallelism Option in SQL Server 2008 Books Online at http://go.microsoft.com/fwlink/?
LinkId=193055. Note that the MAXDOP setting can also be enforced with Resource Governor.

Figure 17: ROLAP SQL Query Fan-Out

Note: The SQL query fan-out design depicted in Figure 17 primarily benefits ROLAP cubes
with a high percentage of MDX queries spanning all cube partitions, such as MDX queries on
attributes that are not associated with partition slices. This design does not apply to MOLAP.

35
MOLAP has substantially lower hardware requirements. For example, MOLAP was able to
execute query A\03 much faster than ROLAP on a single server with eight cores. It didnt require
as much CPU power to parse through 30 days of data (see the results in Figure 16).

Usage-Based Optimization of Long-Running Queries


Up to this point, SQLCAT had implemented ROLAP aggregations based on the assumption that
all MDX queries had equal weight. The next logical step was to increase cube performance
specifically for long-running queries, such as queries 3\10 and 3\11, by using the Usage-Based
Optimization Wizard.

The Usage-Based Optimization Wizard can create smart cube-based aggregations based on
statistical information about MDX queries tracked in the Analysis Services query log. To
generate suitable query log entries, SQLCAT enabled the query log as discussed in the TechNet
article Configuring the Analysis Services Query Log at http://go.microsoft.com/fwlink/?
LinkId=193056, then ran the selected queries via the ASCMD command-line tool.

Figure 18 shows a breakdown of the response times for query 3\10 and 3\11 before and after
UBO. As the results reveal, the long-running queries benefited substantially from UBO.

Figure 18: Query Times Before and After UBO on a Logarithmic Scale

Particularly in ROLAP mode, response times dropped from almost 3 minutes to roughly 300
milliseconds. Prior to UBO, the ROLAP queries lagged far behind their MOLAP counterparts.
After UBO, MOLAP and ROLAP response times were at par, and only fractions of a second
apart. This proves that the application of a smart aggregation design through UBO can benefit
both MOLAP and ROLAP cubes. However, UBO should not be applied indiscriminately to all
queries to avoid inadvertent effects on storage and processing overhead, as discussed in the

36
Technical Note Reintroducing Usage-Based Optimization in SQL Server 2008 Analysis
Services available on the SQLCAT web site at http://go.microsoft.com/fwlink/?LinkId=191147.

Dealing with Skewed Distribution of Data


In addition to aggregations, SQLCAT was also interested in studying the effect of data skew on
ROLAP performance. The lab data warehouse facilitated this investigation because the
Software dimension data was very unequally distributed across the facts data. As illustrated in
Figure 19, the vast majority of the Software_IDs in the source facts tables were unknown values
(-1). Only a small portion of the IDs referred to known entities.

Figure 19: Skewed Distribution of Software_IDs in a Lab Facts Table

In particular, SQLCAT wanted to examine the impact of data skew on query optimizer. Query
optimizer uses a complex optimization algorithm based on table and index statistics to formulate
the query execution plan. This algorithm works well in most cases, but it is not perfect. Skewed
data can cause query optimizer to generate an inefficient query execution plan, which would
affect ROLAP performance while the effect remains virtually non-existent for MOLAP.

To perform the tests, SQLCAT ran arbitrary ad-hoc queries against the MOLAP and ROLAP
cubes. These ad-hoc queries did not benefit from aggregations or indexed views, and the
performance results validated the assumption that data skew must be better compensated for
with ROLAP than with MOLAP. The MOLAP queries performed well with an average response
time of 49 seconds while the ROLAP queries, on average, took approximately two minutes
longer (see Figure 19).

One way to compensate for data skew in a relational data warehouse is to cover the most
relevant queries with indexed views. Another is to ensure that appropriate, up-to-date column

37
statistics exist. For information about concepts and guidelines for using query optimization
statistics, refer to the topic Using Statistics to Improve Query Performance in SQL Server 2008
Books Online at http://go.microsoft.com/fwlink/?LinkId=164535.

Lessons Learned and Best Practices


In reviewing the investigation project, SQLCAT concluded that the efforts to achieve similar or
better performance with ROLAP over MOLAP had been successful. The test results exceeded
expectations in many cases and gave SQLCAT a good grasp on hardware requirements,
implementation details, and design implications. Table 4 summarizes corresponding lessons
learned during the ROLAP investigation and best practices for large-scale ROLAP
implementations, which SQLCAT determined in discussions with SQL Server and Analysis
Services developers.

Table 4: ROLAPLessons Learned and Best Practices


ROLAP Lessons Learned Best Practices
Is a viable For large cubes with a low level of Avoid complex data models and simplify
choice for complexity and low attribute cardinality, the relational data warehouse as much
cubes with ROLAP can be a good choice, but as possible, such as by removing
low ROLAP is less attractive than MOLAP unnecessary columns and, if possible,
complexity. for cubes with many dimensions (wide by replacing wide data types with
facts tables), a large number of narrower alternatives to increase storage
attributes (wide dimension tables), and efficiency.
complex attribute relationships
(snowflake schema or many-to-many
relationships).
Must use Analysis Services must reference Use table binding (not query binding) for
table binding. database tables in the SQL queries to ROLAP cube partitions, include only
support relational aggregations based tables (not views or named queries) in
on indexed views because SQL Server the DSVs, and remove any Count of
does not support building indexed Rows measures from ROLAP measure
views against views or subqueries and groups.
the query optimizer does not leverage
indexed views if the SQL queries do
not reference tables directly.
Requires In comparison to MOLAP, ROLAP has Provide sufficient CPU power to sustain
more substantially higher hardware the high number of concurrent SQL
hardware requirements, such as higher demand queries and optimize the storage
resources for CPU power and I/O performance. subsystem to deliver sufficient I/O
than MOLAP. throughput to reach a very high level of
CPU utilization. Correctly implemented,
ROLAP should be consistently
generating sequential scans, so the
storage subsystem should be optimized
for a high volume of serial I/O requests.
Impacts the Although ROLAP avoids copying Use transparent aggregations in
size of source facts data into multidimensional combination with cube-based

38
relational databases, it does place a high aggregations to avoid inefficient use of
databases. demand on storage capacities in the storage resources, particularly in
relational data warehouse for indexed environments with multiple cubes
views and summary tables. For sharing the same data warehouse. Avoid
example, indexing on every attribute overlapping cube-based aggregations
can lead to excessive storage and cubes with different levels of
consumption in ROLAP mode, while granularity because they require different
MOLAP is much more efficient with its sets of tables for each level of
automatic two-level bitmap indexes. granularity.
Increases MOLAP automatically optimizes many Create scripts or other custom solutions
design design aspects that require manual to automate the provisioning of facts
complexity attention in ROLAP mode. Queries that tables, indexed views, and
and dont hit aggregations require a very corresponding cube partitions. Directly
management clean, normalized schema and other map the facts tables in the relational
overhead. data warehouse optimization data warehouse to cube partitions in
techniques also come into play, such Analysis Services. Also, use the ASCMD
as restricting the maximum degree of command-line tool and Profiler traces to
parallelism to 1 and compensating for test applicable queries with original
data skew. Also, because the cube aggregations, new UBO aggregations,
cant rely on partitioned facts tables, handmade aggregations, and no
the data warehouse requires individual aggregations to better understand the
daily tables and separate sets of query characteristics of the cube.
indexed views, which can substantially
increase management overhead.
Can leverage MOLAP and ROLAP can use Create cube-based aggregations at the
cube-based aggregations even when the query leaf level to help known cross-join
aggregations granularity doesnt exactly match the patterns. Note that Analysis Services can
even if the aggregation granularity. However, only leverage aggregations for different
granularities ROLAP can only use cube-based granularities if the aggregations are part
do not aggregations on the leaf level of a of the ROLAP cube design, such as
exactly dimension when querying higher aggregations created by using UBO.
match. levels. Transparent aggregations do not fall into
this category.
Might be MOLAP can catch referential integrity Focus on data cleanup and data quality
affected by errors and other data problems during during the load process. Confine data
data cube processing, but these issues can changes to the most current facts table,
problems affect ROLAP during query execution. such as to the most recent daily table, in
and ROLAP cubes do not support order to avoid any impact of referential
referential Snapshot isolation with multiple active integrity issues on past data.
integrity result sets on a SQL Server
issues. connection, so DW architects must use
the ReadCommitted isolation level,
which allows nonrepeatable reads
(reading the same row twice delivers
different results) and phantoms (re-
executing the same query delivers a
different set of rows).
Relies on ROLAP performance depends on the Add foreign key and other constraints to
query capabilities of the query optimizer to the facts tables. Keep the statistics up to
optimizers formulate an efficient query execution date by running regular UPDATE
39
ability to plan. An inefficient query execution STATISTICS jobs after updates (or turn
make the plan affects ROLAP performance, on auto update stats).
best while the effect remains virtually non-
decisions for existent for MOLAP.
the query
execution
plan.

Conclusion
ROLAP can be a viable option for simple cubes with large data volumes but low dimensional
complexity if DW architects and database administrators have the knowledge to fine-tune the
data warehouse manually. It is important to realize that MOLAP optimizes many design aspects
automatically, while this must be done by hand in ROLAP mode. However, given sufficient CPU
power, memory capacity, a storage subsystem optimized for high I/O throughput, and useful
relational aggregations, ROLAP can reach very high performance levels thanks to the amazing
performance of the SQL Server 2008 relational engine, exceeding MOLAP performance in some
cases.

In comparison to the gold standard of MOLAP, ROLAP clearly places a higher demand on
hardware resources, impacts the size of relational databases, increases design complexity and
management overhead, comes with higher processing overhead, and might be affected by data
problems and referential integrity issues. For instance, to benefit from aggregations, a ROLAP
cube cant use query binding, which limits cube design flexibility and prevents the data
warehouse from using table partitioning. For partitioning purposes, the data warehouse must
rely on the old-style method of individual facts tables organized by attributes related to cube
partitioning, which noticeably increases management overhead. Furthermore, if ROLAP cant
determine which partitions contain relevant data, perhaps because the query did not specify any
of the slice attributes or because of complex attribute relationships that ROLAP cant analyze,
ROLAP includes all available cube partitions in the query execution, potentially flooding the
relational engine and causing a substantial backlog of queries as well as excessive thread
thrashing and I/O congestion. Limiting parallel query plan generation and implementing a SQL
query fan-out design can help to mitigate this issue, but the required resources to implement a
fan-out are substantial. In comparison, MOLAP can determine the relevant cube partitions even
with parent-child and many-to-many relationships in the dimensions and can execute queries
much more efficiently and with much less CPU power. MOLAP offers many performance
advantages out of the box, such as automatic two-level bitmap indexing on every attribute and
lower I/O overhead due to data compression.

On the other hand, ROLAP undeniably does have the advantage that it doesnt need to copy the
source data into a multidimensional database. With data volumes beginning to exceed the
maximum MOLAP capabilities, this advantage will trump all of the ROLAP disadvantages,
especially considering the powerful features of the SQL Server relational engine to manage the
data volumes. Smart aggregation designs that combine the advantages of transparent and

40
cube-based aggregations can mitigate some of the performance disadvantages of ROLAP. It is
also possible to fan out SQL queries across multiple data warehouse servers. The data
warehouse scale-out onto multiple servers would also benefit cube partition processing. And
finally, XMLA scripts and custom solutions can help to lower the administrative burden of
maintaining a large number of facts tables and cube partitions.

The keys to a successful ROLAP implementation are a fast underlying storage subsystem, a
streamlined relational data warehouse, and thorough testing, tweaking, and usage-based
optimization. It is imperative to study Profiler traces and analyze the SQL queries generated by
Analysis Services. If these SQL queries reference views or subqueries in the FROM clause, the
ROLAP cube will not be able to leverage relational aggregations. Query optimizer will not use
transparent aggregations created manually in the data warehouse and the creation of cube-
based aggregations will fail during partition processing if the cube uses query binding for any
reason. It is important to review the design of DSVs, cube partitions, and measure groups to
ensure that Analysis Services uses table binding.

Although SQLCAT deemed its first lab-based customer ROLAP investigation a great success, a
few questions still remain to be answered. For example, SQLCAT did not examine the ROLAP
behavior for multiple concurrent users and did not study how ROLAP would benefit from a warm
Analysis Services cache. In theory, the ROLAP storage mode exhibits the same caching as
MOLAP does and should therefore show similar performance improvements, but this still needs
to be verified in a realistic test lab. Furthermore, SQLCAT still needs to clarify the impact of
attribute cardinality, distinct count measures, and complex attribute relationships on ROLAP
performance. It still needs to be seen how ROLAP performs over very wide dimensions and
attributes with one million or more members. For all of these reasons, customers are
encouraged to use ROLAP with caution, apply the optimization techniques discussed in this
case study, and perform additional acceptance tests with the full number of current users in their
environments.

41
For more information:

http://www.sqlcat.com/: SQLCAT Web site

http://www.microsoft.com/sqlserver/: SQL Server Web site

http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter

http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter

Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5
(excellent) how you would rate this paper and why youve given it this rating. For example:

Are you rating it high because it has good examples, excellent screen shots, clear
writing, or another reason?
Are you rating it low because the examples are poor, the screen shots fuzzy, or the
writing unclear?

This feedback will help us improve the quality of the case studies we release.

Send feedback.

42