Está en la página 1de 88

Using the ETL Infrastructure Inside Oracle Database 10g

Purpose
This tutorial covers the Extraction, Transformation, and Loading (ETL) infrastructure of
Oracle Database 10g.
Time to Complete
Approximately 1 hour
Topics
This tutorial covers the following topics:
Overview

Prerequisites

Implement Schema Changes for the Sales History Schema

Review the Multitable Insert

Learn DML Error Logging Capabilities

Experience the Basics of the Table Function

Use Synchronous Change Data Capture (CDC) to Track and Consume Incremental
Data Changes
Propagate Information for a Data Mart

Clean Up

Summary

Viewing Screenshots

Place the cursor over this icon to load and view all the screenshots for this tutorial.
(Caution: This action loads all screenshots simultaneously, so response time may be
slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following
steps to load and view only the screenshot associated with that step. You can hide an
individual screenshot by clicking it.
Overview
What Happens During the ETL Process?
ETL stands for extraction, transformation, and loading. During Extraction, the desired
data has to be identified and extracted from many different sources, including database
systems and applications. Often, it is not possible to identify the specific subset of
interest, meaning that more than necessary data has to be extracted, and the
identification of the relevant data is done at a later point in time. Depending on the
source system’s capabilities (for example, OS resources), some transformations may
take place during this extraction process. The size of the extracted data varies from
hundreds of kilobytes to hundreds of gigabytes, depending on the source system and the
business situation. The same is true for the time delta between two (logically) identical
extractions: the time span may vary between days/hours and minutes to near real-time.
For example, Web server log files can easily become hundreds of megabytes in a very
short period of time.
After extracting data, it needs to be physically transported to the target system or an
intermediate system for further processing. Depending on the chosen way of
transportation, some transformations can be done during this process, too. For example,
a SQL statement that directly accesses a remote target through a gateway, can
concatenate two columns as part of the SELECT statement .
After extracting and transporting the data, the most challenging (and time consuming)
part of ETL follows: Transformation and Loading into the target system. This may
include:
Application of complex filters

Data has to be validated against information already existing in target database tables
Data extracted without the knowledge of new versus changed information has to be
checked against the target objects to determine whether it must be updated or
inserted
The same data has to be inserted several times as detail level and aggregated
information
This should be done as quickly as possible in a scalable manner and must not affect the
existing target with respect to concurrent access for information retrieval.
Oracle offers a wide variety of capabilities to address all the issues and tasks relevant in
an ETL scenario. Oracle Database 10g is the ETL transformation engine.
Back to Topic List
Prerequisites
Before starting this tutorial, you should:
1. Perform the Installing Oracle Database 10g on Windows tutorial.
2. Download and unzip etl2.zip into your working directory (i.e. c:\wkdir).

Back to Topic List


Implement Schema Changes for the Sales History Schema
Before starting the tasks for this tutorial, you need to implement some changes on the
existing Sales History schema. Additional objects are necessary, and additional
system privileges must be granted to the user SH. The SQL file for applying those
changes is modifySH_10gR2.sql .
1. Start a SQL *Plus session. Select Start > Programs > Oracle-OraDB10g_home > Application
Development > SQL Plus.
(Note: This tutorial assumes you have an c:\wkdir folder. If you do not, you need to create
one and unzip the contents of etl2.zip into this folder. While executing the scripts, paths are
specified)
2. Log in as the SH user. Enter SH as the User Name and SH as the Password. Then click OK.
3. Run the modifySH_10gR2.sql script from your SQL*Plus session.
@c:\wkdir\modifySH_10gR2.sql
The bottom of your output should match the image below

Back to Topic List


Reviewing the Multitable Insert
MyCompany receives some nonrelational data structures from one of its partner
companies, which then sells the products for a special advertisement campaign. The
data structure is a denormalized, nonrelational record structure from proprietary
mainframe systems consisting of one record per customer and product per week. Those
data structures have to be inserted into the data warehouse. Because sales record data is
stored per customer and product per day, you need to transform the incoming data.
As part of the transformation, the nonrelational denormalized data structure must be
transformed from one record per week into seven records, each containing the
information of a business day (first example business transformation). In addition, the
data warehouse keeps track of all new customers with a credit limit above a certain
limit. Track those customers separately.
In this section, you will implement those business transformations by leveraging
Oracle’s new multitable insert capabilities. To do this, perform the following steps:
Use the Multitable Insert for Pivoting
Use the Multitable Insert for Conditional Insertion

Back to Topic List


Using the Multitable Insert for Pivoting
1. Show the execution plan for the new multitable insert. From a SQL*Plus session logged on to
the SH schema, execute the following SQL script:
@c:\wkdir\explain_mti_new.sql
DELETE
FROM PLAN_TABLE;
EXPLAIN PLAN FOR
INSERT ALL
INTO sales
VALUES(product_id, customer_id,weekly_start_date,2,9999, q_sun,sales_sun)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+1,2,9999, q_mon,sales_mon)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+2,2,9999, q_tue,sales_tue)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+3,2,9999, q_wed,sales_wed)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+4,2,9999, q_thu,sales_thu)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+5,2,9999, q_fri,sales_fri)
INTO sales
VALUES(product_id, customer_id,weekly_start_date+6,2,9999, q_sat,sales_sat)
SELECT * FROM sales_input_table;
SET linesize 140
SELECT * from table(dbms_xplan.display);
Note: The input source table is scanned only once! The complexity of the denormalization is
handled within the several INSERT INTO branches, thus avoiding multiple scans.

2. Now show the execution plan for the multi-table insert based on a UNION ALL set operation.
From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\explain_mti_old.sql
DELETE FROM PLAN_TABLE;
COMMIT;
EXPLAIN PLAN FOR
INSERT INTO sales
(prod_id, cust_id, time_id, channel_id,promo_id,amount_sold,quantity_sold)
SELECT product_id, customer_id,weekly_start_date,2,9999,sales_sun,q_sun
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+1,2,9999,sales_mon,q_mon
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+2,2,9999,sales_tue,q_tue
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+3,2,9999,sales_wed,q_wed
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+4,2,9999,sales_thu,q_thu
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+5,2,9999,sales_fri,q_fri
FROM sales_input_table
UNION ALL
SELECT product_id, customer_id,weekly_start_date+6,2,9999,sales_sat,q_sat
FROM sales_input_table;
SET linesize 140
SELECT * from table(dbms_xplan.display);
COMMIT;

Note: The input source table is scanned seven times! The complexity of the denormalization is
handled within the several SELECT operations.
With an increasing number of input records, the superiority and the performance improvement of
the new multi-table insert statement—by reducing the statement to only one SCAN—will
become more and more obvious.

Back to Topic
Using the Multitable Insert for Conditional Insertion
1. Create an intermediate table consisting of new information. From a SQL*Plus session logged on
to the SH schema, execute the following SQL script, which contains the SQL statements from
steps 1-4:
@c:\wkdir\mti2_prepare.sql
Rem create intermediate table with some records

CREATE TABLE customers_new AS


SELECT *
FROM customers
WHERE cust_id BETWEEN 2000 AND 5000;

2. Disable constraints on the SALES table; this is necessary for step 3.


ALTER TABLE sales DISABLE CONSTRAINT sales_customer_fk;

3. Delete some data from the CUSTOMERS table.


DELETE FROM customers WHERE cust_id BETWEEN 2000 AND 5000;

4. Create an empty table for our special promotion information.


CREATE TABLE customers_special AS
SELECT cust_id, cust_credit_limit
FROM customers
WHERE rownum > 1;
5. Issue the multitable insert into several tables with different table structures. From a SQL*Plus
session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\do_mti2.sql
INSERT /*+ APPEND NOLOGGING */FIRST
WHEN cust_credit_limit >= 4500 THEN
INTO customers
INTO customers_special VALUES (cust_id, cust_credit_limit)
ELSE
INTO customers
SELECT * FROM customers_new;

6. You can see what was inserted. Execute the following SQL script:
@c:\wkdir\control_mti2.sql
SELECT COUNT(*) FROM customers;
SELECT COUNT(*) FROM customers_special;
SELECT MIN(cust_credit_limit) FROM customers_special;
7. Before continuing, reset your environment by executing the following SQL script:
@c:\wkdir\reset_mti2.sql

set echo on
REM cleanup and reset

ALTER TABLE sales


MODIFY CONSTRAINT sales_customer_fk RELY
ENABLE NOVALIDATE;
DROP TABLE customers_special;
DROP TABLE customers_new;
COMMIT;
Back to Topic
Back to Topic List
Using the Upsert Functionality, SQL MERGE Keyword Overview
MyCompany has to update its product information in the data warehouse periodically
with data from its products database. Unfortunately, you are receiving the product delta
information without the possibility to distinguish between new and updated information.
Therefore, you have to figure this out on the data warehouse site.
To use the SQL MERGE command to either update or insert the data conditionally, you
perform the following steps:
1. Create an external table (and directories) for the external products information.
2. Perform an Upsert using the SQL MERGE command.
3. Show the execution plan of the MERGE command.
4. Perform an Upsert using two separate SQL commands (without MERGE
functionality).

Back to Topic List


1. Creating an External Table (and Directories) for the External Products Information
Before you can create the external table, you need to create a directory object in the
database that points to the directory on the file system where the data files will reside.
Optionally, you can separate the location for the logged, bad and discarded files from
the location of the data files, as shown in the following example.
1. To create the directory, perform the following. If you have already completed High-Speed Data
Load and Rolling Window Operations, the directories will already exist and you can skip this
step.
The scripts are set up for a Windows system and assume that the Hands-On workshop was
extracted
to the c:\wkdir folder.
From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\create_directory.sql
DROP DIRECTORY data_dir;
DROP DIRECTORY log_dir;
Rem *****
Rem CREATE DIRECTORIES
Rem note that security is controlled
Rem (a) with privileges on directory (read/write)
Rem (b) with privileges on external table
Rem *****
CREATE DIRECTORY data_dir AS 'c:\wkdir';
CREATE DIRECTORY log_dir AS 'c:\wkdir';

2. When creating an external table, you are defining two parts of information:
1. The metadata information for the table representation inside the database
2. The access parameter definition of how to extract the data from the external file
After creation of this meta information, the external data can be accessed from within the
database, without the necessity of an initial load. To create the external table, from a SQL*Plus
session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\create_external_table2.sql
Rem *****
Rem preparation for upsert
Rem file prodDelta.dat, 1.2 MB must
Rem reuse of directory STAGE_DIR
Rem access via external table
Rem *****
DROP TABLE products_delta;
CREATE TABLE products_delta
(
PROD_ID NUMBER(6),
PROD_NAME VARCHAR2(50),
PROD_DESC VARCHAR2(4000),
PROD_SUBCATEGORY VARCHAR2(50),
PROD_SUBCATEGORY_ID NUMBER,
PROD_SUBCATEGORY_DESC VARCHAR2(2000),
PROD_CATEGORY VARCHAR2(50),
PROD_CATEGORY_ID NUMBER,
PROD_CATEGORY_DESC VARCHAR2(2000),
PROD_WEIGHT_CLASS NUMBER(2),
PROD_UNIT_OF_MEASURE VARCHAR2(20),
PROD_PACK_SIZE VARCHAR2(30),
SUPPLIER_ID NUMBER(6),
PROD_STATUS VARCHAR2(20),
PROD_LIST_PRICE NUMBER(8,2),
PROD_MIN_PRICE NUMBER(8,2),
PROD_TOTAL VARCHAR2(13),
PROD_TOTAL_ID NUMBER,
PROD_SRC_ID NUMBER,
PROD_EFF_FROM DATE,
PROD_EFF_TO DATE,
PROD_VALID CHAR(1)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
NOBADFILE
NOLOGFILE
FIELDS TERMINATED BY "|" LDRTRIM
)
location
('prodDelta.dat')
REJECT LIMIT UNLIMITED NOPARALLEL;
Back to Topic
2. Performing an Upsert Using the SQL MERGE Command
To perform an upsert using the SQL MERGE command, execute the following step:
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\do_merge_new.sql
Rem *****
Rem MERGE
Rem new and modified product information has arrived
Rem use of external table again
Rem - seamlessly within the MERGE command !!!
Rem *****
MERGE INTO products t
USING products_delta s
ON ( t.prod_id=s.prod_id )
WHEN MATCHED THEN
UPDATE SET t.prod_name=s.prod_name, t.prod_list_price=s.prod_list_price,
t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN
INSERT (prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total,
prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
VALUES (s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory,
s.prod_subcategory_id,
s.prod_subcategory_desc, s.prod_category, s.prod_category_id,
s.prod_category_desc,
s.prod_status, s.prod_list_price, s.prod_min_price, s.prod_total,
s.prod_total_id,
s.prod_weight_class, s.prod_pack_size, s.supplier_id)
;
ROLLBACK;

Note: You will ROLLBACK so that you can reissue the same upsert with two SQL statements in
a subsequent operation.

Back to Topic
3. Showing the Execution plan of the MERGE Command
To examine the execution plan of the SQL MERGE statement, perform the following
steps:
1. From a SQL*Plus session logged on to the SH schema, run the explain_merge_new.sql
script, or copy the following SQL statement into your SQL*Plus session:
@c:\wkdir\explain_merge_new.sql
DELETE FROM plan_table;
COMMIT;
EXPLAIN PLAN FOR
MERGE INTO products t
USING products_delta s
ON ( t.prod_id=s.prod_id )
WHEN MATCHED THEN
UPDATE SET t.prod_name=s.prod_name, t.prod_list_price=s.prod_list_price,
t.prod_min_price=s.prod_min_price
WHEN NOT MATCHED THEN
INSERT (prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total,
prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
VALUES (s.prod_id, s.prod_name, s.prod_desc, s.prod_subcategory,
s.prod_subcategory_id,
s.prod_subcategory_desc, s.prod_category, s.prod_category_id,
s.prod_category_desc,
s.prod_status, s.prod_list_price, s.prod_min_price, s.prod_total,
s.prod_total_id,
s.prod_weight_class, s.prod_pack_size, s.supplier_id)
;
set linesize 140
select * from table(dbms_xplan.display);

Back to Topic
4. Performing an Upsert Using Two Separate SQL Commands
Without the MERGE statement, such upsert functionality could be implemented as a
combination of two SQL statements (insert plus update) or in a procedural
manner. A possible upsert implementation of this business problem as two SQL
commands is shown below.
To demonstrate how this was done, perform the following steps:
1. To leverage an updatable join view, a primary or unique key must exist on the join column.
Therefore, you cannot use the external table directly. As a result, you need to create an
intermediate table in the database with the appropriate unique key constraint. Run the following
script to create the necessary structures and perform the upsert.:
@c:\wkdir\do_merge_816.sql
CREATE TABLE prod_delta NOLOGGING AS SELECT * FROM products_delta;
CREATE UNIQUE INDEX p_uk ON prod_delta(prod_id);
UPDATE
(SELECT s.prod_id, s.prod_name sprod_name, s.prod_desc sprod_desc,
s.prod_subcategory sprod_subcategory,
s.prod_subcategory_desc sprod_subcategory_desc, s.prod_category
sprod_category,
s.prod_category_desc sprod_category_desc, s.prod_status
sprod_status,
s.prod_list_price sprod_list_price, s.prod_min_price
sprod_min_price,
t.prod_id, t.prod_name tprod_name, t.prod_desc tprod_desc,
t.prod_subcategory
tprod_subcategory,
t.prod_subcategory_desc tprod_subcategory_desc, t.prod_category
tprod_category,
t.prod_category_desc tprod_category_desc, t.prod_status
tprod_status,
t.prod_list_price tprod_list_price, t.prod_min_price
tprod_min_price
FROM products t, prod_delta s WHERE s.prod_id=t.prod_id) JV
SET
tprod_list_price =sprod_list_price,
tprod_min_price =sprod_min_price;
INSERT INTO products
(prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total,
prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total,
prod_total_id,
prod_weight_class, prod_pack_size, supplier_id
FROM prod_delta s WHERE NOT EXISTS (SELECT 1 FROM products t WHERE
t.prod_id =
s.prod_id);
ROLLBACK;
Note that you have to issue two separate SQL statements to accomplish the same functionality
when using the MERGE statement.
2. From a SQL*Plus session logged on to the SH schema, view the execution plans by running the
view_explain_merge.sql script, or copy the following SQL statement into your
SQL*Plus session:
@c:\wkdir\view_explain_merge.sql
PROMPT show the plans
DELETE FROM plan_table;
COMMIT;

EXPLAIN PLAN FOR


UPDATE
(SELECT s.prod_id, s.prod_name sprod_name, s.prod_desc sprod_desc,
s.prod_subcategory sprod_subcategory,
s.prod_subcategory_desc sprod_subcategory_desc, s.prod_category
sprod_category,
s.prod_category_desc sprod_category_desc, s.prod_status sprod_status,
s.prod_list_price sprod_list_price, s.prod_min_price sprod_min_price,
t.prod_id, t.prod_name tprod_name, t.prod_desc tprod_desc,
t.prod_subcategory tprod_subcategory,
t.prod_subcategory_desc tprod_subcategory_desc, t.prod_category
tprod_category,
t.prod_category_desc tprod_category_desc, t.prod_status tprod_status,
t.prod_list_price tprod_list_price, t.prod_min_price tprod_min_price
FROM products t, prod_delta s WHERE s.prod_id=t.prod_id) JV
SET
tprod_list_price =sprod_list_price,
tprod_min_price =sprod_min_price;
set linesize 180
select * from table(dbms_xplan.display);
DELETE FROM plan_table;

COMMIT;

EXPLAIN PLAN FOR


INSERT INTO products
(prod_id, prod_name, prod_desc, prod_subcategory, prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
prod_weight_class, prod_pack_size, supplier_id)
SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_id,
prod_subcategory_desc, prod_category, prod_category_id,
prod_category_desc,
prod_status, prod_list_price, prod_min_price, prod_total, prod_total_id,
prod_weight_class, prod_pack_size, supplier_id
FROM prod_delta s WHERE NOT EXISTS (SELECT 1 FROM products t WHERE
t.prod_id = s.prod_id);
set linesize 180
select * from table(dbms_xplan.display);
ROLLBACK;

...
To leverage the updatable join view functionality, the external table needs to be copied into a real
database table and a unique index on the join column needs to be created. In short, this requires
more operations, more space requirements, and more processing time.
Back to Topic
Back to Topic List

Learn DML Error Logging Capabilities


Oracle Database 10g release 2 introduces a new and exciting functionality to enable
record-level error handling for bulk SQL operations. The DML error logging
functionality extends existing DML operations by enabling you to specify the name of
an error logging table into which Oracle should record errors encountered during DML
operations. This enables you to complete the DML operation in spite of any errors, and
to take corrective action on the erroneous rows at a later time.
A DML error logging table consists of several mandatory control columns and a set
of user-defined columns that represent either all or a subset of the columns of the target
table of the DML operation using a data type that is capable of storing potential errors
for the target column. For example, you need a VARCHAR2 data type in the error
logging table to store TO_NUM data type conversion errors for a NUMBER column in
the target table. You should use the DBMS_ERRLOG package to create the DML
error logging tables.
1. Create an external table, representing data from a source system with an obvious poor data
quality. Run the script cr_ext_tab_for_elt.sql.
@c:\wkdir\cr_ext_tab_for_elt.sql
PROMPT Create an external table pointing to a data set with poor quality
Rem
DROP TABLE sales_activity_direct;

CREATE TABLE sales_activity_direct


(
PROD_ID NUMBER,
CUST_ID NUMBER,
TIME_ID CHAR(20),
CHANNEL_ID CHAR(2),
PROMO_ID NUMBER,
QUANTITY_SOLD NUMBER(3),
AMOUNT_SOLD CHAR(50)
)
ORGANIZATION external
(
TYPE oracle_loader
DEFAULT DIRECTORY data_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
BADFILE log_dir:'sh_sales2.bad'
LOGFILE log_dir:'sh_sales2.log_xt'
FIELDS TERMINATED BY "|" LDRTRIM
)
location
(
'sales_activity_direct.dat'
)
)REJECT LIMIT UNLIMITED NOPARALLEL;
2. Create a dummy table to avoid interfering with existing tables. Run the script
cr_tab_for_elt.sql.
@c:\wkdir\cr_tab_for_elt.sql
PROMPT create a second sales table to being used by the insert
DROP TABLE sales_overall;
CREATE TABLE sales_overall as select * from sales where 1=0;
3. To track errors with the new DML error logging functionality, you need an error logging table. It
is highly
recommended that you use the provided package DBMS_ERRLOG. Run the script
cr_elt.sql.
@c:\wkdir\cr_elt.sql
PROMPT Create the DML error logging table with DEFAULT name
PROMPT It is highly advised to use the ORCL-provided pkg to create the
PROMPT DML Error Logging table
Rem
DROP TABLE err$_sales_overall;
exec dbms_errlog.create_error_log('sales_overall');

Note the mandatory control columns and the different data types for the columns to being
tracked. The data
types have to be a superset of the target data types to enable a proper tracking of errors, e.g. A
non-number value
for a number target column.

SET LINESIZE 60
DESCRIBE err$_sales_overall
4. Try to load the data residing in the external file into the target table. The default behavior of the
error
logging functionality is set to a REJECT LIMIT of zero. In the case of an error, the DML
operation will fail
and the first record raising the error is stored in the error logging table. Run the script
ins_elt_1.sql to see this behavior.
@c:\wkdir\ins_elt_1.sql
PROMPT First insert attempt , DEFAULT reject limit 0
PROMPT also, the error message that comes back is the one of error #1
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
SELECT * FROM sales_activity_direct
LOG ERRORS INTO err$_sales_overall ( 'load_test1' );
commit;

PROMPT As you can see, nothing is inserted in the target table, but ONE row
in the error
logging table
select count(*) from sales_overall;
select count(*) from err$_sales_overall;
delete from err$_sales_overall;
commit;
From a DML perspective, you want to ensure that a transaction is either successful or not.
Hence, from a generic perspective, you either want to have the DML operation succeeding no
matter what (which would translate into a reject limit unlimited) or you want to have it failing
when an error occurs (reject limit 0). The decision was made to set the default reject limit to
zero, because any arbitrary number chosen is somewhat meaningless; If you decide to tolerate a
specific number of errors, it is a pure business decision how many errors might be tolerable in a
specific situation.

5. Try the insert again with a REJECT LIMIT of 10 records. If more than 10 errors occur, the
DML operation will fail, and you will find 11 records in the error logging table. Run the script
ins_elt_2.sql to see this behavior.
@c:\wkdir\ins_elt_2.sql
SET ECHO OFF

PROMPT First insert attempt , DEFAULT reject limit 10


PROMPT Note that the error message that comes back is the one of error #11

INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall


SELECT * FROM sales_activity_direct
LOG ERRORS INTO err$_sales_overall ( 'load_test2' ) REJECT LIMIT 10;
commit;

PROMPT no rows in target; error count+1 in DML error logging table


select count(*) from sales_overall;
select count(*) from err$_sales_overall;
delete from err$_sales_overall;
commit;
There are more than 10 errors for this insert, meaning the quality of this data is poor.

6. Put the data into the table and figure out what errors your are encountering. Run script
ins_elt_3.sql.
@c:\wkdir\ins_elt_3.sql
PROMPT Reject limit unlimited will succeed
Rem ... as long as you do not run into one of the current limitations ...
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
SELECT * FROM sales_activity_direct
LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT
UNLIMITED;
commit;

PROMPT finally ...


select count(*) from sales_overall;
select count(*) from err$_sales_overall;
7. There are quite a large number of errors! Look into the error logging table to have a better level
understanding of the errors that have occurred. Run script sel_elt_1.sql.
@c:\wkdir\sel_elt_1.sql
PROMPT Please recognize the subtle difference between ERROR MESSAGE ONLY and
-
ERROR MESSAGE TEXT

Rem Therefore we enforce to store both


set linesize 80

select distinct ora_err_number$ from err$_sales_overall;


select distinct ora_err_number$, ora_err_mesg$ from err$_sales_overall;
Please note the subtle difference between Oracle error message and Oracle error text. In many
cases, the error
text provides additional information that helps analyzing the problem, thus the error message and
the error text are included as mandatory control columns to an error logging table.

8. As of today, there are some limitations around the error logging capability. All limitations have
to deal with situations where an index maintenance is done in a delayed optimization. The
limitations will be addressed in a future release of Oracle. Run the script elt_limit.sql.
@c:\wkdir\elt_limit.sql
PROMPT discuss a case with limitation
truncate table sales_overall;
alter table sales_overall add constraint pk_1 primary key (prod_id,
cust_id,time_id,
channel_id, promo_id);

PROMPT works fine as before, errors get re-routed


INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
SELECT * FROM sales_activity_direct
LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT
UNLIMITED;
commit;

select count(*) from sales_overall;


select count(*) from err$_sales_overall;
delete from err$_sales_overall;
commit;
PROMPT case when deferred constraint check (UNIQUE INDEX) leads to error
Rem unique index maintenance is a delayed operation that cannot be caught
Rem on a per record base as of today. Planned for a future release.
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
SELECT * FROM sales_activity_direct
LOG ERRORS INTO err$_sales_overall ( 'load_20040802' ) REJECT LIMIT
UNLIMITED;
commit;

select count(*) from sales_overall;


select count(*) from err$_sales_overall;
commit;
9. DML Error logging is taking care of any kind of error that happens during the DML operation.
However, any kind of error that happens as part of the SQL statement in the execution plan
BEFORE the DML operation cannot be caught. Consider the following DML operation in the
script ins_elt_4.sql.
@c:\wkdir\ins_elt_4.sql
PROMPT INTERESTING CASE.
PROMPT The DML Error Logging can only catch errors that happen at DML TIME,
but not errors
PROMT that happen as part of the SQL statement
Rem the TO_DATE conversion is in the insert portion, so that we catch the
error
Rem this is default view merging of ORCL
alter table sales_overall drop constraint pk_1;
truncate table sales_overall;
INSERT /*+ APPEND NOLOGGING PARALLEL */
INTO sales_overall
SELECT prod_id, cust_id,
TO_DATE(time_id,'DD-mon-yyyy'),
channel_id, promo_id, quantity_sold, amount_sold
FROM sales_activity_direct
LOG ERRORS INTO err$_sales_overall ( 'load_20040802' )
REJECT LIMIT UNLIMITED;
commit

Note that the SELECT statement is applying a TO_DATE() function on column


sales.activity_direct.time_id. Oracle internally optimizes the error handling by
pushing up this conversion function to the insert operation to ensure that any potential error will
be caught. As you experienced earlier, the DML operation succeeds.
When you look into the plan, you will realize what kind of optimization for the error handling is
taking place. Run the script xins_elt_4.sql.
@c:\wkdir\xins_elt_4.sql
SET LINESIZE 140

explain plan for


INSERT /*+ APPEND NOLOGGING PARALLEL */
INTO sales_overall
SELECT * FROM
( SELECT prod_id, cust_id,
TO_DATE(time_id,'DD-mon-yyyy'),
channel_id, promo_id, quantity_sold, amount_sold
FROM sales_activity_direct )
LOG ERRORS INTO err$_sales_overall ( 'load_20040802' )
REJECT LIMIT UNLIMITED;

select * from table(dbms_xplan.display(null,null,'ALL'));


The plan shows that the TO_DATE() function is pushed up to the insert operation, represented
as a 'LOAD AS
SELECT' row source (it is a direct path insertion)

10. Run the same DML operation again, but enforce that the TO_DATE() conversion cannot be
pushed to the DML operation. By using a view construct with a NO_MERGE hint, you can
accomplish this. The equivalent SQL SELECT statement itself fails, as you can see by running
the script sel_ins_elt_5.sql.
@c:\wkdir\sel_ins_elt_5.sql
PROMPT The equivalent SQL statement will fail if the predicate is
artificially kept -
within a lower query block

SELECT prod_id, cust_id,


TO_DATE(time_id,'DD-mon-yyyy'),
channel_id, promo_id, quantity_sold, amount_sold
FROM sales_activity_direct;
The DML operation will fail with the same error. Run the script ins_elt_5.sql.
@c:\wkdir\ins_elt_5.sql
PROMPT The same is true when errors are happening on a row source level
inside the -
SELECT block
PROMPT We cannot catch such an error

truncate table sales_overall;


INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
SELECT * FROM
( SELECT /*+ NO_MERGE */ prod_id, cust_id,
TO_DATE(time_id,'DD-mon-yyyy'),
channel_id, promo_id, quantity_sold, amount_sold
FROM sales_activity_direct )
LOG ERRORS INTO err$_sales_overall ( 'load_20040802' )
REJECT LIMIT UNLIMITED;
commit;

And the plan shows that the conversion is taking place inside the view, thus the error. Run the
script xins_elt_5.sql to view this behavior:
@c:\wkdir\xins_elt_5.sql
explain plan for
INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales_overall
SELECT * FROM
( SELECT /*+ NO_MERGE */ prod_id, cust_id,
TO_DATE(time_id,'DD-mon-yyyy'),
channel_id, promo_id, quantity_sold, amount_sold
FROM sales_activity_direct )
LOG ERRORS INTO err$_sales_overall ( 'load_20040802' )
REJECT LIMIT UNLIMITED;

select * from table(dbms_xplan.display(null,null,'ALL'));

Back to Topic
Back to Topic List
Experience the Basics of the Table Function
In the ETL process, the data extracted from a source system passes through a sequence
of transformations before it is loaded into a data warehouse. Complex transformations
are implemented in a procedural manner, either outside the database or inside the
database in PL/SQL. When the results of a transformation become too large to fit into
memory, they must be staged by materializing them either into database tables or flat
files. This data is then read and processed as input to the next transformation in the
sequence. Oracle table functions provide support for pipelined and parallel execution of
such transformations implemented in PL/SQL, C, or Java.
A table function is defined as a function that can produce a set of rows as output;
additionally, Oracle table functions can take a set of rows as input. These sets of rows
are processed iteratively in subsets, thus enabling a pipelined mechanism to stream
these subset results from one transformation to the next before the first operation has
finished. Furthermore, table functions can be processed transparently in parallel, which
is equivalent to SQL operations such as a table scan or a sort. Staging tables are no
longer necessary.
Because of the powerful and multifaceted usage of table functions as part of complex
transformations, you examine basic implementation issues for table functions only. To
do this, perform the following activities:
1. Set up the basic objects for a table function.
2. Perform a nonpipelined table function, returning an array of
records.
3. Perform a pipelined table function.
4. Perform transparent parallel execution of table functions.
5. Perform a table function with autonomous DML.
6. Perform seamless streaming through several table functions.

1. Setting up the Basic Objects for a Table Function


A table function can produce a set of rows as output and can take a set of rows as input.
Therefore, it is imperative that you have to define a record type and a collection of
records to define the input and output; those types can either be strongly typed, that is,
well-defined prior to their execution, or weakly typed, which means without a fixed
record format before run time. You will examine both variants.
To set up the table function objects, perform the following steps:
1. Define the object (record) type for the examples. From a SQL*Plus session logged on to the SH
schema, execute the following SQL script:
@c:\wkdir\setup_tf_record.sql
SET ECHO ON
PROMPT object_types
CREATE TYPE product_t AS OBJECT (
prod_id NUMBER(6)
, prod_name VARCHAR2(50)
, prod_desc VARCHAR2(4000)
, prod_subcategory VARCHAR2(50)
, prod_subcategory_desc VARCHAR2(2000)
, prod_category VARCHAR2(50)
, prod_category_desc VARCHAR2(2000)
, prod_weight_class NUMBER(2)
, prod_unit_of_measure VARCHAR2(20)
, prod_pack_size VARCHAR2(30)
, supplier_id NUMBER(6)
, prod_status VARCHAR2(20)
, prod_list_price NUMBER(8,2)
, prod_min_price NUMBER(8,2)
);
This is the basic object record type you will use in the examples.

2. Define the object (collection) type for the examples. From a SQL*Plus session logged on to the
SH schema, execute the following SQL script:
@c:\wkdir\setup_tf_collection.sql
CREATE TYPE product_t_table AS TABLE OF product_t;
/

This type represents the structure of the set of records, which will be delivered back by your
table functions.
3. Define a package for the REF CURSOR types. From a SQL*Plus session logged on to the SH
schema, execute the following SQL script:
@c:\wkdir\setup_tf_package.sql
Rem package of all cursor types
Rem handle the input cursor type and the output cursor collection type

CREATE OR REPLACE PACKAGE cursor_PKG as


TYPE product_t_rec IS RECORD (prod_id NUMBER(6)
, prod_name VARCHAR2(50)
, prod_desc VARCHAR2(4000)
, prod_subcategory VARCHAR2(50)
, prod_subcategory_desc VARCHAR2(2000)
, prod_category VARCHAR2(50)
, prod_category_desc VARCHAR2(2000)
, prod_weight_class NUMBER(2)
, prod_unit_of_measure VARCHAR2(20)
, prod_pack_size VARCHAR2(30)
, supplier_id NUMBER(6)
, prod_status VARCHAR2(20)
, prod_list_price NUMBER(8,2)
, prod_min_price NUMBER(8,2));

TYPE product_t_rectab IS TABLE OF product_t_rec;


TYPE strong_refcur_t IS REF CURSOR RETURN product_t_rec;
TYPE refcur_t IS REF CURSOR;
END;

4. Create a log table for the table function example with autonomous DML. From a SQL*Plus
session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\cre_ope_table.sql
CREATE TABLE obsolete_products_errors
(prod_id NUMBER, msg VARCHAR2(2000));
This table will be used for the table function example that fans out some data as part of its
execution.
Due to time constraints, only the PL/SQL implementations of table functions are covered here.
For more information, see the ‘Data Cartridge Developer’s Guide’ for details of interface
implementations.
Table functions return a set of records to the subsequent operation. The result set can be
either delivered en-block (the default PL/SQL behavior) or pipelined – increments of
the result set are streamed to the subsequent operation as soon as they are produced.
For table function implementations in PL/SQL, the PL/SQL engine controls the size of
the incrementally returned array. For interface implementations of table functions, in C
or Java, it is up to the programmer to define the incremental return set.
Back to Topic
2. Performing Nonpipelined Table Function, Returning an Array of Records
The following example represents a simple table function, returning its result set
nonpipelined as a complete array. The table function filters out all records of the product
category "Electronics" and returns all other records. Note that the input REF CURSOR
is weakly typed, meaning it could be any valid SQL statement. It is up to the table
function code to handle its input, probably based on other input variables.
To perform the nonpipelined table function, you perform the following steps:
1. From a SQL*Plus session logged on to the SH schema, run create_tf_1.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\create_tf_1.sql
PROMPT SIMPLE PASS-THROUGH, FILTERING OUT obsolete products without
product_category 'Electronics'
Rem uses weakly typed cursor as input
CREATE OR REPLACE FUNCTION obsolete_products(cur cursor_pkg.refcur_t) RETURN
product_t_table
IS
prod_id NUMBER(6);
prod_name VARCHAR2(50);
prod_desc VARCHAR2(4000);
prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000);
prod_category VARCHAR2(50);
prod_category_desc VARCHAR2(2000);
prod_weight_class NUMBER(2);
prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30);
supplier_id NUMBER(6);
prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2);
prod_min_price NUMBER(8,2);
sales NUMBER:=0;
objset product_t_table := product_t_table();
i NUMBER := 0;
BEGIN
LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched
-- Category Electronics is not meant to be obsolete and will be suppressed
IF prod_status='obsolete' AND prod_category != 'Electronics' THEN
-- append to collection
i:=i+1;
objset.extend;
objset(i):=product_t( prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price);
END IF;
END LOOP;
CLOSE cur;
RETURN objset;
END;
/
2. You cannot transparently SELECT from the defined table function obsolete_products.
From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\use_tf_1.sql
SELECT DISTINCT UPPER(prod_category), prod_status
FROM TABLE(obsolete_products(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products)));

Note that the input SELECT list for the cursor variable must match the definition of the table
function, in this case, pkg.refcur_t. If you can ensure that the table definition (in this case
products) matches this input cursor, you can simplify the statement by using the SELECT *
FROM TABLE notation.
Back to Topic
3. Performing Pipelined Table Functions
The following example represents the same filtering-out than the previous example.
Unlike the first one, this table function returns its result set pipelined, which means
incrementally. Note that the input REF CURSOR is strongly typed here, meaning it
could be only a SQL statement returning a record type of product_type_record.
To perform the pipelined table function, perform the following steps:
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\create_tf_2.sql
Rem same example, pipelined implementation
Rem strong ref cursor (input type is defined)
Rem a table without a strong typed input ref cursor
Rem cannot be parallelized
CREATE OR REPLACE FUNCTION obsolete_products_pipe(cur
cursor_pkg.strong_refcur_t)
RETURN product_t_table
PIPELINED
PARALLEL_ENABLE (PARTITION cur BY ANY) IS
prod_id NUMBER(6);
prod_name VARCHAR2(50);
prod_desc VARCHAR2(4000);
prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000);
prod_category VARCHAR2(50);
prod_category_desc VARCHAR2(2000);
prod_weight_class NUMBER(2);
prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30);
supplier_id NUMBER(6);
prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2);
prod_min_price NUMBER(8,2);
sales NUMBER:=0;
BEGIN
LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched
IF prod_status='obsolete' AND prod_category !='Electronics' THEN
PIPE ROW (product_t( prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price));
END IF;
END LOOP;
CLOSE cur;
RETURN;
END;
/
2. You can now transparently SELECT from the defined table function obsolete_products.
From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\use_tf_2.sql
SELECT DISTINCT prod_category, DECODE(prod_status,'obsolete','NO LONGER
AVAILABLE','N/A')
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products)));
Back to Topic
4. Performing Transparent Parallel Execution of Table Functions
Table functions can be parallelized transparently without any special intervention or
setup. The only necessary declaration, which has to be done as part of the table function
creation, is to tell the SQL engine if there are any restrictions or rules for parallelizing
its execution.
Think of a table function that processes a complex aggregation operation over some
product category attributes. If you want to process this operation in parallel, you have to
guarantee that all records with the same product category attributes are processed from
the same parallel execution slave, so that your aggregation operation covers all records
that belong to the same grouping. This implies that, during parallel processing, a
distribution rule must be used for parallel processing, which would guarantee correct
distribution as described above.
This parallel distribution rule is defined as part of the table function header, for
example:
PARALLEL_ENABLE (PARTITION cur BY ANY) IS
All you need to do to enable parallel execution of a table function is to use the syntax
shown.
To perform one of the table functions you previously defined in parallel, perform the
following steps:
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\pq_session.sql
SELECT *
FROM v$pq_sesstat
WHERE statistic in ('Queries Parallelized','Allocation Height');

2. Now you can use one of the table functions in parallel. You enforce the parallel execution with a
PARALLEL hint on the REF CURSOR. From a SQL*Plus session logged on to the SH schema,
execute the following SQL script:
@c:\wkdir\parallel_tf.sql
SELECT DISTINCT prod_category,
DECODE(prod_status,'obsolete','NO LONGER AVAILABLE','N/A')
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT /*+ PARALLEL(a,4)*/
prod_id, prod_name, prod_desc, prod_subcategory, prod_subcat_desc,
prod_category, prod_cat_desc, prod_weight_class, prod_unit_of_measure,
prod_pack_size, supplier_id,prod_status, prod_list_price,prod_min_price
FROM products a)));
3. Rerun the SQL script to check the status of the parallelized queries:
@c:\wkdir\pq_session.sql
SELECT *
FROM v$pq_sesstat
WHERE statistic in ('Queries Parallelized','Allocation Height');

Back to Topic
5. Performing Table Functions with Autonomous DML
Although a table function is part of a single atomic transaction, it provides the
additional capability to fan out data to other tables within the scope of an autonomous
transaction. This can be used in a variety of ways, such as for exception or progress
logging or to fan out subsets of data to be used by other independent transformations.
The following example logs some "exception" information into the
OBSOLETE_PRODUCTS_ERRORS table:
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\create_tf_3.sql
CREATE OR REPLACE FUNCTION obsolete_products_dml(cur
cursor_pkg.strong_refcur_t,
prod_cat varchar2 DEFAULT 'Electronics') RETURN product_t_table
PIPELINED
PARALLEL_ENABLE (PARTITION cur BY ANY) IS
PRAGMA AUTONOMOUS_TRANSACTION;
prod_id NUMBER(6);
prod_name VARCHAR2(50);
prod_desc VARCHAR2(4000);
prod_subcategory VARCHAR2(50);
prod_subcategory_desc VARCHAR2(2000);
prod_category VARCHAR2(50);
prod_category_desc VARCHAR2(2000);
prod_weight_class NUMBER(2);
prod_unit_of_measure VARCHAR2(20);
prod_pack_size VARCHAR2(30);
supplier_id NUMBER(6);
prod_status VARCHAR2(20);
prod_list_price NUMBER(8,2);
prod_min_price NUMBER(8,2);
sales NUMBER:=0;
BEGIN
LOOP
-- Fetch from cursor variable
FETCH cur INTO prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price;
EXIT WHEN cur%NOTFOUND; -- exit when last row is fetched
IF prod_status='obsolete' THEN
IF prod_category=prod_cat THEN
INSERT INTO obsolete_products_errors VALUES
(prod_id, 'correction: category '||UPPER(prod_cat)||' still available');
COMMIT;
ELSE
PIPE ROW (product_t( prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price));
END IF;
END IF;
END LOOP;
CLOSE cur;
RETURN;
END;
/
2. Now you can select from this table function and check the log table. From a SQL*Plus session
logged on to the SH schema, execute the following SQL script:
@c:\wkdir\use_tf_3a.sql
TRUNCATE TABLE obsolete_products_errors;
SELECT DISTINCT prod_category, prod_status FROM TABLE(obsolete_products_dml(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products)));
SELECT DISTINCT msg FROM obsolete_products_errors;
3. Select from this table function again with a different input argument and check the log table.
@c:\wkdir\use_tf_3b.sql
TRUNCATE TABLE obsolete_products_errors;
SELECT DISTINCT prod_category, prod_status FROM TABLE(obsolete_products_dml(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products),'Photo'));
SELECT DISTINCT msg FROM obsolete_products_errors;
You see that, dependent on the input variable, the table function processes the input rows
differently, thus delivering a different result set and inserting different information into
obsolete_products_error.
Back to Topic
6. Performing Seamless Streaming Through Several Table Functions
Besides the transparent usage of table functions within SQL statements and its
capability of being processed in parallel, one of the biggest advantages is that table
functions can be called from within each other. Furthermore, table functions can be used
in any SQL statement and become the input for all kinds of DML statements.
To see how this is done, perform the following steps:
1. This SQL statement uses two table functions, nested in each other. From a SQL*Plus session
logged on to the SH schema, execute the following SQL script:
@c:\wkdir\use_tf_stream.sql
SELECT DISTINCT prod_category, prod_status
FROM TABLE(obsolete_products_dml(CURSOR(SELECT *
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products))))));
SELECT COUNT(*) FROM obsolete_products_errors;
SELECT DISTINCT msg FROM obsolete_products_errors;
set echo off

2. Now you can use this table function as INPUT for a CREATE TABLE AS SELECT
command. From a SQL*Plus session logged on to the SH schema, execute the following SQL
script:
@c:\wkdir\CTAS_tf.sql
CREATE TABLE PIPE_THROUGH AS
SELECT DISTINCT prod_category, prod_status
FROM TABLE(obsolete_products_dml(CURSOR(SELECT *
FROM TABLE(obsolete_products_pipe(
CURSOR(SELECT prod_id, prod_name, prod_desc, prod_subcategory,
prod_subcategory_desc,
prod_category, prod_category_desc, prod_weight_class,
prod_unit_of_measure,
prod_pack_size, supplier_id, prod_status, prod_list_price,
prod_min_price
FROM products))))));

3. Before you move to the next section of this tutorial, you need to clean up your environment by
executing the following script:
@c:\wkdir\cleanup_tf.sql
DROP TABLE obsolete_products_errors;
DROP TABLE pipe_through;
DROP FUNCTION obsolete_products;
DROP FUNCTION obsolete_products_pipe;
DROP FUNCTION obsolete_products_dml;
Back to Topic
Back to Topic List
Using Synchronous Change Data Capture (CDC) to Capture and
Consume Incremental Source Changes
Data warehousing involves the extraction and transportation of relational data from one
or more source databases, into the data warehouse for analysis. Change Data Capture
quickly identifies and processes only the data that has changed, not entire tables, and
makes the change data available for further use.
Without Change Data Capture, database extraction is a cumbersome process in which
you move the entire contents of tables into flat files, and then load the files into the data
warehouse. This ad hoc approach is expensive in a number of ways.
Change Data Capture does not depend on intermediate flat files to stage the data outside
of the relational database. It captures the change data resulting from INSERT,
UPDATE, and DELETE operations made to user tables. The change data is then stored in
a database object called a change table, and the change data is made available to
applications in a controlled way.
Publish and Subscribe Model
Most Change Data Capture systems have one publisher that captures and publishes
change data for any number of Oracle source tables. There can be multiple subscribers
accessing the change data. Change Data Capture provides PL/SQL packages to
accomplish the publish and subscribe tasks.
Publisher
The publisher is usually a database administrator (DBA) who is in charge of creating
and maintaining schema objects that make up the Change Data Capture system. The
publisher performs the following tasks:

Determines the relational tables (called source tables) from which the data
warehouse application is interested in capturing change data.
Uses the Oracle supplied package, DBMS_LOGMNR_CDC_PUBLISH, to set up the
system to capture data from one or more source tables.
Publishes the change data in the form of change tables.

Allows controlled access to subscribers by using the SQL GRANT and REVOKE
statements to grant and revoke the SELECT privilege on change tables for users and
roles.
Subscriber
The subscribers, usually applications, are consumers of the published change data.
Subscribers subscribe to one or more sets of columns in source tables. Subscribers
perform the following tasks:

Use the Oracle supplied package, DBMS_LOGMNR_CDC_SUBSCRIBE, to subscribe


to source tables for controlled access to the published change data for analysis.
Extend the subscription window and create a new subscriber view when the
subscriber is ready to receive a set of change data.
Use SELECT statements to retrieve change data from the subscriber views.

Drop the subscriber view and purge the subscription window when finished
processing a block of changes.
Drop the subscription when the subscriber no longer needs its change data.

Please consult the ‘Oracle Data Warehousing Guide’ for a more detailed discussion
about the Change Data Capture Architecture.
To learn more about this topic, perform the following steps:
1. Use synchronous CDC to track all the incremental changes.
2. Create a change table.
3. Subscribe to a change set and to all source table columns of interest.
4. Activate a subscription and extend the subscription window.
5. Investigate how to handle the new environment over time.
6. Run the publisher.
7. Drop the used change view and purge the subscription window.
8. Clean up the CDC environment.
Back to Topic
Back to Topic List
1. Using Synchronous CDC to Track all the Incremental Changes
You use synchronous CDC to track all the incremental changes for the table PRODUCTS
by performing the following steps ( For demonstration purposes the publisher and
subscriber will be the same database user.):
To see how this is done, perform the following steps:
1. First create a new intermediate table where you apply all changes. From a SQL*Plus session
logged on to the SH schema, run cr_cdc_target.sql, or copy the following SQL
statements into your SQL*Plus session:
@c:\wkdir\cr_cdc_target.sql
CREATE TABLE my_price_change_Electronics
(prod_id number, prod_min_price number, prod_list_price number, when date);

2. If you never used Change Data Capture before, you won't see any objects related to CDC other
than the synchronous change set SYNC_SET and the synchronous change source
SYNC_SOURCE. Both of these objects are created as part of the database creation.
@c:\wkdir\show_cdc_env1.sql
SELECT * FROM change_sources;
PROMPT see the change tables
Rem shouldn't show anything
SELECT * FROM change_tables;
PROMPT see the change sets SYNC_SET
SELECT decode(to_char(end_date,'dd-mon-yyyy hh24:mi:ss'),null,'No end date
set.')
end_date,
decode(to_char(freshness_date,'dd-mon-yyyy hh24:mi:ss'),null,'No freshness
date set.') freshness_date
FROM change_sets WHERE set_name='SYNC_SET';
Back to Topic
2. Creating a Change Table
All changes are stored in change tables. Change tables have a one-to-one dependency to
a source table; they consist of several fixed metadata columns and a dynamic set of
columns equivalent to the identified source columns of interest.
1. You create a change table by using the provided package DBMS_CDC_PUBLISH.

Note that the old package name DBMS_CDC_LOGMNR_PUBLISH is a synonym pointing to


DBMS_CDC_PUBLISH and is maintainted as-is for backward compatibility.
@c:\wkdir\cr_cdc_ct.sql
rem create a change table within the change set SYNC_SET.
Rem details on parameters see doc
PROMPT *** 10g ***
PROMPT NOTE THAT DBMS_LOGMNR_CDC_* are synonyms for the DBMS_CDC_* packages
and are only around
PROMPT for backwards compatibility
begin
DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE (OWNER => 'SH', -
CHANGE_TABLE_NAME => 'PROD_price_CT',
CHANGE_SET_NAME => 'SYNC_SET',
SOURCE_SCHEMA => 'SH',
SOURCE_TABLE => 'PRODUCTS',
COLUMN_TYPE_LIST => 'prod_id number(6), prod_min_price number(8,2),
prod_list_price number(8,2)',
CAPTURE_VALUES => 'both',
RS_ID => 'y',
ROW_ID => 'n',
USER_ID => 'n',
TIMESTAMP => 'n',
OBJECT_ID => 'n',
SOURCE_COLMAP => 'y',
TARGET_COLMAP => 'y',
OPTIONS_STRING => null);
end;
/

This script creates a change table called PROD_PRICE_CT and the necessary trigger
framework to track all subsequent changes on PRODUCTS. Note that the tracking is done as part
of the atomic DML operation against PRODUCTS.

2. Describe the change table PROD_PRICE_CT:


set linesize 120
desc prod_price_ct

See the static metadata columns as well as the columns PROD_ID, PROD_MIN_PRICE, and
PROD_LIST_PRICE (equivalent to source table PRODUCTS).

3. To see the metadata of the published source table and the change tables in the system, issue the
following SQL commands:
@c:\wkdir\show_cdc_env2.sql
prompt see the published source tables
SELECT * FROM dba_source_tables;
prompt see published source columns
SELECT source_schema_name,source_table_name, column_name
FROM dba_source_tab_columns;
prompt see the change tables
SELECT * FROM change_tables;

Back to Topic
3. Subscribing to a Change Set and to All Source Table Columns of Interest
You can subscribe to a change set for controlled access to the published change data for
analysis. Note that the logical entity for a subscription handle is a change set and not a
change table. A change set can consist of several change tables and guarantees logical
consistency for all of its change tables. After subscribing to a change set, you have to
subscribe to all source table columns of interest.
1. You must get a unique subscription handle that is used throughout the session and tell the system
what columns you are interested in. You are using the existing change set SYNC_SET. This
functionality is provided by the package DBMS_CDC_SUBSCRIBE.
@c:\wkdir\subs_cdc_ct.sql
variable subname varchar2(30)
begin
:subname := 'my_subscription_no_1';
DBMS_CDC_SUBSCRIBE.CREATE_SUBSCRIPTION (
CHANGE_SET_NAME => 'SYNC_SET',
DESCRIPTION => 'Change data PRODUCTS for ELECTRONICS',
SUBSCRIPTION_name => :subname);
END;
/
PRINT subname

2. You can query the meta information about your subscription.


@c:\wkdir\show_cdc_env3.sql
SELECT handle, description,
decode(status,'N','Not activated.','Activated.') status,
earliest_scn, latest_scn, decode(last_purged,null,'Never purged.',
to_char(last_purged,'dd-mon-yyyy hh24:mi:ss')) last_purged,
decode(last_extended, null,'Never extended.',
to_char(last_extended,'dd-mon-yyyy hh24:mi:ss')) last_extended
FROM
user_subscriptions;
SELECT * FROM user_subscribed_tables;
SELECT source_table_name, column_name
FROM dba_subscribed_columns;
3. After creating a subscription, you set up a change view for this subscription. The complete
handling of change views was enhanced with Oracle Database 10g: you can name a change table
or you do not have to drop change views to extend and/or purge subscriber windows.
@c:\wkdir\cr_cdc_rv.sql
rem now subscribe to a single source table and columns of interest.
rem A subscription can contain one or more tables from the same change set.
PROMPT ***10g***
Rem use new interface to name subscriber view explicitly !!
variable view_name varchar2(30);
BEGIN
:view_name := 'my_prod_price_change_view';
DBMS_CDC_SUBSCRIBE.SUBSCRIBE (
SUBSCRIPTION_NAME => :subname,
SOURCE_SCHEMA => 'sh',
SOURCE_TABLE => 'products',
COLUMN_LIST => 'prod_id, prod_min_price, prod_list_price',
subscriber_view => :view_name );
END;
/
PROMPT ***10g***
Rem The view is created automatically
desc MY_PROD_PRICE_CHANGE_VIEW
So far you accomplished the necessary steps to set up a CDC environment:

On the publisher site you prepared a source system for CDC.

On the subscriber site, you identified and subscribed to all source tables (and
columns) of interest and defined/created our Change View.
Now you are ready to use CDC.
Back to Topic
4.Activating a Subscription and Extending the Subscription Window
The first step after setting up the publish and subscribe framework is to activate a
subscription and to extend the subscription window to see all changes that have taken
place since the last subscription window extension or since the activation of a
subscription.
1. Activate and extend a subscription window. You will also have a look into the metadata.
@c:\wkdir\sub_cdc1.sql
rem now activate the subscription since we are ready to receive
rem change data the ACTIVATE_SUBSCRIPTION procedure sets
rem subscription window to empty initially
rem At this point, no additional source tables can be added to the
rem subscription

EXEC DBMS_LOGMNR_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION -
(SUBSCRIPTION_name => 'my_subscription_no_1')

rem now recheck the subscriptions and see that it is active


rem the view is still not created ...

SELECT handle, description,


decode(status,'N','Not activated.','Activated.') status, earliest_scn,
latest_scn,
decode(last_purged,null,'Never purged.',
to_char(last_purged,'dd-mon-yyyy hh24:mi:ss')) last_purged,
decode(last_extended, null,'Never extended.',
to_char(last_extended,'dd-mon-yyyy hh24:mi:ss')) last_extended
from user_subscriptions;
2. Any changes you apply now to the source table PRODUCTS is reflected in the change table. The
changes are transparently maintained with triggers on the source table.
@c:\wkdir\dml_cdc1.sql
Rem now do some changes

UPDATE products
SET prod_list_price=prod_list_price*1.1
WHERE prod_min_price > 100;
COMMIT;
Rem you will see entries in the change table
Rem note that we have entries for the old and the new values

SELECT Count(*) FROM prod_price_ct;


Note that you have two records for each source row, one representing the old, and the other the
new value set for the subscribed columns.
Again, please never use the change table itself for identifying the changes on a source table. Use
the subscriber view that will be created for you. This is the only supported way to guarantee that
every change is delivered only once to a subscribing application.
3. Unlike the change table, the change view does not show any records yet. This is because you
applied the DML operation AFTER you extended the subscription window. Consequently you
must not see anything yet.
Rem the view does not show anything
select count(*) from MY_PROD_PRICE_CHANGE_VIEW;
You are now preparing for consuming the changes by extending the subscription window.
@c:\wkdir\ext_cdc_sub1.sql
rem now set upper boundary (high-water mark) for the subscription window
rem At this point, the subscriber has created a new window that begins
rem where the previous window ends.
rem The new window contains any data that was added to the change table.
EXEC DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW -
(SUBSCRIPTION_NAME => 'my_subscription_no_1');

rem now recheck the subscriptions and see that it is active


SELECT handle, description,
decode(status,'N','Not activated.','Activated.') status,
earliest_scn, latest_scn,
decode(last_purged,null,'Never purged.',
to_char(last_purged,'dd-mon-yyyy hh24:mi:ss')) last_purged,
decode(last_extended, null,'Never extended.',
to_char(last_extended,'dd-mon-yyyy hh24:mi:ss')) last_extended
from user_subscriptions;
Rem ... and you will see data
SELECT count(*) FROM my_prod_price_change_view;

4. You can now select from this system-generated change view, and you will see only the changes
that have happened for your time window.
@c:\wkdir\sel_cdc_cv1.sql
Rem changes classified for specific product groups
SELECT p1.prod_id, p2.prod_category, p1.prod_min_price,
p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND operation$='UN';
PROMPT and especially the Electronics' ones - 3 records only
SELECT p1.prod_id, p2.prod_category, p1.prod_min_price,
p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND operation$='UN'
AND p2.prod_category='Electronics';

In the example shown above, you are joining back to the source table PRODUCTS. When you set
up a CDC environment, please ensure that such an operation should not be necessary; if possible,
a change table should be usable as stand-alone source for synchronizing your target environment.
Joining back to the source table may reduce the data stored on disk, but it creates additional
workload on the source site. Join-back cannot be accomplished on a different system than the
source system.

5. You can now consume the changes on your target system.


@c:\wkdir\consume_cdc1.sql
Prompt ... you can now consume the changes
rem arbitrary example, where we only track ELECTRONICS changes. This shall
demonstrate
rem the flexibility as well as the responsibility of the client site
(consumer) to deal
rem appropriately with the changes
INSERT into my_price_change_electronics
SELECT p1.prod_id, p1.prod_min_price, p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND p2.prod_category='Electronics' AND operation$='UN';

COMMIT;

SELECT prod_id, prod_min_price, prod_list_price,


to_char(when,'dd-mon-yyyy hh24:mi:ss')
FROM my_price_change_electronics;

Back to Topic
5.Activating a Subscription and Extending the Subscription Window
So far you exercised your first consumption with the synchronous CDC framework.
Now investigate how to handle such an environment over time when additional changes
will happen and old changes are consumed by all subscribers.
1. Other DML operations take place on the source table PRODUCTS.
@c:\wkdir\dml_cdc2.sql
PROMPT other changes will happen
UPDATE products
SET prod_min_price=prod_min_price*1.1
WHERE prod_min_price < 10;
COMMIT;

2. The synchronous CDC framework tracks these DML operations transparently in the change
table.
@c:\wkdir\show_cdc_ct2.sql
SELECT count(*) FROM prod_price_ct ;
PROMPT and especially the ELECTRONICS' ones
SELECT COUNT(*) FROM prod_price_ct p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
3. Run the following script to view the product data.
@c:\wkdir\sel_cdc_cv1.sql
Rem changes classified for specific product groups

SELECT p1.prod_id, p2.prod_category, p1.prod_min_price,


p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND operation$='UN';

PROMPT and especially the Electronics' ones - 3 records only


SELECT p1.prod_id, p2.prod_category, p1.prod_min_price,
p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id
AND operation$='UN'
AND p2.prod_category='Electronics';
4. To tell the CDC framework that you are done with a set of changes, you have to purge and
extend the subscription.
@c:\wkdir\purge_cdc_sub_window1.sql
PROMPT purge old data from the subscription window
Rem this will NOT delete any records from the change table. It just tells the
CDC system
Rem that the changed data used so fast is no longer needed

EXEC DBMS_CDC_SUBSCRIBE.PURGE_WINDOW -
(SUBSCRIPTION_name => 'my_subscription_no_1');
PROMPT still the same number of records in the change table
PROMPT REMEMBER THE NUMBER OF ROWS !!!
SELECT COUNT(*) FROM prod_price_ct;
PROMPT ... change view is empty
SELECT COUNT(*) FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';

5. To get the new changes reflected in the change view, you have to extend the time window for the
change view.
@c:\wkdir\ext_cdc_sub_window1.sql
PROMPT let's get the new change
Rem 'do it again Sam'
Rem first extend the window you want to see
EXEC DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW -
(SUBSCRIPTION_name => 'my_subscription_no_1');

Rem ... now you will see exactly the new changed data since the last
consumption
SELECT COUNT(*) FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';
The new change view shows exactly the changes that were not consumed yet. This is different
than the content in the change table.

6. Now consume the new changes. Because the incremental changes you are interested in are stored
in a change table (a "normal" Oracle table), you can use any language or SQL construct the
database supports.
@c:\wkdir\consume_cdc2.sql
PROMPT the new changes will be used
MERGE INTO my_price_change_Electronics t
USING (
SELECT p1.prod_id, p1.prod_min_price, p1.prod_list_price, commit_timestamp$
FROM my_prod_price_change_view p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics' AND
operation$='UN') cv
ON (cv.prod_id = t.prod_id)
WHEN MATCHED THEN
UPDATE SET t.prod_min_price=cv.prod_min_price,
t.prod_list_price=cv.prod_list_price
WHEN NOT MATCHED THEN
INSERT VALUES (cv.prod_id, cv.prod_min_price, cv.prod_list_price,
commit_timestamp$);
COMMIT;
rem look at them
SELECT prod_id, prod_min_price, prod_list_price,
to_char(when,'dd-mon-yyyy hh24:mi:ss')
FROM my_price_change_electronics;
Back to Topic
6. Running the Publisher
The Publisher is responsible for maintaining the CDC framework and guaranteeing that
the change tables are purged regularly as soon as all subscribers have consumed a
specific set of change information.
1. From a SQL*Plus session logged on to the SH schema, run purge_ct.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\purge_cdc_ct.sql
PROMPT what to do to avoid overflow of change table ? PURGE IT !

Rem only the rows where we have no potential subscribers will be purged !!!
Rem we guarantee to keep all rows as long as we have subscribers which
haven't
rem purged their window ...
exec DBMS_CDC_PUBLISH.purge_change_table('sh','prod_price_ct')
PROMPT REMEMBER THE NUMBER OF ROWS !!!
Rem you will have 18 entries for the last updated 9 records
SELECT COUNT(*) FROM prod_price_ct;
PROMPT this is exactly twice the number of changes we made with the second
DML operation
SELECT COUNT(*) FROM prod_price_ct p1, products p2
WHERE p1.prod_id=p2.prod_id AND p2.prod_category='Electronics';

Although we haven’t specified any specific time window for the purge operation of the change
table, the CDC framework ensures that only those changes will be purged that are no longer
needed by a subscriber.

Back to Topic
7. Dropping the Used Change View and Purging the Subscription Window
1. Because you consumed the second set of change already, you can drop your used change view
and purge your subscription window. You will also purge the change table.
@c:\wkdir\purge_cdc_sub_window2.sql
PROMPT ... purge the newly consumed data from the subscription window (2nd
DML operation))
EXEC DBMS_CDC_SUBSCRIBE.PURGE_WINDOW -
(SUBSCRIPTION_name =>'my_subscription_no_1');
PROMPT purge all change tables again
EXEC DBMS_CDC_PUBLISH.purge_change_table('sh','prod_price_ct')
PROMPT ... and you will see an empty change
SELECT * FROM prod_price_ct;

The change table is now empty. All changes are consumed and can be purged.

Back to Topic
8. Cleaning Up the CDC Environment
Clean up the CDC environment. To do this, perform the following step:
1. From a SQL*Plus session logged on to the SH schema, run cleanup_cdc.sql, or copy the
following SQL statements into your SQL*Plus session:
@c:\wkdir\cleanup_cdc.sql
PROMPT CLEAN UP
exec DBMS_CDC_SUBSCRIBE.drop_subscription -
(subscription_name=> 'my_subscription_no_1');
exec DBMS_CDC_PUBLISH.DROP_CHANGE_TABLE (OWNER => 'sh', -
CHANGE_TABLE_NAME => 'prod_price_CT', -
FORCE_FLAG => 'Y');
DROP TABLE my_price_change_electronics;
UPDATE products SET prod_list_price=prod_list_price/1.1
WHERE prod_min_price > 100;
UPDATE products SET prod_list_price=prod_list_price/1.1
WHERE prod_min_price < 10;
COMMIT;
Back to Topic
Back to Topic List
Propagate Information for a Data Mart
Besides its central data warehouse, MyCompany is running several small divisional data
marts. For example, the products department wants to receive all transactional SALES
data partitioned by its main product categories for marketing campaign analysis; only
the SALES data of 2000 is relevant. You will address the business problem by using
Oracle’s Transportable Tablespace capabilities and Oracle’s LIST partitioning.
Furthermore, to guarantee a successful completion of the generation process of this new
table, you run the statement in RESUMABLE mode, thus ensuring that any space
problem will not cause the creation process to fail.
Note: To demonstrate the benefit of resumable statements, you need TWO sessions and
therefore TWO windows. Please read the following section CAREFULLY before
taking the exercise.
To propagate from a data warehouse to a data mart, you perform the following steps:
1. Enable a RESUMABLE session.
2. Create a new tablespace as a potential transportable tablespace.
3. Create a LIST partitioned table in the new tablespace.
4. Leverage the new RESUMABLE statement capabilities for efficient error detection
and handling.
5. Create a new RANGE_LIST partitioned table (this is Oracle9i Release 2
functionality).
6. Prepare the metadata export for a transportable tablespace.

Back to Topic List


1. Enabling a RESUMABLE Session
With resumable statements, Oracle provides the ability to suspend and resume the
execution of large database operations in the event of repairable failure. Currently the
types of errors from which a statement can resume are space limit and out-of-space
errors. When an operation suspends, you have the opportunity to take the corrective
steps to resolve the error condition. Alternatively, a procedure can be registered to
automate the error correction. After the error condition is solved, the suspended
statement will automatically resume and continues operation. If the error is not
corrected within an optionally specified suspension time limit, the statement will finally
fail. You will choose a manual error correction for this exercise.
To see how this is done, perform the following step:
1. This SQL statement uses two table functions, nested in each other. From a SQL*Plus session
logged on to the SH schema, execute the following SQL script:
@c:\wkdir\set_resumable.sql
PROMPT CLEAN UP
ALTER SESSION ENABLE RESUMABLE TIMEOUT 1200 NAME
'create list partitioning';

This brings your session in the so-called resumable mode, names it, and enables a maximum
suspension time of 1200 seconds.

Back to Topic
2. Creating a New Tablespace as Potential Transportable Tablespace
Now create an additional tablespace for storing our new LIST partitioned fact table. To
do this, perform the following steps:
1. First, make sure that the potential tablespace TRANSFER does not exist. From a SQL*Plus
session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\drop_TS.sql
DROP TABLESPACE my_obe_transfer INCLUDING CONTENTS AND DATAFILES;

The tablespace is dropped. If you get an ORA-959 message, you can ignore it; this means that
the tablespace does not exist.
2. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
NOTE: If you are using a different drive or path for your data files, the script will need to be
edited accordingly.
@c:\wkdir\create_ts.sql
CREATE TABLESPACE my_obe_transfer DATAFILE 'c:\obetemp' SIZE 2M REUSE
autoextend off;

The tablespace is intentionally defined too small for the target table to enforce a space error
during table creation.

Back to Topic

3. Creating a LIST Partitioned Table in the New Tablespace


In Window 1, you create a LIST partitioned table with a CREATE TABLE AS
SELECT command in the newly created tablespace transfer.
1. From a SQL*Plus session logged on to the SH schema, run
create_list_part_table.sql.
NOTE: This statement will hang due to space problems in the TRANSFER tablespace and show
up in the alert log. When it hangs move to the next step.
@c:\wkdir\create_list_part_table.sql
DROP TABLE sales_prod_dept;
PROMPT create table in new TS that is too small
CREATE TABLE sales_prod_dept
(prod_category, prod_subcategory,cust_id,
time_id,channel_id,promo_id, quantity_sold, amount_sold
) NOLOGGING TABLESPACE my_obe_transfer
PARTITION BY LIST (prod_category)
(PARTITION electronic_sales values ('Electronics'),
PARTITION hardware_sales values ('Hardware'),
PARTITION sw_other_sales values ('Software/Other'),
PARTITION p_and_a values ('Peripherals and Accessories'),
PARTITION photo_sales values ('Photo')
)
AS
SELECT p.prod_category, p.prod_subcategory, s.cust_id,
s.time_id,s.channel_id,
s.promo_id, SUM(s.amount_sold) amount_sold, SUM(s.quantity_sold)
quantity_sold
FROM sales s, products p, times t
WHERE p.prod_id=s.prod_id
AND s.time_id = t.time_id
AND t.fiscal_year=2000
GROUP BY prod_category, prod_subcategory,cust_id, s.time_id,channel_id,
promo_id
;

Back to Topic
4. Leveraging the New RESUMABLE Statement Capabilities for Efficient Error
Detection and Handling
While the CREATE TABLE AS SELECT is running in Window 1, you can control
the status of all sessions running in RESUMABLE mode in Window 2.
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL statement:
@c:\wkdir\run_sel_w2.sql
SELECT NAME, STATUS, ERROR_MSG FROM dba_resumable;
As long as no error occurs, the session stays in a normal status. As soon as an error occurs, your
session status changes, and the ORA-error causing this problem is displayed. The current
statement in the RESUMABLE session will be suspended until either (a) the problem is fixed, or
(b) the timeout limit is reached. The error will also show up in the alert.log file of the running
instance.

2. Now fix the problem manually. Issue the individual tablespace/datafile definitions of the
following commands. From a SQL*Plus session logged on to the SYSTEM Schema, execute the
following SQL script:
@c:\wkdir\fix_ts.sql
ALTER DATABASE DATAFILE 'c:\obetemp' AUTOEXTEND ON NEXT 5M;

The tablespace can now be autoextended by the database itself without any further interaction.

3. As soon as the error is fixed, the suspended session automatically resumes and continues
processing. Check the data dictionary view for change of status. From a SQL*Plus session
logged on to the SH schema, check the status again:
@c:\wkdir\run_sel_w2.sql
SELECT NAME, STATUS, ERROR_MSG FROM dba_resumable;
The resumption of the statement occurs in the previously hung window.

Back to Topic
5. Creating a Range-List Partitioned Table in the Transfer Tablespace
With the additional partitioning strategy RANGE-LIST, you can use a composite
partitioning method to subdivide a partitioned table based on two logical attributes. The
main partitioning strategy is the well-known and most used Range partitioning strategy.
You find this implementation in nearly every data warehouse implementation dealing
with Rolling Window Operations. Furthermore, every Range partition is partitioned
based on a List Partitioning strategy, enabling a finer partition granule for addressing
your business needs.
A typical example leveraging Range-List Partitioning is a global Retail environment,
having time range partitions (Rolling Window) and a region-oriented list partition
underneath, so that you can maintain every time window for a specific region
independent of each other.
List partitioned tables enable you to group sets of distinct unrelated values together in
one partition. Consequently, all values not being covered with an existing partition
definition will raise an Oracle error message. Although this enforces a business
constraint (you should not see a non-existing region in your data warehouse), you
cannot always be sure to not having such violations. Oracle therefore introduces the
capability to create a DEFAULT List Partition, a kind of catch-all partition for all
undefined partition key values.
Oracle has also introduced the so-called subpartition template technology, a common
optional element of both range-hash and range-list composite partitioning. The template
provides an easy and convenient way to define default subpartitions for each table
partition. Oracle will create these default subpartitions in any partition for which you do
not explicitly define subpartitions. This clause is useful for creating symmetric
partitions.
In this section, you create a Range-List partitioned table, leveraging all of the
functionality mentioned above.
1. From a SQL*Plus session logged on to the SH schema, run the following SQL statements into
your SQL*Plus session:
@c:\wkdir\create_new_range_list.sql
CREATE TABLE sales_rlp
COMPRESS
TABLESPACE MY_OBE_TRANSFER
PARTITION BY RANGE (time_id)
SUBPARTITION BY LIST (channel_id)
SUBPARTITION TEMPLATE
( SUBPARTITION direct values (3),
SUBPARTITION internet values (4),
SUBPARTITION partner values (2),
SUBPARTITION other values (DEFAULT)
)
(PARTITION SALES_before_1999 VALUES LESS THAN (TO_DATE('01-JAN-1999','DD-MON-
YYYY')),
PARTITION SALES_Q1_1999 VALUES LESS THAN (TO_DATE('01-APR-1999','DD-MON-
YYYY')),
PARTITION SALES_Q2_1999 VALUES LESS THAN (TO_DATE('01-JUL-1999','DD-MON-
YYYY')),
PARTITION SALES_Q3_1999 VALUES LESS THAN (TO_DATE('01-OCT-1999','DD-MON-
YYYY')),
PARTITION SALES_Q4_1999 VALUES LESS THAN (TO_DATE('01-JAN-2000','DD-MON-
YYYY')),
PARTITION SALES_Q1_2000 VALUES LESS THAN (TO_DATE('01-APR-2000','DD-MON-
YYYY')),
PARTITION SALES_Q2_2000 VALUES LESS THAN (TO_DATE('01-JUL-2000','DD-MON-
YYYY')),
PARTITION SALES_Q3_2000 VALUES LESS THAN (TO_DATE('01-OCT-2000','DD-MON-
YYYY')),
PARTITION SALES_Q4_2000 VALUES LESS THAN (MAXVALUE) NOCOMPRESS)
AS
SELECT * FROM sales sample(10);
This example also shows how you can easily create compressed and uncompressed partitions as
part of an initial table creation. The partition and template names are inherited by all the
subpartitions, thus making the naming and identification of subpartitions much easier and more
convenient.

2. From a SQL*Plus session logged on to the SH schema, run the following SQL statements into
your SQL*Plus session:
@c:\wkdir\show_range_list_names.sql
PROMPT as you can see, the partition and template name is inherited to all
the subpartitions,
PROMPT thus making the naming - and identification - of subpartitions much
easier ...
select partition_name, subpartition_name, high_value from
user_tab_subpartitions
where table_name='SALES_RLP';
A List and Range-List partitioned table group sets distinct unrelated values together in one
partition. Consequently, all values not being covered with an existing partition definition will
raise an Oracle error message. We will now create a DEFAULT List Partition for the previously
created List partitioned table sales_prod_dept, a kind of catch-all partition for all
undefined partition key values.

3. From a SQL*Plus session logged on to the SH schema, run the following SQL statements into
your SQL*Plus session:
@c:\wkdir\cr_default_list_part.sql
PROMPT add a new partition to sales_prod_dept that does NOT have a DEFAULT
partition
Rem it is added just like for a range partitioned table
ALTER TABLE sales_prod_dept ADD PARTITION gameboy_sales VALUES ('Gameboy');

PROMPT now add another one, covering the DEFAULT value


ALTER TABLE sales_prod_dept ADD PARTITION all_other_sales VALUES (DEFAULT);

PROMPT view the data dictionary


select partition_name, high_value
from user_tab_partitions
where table_name='SALES_PROD_DEPT';

All records having an undefined partitioning key will be stored in this new partition other_sales.
However, having a DEFAULT partition changes the way you’re adding new partitions to this
table. Conceptually, you cover all possible values for the partitioning key—the defined ones and
"all others"—as soon as you have created a default partition.
To ADD a new partition to this table, you logically have to SPLIT the default partition into a new
partition with a set of defined keys and a new default partition, still covering "all other values"
(now reduced by the keys we have specified for the new partition). Any attempt to add a new
partition like you did before will raise an ORA error message.

4. From a SQL*Plus session logged on to the SH schema, run the following SQL statements into
your SQL*Plus session:
@c:\wkdir\split_default_list_part.sql
PROMPT Unlike the first time, we cannot simply add a new partition
PROMPT raises ORA-14323: cannot add partition when DEFAULT partition exists
Rem we cannot be sure whether the new value already exists in the DEFAULT
partition
ALTER TABLE sales_prod_dept ADD PARTITION undefined_sales VALUES
('Undefined');
This will raise an Oracle error. Run the following commands or the script
split_default_list_part_b.sql
@c:\wkdir\split_default_list_part_b.sql
PROMPT so we have to SPLIT the default partition to ensure that
PROMPT all potential values of 'Undefined' are in the new partition

ALTER TABLE sales_prod_dept SPLIT


PARTITION other_sales VALUES ('Undefined') INTO
(PARTITION undefined_sales, PARTITION other_sales);
PROMPT control the data dictionary
PROMPT Note that without specifying any tablespace, the default
PROMPT tablespace of the partitioned table is used
select partition_name, tablespace_name, high_value
from user_tab_partitions
where table_name='SALES_PROD_DEPT';
Back to Topic
6. Preparing the Metadata Export for a Transportable Tablespace
To prepare a tablespace for transportation, you simply have to export the metadata
information of the tablespace. This export, together with a physical copy of the
tablespace, can be used for importing the tablespace into another database instance.
Prior to Oracle9i, the target database had to be running on the same operating system
with the same database block size. Oracle9i lifted the restriction of the same block size.
Oracle Database 10g lifts the restriction even more by introducing heterogeneous
transportable tablespaces.
To show this capability, you perform the following steps:
1. From a SQL*Plus session logged on to the SH schema, execute the following SQL script:
@c:\wkdir\make_ts_ro.sql
ALTER TABLESPACE my_obe_transfer READ ONLY;
This guarantees that no further changes can happen on the data in the tablespace.

2. You can now export the data dictionary information of tablespace my_obe_transfer as
follows:
@c:\wkdir\export_metadata.sql
Optionally, you can create a specific directory where you want to store the export dump file:
CREATE DIRECTORY my_obe_dump_dir as 'c:\wkdir';
Now you can export the metadata only. Please compare the file size with the tablespace size to
get a feeling what it would have meant to extract all the data and not only the metadata.
expdp \'/ as sysdba\' DIRECTORY=my_obe_dump_dir DUMPFILE=
meta_MY_OBE_TRANSFER.dmp

Note: This script may take several minutes to complete running.


Note: The same export syntax can be used at the DOS prompt by dropping the SQL*Plus "host"
command.
3. Before you import the tablespace information into another database, you will want to clean up
and reset the session status by executing the SQL script:
@c:\wkdir\cleanup_tts.sql
DROP TABLESPACE transfer INCLUDING CONTENTS AND DATAFILES;
ALTER SESSION DISABLE RESUMABLE;
4. Trying to access table sales_prod_dept raises an error. The table does not exist any longer.
@c:\wkdir\sel_spd.sql
SELECT count(*)
from sales_prod_dept;

Back to Topic
Back to Topic List
Cleanup
To clean up your environment, you will need to perform the following step:
1. From a SQL*Plus session logged on to the SH schema, execute the following commands:
SET SERVEROUTPUT ON
EXEC dw_handsOn.cleanup_modules
Back to Topic List
Summary
In this lesson, you learned how to:
Perform a multitable insert

Perform an upsert

Use a table function

Use synchronous CDC to capture and consume incremental source changes

Propagate from a data warehouse to a data mart

También podría gustarte