Está en la página 1de 16

Technical Bulletin

Part No. 74-0111

Ascential DataStage Sybase IQ Load Stage


This technical bulletin describes Release 1.3 of the Sybase IQ Load stage. This stage loads data from Ascential DataStage jobs into the Sybase IQ indexing engine.

Copyright 2004, 1997-2003 Ascential Software Corporation All rights reserved.

19972004 Ascential Software Corporation. All rights reserved. Ascential, Ascential Software, DataStage, MetaStage, and MetaBroker are trademarks of Ascential Software Corporation or its affiliates and may be registered in the United States or other jurisdictions. Adobe Acrobat is a trademark of Adobe Systems, Inc. Microsoft, Windows, Windows NT, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. Adaptive Server, Open Client, and Sybase are either registered trademarks or trademarks of Sybase, Inc. UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company, Ltd. Other marks mentioned are the property of the owners of those marks. This product may contain or utilize third party components subject to the user documentation previously provided by Ascential Software Corporation or contained herein.

Printing History
First Edition (74-0111) for Release 1.0, August 1997 Second Edition (74-0111) for Release 1.1, June 1998 Third Edition (74-0111) for Release 1.2, March 1999 Fourth Edition (74-0111) for Release 1.3, September 2001 Updated for Release 1.3, August 2002 Updated for Release 1.3, August 2003 Updated for Release 1.3, May 2004

How to Order Technical Documents


To order copies of documents, contact your local Ascential subsidiary or distributor, or call our office at (508) 366-3888. Documentation Team: Marie E. Hedin

May 2004

74-0111

Introduction
This technical bulletin describes the following for Release 1.3 of Sybase IQ Load, updated for Ascential DataStage Release 7.5: Functionality Terminology Installation Index sets Disk overflow handling Stage and link properties

The Sybase Adaptive Server IQ (Sybase IQ) is an advanced indexing engine, not a database management system. The goal of the Sybase IQ Load stage is to enable Ascential DataStage to load data into existing Sybase IQ index sets rapidly and efficiently. Each input link to the stage represents a stream of rows to load into a Sybase IQ index set or joined index set. Reference links and output links have no meaning in the context of Sybase IQ Load and are not allowed. Use Sybase IQ Load with versions of Sybase IQ before Version 12. Use Sybase IQ12 Load with Version 12 of Sybase IQ. You can create jobs with both the Sybase IQ12 Load and the Sybase IQ Load stages if you need to load different versions of Sybase IQ within the same job. See the online readme file for your platform for the latest information about the Ascential DataStage release.

Functionality
Sybase IQ Load supports the following functionality: Support for NLS (National Language Support). For more information, see Ascential DataStage NLS Guide. Support for Ascential MetaStage. For more information, see Ascential MetaStage Users Guide. Generation and optional automatic execution of the Sybase IQ commands to load index sets with data from input links. Two methods for loading an index set automatically: in-line with the job or using a post-job batch file. Automatic loading of joined index sets in the correct order.

Ascential DataStage Sybase IQ Load Stage

74-0111

May 2004

Automatic generation of overflow data files if the first data file exhausts physical disk space. Support for data files that exceed the 2-GB file size limit for 64-bit file systems. Allows the job designer to specify Sybase IQ commands to be run before and after the insert for the primary purpose of sending diagnostic or verification output to the DataStage log. Generation of data files in fixed-width or delimiter-separated ASCII format. The following functionality is not supported: Deletion and recreation of the index set itself. Other modes of operation supported in database load utilities, for example, update existing rows only. Automatic execution of insert commands works only when the Sybase IQ Server resides on the same machine as the DataStage job. Loading of joined index sets can only be done using an after-job subroutine, because the stage instance has no way to guarantee that the columns of different tables in the index set will be loaded in the correct order. (The job compiler may draw a process boundary through a given stage instance so that there is no one process address space that knows the status of all the links connected to the stage.) Exploitation of named pipes as a way to avoid generating large data files.

Ascential DataStage Sybase IQ Load Stage

May 2004

74-0111

Terminology
The next two sections explain the Sybase IQ and the Sybase IQ Load terms used in this document.

Sybase IQ Terminology
The following table lists the Sybase IQ terms used in this document: Term index index set Description A single column in an index set. The Sybase IQ equivalent of a table. It is a collection of named, typed indexes on columns of data which may have come from a Sybase SQL Server database, a foreign database, or a flat file. Sybase IQ Load loads data into index sets. Every index set definition has associated with it a Sybase SQL Server table definition, because Sybase IQ uses the SQL Server to catalog information about its index sets. The Sybase IQ entity that contains index sets. The index space owns disk and other resources, and provides a handle for administration. The Sybase IQ command used to delete data from an index set. This command does not delete any data from the underlying table. The Sybase IQ command used to insert data into an index set. This command does not insert any data into the underlying table. The server engine to which you connect in order to use Sybase IQ. An IQ Server instance provides access to one or more index spaces.

index space

IQ DELETE

IQ INSERT

IQ Server

joined index A set of indexes over the columns of a relational join between sets two or more tables. A given table in the associated Sybase SQL Server catalog can have an index set over itself, and can at the same time participate in any number of joined index sets. The joined index sets and the simple index set are distinct and must be loaded separately.

Ascential DataStage Sybase IQ Load Stage

74-0111

May 2004

Sybase IQ Load Terminology


The following table lists the Sybase IQ Load terms used in this document: Term Load stage control file Description A passive stage whose role in a DataStage job is to take streams of tabular data and load them into tables of a target database. A file of Sybase IQ commands that loads or reloads an index set. A DataStage job generates one control file for each input link to each instance of Sybase IQ Load stage. Control files can be executed by redirecting them to the Sybase isql utility. An ASCII file of row/column data from an input link that is to be loaded. The property of a link that governs the order in which its data is loaded relative to other links attached to this stage. See IQLSN in Link Property Help Text on page 11. A batch file, provided with the Sybase IQ Load stage, that executes control files in a specified list of directories in load sequence number order. An individual stage of a given type, appearing as an icon in a job design.

data file load sequence number post-job loading script stage instance

Installing the Plug-In


For instructions and information supporting the installation, see Ascential DataStage Plug-In Installation and Configuration Guide.

Index Sets
The next sections describe loading index sets and joined index sets.

Loading Index Sets


Sybase IQ Load supports the following methods for loading the data from its input links into Sybase IQ index sets: Manual loading Automatic loading Post-job loading

Ascential DataStage Sybase IQ Load Stage

May 2004

74-0111

Manual Loading
The Sybase IQ Load stage instance generates an ASCII data file and a control file for each input link, but does not load the data into Sybase IQ. You can load the data later by redirecting the control file to the Sybase isql utility. Manual loading is the default.

Automatic Loading
Rows arriving at an input link are written to a data file as in manual loading. When the link reaches end-of-data, appropriate IQ DELETE and IQ INSERT commands are generated and executed via a Sybase Client-Library connection to the IQ Server. The commands to be executed are also written to a control file to log the activity. Automatic loading works if both the DataStage Server and the IQ Server reside on the same machine. It also works if the two servers reside on different machines that are connected by a local area network (LAN) if both of the following conditions are met: Sybase Open Client must be installed on the machine hosting the DataStage Server. The control files, data files, and overflow directories are written to a directory visible to both the DataStage Server and the IQ Server. This directory must use the same absolute pathname on both machines. For more information about overflow directories, see Disk Overflow Handling on page 7.

Post-Job Loading
The data file and control file are written the same way as in manual loading, but the control file commands are executed by a batch file invoked as an ExecDos afterjob subroutine. When used with load sequence numbers, this method guarantees that multiple index sets are loaded in the correct order. The requirements for automatic loading of data when Ascential DataStage and IQ Servers reside on different machines on a LAN also apply to post-job loading. Note: This is the only method supported for loading joined index sets.

Ascential DataStage Sybase IQ Load Stage

74-0111

May 2004

Loading Joined Index Sets


Handling joined index sets is a special case in Sybase IQ. This is because the data for each table in the join must be loaded in a specific sequence so that the joined index set works. Sybase IQ computes and prints this sequence when the joined index set is created. You must remember this sequence because Sybase IQ provides no command or programmatic interface for determining the sequence later. Support for joined index sets in Sybase IQ Load is complicated by the fact that the input links to a stage instance are not guaranteed to run in the same process. Depending on the overall design of the job, the DataStage job compiler may draw process boundaries through a Sybase IQ Load instance. This makes it impossible to know while the job is running when the last link has closed. Consequently, the actual loading of data into joined index sets has to be done outside the job itself. The Sybase IQ Load package has a batch file named IQLOAD.BAT that you can run using the ExecDOS after-job subroutine to load joined index sets. The job designer specifies the load order with the Load Sequence Number link prompt. For more information about this prompt, see Stage Property Prompts on page 9. For more information about using and configuring the IQLOAD.BAT file, see the following sections.

Using the Post-Job IQ Loading Batch File


The following sections explain how to load data from DataStage jobs into Sybase IQ using the post-job loading technique. This is the only automatic way to load joined index sets from Ascential DataStage. You may also want to load ordinary index sets this way. The IQLOAD.BAT batch file executes control files in a specified list of directories, using a specified user name, password, and IQ server name. You can modify this file or create a new one. We recommend, however, that you keep the original as a backup.

Locating the IQLOAD.BAT Batch File


The IQLOAD.BAT batch file is located in the root directory of the Sybase IQ Load installation package. Copy it to a directory in your PATH.

How IQLOAD.BAT Works with Ascential DataStage


The IQLOAD.BAT file executes all the IQ control files, which are generated by Ascential DataStage, in a specified list of directories. The control files must begin with load sequence number prefixes (that is, have names in the format n.base-

Ascential DataStage Sybase IQ Load Stage

May 2004

74-0111

name.ctl, where n is an integer starting with 1). Within each directory, the batch file executes the control files in the order indicated by these prefixes. You can run IQLOAD.BAT manually from the DOS prompt by entering the following:
C:\> IQLOAD username password iqserver dir1 [dir2...]

Note: The IQLOAD.BAT file distributed with Sybase IQ Load only executes control files from links that have load sequence numbers, and it does not delete any previous data files or control files which may be there.

Configuring IQLOAD.BAT from a Ascential DataStage Job


To use IQLOAD.BAT from a DataStage job, configure it as an after-job ExecDOS subroutine according to the following steps: 1. 2. 3. 4. Open your job in the Ascential DataStage Designer. Choose Job Properties from the Edit menu. A dialog box appears that lets you configure various things about your job. Choose ExecDOS from the After-job subroutine list. Type the command line to run IQLOAD.BAT in the Input Value field. Use the format described in the previous section. Remember to list all the directories in the job to which the Sybase IQ Load stages write the control files. We recommend that you set up your jobs so that each job, or batch of jobs, has its own directory for control and data files. This minimizes the possibility of one job executing load operations that were prepared by another job, or rerunning loads that have already been performed once.

5.

Disk Overflow Handling


Sybase IQ Load must be able to handle load operations in the multiple-gigabyte range. With data sets this large, the potential exists to exhaust the free space on the disk drive or partition that is receiving the data file. You can configure a Sybase IQ Load stage instance to handle these situations by providing a semicolon-separated list of directory paths as the value of the DIRPATH stage property. If the stage runs out of disk space in the middle of a job run and is unable to write a row to the data file, it opens an overflow data file in

Ascential DataStage Sybase IQ Load Stage

74-0111

May 2004

the second directory in the list and continues. In this way, the data can be spread among multiple disk drives or partitions. The pathnames of overflow files are added to the USING clause of the IQ INSERT command that loads the data. Note: If you experience any timeout issues, increase the default values for CS_RETRY_COUNT and CS_TIMEOUT_VALUE to 10 or higher.

Properties
The tables in the next two sections include the following column heads: Prompt is the text that you see in the stage editor user interface. Default is the value used if you do not supply a value. Description describes the properties.

Stage Property Prompts


The following table lists the stage properties for Sybase IQ Load : Prompt Load Automatically IQ Indexspace Name Output Path Password (Sybase) IQ Server Name User ID (Sybase) Default No None C:\temp ***** None None Description Load index set automatically (Yes, No) (Required) Name of Sybase IQ index space Full directory path for data and control files (Required) Sybase password (Required) Name of IQ Server (Required) Sybase user name

Ascential DataStage Sybase IQ Load Stage

May 2004

74-0111

Stage Property Help Text


The following table lists the help text for each stage property: Prompt Load Automatically Help Text The option that tells Ascential DataStage to connect to the IQ Server and execute the commands in the control file. This is done after the last row of data is written to the corresponding data file. This option is ignored for joined index sets. The default does not automatically load the control and data files. Set this property to No if post-job batch loading is to be used. The name of the target IQ index space. This name appears as the argument to a USE command, which is the first command executed during a load. A semicolon-separated list of absolute pathnames in which Sybase IQ Load creates control and data files. Control files are written to the first directory path in the list. Data files are created first in the first directory in the list. If the data file overflows the disk space contained within its current directory, an overflow data file is created in the next directory in the list, and so on, until the end of the data is reached or the disk space is exhausted. The IQ password used when connecting to the IQ Server to perform the load. The IQ user name used when connecting to the IQ Server to perform the load.

IQ Indexspace Name Output Path

Password

IQ Server Name The IQ Server name as defined for any IQ client program. User ID

Link Property Prompts


The following table lists the properties for the input links to the Sybase IQ Load stage. Prompt CHAR Delimiter Control File Name Default | (vertical bar) tablename.ctl Description (Optional) Delimiter for CHAR and VARCHAR columns File of IQ insertion commands (generated by Ascential DataStage)

Ascential DataStage Sybase IQ Load Stage

74-0111

May 2004

Prompt Data File Name Clear Before Load Post-insert Command Pre-insert Command IQ DELETEFROM

Default

Description

tablename.dat Data file (generated by Ascential DataStage) Yes None None None (Yes, No) Runs IQ DELETE command before load Optional IQ command to run after delete/insert operation Optional IQ command to run before delete/insert operation Optional FROM clause for IQ DELETE Optional WHERE clause for IQ DELETE command Optional WITH clause for IQ DELETE command Required if loading a joined index set Optional WITH clause for IQ INSERT command index set load order (required for joined index sets) (Required) Target Sybase IQ index set

IQ DELETEWHERE None IQ DELETEWITH Joined Indexset name IQ INSERTWITH Load Sequence Number Indexset Name None None None 0 None

10

Ascential DataStage Sybase IQ Load Stage

May 2004

74-0111

Link Property Help Text


The following table lists the help text for each link property: Property CHARDELIM Help Text Character string that delimits CHAR and VARCHAR columns in the data file. If not specified, these columns are written in fixed-width ASCII format and declared appropriately in the control file. Noncharacter columns are always delimited by vertical bars (|), regardless of the CHAR delimiter. Control file generated by Ascential DataStage containing IQ commands to run at database load time. This file is created for documentation purposes, even if the IQ commands are loaded automatically. Its name defaults to the value of the TABLE property, appended by an extension of .ctl. If the load sequence number is nonzero, the number is prefixed to the control file name, resulting in a file name with the format n.name.ctl. This prefix helps the post-job batch loader to run the control files in the correct sequence for joined index set loads. The control file is always created in the first directory in DIRPATH (see Stage Property Prompts on page 8). File name of the flat ASCII output file containing the rows and columns of data to load into the index set for this link. The file name defaults to the value of the TABLE property, appended by an extension of .dat. The data file is created in the first DIRPATH directory. If disk space under the first DIRPATH directory is exhausted, data overflows into a new data file of the same name in the second DIRPATH directory, and so on until the end of data is reached or the last DIRPATH directory is exhausted. Controls whether an IQ DELETE command is generated before the IQ INSERT command that loads the new data into the index set. For single index sets, an IQ DELETE FROM INDEXSET command is generated. For joined index sets, an IQ DELETE FROM JOINED INDEXSET command is generated. In both cases, the WITH, FROM, and WHERE clauses in the corresponding properties are added to the command.

CTRLFILE

DATAFILE

DODELETE

Ascential DataStage Sybase IQ Load Stage

11

74-0111

May 2004

Property IQAFTERCMD

Help Text Full literal text of an optional IQ command to run after the DELETE and INSERT commands that perform the load. This can be used to run IQ SHOW or other commands that generate diagnostic output. Output from these commands appears in the DataStage job log if you request Automatic Loading. Full literal text of an optional IQ command to run before the DELETE and INSERT commands that perform the load. This can be used to run IQ SHOW or other commands that generate diagnostic output. Output from these commands appears in the DataStage job log if you request Automatic Loading. For joined index sets only, this permits full specification of the FROM table[,table] clause of the IQ DELETE FROM JOINED INDEXSET FOR table FROM command. Defaults to the name of the index set for this link. The keyword FROM is optional. Specifies optional search_condition(s) for the IQ DELETE command. If defined, search_condition(s) access the WHERE clause of the IQ DELETE or IQ DELETE FROM JOINED INDEXSET command. The keyword WHERE is optional. Specifies optional delete_load_option(s) for the WITH clause of the IQ DELETE command. This can be used to control various parameters of the deletion process (see Sybase IQ Language Reference). The keyword WITH is optional. Name of the joined index set to load. Required for joined index set loads, otherwise ignored. The presence of a joined index set name and a load sequence number greater than 1 causes an IQ INSERT INTO JOINED INDEXSET command to be generated as the load command for the link instead of an IQ INSERT INTO command. For more information, see the help on load sequence numbers. Specifies optional insert_load_option(s) for the WITH clause of the IQ INSERT or IQ INSERT INTO JOINED INDEXSET command that loads the data. This can be used to control various parameters of the load (see Sybase IQ Language Reference). The keyword WITH is optional.

IQBEFORECMD

IQDELFROM

IQDELWHERE

IQDELWITH

IQJOINIDXSET

IQLOADWITH

12

Ascential DataStage Sybase IQ Load Stage

May 2004

74-0111

Property IQLSN

Help Text The load sequence number of a link governs the order in which its data is loaded relative to other links attached to this stage. A nonzero load sequence number disables Automatic Loading. The load sequence number becomes a prefix to the control file name. If a joined index set name is defined, then IQ INSERT INTO JOINED INDEXSET syntax is generated for links whose load sequence number is greater than 1. (The top table in the join hierarchy of the joined index set should be given a load sequence number of 1 because it needs an IQ INSERT INTO command. See Sybase IQ Language Reference and other Sybase IQ documentation.) Name of the Sybase IQ index set to load with data from this link. This name is the argument of the FOR clause in the IQ INSERT INTO command or the IQ INSERT INTO JOINED INDEXSET command. (Required)

TABLE

Ascential DataStage Sybase IQ Load Stage

13

74-0111

May 2004

14

Ascential DataStage Sybase IQ Load Stage

También podría gustarte