Está en la página 1de 34

SAS – Statistical Analysis System

Domain: Retail
Name of the author: Shalini Balasubramani (shalini.balasubramani@tcs.com)
Date created: 04/28/2009
AGENDA

• Statistical Analysis System


• Components of SAS
• DATA and PROC step flow
• DATA step
• Input statement
• Output statement
• PROC step

18 September 2009 2
SAS – Statistical Analysis System
• It acts as powerful system for statistical analysis and data manipulation
• It provides an extensive usage in spreadsheets and graphical analysis
• It includes a complete programming language as well as modules for
- Econometric and time series analysis
- Project management, engineering and statistical research
- Linear programming
- Operation research
• It provides multidimensional data analysis (OLAP – On Line Analytical
Processing), query and reporting, EIS (Executive Information System), data
mining and data visualization

18 September 2009 3
Components of SAS

• DATA and PROC steps acts as a building blocks of the SAS program
• A typical program starts with either DATA step or the combination of DATA
and PROC step
• DATA step creates data sets and pass the data to the PROC step for the
further data manipulations
• DATA step contains the information about the declared variables within the
data set

18 September 2009 4
DATA and PROC step flow

Input data Variable declaration

RAW DATA DATA step

Data manipulation
as per the function

SAS
PROC step
DATASET

Output
data

REPORTS

18 September 2009 5
SAS – DATA step
• The step begins with DATA statement
• DATA sets are produced by DATA step.
• DATA step contains the information about the declared variables including its
name, type (character, numeric), length (storage size), and position (starting
position) within the data set
• It passes the data to the PROC step for the further manipulation
• It reads and modify the data
Syntax: DATA data-set name;
E.g.: DATA data1;

18 September 2009 6
Instream data
• It reads the data in the DATA step.
• The data are passed in free format within the DATA step
Syntax:
DATA dataset name;
INPUT [variable] [format];
CARDS;
value[1-n];
RUN;

18 September 2009 7
INSTREAM DATA – INPUT CARD (E.g.)

18 September 2009 8
INSTREAM DATA – SAS LOG

18 September 2009 9
Reading data from external file

• INPUT keyword declares the variables with format, length in a file


• INFILE statement is used to read the data from the external file
Syntax:
DATA datastep;
INPUT [variable] [format];
RUN;

18 September 2009 10
SAS – DATA step (E.g.)

18 September 2009 11
Output statement

• PUT statement writes the data in both the external and SAS log.
• The PUT statement writes the data into the SAS log by default in the absence
of external file.

Syntax:
PUT variable-name Format.;

18 September 2009 12
OUTPUT DATA – EXTERNAL FILE (E.g.)

18 September 2009 13
SAS PROC step
• PROC step receives the data passed by the SAS DATA step
• It manipulates the received data as per the function
Syntax:
PROC PRINT DATA=‘data-set’;
[TITLE] ;
RUN;

18 September 2009 14
PROC PRINT (E.g.)

18 September 2009 15
PROC – SORT
• SORT proc sorts the data either in ascending or descending order
• It sorts the data set by the input variables as a key variable
• It sorts the data set in ascending order as a default
Syntax:
PROC SORT DATA=‘input SAS data set’;
OUT=‘Output SAS data set’;
BY <Descending> ‘key variable’;
OPTIONS
RUN;

18 September 2009 16
PROC – SORT (E.g.)

18 September 2009 17
PROC MEANS

• MEANS procedure produces the simple descriptive statistics for numeric


variables.
Syntax:
PROC MEANS DATA = FILE1;
Variable(1-n);
RUN;

18 September 2009 18
PROC MEANS (E.g.)

18 September 2009 19
MEANS – SAS LOG

18 September 2009 20
PROC FREQ

• FREQ statement calculates the frequency by key variable of the SAS data set
Syntax:
PROC FREQ DATA=dataset name;
TABLES variable;
RUN;

18 September 2009 21
PROC – FREQ (E.g.)

18 September 2009 22
FREQ – SAS LOG

18 September 2009 23
MERGE statement
• It combines the SAS data sets and match the observations based on an
identifier
Syntax:
DATA data-set;
MERGE data-set1 data-set2;
RUN;

18 September 2009 24
MERGE (E.g.)

18 September 2009 25
MERGE - SAS LOG

18 September 2009 26
MERGE statement (Contd..,)
Dataset A – Record 1 Dataset B – Record 1
Key value = “A” key value = “A”
Dataset A – Record 2 Dataset B – Record 2
Key value = “B” Key value = “B”
Dataset A – Record 3 Dataset B – Record 3
Key value = “C” Key value = “B”
Dataset A – Record 4 Dataset B – Record 4
Key value = “C” Key value = “C”

DATASET A DATASET B MERGE TYPE


Dataset A - Record 1 Dataset B - Record 1 1 – 1 Merge
Key value = “A” Key value = “A”
Dataset A - Record 2 Dataset B - Record 2 1 – Many Merge
Key Value = “B” Key Value = “B”
Dataset A - Record 2 Dataset B - Record 3 1 – Many Merge
Key Value = “B” Key Value = “B”
Dataset A - Record 3 Dataset B - Record 4 Many – 1 Merge
Key Value = “C” Key Value = “C”
Dataset A - Record 4 Dataset B - Record 4 Many – 1 Merge
Key Value = “C” Key Value = “C”

18 September 2009 27
UPDATE statement
• UPDATE statement performs a modified version of a horizontal merge, in
which values on the original records are overlaid with new information.
• The UPDATE statement can avoid overlaying any given value in the master
dataset with the value in the transaction dataset by setting the corresponding
value in the transaction dataset to missing.
Syntax:
DATA data-set;
UPDATE data-set1 data-set2;
BY key value(optional);
RUN;

18 September 2009 28
UPDATE (E.g.)

18 September 2009 29
UPDATE – SAS LOG

18 September 2009 30
MODIFY statement
• It extends the capabilities of the DATA step, enabling you to manipulate a
SAS data set in place without creating an additional copy
Syntax:
DATA data-set;
MODIFY dataset;
BY key-value;
RUN;

18 September 2009 31
MODIFY (E.g.)

18 September 2009 32
MODIFY – SAS LOG

18 September 2009 33
THANK YOU

18 September 2009 34

También podría gustarte