SAS代写-CHAPTER 2:
时间:2022-10-29
CHAPTER 2:
DATA STEP PROGRAMMING
Learning Outcomes
•Upon completing this chapter, you should be able to do
the following:
• create new variables that are not originally contained in raw data
• perform calculations across records
• keep only selected variables in output SAS data set
• execute SAS DATA step statements conditionally
• execute SAS DATA step statements repeatedly
• put selected records into multiple data sets within one DATA step
• read hierarchical raw data records
• use SAS functions in DATA step (self-study)
2
Creating New Variables
• Assignment statements in DATA step
• New information can be added to a SAS data set by creating new
variables with assignment statements in a Data step.
• Syntax
DATA datasetname;
[Other DATA step statements]
variablename = expression; /*an assignment statement */
[Other DATA step statements]
RUN;
• Descriptions:
• Valid within DATA step. It is an executable statement in SAS DATA
step.
• The left-hand side of the statement must be a variable name.
• expression on the right-hand side may contain combinations of
numeric or non-numeric constants, variables, SAS functions, and
mathematical operators.
• When the expression contains character constants, each value must be
enclosed in a pair of single (or double) quotation marks.
3
Creating New Variables
•Assignment statements in DATA step
• Descriptions:
• SAS assigns variablename’s type and length based on its first
occurrence in the DATA step.
• SAS assigns variablename the same type and length as the
expression on the right side of the assignment operator.
• When instream data input method is used, all assignment
statements must appear before the DATALINES statement.
• Value of a created variable may be affected by the placement of
the concerned statement.
4
Creating New Variables
•Assignment statements in DATA step
• Some commonly used operators:
Addition + Subtraction – Multiplication *
Division / Exponentiation ** Parentheses ( )
• SAS performs exponentiation first, then multiplication and
division, followed by addition and subtraction.
• One can use parentheses to override the order of the
operation.
• Examples:
Statement: var1 = 10 + 4 * 3 ** 2 ; Outcome: var1 = 46
Statement: var1 = ((10 + 4) * 3) ** 2 ; Outcome: var1 = 1764
5
Creating New Variables
•Assignment statements in DATA step
• Example 2.1:
•If a variable (such as Cucumber)
has already been assigned a value
in PDV, SAS replaces the original
value with the new one.
•The variable Pea in the last record
is missing. Consequently, the
values of Total & Tomato_percent
are therefore also set to missing
for the same observation.
•SAS executes each
assignment statement once
during each round of DATA
step iteration.
6
data case1;
infile datalines dlm=',';
input name $ tomato cucumber pea grape;
zone=14;
type='Home';
cucumber = cucumber * 10;
total = tomato + cucumber + pea + grape;
tomato_percent = tomato / total *100;
datalines;
David, 10,2, 40,0
Mary, 15,5,10,1000
Francis, 50, 10, 15, 50
Tom, 20,0,.,20
;
Run;
Creating New Variables
•Assignment statements in DATA step
• Example 2.1: Continued
• Values in PDV under the 1st iteration of the DATA step:
7
input name $ tomato cucumber pea grape;
total = tomato + cucumber + pea + grape;
tomato_percent = tomato / total *100;
zone=14;
type='Home';
cucumber=cucumber * 10;
14.285714286
Creating New Variables
•Assignment statements
• The sequence of assignment statements and INPUT
statement will affect the assigned values of these variables.
• Example 2.1: Continued
• If we swap the following two statements with each other in the same
DATA step:
• Values of Total and Tomato_percent in the PDV shall be affected.
8
cucumber=cucumber * 10;
total = tomato + cucumber + pea + grape;
total = tomato + cucumber + pea + grape;
cucumber=cucumber * 10;
1
2
1
2
20 19.230769231
Calculations Across Records
•RETAIN statement in DATA step
• It may be desirable in some situations to perform calculations
across records of a data set.
• Example: Compute the running total of sales.
• SAS automatically resets the values of all user created
variables in PDV to missing at the beginning of each DATA
step iteration. Therefore, by default, the values of these
variables from the last record cannot be reused in next DATA
step iteration.
• RETAIN statement tells SAS not to reset specified variables
(numeric or categorical) to missing in the PDV at the
beginning of a DATA step iteration so that the previous
values of the specified variables can be reused.
9
Calculations Across Records
• RETAIN statement in DATA step
• Syntax
DATA datasetname;
[Other DATA step statements]
RETAIN variable1 … >;
[Other DATA step statements]
RUN;
• Descriptions
• Valid in DATA step. It is a declarative statement.
• It must appear before DATALINES line statement if instream data input is used.
Placement of the statement will not affect the value of the specified variables, but
it may affect the position of the variables in the created data set.
• It can be used for both numeric and character variables.
• Multiple RETAIN statements may be contained in the same DATA step. Optional
to specify an initial value (initial_valueX) for each concerned variable in the PDV.
If initial value is not specified, the default value is missing.
10
Calculations Across Records
•RETAIN statement in DATA step
Example 2.2: Calculate the running total of variable Sales.
11
data case2;
input month $ sales @@;
acc_sales = acc_sales + sales;
retain acc_sales 0;
datalines;
Jan 3500 Feb 2640 Mar 3350
Apr 1250 May 4350 Jun 5530
Jul 5320 Aug 3890 Sep 4220
Oct 2980 Nov 5680 Dec 3360
;
run;
Values in PDV after compilation phase:
Values in PDV at 1st iteration:
Input month $ sales @@;
acc_sales = acc_sales + sales;
Values in PDV at 2nd iteration:What if :
• RETAIN statement is not used?
• Acc_Sales has not been initialized to 0
value in RETAIN statement?
Calculations Across Records
•RETAIN statement in DATA step
• By default, missing value will be generated from operations
performed on missing value. If a variable contains missing
values for an observation, its running totals from that
observation and beyond will also be missing too.
• Example 2.3:
12
data case3;
input month $ sales @@;
acc_sales=acc_sales+sales;
retain acc_sales 0;
datalines;
Jan 3500 Feb 2640 Mar 3350
Apr 1250 May . Jun 5530
Jul 5320 Aug 3890 Sep 4220
Oct 2980 Nov 5680 Dec 3360
;
Run;
Calculations Across Records
•SUM statement in DATA step
• It is used to compute the running total of a numeric variable.
• It automatically retains the value of the running total variable
from the previous iteration of the DATA step in PDV in order
to cumulatively add the current values of other specified
variable to it across DATA step iterations.
• It automatically sets the starting value of the running total
variable to 0 in PDV at the compilation phase.
• RETAIN statement for the running total variable is not
required unless the initial value of the running total variable is
not 0.
• It treats the missing value of the variable to be added as 0.
13
Calculations Across Records
•SUM statement in DATA step
• Syntax
DATA datasetname;
[Other DATA step statements]
acc_variable + expression ; /*SUM statement*/
[Other DATA step statements]
RUN;
• Descriptions
• Valid in Data step. It is an executable statement in DATA step.
• It must appear before DATALINES line statement if instream data
input is used.
• Variable acc_variable contains the accumulated numeric value.
• acc_variable has initial value of zero by default.
• expression contains the value to be added to the acc_variable.
• expression can be a variable, or a constant, or others. It must return a
numeric value. 14
Calculations Across Records
•SUM statement in DATA step
Example 2.4: Calculate the running total of variable Sales.
15
data case4;
input month $ sales @@;
acc_sales + sales;
datalines;
Jan 3500 Feb 2640 Mar 3350
Apr 1250 May . Jun 5530
Jul 5320 Aug 3890 Sep 4220
Oct 2980 Nov 5680 Dec 3360
;
run;
Keeping/Dropping Selected Variables
• DROP/KEEP statement in DATA step
• DROP statement tells SAS not to send the specified variables from the PDV
to the created SAS data set.
• KEEP statement tells SAS to send only the specified variables from the PDV
to the created SAS data set.
• Syntax:
DATA datasetname;
[Other DATA step statements]
DROP / Keep variable_list ; /*DROP / KEEP statement */
[Other DATA step statements]
RUN;
• Descriptions
• Valid in DATA step. They are declarable statements in DATA step.
• It must appear before DATALINES statement if instream data input method is used.
• Either DROP or KEEP statement can be used, but not both.
• Variables in the variable_list of DROP statement or variables not in the
variable_list of KEEP statement will not be send to the SAS data set, but they still
exist in PDV.
16
Keeping /Dropping Selected Variables
•DROP/KEEP statement in DATA step
• Example 2.5: Keep only variables Tomato and Tomato_percent
in the created SAS data set.
17
data case1a;
infile datalines dlm=',';
input name $ tomato cucumber pea grape;
zone=14;
type='Home';
cucumber = cucumber * 10;
total = tomato + cucumber + pea + grape;
tomato_percent = tomato / total *100;
keep tomato tomato_percent ;
*DROP name cucumber pea grape zone type total;
datalines;
David,10,2,40,0
Mary,15,5,10,1000
Francis,50,10,15,50
Tom,20,0,.,20
;Run;
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• It executes a statement if some specified conditions are
satisfied, or to execute another statement, if specified, when
the specified conditions are not met.
• Syntax:
DATA datasetname;
[Other DATA step statements]
IF expression THEN statement1;
< ELSE statement2; >
[Other DATA step statements]
RUN;
18
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Descriptions:
• Valid in DATA step. They are executable statements in DATA step.
• They must appear before DATALINES statement if instream data input method is
used.
• expression is a kind of SAS expression that involves one or more operands and an
operator.
• statement1 and statement2 are any executable SAS statement.
• The optional ELSE statement gives an alternative action (statement2) if the specified
expression in the IF clause is not true.
• If the expression in the IF clause is true, the statement1 shall be executed for the
current DATA step iteration and the ELSE statement, if present, shall be skipped.
• If the expression in the IF clause is false, the statement1 shall be skipped for the
current DATA step iteration and the ELSE statement, if present, shall be executed.
• The ELSE statement, if used, must immediately follow the IF-THEN statement.
Only one ELSE statement is allowed for each IF-THEN statement.
19
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• expression is often in the form of:
operand operator operand
• where each operand is either a variable name, a constant, or any
valid expression.
• Some commonly used operators in expression:
20
Definition Operator
Equal to = EQ
Not equal to ^= NE
Greater than > GT
Less than < LT
Greater than or equal to >= GE
Less than or equal to <= LE
Equal to one of a list IN
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Numeric comparisons
• SAS makes comparisons for numeric variables based on their current
value in the PDV.
• For example, in expression A >= B, if A has the value 4 and B has the
value 3, then the expression returns a TRUE value. It then executes the
respective statement. If A is 5 and B is 9, then the expression returns a
FALSE value. It does not execute the respective statement.
• A blank (or missing) numeric value (default is period, '.') is smaller
than any other numeric value.
• One can use the IN operator to search for multiple ranges of numeric
values. For example, the expression
VarA IN (1:3, 6:9) /* only integer values are considered*/
is equivalent to the expression
VarA IN (1,2,3,6,7,8,9) /*commas can be replaced by blanks*/
21
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Character comparisons
• The default characters comparison is based on ASCII collating sequence
as follows, from the smallest to the largest displayable character:
blank!"#$%&'()*+,- ./0123456789:;<=>?@ ABCDEFGHIJKLMNOPQRSTUVWXYZ
[ \ ] ˆ_abcdefghijklmnopqrstuvwxyz{}~
• A blank or missing character value is smaller than any other displayable
character. Upper case letter is smaller than low case letter.
• Trailing blanks are ignored in comparison. For example 'AAA' = 'AAA '.
• Blanks at the beginning and in the middle of the character value are
significant to SAS. For example, ' fox' is not equivalent to 'fox'.
• One may compare only a specified prefix of a character expression by
using a colon (:) after a comparison operator. For example, the result of
IF 'SAMMY' =: 'SA’ is TRUE.
• One can use the IN operator with character strings to determine whether
a character value is among a list of character values. For example, IF
VarX in ('David', 'Mary', 'John') .
22
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Example 2.6: Group month values into the quarter of a
year and rate the sales values as Good or Fair.
23
data case5;
input month : $10. sales @@;
if month in: ('Jan' 'Feb' 'Mar') then qtr=1;
if month in: ('Apr' 'May' 'Jun') then qtr=2;
if month in: ('Jul' 'Aug' 'Sep') then qtr=3;
if month in: ('Oct' 'Nov' 'Dec') then qtr=4;
if sales > 4000 then rate='Good';
else rate='Fair';
datalines;
January 3500 February 2640 March 3350
April 1250 May 4350 June 5530
July 5320 August 3890 September 4220
October 2980 November 5680 December 3360
;
run;
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• SAS sets the length of a character variable first time it is
evaluated at compilation phase.
• Example 2.7: Length of Var2 is 3 bytes in Case6a. It is 5 bytes
in Case6b.
24
data case6a;
input var1 @@;
if var1 > 20 then var2='Big';
else var2='Small';
datalines;
25 20 15
;
run;
data case6b;
input var1 @@;
if var1 <= 20 then var2='Small';
else var2='Big';
datalines;
25 20 15
;
run;
• One may use LENGTH statement to preset the length of the new variable.
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• In Example 2.6, a set of IF statements are used for
comparing the Month variable to different sets of month
names.
• The group of IF statements are programmatically correct,
but they are not very efficient. Even if the expression of
the first IF statement is TRUE, the expressions in the
following three IF statements still need to be checked
redundantly.
25
...
if month in: ('Jan' 'Feb' 'Mar') then qtr=1;
if month in: ('Apr' 'May' 'Jun') then qtr=2;
if month in: ('Jul' 'Aug' 'Sep') then qtr=3;
if month in: ('Oct' 'Nov' 'Dec') then qtr=4;
...
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• If only one of the clauses in a set of IF statements may be true,
you can add multiple ELSE IF clauses to the group of IF
statements.
• A group of IF-THEN/ELSE statements with multiple ELSE IF
clauses have the following form:
IF condition THEN statement ;
ELSE IF condition THEN statement ;

< more ELSE IF statements >

• The last optional ELSE clause, if specified, becomes a default that is
automatically executed when all previous IF/ELSE IF clauses in the same
group are not true.
• There must be no other statements between the first If statement and the
last ELSE IF (or Else statement if present) statement. 26
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Example 2.6a: Group month values into the quarter of a
year and rate the sales values as Good or Fair.
27
data case5;
input month : $10. sales @@;
if month in: ('Jan' 'Feb' 'Mar') then qtr=1;
else if month in: ('Apr' 'May' 'Jun') then qtr=2;
else if month in: ('Jul' 'Aug' 'Sep') then qtr=3;
else if month in: ('Oct' 'Nov' 'Dec') then qtr=4;
if sales > 4000 then rate='Good';
else rate='Fair';
datalines;
January 3500 February 2640 March 3350
April 1250 May 4350 June 5530
July 5320 August 3890 September 4220
October 2980 November 5680 December 3360
;run;
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Example 2.8:
28
data case7;
input var1 @@;
length var2 $ 7;
if var1 > 20 then var2 = 'Big';
else if var1 > 10 then var2 = 'Medium';
else if var1 =. then var2 = 'Unknown';
else var2 = 'Small';
datalines;
25 20 15
11 . 8
14 7 .
;
run;
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• If there is a need to execute more than 1 statement when the
IF expression is TRUE or FALSE, the statements to be
executed must be contained within a DO group in this way:
IF expression1 THEN DO;
[Group 1 statements]
END;
[Group 2 statements]
END;>
[Group 3 statements]
END;>
29
• The statements between the DO and
END statements are called a DO
group.
• The DO and END statements are
executable statements.
• If expression1 is TRUE, Group 1
statements will be executed.
• If expression2 is TRUE, Group 2
statements will be executed.
• IF both expression1 and expression2
are FALSE, Group 3 statements will
be executed.
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Example 2.10:
30
data case9;
input id $ course $ marks @@;
length lecturer $ 8;
if course ='MS1234' then do;
lecturer='AB Chan';
classsize=45;
end;
else if course ='MS3456' then do;
lecturer='EF Grand';
classsize=30;
end;
else do;
lecturer='Others';
classsize = .;
end;
datalines;
1234 MS3456 65 1234 MS1234 59
2345 MS1234 75 3456 MS3456 81
4567 MS1111 68 5678 MS2222 100
;
run;
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• IF multiple conditions need to be checked before statements are
to be executed, you can add another IF-THEN/ELSE statement
within a IF-THEN/ELSE statement in this way:
31
• ELSE IF and ELSE
statements belongs to
their immediately
preceding IF-THEN
structure.
• Use this kind of nested
structure with caution.
IF condition1 THEN
IF condition2 THEN
IF condition3 THEN statement;

>

>

>
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Example 2.11. This program is correct in SAS syntax but
has a low readability.
32
data case10a;
input age gender $ @@;
if age > 25 then
if gender = 'F' then group='1F';
else if gender='M' then group='1M';
else group='1U';
else if age > 20 then
if gender = 'F' then group='2F';
else if gender='M' then group='2M';
else group='2U';
else
if gender = 'F' then group='3F';
else if gender='M' then group='3M';
else group='3U';
datalines;
20 F 18 M 21 M 26 M
28 . 35 F 16 . 17 F
22 M
; run;
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Example 2.12: Same program but with improved readability.
33
data case10a;
input age gender $ @@;
if age > 25 then do;
if gender = 'F' then group='1F';
else if gender='M' then group='1M';
else group='1U';
end;
else if age > 20 then do;
if gender = 'F' then group='2F';
else if gender='M' then group='2M';
else group='2U';
end;
else do;
if gender = 'F' then group='3F';
else if gender='M' then group='3M';
else group='3U';
end;
datalines;
20 F 18 M
21 M 26 M
28 . 35 F
16 . 17 F
22 M
; run;
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Commonly used logical operators
• Use the AND (&) logical operator to execute the THEN statement if two
expressions that are linked by AND in IF or ELSE IF clauses are both
need to be true.
• A nested IF-THEN statement such as
states that if both expressions are true, Group equals to '1F'.
• The above nested IF-THEN statements can be reduced into a single IF-
THEN statement by joining up the conditions using the logical operator
AND:
• Using the AND operator greatly improves the readability of the
statement.
34
IF Age >25 THEN IF Gender = 'F' THEN Group = '1F';
IF age >25 AND gender = 'F' THEN Group = '1F';
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Commonly used logical operators
• In SAS, two comparisons with a common variable linked by AND
can be condensed with an implied AND.
• For example: the following two IF statements produce the same
result:
35
IF 16 <= age AND age < 65 THEN …
IF 16 <= age < 65 THEN …
Executing SAS DATA Step Statements Conditionally
• IF-THEN/ELSE statements
• Commonly used logical operators
• Use the OR (|) logical operator to execute the THEN statement if
either one of the two expressions that are linked by the OR needs
to be true.
• For example, the following two statements
can be combined as
• AND and OR operators can be used in the same statement, but
AND operator precede OR operator.
• Parentheses ( ) may be used to group conditions. For example:
36
IF course1 > 80 THEN grade = 'A' ;
IF course2 > 80 THEN grade = 'A' ;
IF course1 > 80 OR course2 >80 THEN grade = 'A' ;
IF course1 > 80 AND (course2 >80 OR course3 > 80) THEN grade = 'A' ;
Executing SAS DATA Step Statements Conditionally
37
• IF-THEN/ELSE statements
• Commonly used logical operators
• Use the NOT (^ or ~) logical operator with other operators to
reverse the logic of a comparison, such as NOT >, NOT =, NOT
IN , etc.
• Example statements:
• Avoid putting too many logical operators into one IF-THEN/ELSE
statement as it will be hard to trace the sources of error when
something is gone wrong.
IF course1 NOT <= 80 OR course2 NOT <= 80 THEN grade = 'A' ;
IF course1 NOT IN (1:60, 80:100) THEN grade = 'B' ;
Executing SAS DATA Step Statements Repeatedly
• It is often necessary to apply same operation to multiple
variables, such as:
• To define a set of variables with the same name, except for the
last character or characters are differ. For example:
INPUT Day1 Day2 … Day30 ;
• To apply the same calculations to a group of variables with
similar names. For example:
Income1 = Revenue1 – Expense1;
Income2 = Revenue2 – Expense2;

Income12 = Revenue12 – Expense12;
• To replace blank (or missing) values to a group of variables
with substituted values. For example:
IF Income1 = . THEN Income1 = 0;
IF Income2 = . THEN Income2 = 0;

IF Income12 = . THEN Income12 = 0; 38
•SAS Variable Lists
• A SAS variable list is an abbreviated method of referring to a
list of variable names.
• Numbered range lists:
• Applies to a series of variables with the same name, except for the last
character or characters which are consecutive numbers.
• The abbreviated list may begin with any number and end with any
number which is larger than the beginning number.
• Example 2.13:
39
Executing SAS DATA Step Statements Repeatedly
data case11;
input branch $ month1 - month5;
datalines;
A 23 42 23 26 19
B 32 23 67 68 44
C 56 45 83 34 67
;
run;
•SAS Variable Lists
• Name Range Lists
• Name range list relies on the order of variable definition, i.e., the order
of their appearance in the DATA step (or equivalently, the order of their
appearance in the PDV).
• These variables must be already defined before they are referred to in
the name range list.
• Name range list can be used in one of the following formats, assume
XXX and ZZZ are variables defined in the current DATA step and XXX
appears before ZZZ in the same DATA step statements (or PDV):
• XXX - - ZZZ, refers to all variables between XXX and ZZZ in PDV.
• XXX - NUMERIC - ZZZ, refers to all numeric variables between
XXX and ZZZ in PDV.
• XXX - CHARACTER - ZZZ, refers to all character variables
between XXX and ZZZ in PDV.
40
Executing SAS DATA Step Statements Repeatedly
(Self-study)
•SAS Variable Lists
• Name Range Lists
• For example: Consider the statement
INPUT IDNUM NAME $ YEARS SALESAMT UNITSOLD ;
• A name range list NAME - - SALESAMT refers to variables
NAME, YEARS, and SALESAMT .
• IDNUM - NUMERIC - SALESAMT refers to IDNUM, YEARS,
and SALESAMT.
• Name Prefix Lists
• Refers to all variables that begin with a specified character string.
Must be already defined before they are referred to.
• Example: The list SALES: refers to all variables that begin with
“SALES”, such as SALES_JAN, SALES_FEB, SALES3, and
SALESXXX, etc.
41
Executing SAS DATA Step Statements Repeatedly
(Self-study)
•SAS Variable Lists
• Special SAS Name List
• _NUMERIC_
• Specifies all numeric variables that are already defined in the
current DATA step.
• _CHARACTER_
• Specifies all character variables that are already defined in the
current DATA step.
• _ALL_
• Specifies all variables that are already defined in the current
DATA step.
42
Executing SAS DATA Step Statements Repeatedly
(Self-study)
•ARRAY statement
• It provides an alternative method for referring to a variable
rather than using the name of the variable.
• Syntax
ARRAY arrayname[n] <$> ;
• Descriptions
• Valid in DATA step. It is a declarative statement.
• arrayname is the name of the array. It must not be the name of a
variable in the same DATA step.
• arrayname is not a variable and it will not appear in PDV or the
created SAS data set.
• n is the number of variables grouped in the array. If n is replaced
by *, SAS determines the size of the array by counting the
variables assigned to the array.
• n must be surrounded by either ( ), { }, or [ ].
43
Executing SAS DATA Step Statements Repeatedly
Executing SAS DATA Step Statements Repeatedly
•ARRAY statement
• Descriptions
• $ is needed if the variables to be grouped are character type and
the assigned variables have not been defined before the ARRAY
statement.
• w defines the length of the variable. Default length is 8 bytes.
• variable1, variable2, etc. are the names of the variables to be
grouped. Any one of the available SAS variable lists can be used
if applicable. All listed variables must be of the same type. If no
variable is listed, SAS will create the set of variables using the
name of the array and adds a number suffix from 1 to n to each
variable.
44
•ARRAY statement
• Example 2.14:
45
data case12a;
input branch $ rev1-rev6 exp1-exp6;
array revenue[6] rev1-rev6;
array expense[6] exp1-exp6;
array income[6];
income[1]=revenue[1] - expense[1];
income[2]=revenue[2] - expense[2];
income[3]=revenue[3] - expense[3];
income[4]=revenue[4] - expense[4];
income[5]=revenue[5] - expense[5];
income[6]=revenue[6] - expense[6];
datalines;
A 21 33 12 21 43 23 11 21 15 18 20 24
B 34 25 26 67 43 23 23 21 25 26 29 30
C 21 30 26 43 30 32 18 20 15 24 13 26
;
run;
These six statements
are almost identical,
they are only different
in their respective
index number in each
array.
Executing SAS DATA Step Statements Repeatedly
• DO loop statements
• It tells SAS to run a group of executable statements repeatedly within
one DATA step iteration for a specified number of times.
• Syntax
DO index_variable = k TO m < BY increment_amount > ;
… SAS statements …
END ;
• Descriptions
• Valid in DATA step. It is an executive statement.
• index_variable must be a numeric variable, where as k, m, and
increment_amount can either be numeric variables or numeric constants.
• index_variable is a variable that changes its value at each iteration of
the loop.
• It takes value k at the beginning of the DO loop, and it ends with a
value > m when the DO loop stops.
• increment_amount controls how the value of index_variable changes.
Its default value is 1.
46
Executing SAS DATA Step Statements Repeatedly
•DO loop statements
• Descriptions
• SAS executes each statement between DO and END statements
sequentially and repeatedly.
• index_variable equals to k to start with. If index_variable
<= m, SAS executes the statements between DO and END
statements.
• At END statement, index_variable changes by the amount
of increment_amount.
• SAS goes back to the DO statement and check the value of
index_variable. If index_variable <= m, SAS executes each
statement between DO and END statements again. If
index_variable > m, SAS proceeds to execute the statements
after the END statement.
• All variables involved in the DO statement will be sent to the
created data set by default.
47
Executing SAS DATA Step Statements Repeatedly
•DO loop statements
• Example 2.15
• In practice, index_variable of the DO loop statement is often
dropped from the created data set using the DROP statement.
• It is a good programming practice to indent the statements
between DO and END statements. 48
data case13a;
do i = 1 to 3;
Var1 = i;
end;
run;
• How many observations are
there in Case13a?
• What are the values of each
variable?
data case13b;
do i = 3 to 1 by -1;
Var1 = i;
end;
run;
• A backward DO loop.
Executing SAS DATA Step Statements Repeatedly
•DO loop statements
• Example 2.16. Repeat Example 2.14 but compute the values
of variables Income1 – Income6 by using the DO loop.
49
data case12b;
input branch $ rev1-rev6 exp1-exp6;
array revenue[6] rev1-rev6;
array expense[6] exp1-exp6;
array income[6];
do index = 1 to 6;
income[index]=revenue[index] - expense[index];
end;
drop index;
datalines;
A 21 33 12 21 43 23 11 21 15 18 20 24
B 34 25 26 67 43 23 23 21 25 26 29 30
C 21 30 26 43 30 32 18 20 15 24 13 26
;
run;
Executing SAS DATA Step Statements Repeatedly
Selecting Observations
• OUTPUT statement
• By default, SAS writes the contents in the PDV to the created data set at
the end of each DATA step iteration. The OUTPUT statement can be
used to override this setting.
• Syntax
OUTPUT <;
input status : $10. @; /*or @@ */
if status = 'Employee' then do;
input lastname : $10. firstname : $10.;
end;
else if status = 'Dependent' then do;
input dependname : $10. relationship $ age;
output;
end;
retain lastname firstname;
drop status;run;
• Employee Daniel Wong does
not appear in the created
SAS data set because he
does not have a dependent
record in the raw data.
Reading Hierarchical Records
•Example 2.24. Refer to the data file MultiRecords2.txt.
Suppose we want to create a SAS data set that contains
each employee's monthly payroll deduction for insurance
premium. The deduction is computed as follows:
• The insurance is free for the employee;
• $100 per month for a spouse's insurance if applicable;
• $60 per month for each child's insurance if applicable;
Show only 1 observation for each employee in the SAS data set
with the following variables: the employee's first name, the
employee's last name, insurance deduction for the spouse (show
value 0 if not applicable), insurance deduction for all children
(show value 0 if not applicable), and the total insurance
deduction (show value 0 if not applicable).
66
Reading Hierarchical Records
• Example 2.24, continued. The first few observations of the
created data set look like this:
• The logic of the SAS DATA step program is as follows:
• Input the first field of a record, hold the record in the input.
• If the value of first field equals to "Employee" :
• If the DATA step is not in its first iteration, compute the total amount of
insurance deduction. Then output to the SAS data set.
• Otherwise, proceed with an INPUT for the employee's first name and
the last name. Set the amount of each insurance deduction to 0. No
output to the data set.
• If the value of first field equals to "Dependent", proceed with an INPUT
for the dependent's relationship. Accumulate the amount of each insurance
deduction according to the relationship. No output to the data set.
• Retain the employee's first name and the last name in the PDV for next
data step iteration.
67
Reading Hierarchical Records
•Example 2.24, continued.
68
data case20;
infile 'MS3251_2021/Raw Data/MultiRecords2.txt' dlm=',';
length lastname firstname $ 10 ;
input status : $10. @;
if status = 'Employee' then do;
if _n_ ^= 1 then do; /*Checking the number of DATA step iteration*/
insure_total = insure_spouse + insure_child;
output;
end;
input lastname firstname;
insure_spouse=0;
insure_child=0;
end;
else if status ='Dependent' then do;
input dependname : $10. relation $;
if relation = 'S' then insure_spouse + 100;
else if relation = 'C' then insure_child + 60;
end;
retain lastname firstname;
keep lastname firstname insure_spouse insure_child insure_total;
run;
Reading Hierarchical Records
•Example 2.24, continued. The created data set from
the DATA step is:
• The above DATA step fails to output the last observation
(Mary Fong) from the PDV to the SAS data set as the
output action only happens when status = 'Employee' and
_n_ ^= 1 at the beginning of a DATA step iteration.
69
Reading Hierarchical Records
• INFILE statement option END =variable_name ;
• It tells SAS to create a system variable which is named as
variable_name.
• variable_name is any valid SAS variable name that is not used
in the DATA step for other purposes.
• Its value remains 0 until SAS processes the last data record.
• Its value equals to 1 if the last record in the raw data file is being
processed.
• It appears in PDV but will not be send to the SAS data set.
• The option cannot be used with instream data input method.
70
Reading Hierarchical Records
•Example 2.24, continued. Modify the DATA step to
output the PDV when the last record is being dealt with.
71
data case20a;
Infile
'MS3251_2021/Raw Data/MultiRecords2.txt'
dlm=',' end = eofile ;
length lastname firstname $ 10;
input status : $10. @;
if status = 'Employee' then do;
if _n_ ^=1 then do;
insure_total = insure_spouse
+ insure_child;
output;
end;
input lastname firstname;
insure_spouse=0;
insure_child=0;
end;
else if status = 'Dependent' then do;
input dependname : $10. relation $;
if relation = 'S' then
insure_spouse +100;
else if relation = 'C' then
insure_child +60;
end;
If eofile = 1 then do;
insure_total = insure_spouse +
insure_child;
output;
end;
retain lastname firstname;
keep lastname firstname insure_spouse
insure_child insure_total;
run;
SAS Functions in DATA step
•A SAS function performs computation with one or more
variables and/or constants over the same observation and
returns a value.
•SAS functions include mathematical functions,
statistical functions, date functions, character functions,
and others.
•Read "SAS Products | Base SAS | SAS 9.4 Functions
and Call Routines: Reference" under "Help | SAS Help
and Documentation" for the full list of available SAS
functions.
72
SAS Functions in DATA step
• SAS function syntax:
Function_name( <, …, argumentn>)
Function_name(OF abbreviated_variable_list)
Function_name(OF array_name[*])
• Descriptions
• Valid in DATA step. It is an executable statement.
• Function_name must be immediately joined by a pair of parentheses.
• If can be used in an assignment statement, or in IF-Then expression, or in
DO WHILE expression.
• If used in an assignment statement, the function must be placed on the
right-hand side of the statement.
• The parentheses may contain one argument, more than one argument, or no
argument (i.e. empty parentheses).
• Multiple arguments are separated by commas.
• The argument can be a variable name, a constant, a variable list, array,
another SAS function, or a valid SAS expression. If a variable list or an
array is specified, precede the list or the array with OF operator. For an
array, also include a [*] syntax after the array name.
73
SAS Functions in DATA step (Self-Study)
•Mathematical functions
• Mostly accommodate one argument
• Example 2.25:
Function name Description
ABS Returns the absolute value
EXP Returns the value of the exponential function
LOG Returns the natural (base e) logarithm
LOG10 Returns the logarithm to base 10
SQRT Returns the square root of a value
74
data case21;
input id income @@;
log_income = log(income);
datalines;
1 4540 2 4670 3 5100 4 2600
5 3750 6 . 7 8213 8 4879
;
run;
SAS Functions in DATA step (Self-Study)
•Truncation functions
• Argument can be a numeric constant, a variable, or an
expression.
• Statement examples:
Function Description
INT Returns the integer portion of the agument
ROUND Rounds the nearest integer to the argument
ROUND
(Argument,
rounding unit )
Rounds the first argument to a value that is very
close to a multiple of the second argument
75
SAS Functions in DATA step (Self-Study)
•Statistical functions
• Often involve more than one arguments.
• All computations are across listed variables in the
argument for each observation.
• Missing values are ignored and are not included in
computation.
Function name Description
MAX Returns the largest value
MEAN Returns the arithmetic mean
MEDIAN Returns the median value
MIN Returns the minimum value
STD Returns the standard deviation
SUM Returns the sum of the nonmissing arguments
VAR Returns the variance
N Returns the number of non-missing values
NMISS Returns the number of missing values
76
77
SAS Functions in DATA step (Self-Study)
•Statistical functions
• Example 2.26
data case22;
infile datalines dsd ;
input sales1-sales6;
sales_min=min(sales1, sales2, sales3, sales4, sales5, sales6);
sales_manual_sum=sales1+sales2+sales3+sales4+sales5+sales6;
sales_sum=sum(of sales1-sales6);
array sales[6] sales1-sales6;
sales_n=n(of sales[*]);
sales_ave=mean(of sales[*]);
keep sales_:;
datalines;
100,145,,195,132,196
56,75,155,245,288,258
150,77,315,220,316,158
113,95,256,,290,310
;run;
78
SAS Functions in DATA step (Self-Study)
•Character functions
• Only used for character variables
Function name Description
CAT(string1 <,…stringn>
Concatenates character strings without removing leading or trailing
blanks
CATS(string1 <,…stringn> Concatenates character strings and removes leading or trailing blanks
CATT(string1 <,…stringn> Concatenates character strings and removes trailing blanks
CATX(separator, string1
<, … stringn>)
Concatenates character strings, removes leading and trailing blanks,
and inserts separators
COMPBL Removes multiple blanks in a character argument
FIND(argument, 'substring) Searches for a specific substring of characters within an argument
LEFT Left aligns argument
')
' t ', string1
79
SAS Functions in DATA step (Self-Study)
•Character functions
Function name Description
LENGTH
Returns the length of a non-blank character string, exlcuding trailing
blanks, and returns 1 for a blank character string
LENTGHC Returns the length of a character string, including trailing blanks
LENGTHN
Returns the length of a non-blank character string, excluding trailing
blanks, and returns 0 for a blank character string
LOWCASE Converts all letters in an argument to lowercase
RIGHT Right aligns argument
SUBSTR (argument, position
<,length>)
Extracts a substring of argument beginning with the character at
specified position and specified length. If length is omitted, SAS
extracts the remainder of the expression
TRIM
Removes trailing blanks from character expressions and returns one
blank if the expression is missing
TRIMN
Removes trailing blanks from character expressions and returns a null
string if the expression is missing
UPCASE Converts all letters in an argument to uppercase
80
SAS Functions in DATA step (Self-Study)
•Character functions
• Example 2.27
data case23;
input var1 $10. var2 $char10.;
lenc1=lengthc(var1);
lenc2=lengthc(var2);
len1=length(var1);
len2=length(var2);
tramlenc1=lengthc(trim(var2));
trimlenc2=lengthc(cats(var2));
cat1=cat(var1,var2);
cats2=cats(var1,var2);
catx3=catx(',',var1,var2);
datalines;
Part1 Part2
12345678901234567890
;
run;
81
SAS Functions in DATA step (Self-Study)
•Character functions
• Example 2.28
data case24;
infile datalines dlm=',';
input name: $20. Sex $ ;
new_name=compbl(name);
blank_pos=
find(new_name,' ');
name_len=length(new_name);
last_name=substr(new_name,
blank_pos+1);
first_name=substr(new_name,
1,blank_pos-1);
sex=upcase(sex);
datalines;
Mary Chan, f
Tom Ng, M
David Wong, m
Betty Chung, F
;
run;
DATA Step Functions (Self-Study)
•Character functions
82
SCAN(source, n, 'delimiter_list') Returns the nth substring that is separated by 'delimiter_list' from source.
source is the string from which the specified substring is to be extracted.
n is the position of the term to be selected from source.
'delimiter_list’ can list one, or multiple separators.
COMPRESS(source, 'char_list’,
'modifier_list')
Returns a string with specified characters in 'char_list' and 'modifier_list' removed
(or kept if requested) from source.
source is the string from which specified characters will be removed.
'char_list' contains a list of characters to be removed by default. It can be empty.
'modifier_list' tells COMPRESS what to do with the 'char_list' and add more
characters to 'char_list'. It can be empty. The following characters are come
commonly used modifiers:
a adds alphabetic characters to 'char_list' .
d adds digits to list 'char_list’ .
i ignors the case of the characters to be kept or removed.
k keeps only the characters in 'char_list’ instead of removing them.
l adds lowercase letters to 'char_lis'.
p adds punctuation marks to 'char_list' .
s adds space characters (blank, tab, line feed etc.) to 'char_list'.
u adds uppercase letters to 'char_list'.
SAS Functions in DATA step (Self-Study)
•Character Functions
• Example 2.29
83
data case25;
input name $ 1-20;
surname=scan(name, 1, ' ');
givenname1=scan(name, 2, ' ');
givenname2=scan(name, 3, ' ');
givennames=catx(' ', givenname1, givenname2);
datalines;
Chen Zhi Qiang
Wang Da Ming
Li Xiao Yan
Cheung Juan
; run;
SAS Functions in DATA step (Self-Study)
•Character Functions
• Example 2.30
84
data case26;
input productcode: $10. @@;
product=compress(productcode,, 'ka'); /*keep only letters*/
code=compress(productcode,, 'a'); /*remove all letters*/
datalines;
Aa235 BXT3218 6789ZYV 316X
;
run;
85
SAS Functions in DATA step (Self-Study)
•Date functions
• Most arguments are SAS date values.
Function Description
Day Returns the day of the month from a SAS date value
MDY (month ,
day , year )
Returns A SAS date value from numeric expression
of month, day, and year values
MONTH Returns the month from a SAS date value
TODAY( )
Returns the current date as a SAS date value,
empty argument
WEEK Returns the week number value
WEEKDAY
Returns the day of the week from a SAS date value,
where 1=Sunday, 2=Monday, …, 7=Saturday
YEAR Returns the year from a SAS date value
86
SAS Functions in DATA step (Self-Study)
•Date functions
• Example 2.31
data case27;
input id birthday birthmonth birthyear @@;
birthdate=mdy(birthmonth,birthday,birthyear);
birthweek=week(birthdate);
birthweekday=weekday(birthdate);
cutoffdate='1jan2004'd; /*a date constant, stored as SAS date
value*/
day_diff=datdif(cutoffdate,birthdate, 'actual');
year_diff=yrdif(cutoffdate,birthdate, 'actual');
datalines;
1 31 12 2005 2 1 1 2006 3 28 2 2006 4 31 3 2006
;
run;
SAS Functions in DATA step (Self-Study)
•Date functions
87
SAS Functions in DATA step (Self-Study)
•Date functions
• Example 2.32
88
data case28;
start_date='12Jul2008'd;
new_date=intnx('day',start_date,60);
output;
new_date=intnx('day',start_date,-60);
output;
new_date=intnx('month',start_date,3); output;
new_date=intnx('month',start_date,-3); output;
new_date=intnx('month',start_date,3,'end'); output;
new_date=intnx('month',start_date,-3,'end'); output;
new_date=intnx('year',start_date,3); output;
new_date=intnx('year',start_date,3,'sameday'); output;
new_date=intnx('year',start_date,-3,'sameday'); output;
run;
SAS Functions in DATA step (Self-Study)
•Date functions
• Example 2.33
89
data case29;
timegap=intck('day','1jan2008'd,
'31dec2008'd);output;
timegap=intck('day','1jan2008'd,
'31dec2007'd);output;
timegap=intck('month','1jan2008'd,
'31dec2008'd);output;
timegap=intck('month','1jan2008'd,
'31dec2007'd);output;
timegap=intck('year','1jan2008'd,
'31dec2008'd);output;
timegap=intck('year','1jan2008'd,
'31dec2007'd);output;
run;
90
SAS Functions in DATA step (Self-Study)
•Special functions
Function Description
INPUT(source , informat )
Returns the value of a character string with a different format. The returned
value can be numeric or character string. Souce identifies the variable or
constant whose value you want to reformat. It must be a character type
variable or constant. Informat determines the new format.
PUT(source , newformat )
Returns a value with a format different to source. The returned value is
always a character string. Source identifies the variable or constant whose
value you want to reformat. Newformat must be the same type as the source.
If souce is numeric, the returned value is right aligned (filled with leading
blanks). If source is character, the returned value is left aligned.
SAS Functions in DATA step (Self-Study)
•Special functions
• Example 2.34
91
data case30;
input id $ class $ year;
newid1=put(year,2.); /*convert to a character
of length 2 bytes*/
newid=cats(id,'/',class,'/',put (year,2.));
subid1=substr(id,2);
subid2=input(substr(id,2),8.);
datalines;
A1335 4D 2
B2468 5A 3
;run;


essay、essay代写