In this section, we will apply some of the general techniques outlined above and related them to how you can implement these with SAS. While we recognize that many of the ideas presented here are in no way novel, we do cite original work when appropriate.
Generalized SAS Code
In later sections, we highlight robust techniques that can be used when code is running within the macro compiler in SAS. There are times, when you want to prevent or highlight issues in open code – that is code, which is running by itself and not part of a macro or larger application.
Tip # 1. In the final code, there should be no dead code that does not work or that is not used. This must be removed from the program.
For example, the /*and visitnum>1*/ is no longer in use and it needs to be removed for better readability:
proc sort data=adam.advs out=vs;
where saffl=’Y’ /*and visitnum >1*/ and trt1a in (1,2);
run;
Tip # 2. Error Checking Code should be included in the program where possible
Code to allow checking of the program or of the data (on all data or on a subset of patients such as clean patients, discontinued patients, patients with SAE or patients with odd data) is encouraged and should be built throughout the program. This code can be easily activated during the development phase or commented out during a production run using the piece of code
For example, the following code will report the partial date data into the LOG window:
data ae;
set raw.ae;
if length(compress(aestdt))=9 then aestdtn=input(aestdt,date9.);
else if aestdt>’ ‘ then put ‘ERROR: DATA PROBLEM: Partial Date issue for IF-THEN in AE step. ‘ usubjid= aestdt=;
run;
Tip # 3. Try to produce code that will operate correctly with unusual data in unexpected situations (e.g. missing data).
For example, it is important to exclude the missing data for lab normal range indicator flags in order to the common mistake of showing ‘LOW’ or ‘HIGH’ in the LBNRIND column for missing values of LBSTRESN:
data lb;
set raw.lb;
if .<lbstresn < lbstnrlo then lbnrind=’LOW’;
else if lbstresn>. and lbstnrhi>. and lbstnrlo>.
and lbstnrlo<=lbstresn<=lbstnrhi
then lbnrind=’NORMAL’;
else if lbstresn>lbstnrhi>. then lbnrind=’HIGH’;
run;
ERRORS, WARNINGS AND NOTES
As a number of contributors have noted (no pun intended!), paying attention to the log is critical. As SAS moves from a batch-execution environment, to “service” based, understanding where the various logs are critical.
Techniques described by Susan J. Slaughter & Lora D. Delwichexv include understanding errors in the made because of:
- Syntax
- missing semicolon
uninitialized variable and variable not found
- Data
- missing values were generated
- numeric to character conversion
- invalid data
- character field is truncated
- Logic
- DATA step produces wrong results but no error message.
Tip # 4. Whenever you look at the log, you should generally be on the lookout for these terms:
- error
- not referenced
- never been refere
- not resolved
- is not in the report def
- has more than one data set with repeats of by values
- current word or quoted string has become more than 200
- observation(s) outside the axis range
- is unknown
- warning
- uninitialized
- cannot be determined
- extraneous information
- missing values were generated
- observation(s) contained a MISSING value
- Invalid arguments
- truncated to 32 characters
- overwritten by
- can't modify it at this time
Tip # 5. Use the MSGLEVEL=I option in order to have all informational, note, warning, and error messages sent to the LOG.
For examplexvi:
OPTIONS MSGLEVEL=I;
PROC SQL;
SELECT *
FROM AE,
LAB
WHERE AE.SUBJID = LAB.SUBJID AND
TRT= 'A';
QUIT;
The log displays the following notation:
INFO: Index Rating selected for WHERE clause optimization.
Tip # 6. It is not acceptable to have avoidable notes or warnings in the log (mandatory).
Reason: They can often lead to ambiguities, confusion, or actual error (e.g. erroneous merging, uninitialized variables, automatic numeric/character conversions, automatic formatting, operation on missing data...). Note: If such a warning message is unavoidable, an explanation has to be given in the program (mandatory).
For example, merging data with the length (20) of USUBJID from dataset DOSE1 and the length (30) of USUBJID from dataset DOSE2:
data dose3;
merge dose1 dose2;
by usubjid;
run;
WARNING: Multiple lengths were specified for the BY variable pain by input data sets. This may cause unexpected results.
INFO: The variable trt1A on data set WORK.DOSE1 will be overwritten by data set WORK.DOSE2.
NOTE: There were 200 observations read from the data set WORK.DOSE1.
NOTE: There were 200 observations read from the data set WORK.DOSE2.
NOTE: The data set WORK.DOSE3 has 200 observations and 20 variables.
DATA STEP
Tip # 7. Be careful when merging datasets. Erroneous merging may occur when:
- No BY statement is specified (set system option MERGENOBY=WARN or ERROR).
- Some variables, other than BY variables, exist in the two datasets (set system option MSGLEVEL=I), S writes a warning to the SAS log whenever a MERGE statement would cause variables to be overwritten at which the values of the last dataset on the MERGE statement are kept).
- More than one dataset contain repeats of BY values. A WARNING though not an ERROR is produced in the LOG. If you really need, PROC SQL is the only way to perform such many-to-many merges.
Reason: One has to routinely carefully check the SASLOG as the above leads to WARNING messages rather ERROR messages yet the resulting dataset is rarely correct.
For example, merging two lab datasets with USUBJID, VISITNUM, and LBTESTCD, a note ‘statement has more than one data set with repeats of BY values’ is presented in the LOG. This will result in mis-merge. You may want to check the data to see if it is caused by duplicate records or more variables need to be added to the BY statement.
data lb3
merge lb1 lb2;
by usubjid visitnum lbtestcd;
run;
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 1000 observations read from the data set WORK.LB1.
NOTE: There were 1000 observations read from the data set WORK.LB2.
NOTE: The data set WORK.LB3 has 1000 observations and 28 variables.
Tip # 8. Route any error messages to the log window using the ERROR statement with a data step to detect unexpected data points or records.
For example, you want to ensure there is no record with missing AGE value. To give you alert, you can simply employ the ERROR statement:
data dm;
set raw.dm;
if age=. then error 'ERROR: Missing AGE for ' usubjid=;
run;
ERROR: Missing AGE for USUBJID=999-099-001001
STUDYID=DRUG A USUBJID=999-099-001001 SITEID=001 COUNTRY=USA AGE=. SEX=F RACE=5 ETHNIC=NOT HISPANIC OR LATINO TRT1PN=0 TRT1P=A RANDDTN=. HEIGHT1=. HEIGHT2=. WEIGHT1=. ERROR_=1 _N_=1
NOTE: There were 20 observations read from the data set WORK.DM.
NOTE: The data set WORK.DM has 20 observations and 14 variables.
Tip # 9. When coding IF-THEN-ELSE constructs use a final ELSE statement to trap any observations that do not meet the conditions in the IF-THEN clauses.
e.g. ELSE PUT variable= to the log window
Reason: You can only be sure that all possible combinations of data are covered if there is a final ELSE statement.
For example, to ensure all records in the VS dataset are within the study window, you can use the PUT statement to trap those records which do not meet the IF-THEN clause and display them in the LOG window for data issue reporting.:
data vs1;
set vs(keep=usubjid adt trt1day1);
ady=adt-trt1day1;
if .<ady<=7 then avisitn=1;
else if 8<=ady<=14 then avisitn=2;
else if 15<=ady<=21 then avisitn=3;
else if 22<=ady<=28 then avisitn=4;
else put 'WARNING: Out of Window Records: ' usubjid= ady= adt= trt1day1=;
run;
WARNING: Out of Window Records: USUBJID=999-099-001001 ady=36 ADT=2010-12-09 TRT1DAY1=2010-11-03
WARNING: Out of Window Records: USUBJID=999-099-001002 ady=56 ADT=2010-02-02 TRT1DAY1=2009-12-08
NOTE: There were 172 observations read from the data set WORK.VS.
NOTE: The data set WORK.VS1 has 172 observations and 5 variables.
PROCEDURES
Tip # 10. Always use DATA= in a PROC statement (mandatory).
Reason: It ensures correct dataset referencing, makes program easy to follow, and provides internal documentation.
For example:
proc sort data=vs1;
by usubjid;
run;
NOTE: There were 172 observations read from the data set WORK.VS1.
NOTE: SAS sort was used.
NOTE: The data set WORK.VS1 has 172 observations and 5 variables.
PROC DATASETS
Tip # 11. At the end of the program or at strategic points, it is a good practice to use PROC DATASETS to delete unneeded data sets from the work library. This not only will improve performance, but more importantly will show the intention to the reader as well.
For example, the following code will only delete the datasets in the WORK library and format and macro catalogs will remain in the library:
proc datasets lib=work kill memtype=data;
run;
PROC SQL
<Looking for contributions from the community!>
DATA CHECKS
<Looking for contributions from the community!>
FORMATS AND LOOKUPS
Tip # 12. When coding a user-defined FORMAT, include the keyword ‘other’ on the left side of the equals sign so that all possible values have an entry in the format.
Reason: A missing entry in a user-defined FORMAT can be difficult to detect. The simplest way to identify this potential problem is to ensure that all values are assigned a format. Note: This does not apply to INFORMATs. It could be more helpful to get a WARNING message when trying to INPUT data of unexpected format.
For example:
proc format;
value avisit
1 = 'Week 1'
2 = 'Week 2'
3 = 'Week 3'
4 = 'Week 4'
other='Out of Window'
;
run;
SAS Macros
<Looking for contributions from the community!>
MACRO PROGRAMS
<Looking for contributions from the community!>
MACRO VARIABLES
<Looking for contributions from the community!>
SAS Applications
<Looking for contributions from the community!>