SAS® Performance Tuning Techniques

SAS® software provides users with many choices for accessing, manipulating, analyzing, and processing data and results. Due to the software’s power, assortment of features, and size of the data sources, application developers, programmers and end-users actively seek approaches to implement efficiency techniques anywhere possible. This tip highlights performance tuning techniques for SAS users to apply in their applications and program code to help conserve CPU, I/O, data storage, and memory resources while performing tasks involving reading, sorting, grouping, joining (or merging), summarizing and processing data and results.

Quite frequently, while developing applications and program code, efficiency is not always given the attention it deserves, particularly during the early phases of development.  System performance often suffers and as a result can affect the behavior of an application or program.  To combat this, active user participation is crucial to understanding application and performance requirements.

 

Efficiency Objectives

Efficiency objectives are best achieved when implemented as early as possible, preferably during the design phase. But when this is not possible, for example when customizing or inheriting an application, efficiency and performance techniques can still be "applied" to obtain some degree of improvement. Efficiency and performance strategies can be classified into five areas: CPU Time, Data Storage, Elapsed Time, I/O, and Memory.

 

Guidelines to Hold Dear

The difference between an optimized software application (or process) versus one that is not can be dramatic. By adhering to practical guidelines, an application can achieve efficiency in direct relationship to economies of scale. Generally, as much as 90% of efficiency improvements can be gained quickly and with relative ease by applying simple strategies.  But, the final 10% can often be a challenge. Consequently, you will need to be the judge as to whether your application or program has reached "relative" optimal efficiency while maintaining a virtual balance between time and cost.

 

 

The following suggestions are not meant as an exhaustive list of all known efficiency techniques, but as a sampling of techniques that can provide some measure of efficiency. Performance tuning techniques are presented for the following resource areas: CPU time, data storage, I/O, memory, and programming time.

 

CUP Time

  • Use KEEP= or DROP= data set options to retain desired variables.
  • Use WHERE statements, WHERE= data set option, or WHERE clauses to subset SAS datasets.
  • Create and access SAS datasets rather than ASCII or EBCDIC raw data files.
  • Use IF-THEN / ELSE or SELECT-WHEN / OTHERWISE in the DATA step, or a Case expression in PROC SQL to conditionally process data.
  • Use the DATASETS procedure COPY statement to copy datasets opposed to DATA-SET constructs.
  • Use DATA step hash techniques to perform lookups and merges (or joins).
  • Turn off the Macro facility when not needed by specifying the NOMACRO system option.
  • Avoid unnecessary sorting - understand when a sort is needed.
  • Use procedures that support the CLASS statement to take advantage of group processing without sorting.
  • Use the Stored Program Facility for complex DATA steps.
  • CPU time and elapsed time can be reduced with the SASFILE statement.

 

Data Storage

  • Use KEEP= or DROP= data set options to retain desired variables.
  • Process only the variables you need which removes unwanted variables from the program data vector (PDV).
  • Use LENGTH statements to reduce the size of a variable.
  • Use data compression strategies to reduce the amount of storage used to store datasets.
  • Create character variables for data that won’t be used for analytical purposes.
  • Shorten data by using informats and formats.
  • To allow a DATA step to be used without creating a data set, use a DATA _NULL_ statement.
  • More DASD space may be needed to hold a specified amount of data when the default physical BLKSIZE of 6KB is used.
  • When sufficient disk space is unavailable to perform a sort, the SORT procedure’s TAGSORT option should be considered.
  • Remove unwanted SAS datasets with PROC DATASETS.

 

Input/Output (I/O)

  • Read only data that is needed from external data files.
  • Minimize the number of times a large data set is read by subsetting in a single DATA step.
  • Use KEEP= or DROP= data set options to retain only desired variables.
  • Use a WHERE statement, WHERE data set option or PROC SQL WHERE-clause to subset data.
  • Use data compression for large data sets.
  • Use the DATASETS procedure COPY statement to copy datasets with indexes.
  • Use the SQL procedure to consolidate steps.
  • Store data in SAS data sets, not external files to avoid excessive read processing.
  • Perform data subsets as early as possible to reduce the number of reads.
  • Use indexed data sets to improve access to data subsets.
  • Use the OUT= option with PROC SORT to reduce I/O operations.
  • The BUFNO= option can be specified to adjust the number of open page buffers when processing SAS data sets.

 

Memory

  • Read only data that is needed.
  • Process only the variables you need which removes unwanted variables from the program data vector (PDV).
  • Use WHERE statements, WHERE data set options, or WHERE clauses to subset data sets when possible.
  • Avoid storing SAS catalogs in memory because they consume large quantities of memory.
  • If using arrays, create them as _TEMPORARY_ to reduce memory requirements.
  • Increase the REGION size when the amount of available memory is insufficient.
  • Use the SORTSIZE= system option to limit the amount of memory that is available to sorting.
  • Use the SUMSIZE= system option to limit the amount of memory that is available to summarization procedures.
  • Use the MEMSIZE= system option to control memory usage with the SUMMARY procedure.
  • Use the MVARSIZE= system option to specify the maximum size of in-memory macro variable values.
  • Use memory-resident DATA step constructs like Hash objects to take advantage of available memory and memory speeds.

 

Programming Time

  • Use the SQL procedure for code simplification.
  • Use procedures whenever possible.
  • Document programs and routines with comments.
  • Utilize macros for redundant code.
  • Code for unknown data values.
  • Assign descriptive and meaningful variable names.
  • Store formats and labels with the SAS data sets that use them.
  • Use procedures such as PROC SQL when appropriate to consolidate the number of process steps.
  • Use the DATASETS procedure COPY statement to copy data sets with indexes.
  • Test program code using "complete" test data.
  • Assign redundant steps to function keys, particularly during debugging and tuning operations.

 

Conclusion

The value of implementing efficiency and performance strategies into an application cannot be over-emphasized. Careful attention should be given to individual program functions, since one or more efficiency techniques can affect the scalability and/or behavior of your application or program. Efficiency techniques are learned in a variety of ways, from formal classroom instruction to published guidelines in books, manuals, articles, and videotapes. But the greatest value comes from the experience of others, word-of-mouth, and on the job.

 

Trademark Citations

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

 

Author Information

Kirk Paul Lafler is an entrepreneur, consultant and founder at Software Intelligence Corporation, and has been using SAS since 1979. Kirk has worked as a SAS application developer, programmer, certified professional, provider of SAS consulting services, mentor, adjunct professor at San Diego State University, advisor and adjunct professor at University of California San Diego Extension, emeritus sasCommunity.org Advisory Board member, and educator to SAS users around the world. As the author of seven books including PROC SQL: Beyond the Basics Using SAS, Third Edition (SAS Press. 2019) and Google® Search Complete (Odyssey Press. 2014); and hundreds of papers and articles; Kirk has been selected as an Invited speaker, trainer, keynote and section leader at SAS International, regional, special-interest, local, and in-house user group conferences and meetings; and is the recipient of 25 “Best” contributed paper, hands-on workshop (HOW), and poster awards.

 

Comments and suggestions can be sent to:

Kirk Paul Lafler

SAS® Consultant, Application Developer, Programmer, Data Analyst, Educator and Author

Software Intelligence Corporation

E-mail: KirkLafler@cs.com

LinkedIn: http://www.linkedin.com/in/KirkPaulLafler

LinkedIn: https://www.linkedin.com/in/Order-of-Magnitude-Analytics/

Twitter: @sasNerd

 

 

Posted by on

Tags: Kirk Lafler

Categories: Kirk's Korner

Related Blogs

Add Your Comments

Thinking of joining PHUSE?

Already a member but not sure how you can benefit?

PHUSE is an expanding, global society with a global membership of clinical data scientists. It requires a large pool of resources to help with its running, and so there are many opportunities for members to become involved. Whether it's chairing a conference, presenting at an event, leading a working group or contributing to the quarterly online newsletter, we are always keen to hear from volunteers.

Find Out More