Fun with Functions and Macro Variables in PROC SQL

The SQL procedure is a powerful and versatile tool which, when combined with one or more SAS® functions, is able to summarize (or aggregate) data. Besides its ability to aggregate data, PROC SQL can also be used with the SELECT statement’s INTO clause to create single-value macro variables and when combined with the SEPARATED BY keyword is able to create an array of values, commonly referred to as a value-list macro variable. These capabilities afford users the ability to wield a tool that reduces, if not eliminates, hardcoding altogether. As a result, users are able to create and use macro variables along with the macro language to perform iterative processing, macro substitution, lookups and other operations. This tip illustrates several examples of using functions and the macro interface in PROC SQL to perform some exciting things.

Using Summary Functions with the SQL Procedure

As its name implies, summary functions can be used to perform calculations, descriptive statistics, and other aggregating operations in a SELECT statement or HAVING clause. These functions are designed to summarize information and can be combined with other columns and SELECT clauses to produce detailed results about your data.

Without the availability of summary functions, one would need to construct the necessary logic using alternate, or somewhat more, complicated SQL programming constructs. When using a summary function without a GROUP BY clause, all the rows in a table are treated as a single group. Consequently, the result is frequently a single value.

A number of summary functions are available to SAS and SQL users. The following table displays a partial alphabetically list of summary functions. Note: When multiple names for the same function are displayed, the ANSI-approved name appears first.

In the next example, the COUNT function is specified using the (*) argument to produce a total number of rows. The asterisk (*) is specified as the argument to the COUNT function to tabulate all rows found in the SASHELP.CARS table.

Unlike the COUNT(*) function syntax that tabulates all rows, the next example uses the COUNT function with a (column-name) argument to produce a total number of non-missing rows based on the specified column, ORIGIN.

In the next example the COUNT function with a (column-name) argument is specified along with a GROUP BY clause to produce a total number of non-missing rows based on the specified column, ORIGIN.

To enhance the results produced in the previous example by displaying the distribution of values found in the ORIGIN column, the ORIGIN column is specified along with the COUNT function and (column-name) argument in the SELECT statement. As before, a GROUP BY clause is specified to produce a total number of non-missing rows based on the specified column, ORIGIN.

In the next example the MIN summary function is specified to determine the lowest priced car in the SASHELP.CARS table.

The results displayed in the previous example can be enhanced using a FORMAT= parameter to display the results using familiar formatted values.

Additional details about car(s) produced in the previous example, the MAKE and MODEL of car associated with the lowest MSRP can be displayed in the SELECT statement. To select a single row that matches the derived value produced by the MIN function, a HAVING clause expression is specified with the CALCULATED keyword.

The RANGE function is available to determine the difference between the largest and smallest values for a selected numeric column. The NEXT query determines the range of values (difference) for all cars by their ORIGIN (e.g., Asia, Europe, and USA).

Producing Value-list Macro Variables with PROC SQL

Value-list macro variables can be defined in a SELECT statement using the INTO clause and SEPARATED BY keyword. In the first example, a list of unique ORIGIN values are captured using the DISTINCT keyword and then dynamically written to a macro variable called, MORIGIN_LST. Using the SEPARATED BY keyword, each unique ORIGIN value is separated by a comma (,). The result of specifying the SELECT statement syntax in this way lets you dynamically create and write a list of values in ascending order to a global macro variable.

Sometimes it may be necessary to enclose each value in a value-list macro variable in double-quotes. The next example specifies the QUOTE function where the first argument identifies the column to use and the second argument identifies the symbol to use to enclose the unique values that are written to the value-list macro variable.

You may have noticed that the result produced in the previous example has trailing blanks at the end of shorter length values. The next example specifies the QUOTE function along with an embedded TRIM function to remove extraneous blanks at the end of shorter length values. Not only is the result more appealing, it produces results as they should appear in the dynamically created value-list macro variable.

Trademark Citations

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

About the Author

Kirk Paul Lafler is entrepreneur and founder at Software Intelligence Corporation, and has worked with SAS software since 1979. As a SAS consultant, application developer, programmer, data analyst, educator and author at Software Intelligence Corporation; and an advisor and SAS programming adjunct professor at the University of California San Diego Extension, Kirk has taught SAS courses, seminars, hands-on workshops, and webinars to thousands of users around the world.

Kirk has also authored or co-authored several books including PROC SQL: Beyond the Basics Using SAS, Third Edition (SAS Press. 2019) and Google® Search Complete (Odyssey Press. 2014); hundreds of papers and articles on a variety of SAS topics; served as an Invited speaker, educator, keynote and section leader at SAS user group conferences and meetings worldwide; and is the recipient of 25 "Best" contributed paper, hands-on workshop (HOW), and poster awards.

Comments and suggestions can be sent to:

Kirk Paul Lafler

SAS® Consultant, Application Developer, Programmer, Data Analyst, Educator and Author

Software Intelligence Corporation

E-mail: KirkLafler@cs.com

LinkedIn: https://www.linkedin.com/in/KirkPaulLafler/

LinkedIn: https://www.linkedin.com/in/Order-of-Magnitude-Analytics/

Twitter: @sasNerd

 

Posted by on

Tags: Kirk Lafler

Categories: Kirk's Korner

Add Your Comments

Thinking of joining PHUSE?

Already a member but not sure how you can benefit?

PHUSE is an expanding, global society with a global membership of clinical data scientists. It requires a large pool of resources to help with its running, and so there are many opportunities for members to become involved. Whether it's chairing a conference, presenting at an event, leading a working group or contributing to the quarterly online newsletter, we are always keen to hear from volunteers.

Find Out More