Sas Index; How To Create a SAS Index
you can create an index in many different ways, including data step and proc step
When deciding if you should create an index, size matters. The size of your data-set is certainly one major component dictating the necessity of a sas index. One simple question to ask yourself ; are you regularly using a sub-set (a small piece of a data-set, let’s say 15%) of a very large data-set (a data-set of hundreds of thousands or more observations). If the answer to that question is a YES, read further. The other question you want to ask yourself; what is the frequency that you are using a sub-set of a very large data-set? If the frequency is high, you should seriously consider creating an index.
Now you probably want to understand HOW to create a sas index. As is the case with many things in life, it depends. If you have an existing sas data-set, you can use proc datasets along with the where expression. So, we will assume that you know how to import an existing sas data-set using the data step (data, infile, input, run)..all that fun stuff.
You will want to use proc freq to select good variables for your index. You need to do this no matter if you have an existing data-set or not. Good variables are discriminant variables. Discriminant variables are those that can discriminate between groups. For example, a variable that is unique to 15% or less of your employees. The question to ask yourself here is; if I use this variable for my index, how many observations will it return? If 15% or less is the answer, you have a discriminant variable.
Using proc freq is quite easy for this purpose.
proc freq data=nameofdataset(keep=thevariableyouareinterestedin);
Simply look at the percent column and make sure it is 15% and another going all the way down.
Creating the index with proc datasets is super simple after doing this work…
index create nameofvariable/;
Utilizing our index..
where nameofvariable is missing; //nameofvariable is the discriminant variable. The one you decided to use for the index.
Now what we’ve created here and utilized is referred to as a simple index. This is because we are using the values of one variable. If you want to use the values of more than one variable, this is referred to a composite index. Creating this type of index is not much different with data step.
index create compind = (discvar1 disvar2/); //compind is simply the name I’ve given to my composite index. You can use any name.
So that is it for this article lesson. I hope you enjoyed it!
Other articles: Proc Contents