CoolData blog

13 December 2009

Adding new variables to existing datafile in Data Desk, Part 2

Filed under: Data Desk, Derived variables — Tags: , , , , , — kevinmacdonell @ 8:04 pm

As explained in Part 1, you can simply paste a new variable into an existing relation IF there are the same number of cases in both the existing relation and the new variable, AND they are in exactly the same order. Otherwise you will have to perform a join, as follows.

As an example, let’s say my existing data set has variables for a unique identifier (ID), First Name, Last Name, and Class Year. And let’s say that for some reason, I need to add Degree as a variable. (For something simple like this, I would of course scrap this data file and re-query the database to get all the data I need; this is just an example.)

I’ll need to query the database for two things, ID and Degree. (Whenever you query your database, you must never fail to include a unique record identifier. In Banner this may be ID or PIDM.)

Copy the columns you want from Excel, Access or whatever your source is.

In Data Desk, go to the Data menu:  New > Relation. You’ll be pasting your new variables into a relation that is separate from the existing relation that contains the names and class years.

Name the folder (let’s call it “Deg“), and paste your variables. For neatness, drag the variables out of the Clipboard folder and into Deg, and drag the empty Clipboard to the trash.

To avoid confusing the new ID variable and the ID variable as already exists, rename the new variable. (eg. ‘ID-degree’)

Create a new derived variable in the Deg relation folder. Name it ‘ID-degree-lookup’, and give it this expression:

lookup('ID-degree','ID')

Remember to drag each of the two variables into the expression window rather than type them. The Lookup command will literally look up the value of the first argument in the list of values in the second argument.

Now create another derived variable, again in the Deg relation folder. Call it ‘Degree-joined’. Give it this expression:

getcase(textof('Degree'),'ID-degree-lookup')

The GetCase command will retrieve the value of ‘Degree’ that matches each looked-up ID value. The nested TextOf command is needed because the value of ‘Degree’ is non-numerical. If the variable we were adding was a numerical value, such as Lifetime Giving, the TextOf command would not be needed.

You can check if the variable works by choosing Show numbers from the variable’s Hyperview menu, or by dragging the variable into a frequency table.

‘Degree-joined’ is now a full member of both relations – you can drag it into the folder that contains your original data without getting an error message. It’s now available to be used like any other variable in the original data set.

If this seems a little confusing at first, don’t worry. Print these directions out and refer to them as needed. If you are building a predictive model and discovering new variables as you go along, you’ll be calling on this technique so often that eventually you’ll know it by heart.

Blog at WordPress.com.