CoolData blog

13 December 2009

Adding new variables to existing datafile in Data Desk, Part 2

Filed under: Data Desk, Derived variables — Tags: , , , , , — kevinmacdonell @ 8:04 pm

As explained in Part 1, you can simply paste a new variable into an existing relation IF there are the same number of cases in both the existing relation and the new variable, AND they are in exactly the same order. Otherwise you will have to perform a join, as follows.

As an example, let’s say my existing data set has variables for a unique identifier (ID), First Name, Last Name, and Class Year. And let’s say that for some reason, I need to add Degree as a variable. (For something simple like this, I would of course scrap this data file and re-query the database to get all the data I need; this is just an example.)

I’ll need to query the database for two things, ID and Degree. (Whenever you query your database, you must never fail to include a unique record identifier. In Banner this may be ID or PIDM.)

Copy the columns you want from Excel, Access or whatever your source is.

In Data Desk, go to the Data menu:  New > Relation. You’ll be pasting your new variables into a relation that is separate from the existing relation that contains the names and class years.

Name the folder (let’s call it “Deg“), and paste your variables. For neatness, drag the variables out of the Clipboard folder and into Deg, and drag the empty Clipboard to the trash.

To avoid confusing the new ID variable and the ID variable as already exists, rename the new variable. (eg. ‘ID-degree’)

Create a new derived variable in the Deg relation folder. Name it ‘ID-degree-lookup’, and give it this expression:

lookup('ID-degree','ID')

Remember to drag each of the two variables into the expression window rather than type them. The Lookup command will literally look up the value of the first argument in the list of values in the second argument.

Now create another derived variable, again in the Deg relation folder. Call it ‘Degree-joined’. Give it this expression:

getcase(textof('Degree'),'ID-degree-lookup')

The GetCase command will retrieve the value of ‘Degree’ that matches each looked-up ID value. The nested TextOf command is needed because the value of ‘Degree’ is non-numerical. If the variable we were adding was a numerical value, such as Lifetime Giving, the TextOf command would not be needed.

You can check if the variable works by choosing Show numbers from the variable’s Hyperview menu, or by dragging the variable into a frequency table.

‘Degree-joined’ is now a full member of both relations – you can drag it into the folder that contains your original data without getting an error message. It’s now available to be used like any other variable in the original data set.

If this seems a little confusing at first, don’t worry. Print these directions out and refer to them as needed. If you are building a predictive model and discovering new variables as you go along, you’ll be calling on this technique so often that eventually you’ll know it by heart.

Advertisements

1 Comment »

  1. […] show you how in: “Adding new variables to existing datafile in Data Desk, Part 2.“ Comments […]

    Pingback by Adding new variables to existing datafile in Data Desk, Part 1 « CoolData blog — 13 December 2009 @ 8:07 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: