CoolData blog

11 December 2009

Adding new variables to existing datafile in Data Desk, Part 1

Filed under: Data Desk — Tags: , , , — kevinmacdonell @ 11:47 am

When I form a query on our database using MS Access, I try to make it as complete as possible before starting work in Data Desk. I ask myself, do I have all the columns (variables) I need for this task? If not, I modify and run the query again.

Invariably, though, I realize I’ve forgotten something only after I am well underway in Data Desk and starting from scratch is not an option. Or sometimes the data set has to contain so many variables (such as in any typical predictive modeling project) that there’s no way I can pull them out of the database with just one query (MS Access can handle only so much at once).

There is a lazy way to do this, which I do not recommend. You can simply paste a new variable into an existing relation, under these conditions:

  • The number of cases in your new variable is exactly the same as in your existing data file.
  • All the cases (‘rows’ in Excel) in the new variable are in exactly the same order as your existing data file.

These conditions can be easily met if your existing data and your new data is coming from the same static spreadsheet stored on your own hard drive. But a university database is a living thing. The number of ‘living, addressable alumni’ changes by the hour, so I can never be sure that any new variable I pull out of it is going to match up with the data I pulled yesterday.

Even if you ARE grabbing data from your own spreadsheet, what happens if you’ve reordered the rows in the meantime? Data Desk will allow you to paste the variable (because the number of cases is the same), but the data will be mismatched.

Fortunately it takes just a couple of steps in Data Desk to perform what I call a ‘merge’ or ‘join’ using the Lookup and GetCase functions. Sometimes I call it stitching data together – that’s what it feels like I’m doing.

I’ll show you how in: “Adding new variables to existing datafile in Data Desk, Part 2.

Advertisements

2 Comments »

  1. […] Derived variables, joins, lookup, merge, Predictor variables — kevinmacdonell @ 8:04 pm As explained in Part 1, you can simply paste a new variable into an existing relation IF there are the same number of […]

    Pingback by Adding new variables to existing datafile in Data Desk, Part 2 « CoolData blog — 13 December 2009 @ 8:05 pm

  2. […] simply bring in the file as a new variable in your model. (If you’re using Data Desk, follow these directions for adding a new variable to an existing data set.) You may need to create a variable name, as the […]

    Pingback by External data project: “Do Not Call” as a predictor « CoolData blog — 20 May 2010 @ 7:40 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: