When I form a query on our database using MS Access, I try to make it as complete as possible before starting work in Data Desk. I ask myself, do I have all the columns (variables) I need for this task? If not, I modify and run the query again.
Invariably, though, I realize I’ve forgotten something only after I am well underway in Data Desk and starting from scratch is not an option. Or sometimes the data set has to contain so many variables (such as in any typical predictive modeling project) that there’s no way I can pull them out of the database with just one query (MS Access can handle only so much at once).
There is a lazy way to do this, which I do not recommend. You can simply paste a new variable into an existing relation, under these conditions:
- The number of cases in your new variable is exactly the same as in your existing data file.
- All the cases (‘rows’ in Excel) in the new variable are in exactly the same order as your existing data file.
These conditions can be easily met if your existing data and your new data is coming from the same static spreadsheet stored on your own hard drive. But a university database is a living thing. The number of ‘living, addressable alumni’ changes by the hour, so I can never be sure that any new variable I pull out of it is going to match up with the data I pulled yesterday.
Even if you ARE grabbing data from your own spreadsheet, what happens if you’ve reordered the rows in the meantime? Data Desk will allow you to paste the variable (because the number of cases is the same), but the data will be mismatched.
Fortunately it takes just a couple of steps in Data Desk to perform what I call a ‘merge’ or ‘join’ using the Lookup and GetCase functions. Sometimes I call it stitching data together – that’s what it feels like I’m doing.
I’ll show you how in: “Adding new variables to existing datafile in Data Desk, Part 2.“