CoolData blog

3 January 2011

Seven building blocks for your data work

Filed under: Best practices, Training / Professional Development — kevinmacdonell @ 6:36 am

(Image used via Creative Commons license. Click image for source.)

Whether you’re interested in predictive modeling or you just want to become more data-oriented in your work, each of these seven skills will prove useful again and again. None of them involves using stats software. Some of them involve Excel, the software you’re most likely to already have on your desktop.

1. Get access

Arrange to get direct access to some of the key tables of your organization’s database. If data entry is part of your job, you may already have “write” access, but “read” access will be fine. (“Query” access is better, though.) In any event, get as much access as allowable. Knowing which tables contain what types of data will prove valuable in the future. If you can’t get your hands on either the database or a reporting tool that queries the database, skip to Number 3.

2. Get data

Learn how to build a simple query to extract data on your own. Most of us who work at institutions with complex databases rely on our technical people to pull the data we need. Often, though, the tools for running our own reports are available for the asking. When I was working as a prospect researcher, it took forever to get a report built, my requirements kept changing, and I needed data faster than anyone could provide it. So, against my will, I learned how to use Microsoft Access to query our Oracle database. It was a pain in the ass, but the flexibility this afforded me to grab the data I needed at any time turned out to be extremely valuable when I later got into more data-focused work. Of any building block, this one will take the longest — but you don’t need to become proficient, you just need to get started.

3. Get functional

Once you’ve pulled some data, or had it pulled for you, nine times out of ten you’re going to be working with it in Excel. You should learn how to use some very basic spreadsheet formulas. Insert a new blank column somewhere in your data file. Type “=” and start exploring some of the available functions for handling text strings and calculations. If you already know a few functions, go a little further by exploring conditional (“IF”) statements and nested formulas (formulas within formulas). The structure and logic of formulas is something you will encounter again and again, not just in Excel but in query-building and in handling data with other software packages. People are intimidated by formulas because they can appear complex, but just take it slow, learning new functions as you need them. Searching the Net for tutorials is frequently better than relying on Excel’s help file.

4. Get savvy

Acquire or brush up on some (very) basic knowledge of stats terms. Start with learning the meaning, proper usage, and calculation of percentages, means (averages), and medians. A lot of the work of looking for patterns in data involves nothing more than comparing percentages, averages  and medians between two more more groups. If these concepts are old hat, find definitions for other terms you will encounter, including quartile, decile, percentile, correlation, distribution and any other unfamiliar statistical term you come across. When you Google these terms, look for articles that explain them in a way that makes sense to you. If the discussion starts getting too technical, back out of it and follow another search result. There are dozens of ways to explain these concepts, and someone out there has expressed it in language you will understand.

5. Get charts

Go back to Excel, and learn how to make bar charts, which in Excel are called “column” charts. The best way to explore data is to display it visually. It’s also the best way to communicate your findings to others. Hans Rosling’s motion bubble charts may be the sexy thing in graphical display of quantitative narratives, but most of my stories (and yours) will be told by way of bar charts. The improved interface of the Office 2007 suite makes it easier than ever, so upgrade if possible. Learn the various types of bar charts (simple, stacked, 100% stacked). When you get proficient with making bar charts, try converting them to line plots. Ask yourself, which types of data lend themselves to bars and which to lines?

6. Get visual

Learn how to lay out a slide in PowerPoint. Yes, I know, you hate sitting through PowerPoint presentations. So don’t make the same mistakes other people do. Fill your screen with a visual display of data instead of text — copy and paste your bar chart onto a blank slide. Resize the chart and adjust labels as needed to ensure maximum visibility. Now instead of reading text from a slide, you can talk about the picture. Far more effective.

7. Get it down

Acquire the habit of documenting your work. This serves a number of purposes. First, your notes are a placeholder for your explorations. Seeking insights with data takes time, so keeping notes will let you know from day to day where you left off. Recording what you’ve learned already about your data means you don’t have to keep re-learning the same things every time you begin a related project. Second, if you want to share your discoveries with others, it’s a lot easier to pause every once in a while during your work to take a few notes and capture a chart or two than it is to write a discussion paper from scratch after the fact. Third, your documentation is an important record for others to build on your work in future years. (See The Hows and Whats of documentation.)

There you have it: Seven skills that will prove invaluable for any future work involving data. I haven’t given enough guidance in this post to take you through any one analysis project from start to finish, but the general outline is here: Formulate a question you would like an answer to, then find out where the answer is likely to reside (database tables), isolate the data you need to study (query the database), manipulate and clean the data to make it suitable for analysis (functions in Excel or other software), apply the analytical tools that might help answer the question (compare two or more groups statistically), visualize the data (charts), and document and share your findings (via a presentation or discussion paper).

The fact is, listing a set of required skills to learn is putting the cart before the horse. The most powerful cart-puller is having one interesting question, fed by your own curiosity. When you have an engaging question or problem to solve, you’ll pick up the skills you need as you go along.

So what’s your question?

Advertisements

5 Comments »

  1. […] This post was mentioned on Twitter by Arent van t Spijker. Arent van t Spijker said: RT @DataInfoCom: Seven building blocks for your data work http://bit.ly/etutuE […]

    Pingback by Tweets that mention Seven building blocks for your data work « CoolData blog -- Topsy.com — 4 January 2011 @ 3:35 am

  2. Nice list! I would also add: backup your data at least weekly. In some big companies, they will do it 2 or 3 times daily on the network folders. But it’s never safe to count on someone else. You can create your own .bat file (on Windows) to regularly (and easily) backing up your data, your projects and your documentation.

    Comment by Sandro — 4 January 2011 @ 10:58 am

  3. Thanks Sandro – unfortunately I am currently falling down on the job when it comes to backups. I am forced to save data sets and model files on my hard drive because the network folders don’t have the capacity to handle such large files. There’s a lot of work at risk there. I try to make local backups (to a thumb drive, which is itself risky). Uploading the data to a server off-campus is probably not allowed, nor would I want to do that. Suggestions??

    Comment by kevinmacdonell — 4 January 2011 @ 11:44 am

  4. A local backup on an external hard drive is already a good start. I guess some websites (freely?) allow to put some gigabytes of data.

    I think you should be allowed to take some data home (if the data is not sensitive) to backup your work.

    Comment by Sandro — 4 January 2011 @ 12:21 pm

  5. @kevinmacdonell – you might want to try a cloud-based storage solution. There are a lot of options out there (my favorite is http://www.databackupexpress.com). It’s worth a look!

    Comment by Frank — 5 January 2011 @ 1:23 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: