CoolData blog

13 April 2010

Exploring your database relationships with NodeXL

Filed under: Coolness, Relationship mapping — Tags: , , , , — kevinmacdonell @ 7:16 am

Guest Post by Jason Boley, Associate Director of Prospect Management and Tracking, Purdue University

NodeXL is a powerful social network analysis tool that is designed as a plug-in to Excel 2007. NodeXL was designed by a team of programmers at Microsoft Research and is now maintained by the Connected Action Consulting Group.  In a nutshell, the tool was designed to produce network graphs from a list of network usage data, and includes built-in tools for getting information from Twitter, Flickr, YouTube, etc. (Link to NodeXL site.)

But can it be useful for fundraising too?

If you look at NodeXL from a fundraising perspective you immediately see the correlation. NodeXL is designed to be a discovery tool for discovering deep connections in your social networks. This in itself is pretty useful, especially if your fundraising department is heavily involved in social media (and it probably will be at some point). This tool helps you discover who is having an impact on your social reputation. But let’s put that aside. Even the most basic of fundraising offices are constantly looking to find new connections to and between donors. Who knows who that can provide me with an introduction to Mr. X? Depending on the size and complexity of your database (and your staff) these connections are often easy to miss. Sure, we may note that Bob and Jim sit on the same board or work at the same company, but going a layer deeper than that is often problematic.

With that in mind, I began to explore NodeXL.

There are a couple of things that are appealing about NodeXL. First, it’s free. It downloads as a zip file and operates essentially as a macro plug-in for Excel 2007. But this is unlike any macro you’ve ever seen. The visual capabilities are stunning. Secondly, it is relatively straightforward. While it does have some extremely powerful advanced features, it’s relatively easy to get started.

The only information that NodeXL accepts is called an edge list. An edge list is essentially an Excel list with two columns. In column A you have the first value and in column B you have a second value which is related to the value in Column A. So a sample edge list might look as follows:

Bob        Jerry
Jerry      Tom
Tom       Bob
Alice      Jerry

These two columns of data are all that NodeXL accepts and needs to do network analysis. The rest of the work, calculating connections between these relationships, is done for you.

I rarely think small, so my first inclination was to see if we could visualize all of the recorded board memberships in our Advance database. The first step was to create my edge list. This was very simple and took the following format:

I_Jason_Boley   O_Jason_Boley_Foundation
I_Bob Smith       O_Bob_Smith_Foundation
I_Joe_Klein        O_Gates_Foundation

I think you can get the picture. The I and O appended to the front are for facilitating sorting (if you should wish to color code them later you can easily sort the organizations from the individuals). As you might expect, this query returned a couple thousand connections. But I was pleased NodeXL handled them so efficiently. The resulting graph (Harel-Koren Fast Multiscale format) was visually stunning.

(Click for full size.)

Stunning? Yes! Useful? I wasn’t sure yet. I knew that I could isolate an individual within this huge graph. The highlighted green line above, for example, is one of our major donors.

I decided to leave this dataset behind for a bit and move to more of a discovery project. We are very fortunate to have completed a full database screening a few months ago. I turned my eyes towards the data we received from our vendor that includes a table of people in our database with connections the vendor discovered to foundations (based on GuideStar data). I created an edge list from this data that resulted in 4,872 relationships. The resulting graph (Fruchterman-Reingold format) returned a huge blob.

(Click for full size.)

You can click around on a graph like this to isolate individual connections (or edges) and try to do some discovery this way, but in a graph as dense of this it is basically impossible. But the other way to do discovery work is to pose the question from a certain perspective. For instance, I highlighted in the edge list two particular closely related foundations. I then used the subgroup option in NodeXL to select all of the vertices four layers deep, meaning let’s look at relationships between individuals not just directly to the organization, but let’s dig a few layers deeper. You can see from the graph below the highlighted connections from the larger mass.

(Click for full size.)

I then extracted these selected vertices into a brand new NodeXL sheet. After adding some labels, you can reduce the useful information to just a few connections.

(Click for full size.)

What can we tell from this graph? It is likely that the connection between Al Winner and Steven Cook was already known and well documented. But the real advantage here is seeing the tangential connections we may have missed. Steven Cook has a potential relationship with Charles Talor. Indeed, in our database Steven Cook is a well known alumnus, while Charles Talor is someone we would like to know. So a potential for establishing a relationship is provided. The same holds true for potential connections to James Jones and Jean Smith. While some of these connections may be known, the advantages to visualizing them are great. Potential connections jump from the page.

What I have discussed is scratching the surface of this advanced tool in only a cursory manner. I have very little knowledge of network graphing beyond what I’ve read online. Yet with NodeXL I have been able to produce some simple but powerful graphs. Relationships as they are defined in fundraising databases are very different from those that can be extracted from complicated social networking data. And yet there are similarities. As I begin to delve deeper into this tool, it is forcing me to rethink how we have relationships defined in our database, think about existing data that we have that might be useful in a visual format, and think about the future where we will certainly want to supplement our traditional alumni data with information from social networks.

(Jason Boley is the Associate Director of Prospect Management and Tracking at Purdue where he manages prospect information and focuses on data analytics. Jason has over ten years of experience working with fundraising databases and has presented nationally on the topics of database reporting and prospect management.)

Stunning? Yes! Useful? I wasn’t sure yet. I knew that I could isolate an individual within this huge graph. The highlighted green line above, for example, is one of our major donors.

I decided to leave this dataset behind for a bit and move to more of a discovery project. We are very fortunate to have completed a full database screening a few months ago. I turned my eyes towards the data we received from our vendor that includes a table of people in our database with connections the vendor discovered to foundations (based on GuideStar data). I created an edge list from this data that resulted in 4,872 relationships. The resulting graph (Fruchterman-Reingold format) returned a huge blob.

12 April 2010

New way to look at words

Filed under: Coolness, Data visualization, Free stuff, Text — Tags: , , , , , — kevinmacdonell @ 8:13 am

Word clouds aren’t new, but there’s a new online app for creating them that is worth checking out. Tagxedo allows you to create your clouds using some versatile tools for shaping the appearance of the cloud, which you can then easily save as a .jpg or .png.

This comes to me via a post on the LoveStats blog, where Annie Pettit has posted a couple of her own creations – one based on the text of her resume, and one on all the words in her blog.

I wrote about word clouds back in December (Quick and easy visuals of large text files), and the well-known and very cool tool known as Wordle, the creation of Jonathan Feinberg. Tagxedo does the same thing but works a little differently. Powered by Microsoft’s SilverLight browser plug-in, Tagxedo offers a nifty interface for importing your text (or URL), finely controlling your word choice, and playing with the font, colour, theme and layout of your cloud, including being able to choose a shape. The choice of shapes is rather limited – hearts, stars, rectangles and ovals, mostly. Here’s a star-shaped word cloud based on the 150 most common words on this blog:

(Click for full size image.)

My interest in word clouds is related to visualization of data – in this context, conveying the gist of a mass of text by giving prominence to the most common significant words. For example, last year I used Wordles to visualize tens of thousands of words entered as free-text comments in a survey of alumni. It’s no substitute for real analysis, but it does make a cool presentation slide!

NOTE: Check in tomorrow for Jason Boley’s amazing work with NodeXL for visualizing prospect connections in your data.

8 April 2010

Coming soon: Fun with NodeXL

Filed under: Coolness, Relationship mapping — Tags: , , , — kevinmacdonell @ 7:53 am

What is this? Find out next week.

Relationship mapping is a side-interest of mine, one which I’ve only rarely been able to apply in my work. In major-gifts research, I’ve used software to map connections between prospects via the corporate and non-profit boards they sit on together, to create visualizations that are similar in appearance to Muckety, TheyRule and other online apps. These visualizations were half exploratory (searching for the best point of entry to a prospect) and half illustrative (as in showing likely points of connection in a briefing memo).

(WellStar Foundation’s David Broussard reviews a number of relationship mapping applications here. He represents one of the products he reviews, Market Visual, but seems to have a genuine interest in keeping up with everything that’s out there.)

Naturally I was intrigued by NodeXL, an Excel add-on which makes it a snap to visually portray the relationships between and among hundreds or thousands of entities at once. Unlike the products David reviews, NodeXL works exclusively with the data you feed into it.

It’s cool, it’s free, it’s easy to use, and it’s been on my list of blog post ideas for months. I have played with it a bit, creating a visualization of the cross-references in our database. The result looked like a giant knot of black spaghetti. I thought that was cool, but doubted anyone else could be convinced of its coolness. I didn’t find an application for it at work.

I think Jason Boley at Purdue has, though. (Twitter: soam) He’s worked with it a lot more and taken it far beyond what I was able to do. His explorations will be of interest to fundraising shops, so a little while ago I asked if he would contribute a guest post on the topic, and he has obliged. Read it here.


20 January 2010

Another take on Google’s Motion Charts

Filed under: Coolness, Data visualization, Free stuff — Tags: , , , , — kevinmacdonell @ 9:09 am

Late last year I posted a tutorial on creating Google motion charts with your data. These very cool charts work with your time-series data, stored in Google Docs, to create an animation with the power to convey a lot of information in an easily understandable form.

But what about private data? You may not want to rely on Google’s ability to password-protect your data, or the privacy provisions you work with may prohibit posting data to an outside server.

Here’s another way to take advantage of motion charts. I was put onto this by Trevor Skillen, President and CEO of Metasoft, in Vancouver BC, whose company is working on incorporating motion charts into their well-known FoundationSearch product.

This version uses stored code to manipulate your data locally, rather than pulling it from Google Docs.

The advantages are clear:

  • Your data is stored locally and the code is executed locally, in the browser – nothing is sent to Google.
  • You gain precise control over the appearance – you can hide options that the user doesn’t need to see.
  • The example code provided by Google is fairly easy to modify without requiring programming or scripting skills.

Trevor directed me to Google’s ‘playground’ where one can get a quick feel for the technology without much tech effort.

There is a downside … there is a good deal of manual coding you’ll have to do if you want to put a chart together using your own data. This limits you to fairly simple charts – unless you’re capable of writing the additional code that will allow the chart to get data from a file or table.

19 January 2010

Have a dues-based alumni association? Read this paper.

Filed under: Alumni, Model building, Predictor variables — Tags: , , , , — kevinmacdonell @ 4:42 pm

Advancement consultant Peter B. Wylie and predictive modeling expert John Sammis have recently published a new paper, Data Mining and Alumni Association Membership. Like all of their work it’s written in a way anyone can understand. And like some of my recent posts have pointed out, it shows how data mining can be a powerful tool when used to predict all sorts of behaviours besides giving.

This time they’re showing you how you can use certain key pieces of information in your database to predict who will be most likely to want to join your dues-based alumni association. Their paper identifies the key variables that tend to be strongly related to active alumni association membership, and demonstrates how to create a predictive score. Their data came from four public higher-education universities with graduate and undergraduate enrollments that ranged from 4,500 to 27,000.

They believe schools should be using this information to save money on membership appeals, and boost membership.

And I do, too.

Addendum (20 Jan 2010): FYI, Peter Wylie is interviewed in the current issue of CASE Currents magazine. Will post link if it becomes available.

10 December 2009

Cool motion charts – Part 4

Filed under: Data visualization, Free stuff — Tags: , , , — kevinmacdonell @ 1:30 pm

In my previous post in this tutorial, I described how to assemble the data to create your bubble chart. Now comes the relatively painless part: Pasting it into Google Docs and inserting a Google Gadget – the motion chart itself.

To review, the required columns in your spreadsheet should be in this order:

  • A column to define the bubbles (in our case, this is Class Decade)
  • A column to define the time series (Year, i.e. fiscal year of giving)
  • At least two columns of numerical data for the x-axis and y-axis. (You can have more than two columns, to give you more options for charting, but you need at least two. I used Median Gift for the y-axis, and a choice of either Number of Donors or % Participation for the x-axis.)
  • You may also have a column for Category, which just labels the circles in the legend (in our example, this is just a duplication of the data in the Class Decade column)

Assuming you already have a Google or Gmail account, navigate to Google Docs and click on ‘Create New’. Choose ‘Spreadsheet’ from the drop-down menu. Copy all the cells of your Excel spreadsheet, and paste them directly into the Google spreadsheet. Give the file a name, and Save.

(I’m going to assume that you have permission to post your institution’s data online. Keep in mind that you can block public access to the data, or limit it to select invitees who have to log in, or make it wide open and available to all. In any case, it would be best to seek approval.)

Select all of the cells in your sheet that contain your chart data, including the column headers. (Don’t select whole columns – click on cell A1, then hold shift down while clicking on the rightmost cell in the very last row of the sheet.)

In the spreadsheet menu, choose Insert. Click on Gadget.

A window of options will open. You might have to scroll down to find Motion Chart. Click the ‘Add to spreadsheet’ button.

The chart settings window will appear on top of your spreadsheet. (If you don’t see it, scroll up!)

The Range field will already be populated, because you had those cells selected before inserting the gadget. You can modify the range if need be.

Enter a title in the Title field. Ignore the other fields for now.

Click Apply and close.

The chart will take a second or two to appear. It won’t look right – we need to tweak it a bit.

It will also be rather small and hard to work with. To move it to its own sheet, clicking on the little down-arrow at the top left of the chart title bar, and select “Move to own sheet …” from the drop-down menu.

(For additional help at this stage, select Help from the More drop-down menu at top right.)

Now let’s choose the correct values for our x-axis and y-axis.

Click on the x-axis name, and choose the desired value from the options that pop up. (We’re using % Participation.)

(Ignore the Lin and Log menus for now. We’ll leave the scale as Linear, rather than Logarithmic.)

Now click on the y-axis name, and choose Median $.

Notice that the bubbles adjust their orientation accordingly.

Other items that you’ll want to tweak are below. All of these are able to be saved as the default state of your chart:

  • Colour: This should be set to ‘Category
  • Size: Set this to ‘Number alumni‘. For fun, you can also set this to ‘Number of donors’ – then the bubbles will change size over time!
  • Playback speed: The little triangle to the right of the Play button. I usually set this on the slowest speed.
  • Starting year: Push the slider all the way to the left.
  • Labels and trails: You can also click on individual bubbles to label them, or display their trails as they move.

If you play around a bit, which I know you will, you’ll notice that it’s very easy to lose all your settings. And if you try to share your chart with someone else, it won’t display in their browser the way you want it to.

The method for saving your default chart state will be covered in Part 5.

Older Posts »

Create a free website or blog at WordPress.com.