CoolData blog

13 April 2010

Exploring your database relationships with NodeXL

Filed under: Coolness, Relationship mapping — Tags: , , , , — kevinmacdonell @ 7:16 am

Guest Post by Jason Boley, Associate Director of Prospect Management and Tracking, Purdue University

NodeXL is a powerful social network analysis tool that is designed as a plug-in to Excel 2007. NodeXL was designed by a team of programmers at Microsoft Research and is now maintained by the Connected Action Consulting Group.  In a nutshell, the tool was designed to produce network graphs from a list of network usage data, and includes built-in tools for getting information from Twitter, Flickr, YouTube, etc. (Link to NodeXL site.)

But can it be useful for fundraising too?

If you look at NodeXL from a fundraising perspective you immediately see the correlation. NodeXL is designed to be a discovery tool for discovering deep connections in your social networks. This in itself is pretty useful, especially if your fundraising department is heavily involved in social media (and it probably will be at some point). This tool helps you discover who is having an impact on your social reputation. But let’s put that aside. Even the most basic of fundraising offices are constantly looking to find new connections to and between donors. Who knows who that can provide me with an introduction to Mr. X? Depending on the size and complexity of your database (and your staff) these connections are often easy to miss. Sure, we may note that Bob and Jim sit on the same board or work at the same company, but going a layer deeper than that is often problematic.

With that in mind, I began to explore NodeXL.

There are a couple of things that are appealing about NodeXL. First, it’s free. It downloads as a zip file and operates essentially as a macro plug-in for Excel 2007. But this is unlike any macro you’ve ever seen. The visual capabilities are stunning. Secondly, it is relatively straightforward. While it does have some extremely powerful advanced features, it’s relatively easy to get started.

The only information that NodeXL accepts is called an edge list. An edge list is essentially an Excel list with two columns. In column A you have the first value and in column B you have a second value which is related to the value in Column A. So a sample edge list might look as follows:

Bob        Jerry
Jerry      Tom
Tom       Bob
Alice      Jerry

These two columns of data are all that NodeXL accepts and needs to do network analysis. The rest of the work, calculating connections between these relationships, is done for you.

I rarely think small, so my first inclination was to see if we could visualize all of the recorded board memberships in our Advance database. The first step was to create my edge list. This was very simple and took the following format:

I_Jason_Boley   O_Jason_Boley_Foundation
I_Bob Smith       O_Bob_Smith_Foundation
I_Joe_Klein        O_Gates_Foundation

I think you can get the picture. The I and O appended to the front are for facilitating sorting (if you should wish to color code them later you can easily sort the organizations from the individuals). As you might expect, this query returned a couple thousand connections. But I was pleased NodeXL handled them so efficiently. The resulting graph (Harel-Koren Fast Multiscale format) was visually stunning.

(Click for full size.)

Stunning? Yes! Useful? I wasn’t sure yet. I knew that I could isolate an individual within this huge graph. The highlighted green line above, for example, is one of our major donors.

I decided to leave this dataset behind for a bit and move to more of a discovery project. We are very fortunate to have completed a full database screening a few months ago. I turned my eyes towards the data we received from our vendor that includes a table of people in our database with connections the vendor discovered to foundations (based on GuideStar data). I created an edge list from this data that resulted in 4,872 relationships. The resulting graph (Fruchterman-Reingold format) returned a huge blob.

(Click for full size.)

You can click around on a graph like this to isolate individual connections (or edges) and try to do some discovery this way, but in a graph as dense of this it is basically impossible. But the other way to do discovery work is to pose the question from a certain perspective. For instance, I highlighted in the edge list two particular closely related foundations. I then used the subgroup option in NodeXL to select all of the vertices four layers deep, meaning let’s look at relationships between individuals not just directly to the organization, but let’s dig a few layers deeper. You can see from the graph below the highlighted connections from the larger mass.

(Click for full size.)

I then extracted these selected vertices into a brand new NodeXL sheet. After adding some labels, you can reduce the useful information to just a few connections.

(Click for full size.)

What can we tell from this graph? It is likely that the connection between Al Winner and Steven Cook was already known and well documented. But the real advantage here is seeing the tangential connections we may have missed. Steven Cook has a potential relationship with Charles Talor. Indeed, in our database Steven Cook is a well known alumnus, while Charles Talor is someone we would like to know. So a potential for establishing a relationship is provided. The same holds true for potential connections to James Jones and Jean Smith. While some of these connections may be known, the advantages to visualizing them are great. Potential connections jump from the page.

What I have discussed is scratching the surface of this advanced tool in only a cursory manner. I have very little knowledge of network graphing beyond what I’ve read online. Yet with NodeXL I have been able to produce some simple but powerful graphs. Relationships as they are defined in fundraising databases are very different from those that can be extracted from complicated social networking data. And yet there are similarities. As I begin to delve deeper into this tool, it is forcing me to rethink how we have relationships defined in our database, think about existing data that we have that might be useful in a visual format, and think about the future where we will certainly want to supplement our traditional alumni data with information from social networks.

(Jason Boley is the Associate Director of Prospect Management and Tracking at Purdue where he manages prospect information and focuses on data analytics. Jason has over ten years of experience working with fundraising databases and has presented nationally on the topics of database reporting and prospect management.)

Stunning? Yes! Useful? I wasn’t sure yet. I knew that I could isolate an individual within this huge graph. The highlighted green line above, for example, is one of our major donors.

I decided to leave this dataset behind for a bit and move to more of a discovery project. We are very fortunate to have completed a full database screening a few months ago. I turned my eyes towards the data we received from our vendor that includes a table of people in our database with connections the vendor discovered to foundations (based on GuideStar data). I created an edge list from this data that resulted in 4,872 relationships. The resulting graph (Fruchterman-Reingold format) returned a huge blob.

Advertisements

5 Comments »

  1. […] Exploring your database relationships with NodeXL […]

    Pingback by Coming soon: Fun with NodeXL « CoolData blog — 16 April 2010 @ 8:30 am

  2. […] which was put together by David Broussard and further explored in a guest post for the CoolData blog by Jason Boley of Purdue […]

    Pingback by The Future of Prospect Reseach | ProspectResearch.com — 4 August 2010 @ 5:57 pm

  3. […] is Jason Boley’s second guest post for CoolData. His first, Exploring your database relationships with NodeXL, remains one of the most popular posts on the site. Reviewing software is a departure for CoolData, […]

    Pingback by RapidMiner a powerful, low-budget option for the data savvy « CoolData blog — 28 September 2010 @ 7:43 am

  4. Hello Jason, this is a great and informative post. I liked the two stages-data mining by which weak or unknown relationship were uncovered. Would it be possible to show how you did your subgroups in more detail? I refer to your statement ” I then used the subgroup option in NodeXL to select all of the vertices four layers deep”. Thank you

    Comment by ali anani — 4 December 2011 @ 6:37 am

    • Hello. I took a new position after making this post and honestly haven’t gotten around to using NodeXL much lately. But as I remember it’s fairly easy to find the groups with the most connections by just sorting the results table by the ‘degrees’ category. There is also a clustering option that creates subgroups. You can read about all of this in much more detail in the free NodeXL tutorial located here: http://casci.umd.edu/images/4/46/NodeXL_tutorial_draft.pdf

      Comment by Jason Boley — 7 December 2011 @ 3:52 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: