CoolData blog

30 August 2010

New tricks for old data

Filed under: Alumni, Event attendance, Predictor variables — Tags: , , — kevinmacdonell @ 7:38 am

Your data might be old, but that doesn't mean it isn't predictive. (Used via Creative Commons licence. Click image for source.)

Do you sometimes exclude variables from your model because you feel the data is just too old to be useful? I wouldn’t be too quick. For some data at least, there’s no expiry date when it comes to predictive modeling.

I’ve heard of some modelers using old wealth-screening data and getting good results. It may be too out of date for the Major Gift people to use, but if there’s still a correlation with giving, it does your model no harm to make use of it. Just be sure the data you’re using is capacity-related and  not itself a predictive model, or existing donors will score high and high-likelihood non-donors will be submerged.

Old and out-of-date contact information works just fine. I always remind people that a phone number doesn’t have to be valid to be predictive. At some point, that alum provided that number (or email, or cell phone number), and the fact you have it at all is more important than whether you can still reach someone with it. For email in particular, probably a significant portion of your information is useless from a communication point of view, but will still be helpful in prediction. I am not talking about lost alumni — I never include those in my models. I mean alumni that we assume to be contactable.

I test whether the presence of, say, a business phone number is correlated with giving, but I also test the COUNT of business phone numbers. In order to do this, your database must retain the history of previous numbers. When an update is made, ideally a new record is created instead of overwriting the old one. This allows one to query for the NUMBER of update records — which I’ve often found correlates with giving. We know all those previous numbers were disconnected years ago, but their presence indicates a history of ongoing engagement.

What about event attendance? We might reasonably assume that an alum who attended a campus event a decade ago and never returned is far less likely to give than someone who visited just last year. Some schools have attendance data going back many years — is any of that still relevant? My answer is “probably.” I once got to study Homecoming attendance data for a university that had done a good job of recording it in the database going back 10 years. I already knew that Homecoming attendance was predictive of giving for this university, but I was surprised to discover that one-time, long-ago attendance was equally as predictive as recent attendance.

This may not hold true across institutions. You may want to break historical event attendance data into separate year categories to see whether they vary in their correlation with your predicted value. If they don’t differ significantly, then your most powerfully predictive variable will probably be a simple count of number of events attended: Repeat attendance, in my experience, is predictive in the extreme.

The enemy of relevance when it comes to data isn’t how old it is, but how incomplete or biased it is. For example, if you have good data on involvement on athletic teams up until 1985, and then nothing after that — that’s a problem. In that case, your variables for athletic involvement will be more informative about how old your alumni are than how engaged they might be. If you build a model that is restricted to older alumni, you’ll be fine, but if you include the entire database, ‘athletics’ will be highly correlated with ‘age’ and may add little or no predictive value.

What can you do? I see three strategies for addressing issues with older data, each one being appropriate in different situations.

  1. Leave it alone.
  2. Input the data.
  3. Impute the data.

We should leave the data alone when we know that the alum is the one who is primarily responsible for the presence or absence of data. All alumni who are not lost have more or less the same opportunity to provide us their contact and employment information. When all alumni have equal ability to influence some specific data point, the absence of data at that point is not a problem, but rather indicative of an attitude. No intervention is required. (A complicating factor is contact information that is purchased and appended — you should consult the ‘source’ code, if you have it, to distinguish between alum-provided data and data from other sources. The same might go for contact information that has been researched — but probably the number of records that have been researched is small in comparison with the entire database, and not a significant confounding factor.)

We should input the data when we know it to exist outside the database, when it is based on simple historical fact, and when it is practical to do so. An example would be student involvement in athletics. Unless capturing this information is someone’s explicit responsibility, the data will often be spotty; some class years will be covered and others won’t. Someone has this information — it’s probably in a file cabinet in the Athletics Office or, as a last resort, there’s always the yearbook — it just hasn’t been entered into the database. It’s a project, maybe a big project, but it might be quite doable with the help of a student or two. Would you go to this trouble just for the sake of predictive modeling? No. The risk is that the variable would still not be predictive. However, it isn’t hard to see that having the data will prove useful someday, perhaps for a special appeal directed at former student athletes. (An Alumni Records office with this sort of forward-looking, project-oriented mindset is a joy to work with!) If no data has ever been entered at all, and entering it retroactively isn’t a realistic goal, then why not just start tracking it from this point forward? It may be a long time before it becomes useful for data mining — you’ll be long-gone — but remember that our work rests on the shoulders of employees who have gone before, people who never heard of data mining but intuited that this or that category of data would someday prove useful.

And finally, we should impute the data when a variable is useful for prediction but excludes some sector of the alumni population through no fault of their own. Old wealth-screening data is a good example. If the data is ten years old, none of your recent graduates will have a wealth score. This might not be a problem if you’re building a Major Gift model or a Planned Giving model and excluding your younger alumni anyway, but for Annual Giving likelihood you should employ some of the techniques I discussed in previous posts on dealing with missing data. In those posts I was talking about survey data, but the idea is exactly the same. (See Surveys and missing data and More on surveys and missing data.) Essentially, the simpler techniques for imputing missing data involve substituting average values when we don’t know how an alum would have scored (or answered) had he or she had the opportunity to be included.

Search far and wide across the database for your predictors, but go deep as well — backwards in time!


18 February 2010

Tracking event attendance in Banner, Part 2

Filed under: Banner, Event attendance — Tags: , , — kevinmacdonell @ 8:49 am

In Part 1, I showed you the data-entry side of the system – setting up new events and adding database constituents as RSVPs and/or attendees. Today I’ll show you how to access counts of RSVPs and attendees, right in Banner.

Let’s say you’re working at the registration desk for your annual on-campus alumni reunion event, and your Alumni Director or someone comes up and asks for an update on how many attendees have registered so far. You can provide him or her with up-to-the-minute numbers in a cinch. Here’s what you do.

Go directly to GEIATTD. This screen will allow you to view and count all IDs who attended an Event or any specific Function within an Event. This is a query form – you can’t change attendance information here.

If you’ve been signing people in, the first two fields in this screen should already be populated with the unique number (CRN) and description of the event. If not, just click on Search (down arrow beside the Event field) to call up SLQEVNT, and press F8 to populate the query table; double-click on the desired event and return to GEIATTD.

You’ve got several options for counting up your attendees. But first, a couple of notes:

  • If a person is registered for more than one function, their ID will be listed twice on this screen, but the ID will be counted only once, so you don’t have to worry about double-counting.
  • Non-constituent guests (noted in comment fields) cannot be counted.

The following options assume that the buttons “No Guest Criteria” (Guest Indicator area) and “Both” (Invitee/Guest Indicator area) are selected. This follows from the fact that I don’t use the guest registration features in Banner. If you decide to use them, make adjustments accordingly.

1. To count both RSVPs and attendees together:

  • Leave Function blank
  • Leave RSVP blank
  • Select “No Attendance Criteria”
  • Click in any field of the box below.
  • View number in the box “Count”, top right.

2. To count only attendees:

  • Same as above, but instead select “Attended” instead of “No Attendance Criteria”

3. To count RSVPs:

  • Same as (1), but double-click in RSVP field and choose the appropriate code (eg. “Attend”).

4. To count RSVPs or attendees for a specific function:

  • Fill in the desired Function in the Function field, using the search icon.
  • Fill in other fields as desired.

This screen is perfect for obtaining quick RSVP/attendee counts on the fly. Naturally, though, you’ll want to have some reporting options in place in order to make full use of the Event Module for attendance tracking.

26 January 2010

Tracking event attendance in Banner, Part 1

Filed under: Banner, Event attendance — kevinmacdonell @ 12:52 pm

A while ago I urged you to track alumni event attendance in your database. For schools with Advancement departments using SCT Banner, here’s how to use Banner’s Event Management Module to do just that. I’ve shared this many times with other Banner users who want to get serious about tracking attendance but find the Event Module confusing.

The good news: It’s not so bad! Making it work for you requires knowledge of very few screens. Your events planning people don’t need to be using the module already; you can use the RSVP and attendance screens and ignore all the other features.

There are four main steps to event attendance tracking in Banner:

  • Creation of the Events and associated Function(s)
  • Entering RSVPs
  • Confirming attendance of invitees/guests
  • Viewing lists of invitees/guests/attendees

The process outlined below (and the illustrations) are from Banner 7, but from what I’ve seen of Banner versions 6 through 8, there are only minor differences between versions.

1. Events and Functions

Banner allows you to create two related entities, Events and Functions. (I will capitalize these words wherever I refer to their roles as Banner entities.) An Event can be any event offered, sponsored or connected with your institution for which you want to track attendance. That includes everything from Homecoming, to a donor recognition gala, to small receptions of a dozen people. A Function is any specific activity which is part of an Event.

As usual with Banner, you can make it as simple or as elaborate as you need to. You can track attendance for multiple Functions under the umbrella of a single Event. For a Homecoming, you might track the various dinners and other functions separately (and you may need to, if your event planners are relying on the Module for their guest lists, ticketing and so on).

It’s up to you. If you do track Functions separately, Banner makes it easy to count attendance for both the individual Functions AND the whole Event, without double-counting constituents who attend more than once function. But to keep things as simple as possible, you may choose to track only one Function.

Whatever you do, the key thing to remember is that every Event must have at least one Function associated with it. Due to lack of time and staff for data entry, my preference is for keeping it simple: Even for a huge event such as Homecoming, I have only one Function, which I’ve chosen to call “General”.

The Event is created in SLAEVNT. Each Event will automatically be given a unique ID. However (and this may be confusing), you do not need to create a unique Function code every time you create an Event – because the actual Function does not exist until you create an association in SLAEVNT. This will become clearer later on.

The various function types are stored in the validation table GTVFUNC, shown above. Before creating your Event, you might check to ensure that all the Functions you require are available in GTVFUNC, and create them if not. Again, with the occasional exception, I use only one Function code: “General.”

To create Functions:

  • Go to the validation table GTVFUNC
  • Enter a unique code (10 character limit), and its description (Dinner, Auction, Dance, Golf Tournament, etc.)
  • In the next field, select a function Type. A Type code may prove useful in later analysis, to quickly distinguish different categories of events (on campus vs. off campus, university vs. alumni chapter, etc. etc.). These codes are contained in another validation table: STVETYP (not shown).

Once you are satisfied that you have all your function choices in place, you may now create the Event itself:

  • Go to SLAEVNT. If the fields you see are not already clear, then insert a new record.
  • Type ADD in the ‘Event’ field. (Banner 8 has a new button for adding an event, I believe.)
  • Banner will give the event a unique ID when you click in the next section.
  • Click in the Description box and enter the name of your event.
  • Most of the fields that follow are optional. One that isn’t is ‘System’ – I enter A for Alumni.
  • Choose Event Type (again, from STVETYP)
  • Agency/Comm can be used to note event sponsor information, if relevant (optional)
  • Site: I use this field to distinguish on-campus events from all others. Use it as you see fit.
  • I ignore the other fields.
  • Click Save, then press Ctrl-PageDown to advance to the next (required) step.
  • In the next screen, enter Start and End dates (a calendar popup is available)
  • Enter Start and End times (required; if no specific times, just enter ‘0000’ and ‘2359’ for the start and end times).
  • I ignore the Building and Room fields. They’re not required.
  • Now for the final required task: Associate one or more Functions with this Event. Double-click on the Function field, then select List of Functions. You will see the list from the validation table. Select your desired function.
  • Repeat for additional Functions, if desired.
  • Ctrl-PageDown again to enter as many free-form comments for this Event as you wish (optional). In this field, I will sometimes add a note explaining if actual attendance data is available for this event, or just RSVPs.

2. Entering RSVPs

Now you can begin entering names of attendees. You can take one of four approaches:

  • enter an invitee list before the event, and check off attendance later;
  • enter RSVPs only, and check off attendance later;
  • just enter attendees at the event itself (i.e. at a registration desk), or
  • enter a final list of attendees anytime after the event.

Generally what we do is enter RSVPs as they come in, and confirm attendance on-site (i.e., registration) or after the fact (by reviewing guest lists after the event) – the former for large on-campus events such as Homecoming, the latter for smaller off-campus events.

For all approaches, the data entry method is essentially the same. You will use one of two tables: GEAATID or GEAATTD.  They are similar, but with one important difference. If the Event has more than one Function, you have to use GEAATID. If the Event has only one Function, you can use GEAATTD instead. To word it another way:

  • To record attendance or RSVP information for an ID attending multiple functions within an event, use GEAATID.
  • To record attendance for a single function, you can use GEAATTD.
  • Another way of thinking about the difference between them: Use GEAATID to record information by ID; use GEAATTD to record information by function.

(Although GEAATTD seems to be the simplest choice, I’ve gotten into the habit of always using GEAATID for some reason. Because that’s the screen I’m most familiar with, I’ll use that for illustration. Feel free to use the other screen.)

The primary screen for entering RSVP and attendance data. (Click for full-size view, and discover what my middle name is.)

Select the correct Event and Function. (You have to do this only once during a single session.) For the Event, click the down-arrow (Search) to enter query mode on the table SLQEVNT (not shown). This table will open as a blank; press F8 to populate the table with all the events that have been created to date. Double-click on the desired Event.

You’ll be returned to the main screen. Enter the ID or locate the invitee/attendee by name, and click in the next area. If the person has previously been entered (eg. as an RSVP), the Functions that person plans to attend will appear in the first column.

On the other hand, an error message, “Query caused no records to be retrieved,” means that the person has not already been added to any Function attendance list. That’s perfectly OK: You’re going to do that now. Just click OK

Get used to seeing this 'Error message'. (Click for larger view.)

on the error message, and type the name of the Function. (Or, double-click on the Function field, and select the appropriate Function from the pop-up table.) If there are no Functions available, that means no Functions have been associated with this Event in SLAEVNT; you will have to do that first – refer to that section above.

You’ll notice that this screen is the place to record both RSVPs and actual attendance. If entering RSVPs, double-click in the RSVP field and select the appropriate response. (Note: The responses may or may not be pre-loaded. They’re contained in yet another validation table: GTVRSVP. The table I work with has two options, “ATTEND” or “NO_ATT”. If your table is blank, then set up your validation table with the appropriate entries and start over.) When you get rolling, you’ll probably find it quicker to just type “ATTEND” instead of calling up the RSVP options.

Set up GTVRSVP like so:

The RSVP field is optional, by the way. If you’re marking actual attendance, just check off the Attended box in GEAATID. It’s that easy. (Remember to hit ‘Save’.)

I ignore all the other fields, including the “Guests” checkbox. Banner allows you to track who was a guest of whom, but we don’t bother with that. Our rule is, if someone is a constituent in Banner, they’re entered for the event in exactly the same way as anyone else, whether they’re a guest or not. (And a constituent doesn’t have to be a person – businesses can be registered for an event as well.) If your organization’s needs are more complex, explore your options – there’s probably something that will work for you.

Non-constituent guest entered as a free-text comment. (Click for large view.)

What about non-constituent attendees? We capture that information as well, in free-text comment fields. In order to do so, the non-constituent has to be the guest of a constituent. This happens all the time at Homecoming: An alumna brings along her husband, who’s not a grad. In that situation, enter a record for the alum (as above), then press Ctrl-PageDown for the next block. Enter the guest’s name and other desired information in the Comment field farther down. You’ll see other potentially useful fields here, but you can ignore those.

3. Confirming attendance of invitees/guests

When our alumni approach the registration desk, they face our friendly staff sitting behind four or five Banner-enabled laptops. These staff members, all experienced Banner users, ask alumni for address updates and any other updates or preferences that are required. These are entered directly into the database, along with the fact that they attended Homecoming.

As outlined above, confirming attendance is as simple as checking off the ‘Attended‘ box, and entering any needed non-constituent comments, if desired. The attendance indicator can be checked off at any time afterward, while you are entering, updating or correcting an attendance list.

That’s the data-entry side of the system. In Part 2, I discuss accessing counts of RSVPs and attendees.

Later on, I will show you how to access and report attendance data as a query, and how to mass upload attendance data as contacts for your fundraising prospects (in AMACONT).

Event and Function creation

An Event is any event offered, sponsored or connected with StFX University for which we want to track attendance (eg. Homecoming, alumni reunions, gala dinners, etc.) A Function is any specific activity which is part of an Event.

In Banner, every Event must have at least one Function associated with it. For most events, we will choose to track only one Function. For more elaborate events such as Homecoming, we may want to track attendance separately for various functions.

Banner allows us to track attendance for multiple Functions under the umbrella of a single Event, but it also allows us to track general attendance for the event as a whole, regardless of the number of separate Functions. If we are tracking whole-event attendance only, the Function is always coded General. (More on this later.)

17 January 2010

Proving ‘event attendance likelihood’ actually works

Filed under: Event attendance, Model building, Predictive scores, skeptics — Tags: , , , , — kevinmacdonell @ 6:56 pm

In an earlier post I talked about what you need to get started to build an ‘event attendance likelihood’ model. Today I want to provide some evidence to back up my claim that yes, you can identify which segment of your alumni population is most likely to attend your future event.

To recap: Every living, addressable alumnus/na in our database is scored according to how likely he or she is to attend an event, whether it be a President’s Reception or Homecoming, whether they’ve ever attended an event or not.

The scores can be used to answer these types of questions:

  • What’s the top 30% of alumni living in Toronto who should be mailed a paper invite to the President’s Reception?
  • Who are the 50 members of the Class of 2005 who are most likely to come to Homecoming for their 5th-year reunion?

I built our first event-attendance model last summer. As I always do, I divided all our alumni into deciles by the predicted values that are produced by the regression analysis (the ‘raw score’). The result is that all alumni were ranked from a high score of 10 (most likely to attend an event) to 1 (least likely).

At that time, alumni were sending in their RSVPs for that fall’s Homecoming event. Because I use only actual-attendance data in my models, these RSVPs were not used as a source of data. … That made Homecoming 2009 an excellent test of the predictive strength of the new model.

Have a look at this chart, which shows how much each decile score contributed to total attendance for Homecoming 2009. The horizontal axis is Decile Score, and the vertical axis is Percentage of Attendees. Almost 45% of all alumni attendees had a score of 10 (the bar highlighted in red).

(A little over 4% of alumni attendees had no score. Most of these would have been classified as ‘lost’ when the model was created, and therefore were excluded at that time. In the chart, they are given a score of zero.)

To put it another way, almost three-quarters of all alumni attendees have a score of 8 or higher. But those 10 scores are the ones who really stand out.

Let me anticipate an objection you might have: Those high-scoring alumni are just the folks who have shown up for events in the past. You might say that the model is just predicting that past attendees are going to attend again.

Not quite. In fact, a sizable percentage of the 10-scores who attended Homecoming had never attended an event before: 23.1%.

The chart below shows the number of events previously attended by the 10-scored alumni who were at Homecoming in 2009. The newbies are highlighted in red.

The majority of high-scoring attendees had indeed attended previous events (a handful had attended 10 or more!). But that one-quarter hadn’t – and were still identified as extremely likely to attend in future.

That’s what predictive modeling excels at: Zeroing in on the characteristics of individuals who have exhibited a desired behaviour, and flagging other individuals from the otherwise undifferentiated masses who share those characteristics.

Think of any ‘desired behaviour’ (giving to the annual fund, giving at a higher level than before, attending events, getting involved as an alumni volunteer), then ensure you’ve got the historical behavioural data to build your model on. Then start building.

14 January 2010

Building your ‘event attendance likelihood’ model

Filed under: Event attendance, Model building — Tags: , , , , — kevinmacdonell @ 12:20 pm

Photo courtesy of Alumnae Association of Mount Holyoke College (Creative Commons licence)

Your model’s predicted value doesn’t always have to be ‘giving’. Once you’ve discovered the power of predictive modeling for your fundraising efforts, you can direct that power into other Advancement functions.

How about alumni event attendance?

I’ve had great success with this new model, which scores all of our alumni according to how likely they are to attend an event.  I’ll show you what we use it for, and then I’ll bounce a cool idea off you for your thoughts.

If you’ve read some earlier posts, you will already know that event attendance is highly correlated with giving (for our institution – but probably yours as well). Event attendance is an excellent predictor of giving, but it works the other way too: giving is a predictor of propensity to attend events.

We can say this because when we build our models we’re concerned only with correlation, not causation. It would be incorrect for me to say that attending events causes an alum to give, or vice-versa. I don’t know enough to make a statement either way. It could be that both behaviours spring from other influences. It’s enough for our purposes to say that they’re linked in a meaningful way.

To create an event attendance likelihood model you need at least a few years of actual attendance data. I was lucky – I had Homecoming data going back to 1999, as well as a few years of data for alumni receptions across the country. (Gathering this data pays off in many ways besides predictive modeling. See my earlier post, Why you should capture alumni event attendance in your database.)

I gave a lot of thought as to whether I should consider Homecoming and off-campus receptions separately. Clearly they are not the same thing, and perhaps should not have been weighted equally. However, for the sake of simplicity, I regarded all events as the same when I calculated my predicted value (‘number of events attended’). As long as an alumnus/na had to RSVP for the event AND showed up, they got a point for that event.

Another consideration is opportunity. To validly count off-campus events, ALL alumni should have at least had the option to attend an event. It is true that there are many cities where we have yet to host an event. However, I reasoned, we’ve hosted events in many of the towns and cities where the majority of our alumni live (or can reasonably travel to). Therefore I chose to include receptions along with Homecoming. Was I wrong? Not sure!

(Events I chose to leave out were of the exclusive, invite-only type. Because not all alumni were given the opportunity to attend, those events are not suitable to use in this model.)

You create a new model whenever you change the predicted value. Whether you use Peter Wylie’s simple-score method or multiple regression to create your model, when you make “number of events attended” your predicted value, your resulting score set will help to rank all alumni by how likely they are to show up to your event.

Here’s how we use those scores.

Photo courtesy of Alumnae Association of Mount Holyoke College (Creative Commons licence)

Let’s say the Alumni Office wants to send out invitations for Homecoming or for a reception in a city somewhere. Email is a no-brainer. It’s cheap and fast, and alumni of all ages seem very receptive to receiving communications that way.

Naturally we still mail out paper invitations, but for various reasons (cost being supreme), we have to be more selective. Some criteria we use for selecting who will get a mailing are included in the list below. The criteria are adjusted to be more or less restrictive, depending on what our target for mail pieces is.

  • Lifetime household and business giving $x and up
  • Member of donor recognition group in a recent year
  • Has a Planned Giving commitment
  • Identified as an ‘involved’ young alumnus/na
  • Attended Homecoming once in past ‘x’ years
  • Attended a previous event in region

The problem with these criteria is that so many alumni (particularly young alumni) might attend our event but aren’t donors and have never attended an event before. If the goal is attracting new faces to your event, you need some way to segment the ‘willing’ from the disinterested masses, and give them the extra attention they deserve.

This is where predictive modeling shines. I’ll have more to say about building this model later.

Now I want to bounce a cool idea off you. Let’s say you’ve created your model, scored all your alumni, and have since then put on several large events. Those events have generated actual attendance data. Let’s say you use this attendance data to work out the ‘percentage attended’ for each score level. Would that not provide you with a rough estimate of projected attendance for any given invitation list in the future? With incremental adjustments over time, and perhaps for different event types, would this be a valid tool your event planners could use?

I want to know!

An example. Let’s say you have an event coming up in Los Angeles, and your invitation list for that city includes 200 alumni who have a score of 10 in the Event Likelihood Model. You know from past events that 20% of alumni with that score will show up. Therefore you expect to see about 40 of them in Los Angeles. You add in 12% for the next level, 8% for the next level, and so on, and sum it all up to get your total projected attendance.

Valid? Not valid?

14 December 2009

Why you should capture alumni event attendance in your database

Filed under: Event attendance, Predictor variables — Tags: , , — kevinmacdonell @ 11:46 am

There are many reasons why it makes sense to have a system in place to capture alumni event attendance in your database. Here are a few:

  • Pre-entering RSVPs, if your database allows it, enables an efficient way of producing event bios (what in some shops are called ‘blurbs’), by pulling all the relevant data from your database for the IDs that have an RSVP in their record for the event.
  • Statistics for attendance at key annual events (Homecoming in particular) are much more easily generated when the data is stored in the database. Not just attendee counts, but breakdowns by class year, milestone reunion year, giving, and so on.
  • It’s easy to pull a mailing list of event attendees, for post-event surveying.
  • Event attendance history is a useful piece of information to incorporate in major-gift prospect profiles. Having it in one place (ideally as part of a report) will make it easier to retrieve quickly.
  • Event attendance is very highly correlated with giving. It is a valuable predictor in any propensity-to-give model.
  • You can turn that around, and say that giving is highly correlated with event attendance. Make “events attended” the predicted value in an event attendance likelihood model, to segment which group of alumni should receive a mailed invitation.

Naturally, I am most excited by the possibilities for building ever more robust predictive models, but any combination of these reasons is enough to proceed with some system for tracking attendance in your database.

Create a free website or blog at