CoolData blog

14 October 2010

The Hows and Whats of documentation

Filed under: Best practices, Training / Professional Development — Tags: , — kevinmacdonell @ 7:36 am

In an earlier post, I sounded off on the topic of documentation. Documenting your work is essential to making smooth progress in data mining, and in most other kinds of exploratory knowledge work. At its simplest, documentation is the aide-mémoire that saves you time when repeating a complex task, or just some notes to yourself about where you left off on a project so you can get right back at it the next day. At its best, it’s a transfer of knowledge that transcends time and staff changes, with great benefits for the very health of the organization you work for. (And you do care about the organization you work for, don’t you?)

Today I’ll talk about tools for recording and storing documentation, and content, i.e. what things effective documentation should include.

You don’t need much in the way of tools. I use only two, one for text and one for images: Microsoft Word and IrfanView. I use Microsoft Word only for its ubiquity and therefore shareability; but any Word processor or page layout program that can incorporate images is fine. I use IrfanView, a free image viewer, because it has enough features for my purposes and no more.

(You can use whatever you wish, but keep it simple. I won’t tell you how to use Word, but here’s a tip for IrfanView. When you need to capture a visual from your screen, to show the layout of certain key items in a software interface for example, use your computer’s screen capture function to save a copy of the screen image to the clipboard. In IrfanView, press Ctrl-V (on a PC) to paste the captured image. Click and drag with the mouse to select the area you want, and press Ctrl-Y to crop the image. No need to save the image as a file; just press Ctrl-C to copy, move over to your open Word doc, and press Ctrl-V to paste the image where your cursor is. Done.)

I’m making some assumptions about how you’re going to store and possibly disseminate your documentation. I’ve opted for electronic capture, not only because the majority of the knowledge workers’ craft is conducted via the computer screen, but because it facilities central storage and encourages regular and seamless updating — a must for useful documentation.

Another assumption is that you’re creating documents that either make up your own personal store, or are shared with a limited number of people — no more than a dozen. For large audiences, you might consider other capture and distribution options such as wikis, intranet pages, video captures, or podcasts, but I’m deliberately steering clear of these options because the goal is to make documentation second nature: You need to use tools that can readily be called up and pushed aside over and over. And really, there are few workaday processes that cannot be adequately described with words and images alone. (Certain complicated surgeries excepted.)

You should immediately recognize when something you’re doing is a PROCESS (i.e., involves more than one or two steps), and then you should get it down, right then and there, WHILE YOU ARE DOING IT. For that, the activity has to be as painless and unobtrusive as possible. That’s all I have to say about tools — no fancy software, just the basics.

As for content, obviously that’s up to you and it depends on the size and complexity of the task you’re trying to document. But good documentation has the following attributes in common:

Separate files: I think each task should be created as a separate document. Gigantic manuals of hundreds of pages that describe your entire job are going to be very difficult to update selectively, and they’re cumbersome to share. (Think of each documented process as a recipe on a file card.) Create separate documents that address specific tasks, and give them descriptive file names to make them easier to find. Include the revision date in the file name.

Central repository: That’s fancy talk for putting your (electronic) files on a shared drive where others can find them, and where there is a chance that they will be backed up regularly (inquire about that). Don’t save important documents to your hard drive, which will eventually fail.

Electronic version is THE version: Make it a rule that the version saved to the shared drive is the master copy. The master copy is always the most up-to-date version. When I need to use a file for documentation, I will print it off and refer to it as I work. I mark it up as I work with any improvements or revisions that I see are needed, and when I have time, I revise the master copy and throw the paper copy in the shredder bin.  (If you have two monitors, revise the electronic copy as you go and don’t bother with paper.)

Iterative updates: Processes change all the time, and documentation must keep up. Rather than trying to adhere to some sort of artificial schedule for document review, make changes at the time you’re actually using the process, as suggested above. Small changes over time are easy to manage and really add up.

Creation and revision dates: The revision date is in the file name, but put both the creation and revision dates at the head of the document as well. That way, you can compare a printed copy of the file with what is stored on the shared drive. Oh, and do NOT ever use a field that autopopulates with today’s date; every time someone opens the document, it will display the current date, making it hard to figure out how old the document is.

Context: Are there previously-documented processes that precede what you’re documenting now? Are there documented steps that follow? Make reference to them, with instructions for where to find them. If someone’s described it better than you can, don’t reinvent the wheel. Sometimes the software manual is the best guide. Not often, mind you, but sometimes!

Software versions: If your process relies on software, make note of the version number near the head of your document. Processes frequently change as a result of major upgrades — this is particularly true of databases — and any upgrade might trigger the need for a revision. Again, documentation never stands still.

Prerequisites: Does the process presume the user has access to a database, database table, service, server, or other resource that normally requires a password or account? List those requirements near the beginning, if the process is likely to be needed by someone new. In large organizations, getting an account set up can itself be a time-consuming process. If you’ve figured it out, don’t just be satisfied to finally be “in”. Help the next person get on board.

Bulleted or numbered lists: Incredibly helpful for someone needing to carry out a series of steps in a certain order. Learn how to quickly format a list — numbers if order is important, bullets if not. If the process is long and complex, start by providing a summary of the basic stages of the process, and then go into detail; the summary will help keep the user oriented during the process.

Recipe, not manual: If your document needs a table of contents or an index, you’re probably cramming too much into it. The core of your document should be: “do this, then do this, then do this.” Like a recipe. Consider adding appendices to hold discussions of side-issues, large tables of information, alternative processes, older versions of the process that might need to be called up again, and so on. Keep major diversions out of the main body of the document.

Numbered pages: Duh.

Background knowledge brought to foreground: Is your document useful to someone who isn’t you? If you are documenting a process you are already familiar with, the odds are stacked against you. The reason is that knowledge mastered tends to be internalized and pushed to the background. You just assume (wrongly) that the next schlub to come along already knows what you know. Keep the beginner in mind, and spell out what you think is obvious. This is not easy, however: It is so much better if you document the process AS YOU ARE LEARNING IT YOURSELF. Learning and documenting are NOT separate stages.

Visuals, when necessary: Text will often suffice, but sometimes it takes a screen shot to make the point clear.

Expectations and exceptions: Let the user in on what results to expect. If it’s likely going to take half an hour to run a job, provide a warning. Also, anticipate some of the possible errors someone other than yourself might encounter, and suggest fixes. Many processes have critical points where a tiny mistake that is easy to miss will throw things off the rails — step into your ‘beginner mind’ and flag those points.

Explain “why”: Sometimes the most worst thing about following a set of steps for the user is having to wonder, “WHY am I doing this?” People will be tempted to skip steps that don’t seem to have any reason behind them — they learn the reason only when things go wrong. Here’s a simple example of explaining the Why which can save a person some head-scratching: “In the next block, enter ’02:00′ as the Submit Time. This will ensure that the job runs immediately. Setting another time or leaving the field blank may cause the job to be queued to run overnight.”

Good spelling and Shakespearean powers of expression are optional, as are fancy software tools. If you care about what you’re doing, you care enough to document, and that is all you really need.



  1. Great post, and an excellent reminder of where I need to improve my habits!

    As far as image capturing, I like Microsoft’s OneNote. You hit the windows icon plus the S key, and your screen is greyed out. From there you drag a box over what you want to capture, and the program automatically opens up that shot in OneNote, where you can copy to your clipboard. Eliminates the step of having to crop the image out of a bigger screen capture.

    Also, I’m keen on finding a good wiki to use for maintaining documentation for multiple users. We end up with several iterations of documentation (in different places) on our shared drives, so finding the “official” version can be tricky for people. A centralized, online wiki would help address this problem.

    Comment by Mark Egge — 14 October 2010 @ 10:26 am

    • I haven’t used OneNote, but it sounds like a valuable tool for documenting — thanks for sharing. Your point about using a wiki when documentation has multiple authors is also well taken. I think Google Docs has also made great improvements in the ability to create and maintain documents with multiple authors.

      Comment by kevinmacdonell — 14 October 2010 @ 1:17 pm

  2. Good post… I have always believed that good documentation is a process anyone can master, and not just a pro tech writer.

    Comment by moi1981 — 14 October 2010 @ 2:02 pm

  3. I’m no R expert yet, but that’s one of the things I’m finding I like about it so far. The documentation and the process itself often are (or at least can potentially be) the same thing.

    Graphical interfaces are great from a human interaction standpoint, but not so hot from a documentation/reproducibility standpoint.

    Comment by Jeff — 15 October 2010 @ 10:37 am

  4. I’m a fan of KeyNote NF for documentation and note-taking. It’s a free, open-source program and functions like a glorified version of Notepad.

    The key benefit is that it allows you to create and move notes in a “tree format” within the same file. Tree format is like the directory hierarchy you see when browsing for files in Windows.

    This makes it very easy to organize notes by stages or categories.

    It also supports images and includes all the typical word-processing functions.

    Comment by John — 19 October 2010 @ 12:03 pm

  5. […] Acquire the habit of documenting your work. This serves a number of purposes. First, your notes are a placeholder for your explorations. Seeking insights with data takes time, so keeping notes will let you know from day to day where you left off. Recording what you’ve learned already about your data means you don’t have to keep re-learning the same things every time you begin a related project. Second, if you want to share your discoveries with others, it’s a lot easier to pause every once in a while during your work to take a few notes and capture a chart or two than it is to write a discussion paper from scratch after the fact. Third, your documentation is an important record for others to build on your work in future years. (See The Hows and Whats of documentation.) […]

    Pingback by Seven building blocks for your data work « CoolData blog — 3 January 2011 @ 10:13 am

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: