Code4lib Day 2: Discovering Digital Library User Behavior with Google Analytics

by Kirk Hess, University of Illinois Urbana-Champaign

Why Google Analytics?

  • free
  • JavaScript based
  • small tracking image (visible via Firebug) = mostly users not bots
  • works across domains
  • easy to integrate with existing system
  • API

Some useful things in the interface:

  • heat map
  • content drill down – click on page and see where users went from there
  • visitor flow
  • events

Export Data Using API

  • Analytics API
  • Java or Javascript (assuming, anything actually)
  • export any field into a database for further analysis (in this case MySQL db)

Analyze Data

  • Which items are popular?
  • How many time was an item viewed?
  • Downloaded?
  • Effective collection size – see if people seeing/using
  • found typically, many things are not popular
  • discover a lot of other things about users

Next Steps

  • found, need to change site design
  • change search weighting
    • allow users to sort by popularity (based on previous data)
    • recommender system – think Amazon
  • add new tracking/new repositories
  • analyze webstats – hard to look at direct access

Moving away from JavaScript based since a lot of mobile devices don’t have it.

The event analysis code has been posted on github and adding events to link code will be added later to his Github account.

Interesting Stats

So, I’ve been doing an inventory of all the instructional “how-to” type pages (and slightly broader) on the UBC library‘s website and I came up with some rather interesting (in some cases, what I thought were staggering) statistics.

Of the 794 internal and external links:

  • somewhat surprisingly, only 3% were 404/dead links
  • 16% were duplicate links (meaning I had already inventoried the link at least once)

Of the 590 internal pages:

  • 20% are in PDF format
  • 4.6% are Videos (mostly outdated)
  • 3% are PDF versions of a webpage
  • 20% (a whooping 106 page) duplicate content of another page. For example, I found 12 different pages that talks about How to Cite something (in general, not different styles).

What I also found interesting were how out of date some of the pages were. The best example was a page that refers to “Information Navigator 2001”! (Disclaimer: I did an inventory based on following links from the Instructional pages, branch pages, and FAQ, so it does not include any delinked pages.)

It’s no secret that I’m part of a larger project to revamp the library website, and I think I just provided some pretty good hard data to justify it.