Year 2012 in review

The stats helper monkeys prepared a 2012 annual report for this blog.

Here’s an excerpt:

4,329 films were submitted to the 2012 Cannes Film Festival. This blog had 32,000 views in 2012. If each view were a film, this blog would power 7 Film Festivals

Click here to see the complete report.

No surprise that the Twitter, and WordPress plugin posts got the most views. However, the blog hit a high of 590 views in a day due to the Code4Libcon 2012 Opening Keynote post.

I didn’t quite realize either that I made 89 posts in 2012, since my goal is one per week!

Code4lib Day 2 Afternoon: Notes and Takeaways

An afternoon of more presentations, which were a bit more technical in terms of data indexing, storage, and use. As a result, there are no detailed posts, but here are a few notes and takeaways.

  • Be careful when you try to parse a bunch of files you download from the web. Parse and store, distribute up front, and build a test index first.
  • Making Software Work – read it
  • The results of one study is not the truth.
  • It’s hard to make a study repeatable.
  • Does agile work? That’s the wrong questions. Really, when does bug fixing has the highest cost?
  • High-risk bugs should be done as early as possible, instead of the easy bugs.
  • What language? Depends on the problem.
  • Make developer happiness hours. (block off time with no meetings)
  • Give people open sight lines instead of high cubicle walls.
  • Be as transparent as possible (e.g. JIRA) including progress.
  • Put projects into short iteration cycles.
  • No code without passing tests!
  • Slides (PDF) for the last talk: Practical Agile: What’s Working for Stanford, Blacklight, and Hydra by Naomi Dushay

In-browser Data Storage and Me

by Jason Casden, North Carolina State University

  • Suma: data collection application using in-browser storage.
  • Indexed database API (aka IndexedDB, WebSimple DB) is where things seem to be going, but limited browser support.
  • Web (DOM) Storage is basically universally supported.
  • Web SQ DB still viable option.
  • lawnchair: object storer, but have adapters for a long list of DBs/APIs.
  • persistence.js: asynchronous JavaScript object-relational mapper and adapters are being built out. Can be used with node.js and MySQL.


Code4lib Day 2: How People Search the Library from a Single Search Box

by Cory Lown, North Carolina State University

While there is only one search box, typically there are multiple tabs, which is especially true of academic libraries.

  • 73% of searches from the home page start from the default tab
  • which was actually opposite of usability tests

Home grown federated search includes:

  • catalog
  • articles
  • journals
  • databases
  • best bets (60 hand crafted links based on most frequent queries e.g. Web of Science)
  • spelling suggestions
  • loaded links
  • FAQs
  • smart subjects

Show top 3-4 results with link to full interface.

Search Stats

From Fall 2010 and Spring 2011, ~739k searches 655k click-throughs

By section:

  • 7.8% best bets (sounds very little, but actually a lot for 60 links)
  • 41.5% articles, 35.2% books and media, 5.5% journals, ~10% everything else
  • 23% looking for other things, e.g. library website
  • for articles: 70% first 3 results, other 30% see all results
  • trends of catalogue use is fairly stable, but articles peaks at the end of term

How to you make use of these results?

Top search terms are fairly stable over time. You can make the top queries work well for people (~37k) by using the best bets.

Single/default search signals that our search tools will just work.

It’s important to consider what the default search box doesn’t do, and doubly important to rescue people when they hit that point.

Dynamic results drive traffic. When putting few actual results, the use of the catalogue for books went up a lot compared to suggesting to use the catalogue.

Collecting Data

Custom log is being used right now by tracking searches (timestamp, action, query, referrer URL) and tracking click-throughs. An alternative might be to use Google Analytics.

For more, see the slides below or read the C&RL Article Preprint.

Code4lib Day 2: Discovering Digital Library User Behavior with Google Analytics

by Kirk Hess, University of Illinois Urbana-Champaign

Why Google Analytics?

  • free
  • JavaScript based
  • small tracking image (visible via Firebug) = mostly users not bots
  • works across domains
  • easy to integrate with existing system
  • API

Some useful things in the interface:

  • heat map
  • content drill down – click on page and see where users went from there
  • visitor flow
  • events

Export Data Using API

  • Analytics API
  • Java or Javascript (assuming, anything actually)
  • export any field into a database for further analysis (in this case MySQL db)

Analyze Data

  • Which items are popular?
  • How many time was an item viewed?
  • Downloaded?
  • Effective collection size – see if people seeing/using
  • found typically, many things are not popular
  • discover a lot of other things about users

Next Steps

  • found, need to change site design
  • change search weighting
    • allow users to sort by popularity (based on previous data)
    • recommender system – think Amazon
  • add new tracking/new repositories
  • analyze webstats – hard to look at direct access

Moving away from JavaScript based since a lot of mobile devices don’t have it.

The event analysis code has been posted on github and adding events to link code will be added later to his Github account.

Interesting Stats

So, I’ve been doing an inventory of all the instructional “how-to” type pages (and slightly broader) on the UBC library‘s website and I came up with some rather interesting (in some cases, what I thought were staggering) statistics.

Of the 794 internal and external links:

  • somewhat surprisingly, only 3% were 404/dead links
  • 16% were duplicate links (meaning I had already inventoried the link at least once)

Of the 590 internal pages:

  • 20% are in PDF format
  • 4.6% are Videos (mostly outdated)
  • 3% are PDF versions of a webpage
  • 20% (a whooping 106 page) duplicate content of another page. For example, I found 12 different pages that talks about How to Cite something (in general, not different styles).

What I also found interesting were how out of date some of the pages were. The best example was a page that refers to “Information Navigator 2001”! (Disclaimer: I did an inventory based on following links from the Instructional pages, branch pages, and FAQ, so it does not include any delinked pages.)

It’s no secret that I’m part of a larger project to revamp the library website, and I think I just provided some pretty good hard data to justify it.