A little sleepy this morning, but I’m certain the talks will help wake us up with the first part of CascadiaJS Server Day. Continue reading “CascadiaFest: Server JS Morning Part 1 Notes”
Tag: analytics
LibTechConf 2014: Knowing Users by their Digital Footprints
Presented by Bill Dueber
Looked at specific users at a specific institution, but advocating that everyone should be doing this. Continue reading “LibTechConf 2014: Knowing Users by their Digital Footprints”
Code4Lib Day 2: Afternoon Notes
De-sucking the Library User Experience
- Jeremy Prevost, Northwestern University
Libraries hate library users. If we didn’t, our websites wouldn’t suck.
Discovery
- if a user can’t find it, why do you own it?
- spend a lot of money on acquiring resources or access to them
- want to allow them to find them
- Good: works like Google from the user’s perspective
- Bad: needs to know how it works to make it work e.g. need to know MARC; can only find known items
- live examples: Ex Libris Voyager vs. Primo
- Voyager: no relevant results even using boolean ‘AND’
- Primo: can use boolean or not, relevant results – de-sucked!
Requesting Item
- Request information/user experience also sucks
- Prepopulated info, request item if not available – de-sucked!
Renew Item
- consistency
- made interfaces consistent – de-sucked!
Mobile
- not going away
- no mobile until mid-2007 for iPhone
- jQuery mobile – Apr 2010 – but updating two sites sucks, no support for tablets
- Mar 2013: responsive design, bootstrap
Libraries don’t hate library users!
- start with something that you would enjoy using
Google Analytics, Event Tracking and Discovery Tools
- Emily Lynema, North Carolina State University Libraries
- Adam Constabaris, North Carolina State University Libraries
How to track in-page events. Decide which events to track, push to Google.
Event Tracking Use Cases
- hidden or externally AJAX events e.g. facets, tabs
- internal links that occur in multiple places e.g. request item
- external links
Examples
- Catalog: click on tabs twice as much as everything else; full text used a lot; browse graphical < text because of placement; about half request item even though in 2 different places
- Summon: trying to track what they could track. Paging more popular than facets
Implementation
- GA API script
- jQuery API
- HTML5 Data Attributes: data-* for use by scripts
- decide what to track
- basic technique
- Summon gets harder. Have to get it in the code. more selectors
Debugging & Testing
- set up safety net first
- know the debugger
- use the GA debug
- test a lot
Actions speak louder than words: Analyzing large-scale query logs to improve the research experience
- Raman Chandrasekar, Serials Solutions
- Susan Price, Serials Solutions
Single unified index for all the items from all libraries’ collections.
RMF Goals
- observe and log user actions e.g. queries, filters, click patterns
- compute quality of search results e.g. user behaviour
- analyze data to improve search results and enhance research experience
Data-Driven Documents: Visualizing library data with D3.js
- Bret Davidson, North Carolina State University Libraries
Why D3?
- uses technologies that you already know
- capable library – pre-built path generations, well maintained etc.
- community – documentation, training available
- might not because of learning curve, and don’t need something this complex
Examples
- suma – space assessment toolkit
- show visualization real time, tables, and CSV file
HTML5 Video Now!
- Jason Ronallo, North Carolina State University Libraries
Yes! Also, slides/presentation.
Here’s Why
- Flash video cannot be run on most mobile/tablets
How it Works
- uses video HTML tag
- use simple fallback – download if can’t view
- problem: browsers cannot decide on single codec to use; codec war
- solution: multiple sources: mp4, webm
- use poster attribute as “screenshot” and don’t have to download video right away
- add type attribute to say which format to use; can be very explicit
- only one video per page please!
- properties exposed in JavaScript
- can add custom controls, more info for users
- events that you can listen for e.g. timeupdate to update time in a video; update wording e.g. which floor
- analytics: play, pause, seek, ended
- can do visualization of engagement
- can style with CSS
- track for subtitles
Polyfills and Advantages
- provide video controls
- flash fallback
- progressive download and range requests
Future of Media on the Web
- DRM looks to be coming
- Popcornjs – can do annotation
- Web Audio API – mix audio, filters, etc.

Code4lib Day 2: How People Search the Library from a Single Search Box
by Cory Lown, North Carolina State University
While there is only one search box, typically there are multiple tabs, which is especially true of academic libraries.
- 73% of searches from the home page start from the default tab
- which was actually opposite of usability tests
Home grown federated search includes:
- catalog
- articles
- journals
- databases
- best bets (60 hand crafted links based on most frequent queries e.g. Web of Science)
- spelling suggestions
- loaded links
- FAQs
- smart subjects
Show top 3-4 results with link to full interface.
Search Stats
From Fall 2010 and Spring 2011, ~739k searches 655k click-throughs
By section:
- 7.8% best bets (sounds very little, but actually a lot for 60 links)
- 41.5% articles, 35.2% books and media, 5.5% journals, ~10% everything else
- 23% looking for other things, e.g. library website
- for articles: 70% first 3 results, other 30% see all results
- trends of catalogue use is fairly stable, but articles peaks at the end of term
How to you make use of these results?
Top search terms are fairly stable over time. You can make the top queries work well for people (~37k) by using the best bets.
Single/default search signals that our search tools will just work.
It’s important to consider what the default search box doesn’t do, and doubly important to rescue people when they hit that point.
Dynamic results drive traffic. When putting few actual results, the use of the catalogue for books went up a lot compared to suggesting to use the catalogue.
Collecting Data
Custom log is being used right now by tracking searches (timestamp, action, query, referrer URL) and tracking click-throughs. An alternative might be to use Google Analytics.
For more, see the slides below or read the C&RL Article Preprint.
Code4lib Day 2: Discovering Digital Library User Behavior with Google Analytics
by Kirk Hess, University of Illinois Urbana-Champaign
Why Google Analytics?
- free
- JavaScript based
- small tracking image (visible via Firebug) = mostly users not bots
- works across domains
- easy to integrate with existing system
- API
Some useful things in the interface:
- heat map
- content drill down – click on page and see where users went from there
- visitor flow
- events
Export Data Using API
- Analytics API
- Java or Javascript (assuming, anything actually)
- export any field into a database for further analysis (in this case MySQL db)
Analyze Data
- Which items are popular?
- How many time was an item viewed?
- Downloaded?
- Effective collection size – see if people seeing/using
- found typically, many things are not popular
- discover a lot of other things about users
Next Steps
- found, need to change site design
- change search weighting
- allow users to sort by popularity (based on previous data)
- recommender system – think Amazon
- add new tracking/new repositories
- analyze webstats – hard to look at direct access
Moving away from JavaScript based since a lot of mobile devices don’t have it.
The event analysis code has been posted on github and adding events to link code will be added later to his Github account.
Code4lib Day 1: Lightning Talks Notes
Al Cornish – XTF in 300 seconds (Slides in PDF)
- technology developed and maintained by California Digital Library
- supports the search/display of digital collections (images, PDFs, etc)
- fully open source platform, based on Apache Lucene search toolkit
- Java framework, runs in Tomcat or Jetty servlet engine
- extensive customization possible through XSLT programming
- user and developer group communication through Google Groups
- search interface running on Solr with facets
- can output in RSS
- has a debug mode
Makoto Okamoto – saveMLAK (English)
- Aid activities for the Great East Japan Earthquake through collaboration via wiki
- input from museum, library, archive, kominkan = MLAK
- 20,000 data of damaged area
- Information about places, damages, and relief support
- Key Lessons
- build synergy with twitter
- have offline meet ups & training
Andrew Nagy – Vendors Suck
- vendors aren’t really that bad
- used to think vendors suck, and that they don’t know how to solve libraries’ problems
- but working for a vendor allows to make a greater impact on higher education, more so than from one university (he started to work for SerialsSolution)
- libraries’ problems aren’t really that unique
- together with the vendor, a difference can be made
- call your vendors and talk to the product managers
- if they blow you off, you’ve selected the wrong vendor
- sometimes vendor solutions can provide a better fit
Andreas Orphanides – Heat maps
The library needed grad students to teach instructional sessions, but how to set schedule when classes have a very inflexible schedule? So, he used the data of 2 semesters of instructional sessions using date and start time, but there were inconsistent start times and duration. The question is how best to visualize the data.
- heatmap package from clickheat
- time of day – x-dimension
- day of the week – y-dimension
- could see patterns in way that you can’t in histogram or bar graph
- heat map needn’t be spatial
- heat maps can compare histogram-like data along a single dimension or scatter-like plot data to look for high density areas
Gabriel Farrell – ElasticSearch
- similar to Solr
- goes across servers
- e.g. Free103Point9
Nettie Lagace from NISO
- National Information Standards Organization (NISO)
- work internationally
- want to know: What environment or conditions are needed to identify and solve the problem of interoperability problems?
Eric Larson – Finding images in book page images
A lot of free books exist out there, but you can’t have the time to read them all. What if you just wanted to look at the images? Because a lot of books have great images.
He used curl to pull all those images out, then use imagemagick to manage the images. The processing steps:
- Convert to greyscale
- Contrast boost x8
- Covert image to 1px by height
- Sharpen image
- Heavy-handed grayscaling
- Convert to text
- Look for long continuous line of black to pull pages with images
Code is on github
Adam Wead – Blacklight at the Rock Hall
- went live, soft launch about a month ago
- broken down to the item level
- find bugs he doesn’t know about for a beer!
Kelley McGrath – Finding Movies with FRBR & Facets
- users are looking for movies, either particular movie or genre/topic
- libraries describe publications e.g. date by DVD, not by movie
- users care about versions e.g. Blu-Ray, language
- Try the prototyped catalog
- Hit list provides one result per movie, can filter by different facets
Bohyun Kim – Web Usability in terms of words
- don’t over rely on the context
- but context is still necessary for understanding e.g. “mobile” – means on the go, what they want on the go
- sometimes there is no better term e.g. “Interlibrary Loan”
- brevity will cost you “tour” vs. “online tour”
- Time ran out, but check out the rest of the slides
Simon Spero – Restriction Classes, Bitches
OWL:
- lets you define properties
- control what the property can apply to
- control the values the property can take
- provides an easy way to do this
- provides a really confusing way to do this
The easy way is usually wrong!
When defining what can apply to and the range, this applies to every use of the property. An alternative is Attempto.
Cynthia Ng – Processing & ProcessingJS
- Processing: open source visual programming language
- Processing.js: related project to make processing available through web browsers without plugins
- While both tend to focus on data visualizations, digital art, and (in the case of PJS) games, there are educational oriented applications.
- Examples:
- Kanji Compositing – allows visual breakdown of Japanese kanji characters, interact with parts, and see children.
- Primer on Bezier Curves – scroll down to see interactive (i.e. if you move points, replots on the fly) and animated graphs.
- Obvious use might be instructional materials, but how might we apply it in this context? What other applications might we think of in the information organization world?
Since doing the presentation, I have already gotten one response by Dan Chudnov who did a quick re-rendering of newspaper data from OCR data. Still thinking on (best) use in libraries and other information organizations.
It’s over for today, but if you’d like more, do remember that there is a livestream and you can follow on twitter, #c4l12 or IRC.