Code4lib Day 2: How People Search the Library from a Single Search Box

by Cory Lown, North Carolina State University

While there is only one search box, typically there are multiple tabs, which is especially true of academic libraries.

  • 73% of searches from the home page start from the default tab
  • which was actually opposite of usability tests

Home grown federated search includes:

  • catalog
  • articles
  • journals
  • databases
  • best bets (60 hand crafted links based on most frequent queries e.g. Web of Science)
  • spelling suggestions
  • loaded links
  • FAQs
  • smart subjects

Show top 3-4 results with link to full interface.

Search Stats

From Fall 2010 and Spring 2011, ~739k searches 655k click-throughs

By section:

  • 7.8% best bets (sounds very little, but actually a lot for 60 links)
  • 41.5% articles, 35.2% books and media, 5.5% journals, ~10% everything else
  • 23% looking for other things, e.g. library website
  • for articles: 70% first 3 results, other 30% see all results
  • trends of catalogue use is fairly stable, but articles peaks at the end of term

How to you make use of these results?

Top search terms are fairly stable over time. You can make the top queries work well for people (~37k) by using the best bets.

Single/default search signals that our search tools will just work.

It’s important to consider what the default search box doesn’t do, and doubly important to rescue people when they hit that point.

Dynamic results drive traffic. When putting few actual results, the use of the catalogue for books went up a lot compared to suggesting to use the catalogue.

Collecting Data

Custom log is being used right now by tracking searches (timestamp, action, query, referrer URL) and tracking click-throughs. An alternative might be to use Google Analytics.

For more, see the slides below or read the C&RL Article Preprint.

Code4lib Day 2: Discovering Digital Library User Behavior with Google Analytics

by Kirk Hess, University of Illinois Urbana-Champaign

Why Google Analytics?

  • free
  • JavaScript based
  • small tracking image (visible via Firebug) = mostly users not bots
  • works across domains
  • easy to integrate with existing system
  • API

Some useful things in the interface:

  • heat map
  • content drill down – click on page and see where users went from there
  • visitor flow
  • events

Export Data Using API

  • Analytics API
  • Java or Javascript (assuming, anything actually)
  • export any field into a database for further analysis (in this case MySQL db)

Analyze Data

  • Which items are popular?
  • How many time was an item viewed?
  • Downloaded?
  • Effective collection size – see if people seeing/using
  • found typically, many things are not popular
  • discover a lot of other things about users

Next Steps

  • found, need to change site design
  • change search weighting
    • allow users to sort by popularity (based on previous data)
    • recommender system – think Amazon
  • add new tracking/new repositories
  • analyze webstats – hard to look at direct access

Moving away from JavaScript based since a lot of mobile devices don’t have it.

The event analysis code has been posted on github and adding events to link code will be added later to his Github account.

Code4lib Day 1: Lightning Talks Notes

Al Cornish – XTF in 300 seconds (Slides in PDF)

  • technology developed and maintained by California Digital Library
  • supports the search/display of digital collections (images, PDFs, etc)
  • fully open source platform, based on Apache Lucene search toolkit
  • Java framework, runs in Tomcat or Jetty servlet engine
  • extensive customization possible through XSLT programming
  • user and developer group communication through Google Groups
  • search interface running on Solr with facets
  • can output in RSS
  • has a debug mode

Makoto Okamoto – saveMLAK (English)

  • Aid activities for the Great East Japan Earthquake through collaboration via wiki
  • input from museum, library, archive, kominkan = MLAK
  • 20,000 data of damaged area
  • Information about places, damages, and relief support
  • Key Lessons
    • build synergy with twitter
    • have offline meet ups & training

Andrew Nagy – Vendors Suck

  • vendors aren’t really that bad
  • used to think vendors suck, and that they don’t know how to solve libraries’ problems
  • but working for a vendor allows to make a greater impact on higher education, more so than from one university (he started to work for SerialsSolution)
  • libraries’ problems aren’t really that unique
  • together with the vendor, a difference can be made
  • call your vendors and talk to the product managers
  • if they blow you off, you’ve selected the wrong vendor
  • sometimes vendor solutions can provide a better fit

Andreas Orphanides – Heat maps

The library needed grad students to teach instructional sessions, but how to set schedule when classes have a very inflexible schedule? So, he used the data of 2 semesters of instructional sessions using date and start time, but there were inconsistent start times and duration. The question is how best to visualize the data.

  • heatmap package from clickheat
  • time of day – x-dimension
  • day of the week – y-dimension
  • could see patterns in way that you can’t in histogram or bar graph
  • heat map needn’t be spatial
  • heat maps can compare histogram-like data along a single dimension or scatter-like plot data to look for high density areas

Gabriel Farrell – ElasticSearch

Nettie Lagace from NISO

  • National Information Standards Organization (NISO)
  • work internationally
  • want to know: What environment or conditions are needed to identify and solve the problem of interoperability problems?

Eric Larson – Finding images in book page images

A lot of free books exist out there, but you can’t have the time to read them all. What if you just wanted to look at the images? Because a lot of books have great images.

He used curl to pull all those images out, then use imagemagick to manage the images. The processing steps:

  1. Convert to greyscale
  2. Contrast boost x8
  3. Covert image to 1px by height
  4. Sharpen image
  5. Heavy-handed grayscaling
  6. Convert to text
  7. Look for long continuous line of black to pull pages with images

Code is on github

Adam Wead – Blacklight at the Rock Hall

  • went live, soft launch about a month ago
  • broken down to the item level
  • find bugs he doesn’t know about for a beer!

Kelley McGrath – Finding Movies with FRBR & Facets

  • users are looking for movies, either particular movie or genre/topic
  • libraries describe publications e.g. date by DVD, not by movie
  • users care about versions e.g. Blu-Ray, language
  • Try the prototyped catalog
  • Hit list provides one result per movie, can filter by different facets

Bohyun Kim – Web Usability in terms of words

  • don’t over rely on the context
  • but context is still necessary for understanding e.g. “mobile” – means on the go, what they want on the go
  • sometimes there is no better term e.g. “Interlibrary Loan”
  • brevity will cost you “tour” vs. “online tour”
  • Time ran out, but check out the rest of the slides

Simon Spero – Restriction Classes, Bitches

OWL:

  • lets you define properties
  • control what the property can apply to
  • control the values the property can take
  • provides an easy way to do this
  • provides a really confusing way to do this

The easy way is usually wrong!

When defining what can apply to and the range, this applies to every use of the property. An alternative is Attempto.

Cynthia Ng – Processing & ProcessingJS

  • Processing: open source visual programming language
  • Processing.js: related project to make processing available through web browsers without plugins
  • While both tend to focus on data visualizations, digital art, and (in the case of PJS) games, there are educational oriented applications.
  • Examples:
    • Kanji Compositing – allows visual breakdown of Japanese kanji characters, interact with parts, and see children.
    • Primer on Bezier Curves – scroll down to see interactive (i.e. if you move points, replots on the fly) and animated graphs.
  • Obvious use might be instructional materials, but how might we apply it in this context? What other applications might we think of in the information organization world?

Since doing the presentation, I have already gotten one response by Dan Chudnov who did a quick re-rendering of newspaper data from OCR data. Still thinking on (best) use in libraries and other information organizations.

It’s over for today, but if you’d like more, do remember that there is a livestream and you can follow on twitter, #c4l12 or IRC.

Code4lib Day 1 Afternoon: Takeaways on Usability & Search

Once again, I didn’t take full notes on all the sessions, but some takeaways below.

  • Non-English searches should not suck.
  • Favour precision over recall on large-scale searching.
  • Develop measures of assessment in order to measure success.
  • Leverage the correlation between academic degree and type of materials used, and focus on discipline-related materials and authors in case of ambiguity.
  • If a user built-in interface doesn’t work, you can always put something on top.

Many of these sound like common sense, but not enough people do it.

See my other posts for notes on the presentations I wrote more on:

Code4lib Day 1: Design for Developers – Some Notes

by Lisa Kurt, University of Nevada

If you can get three things down, you can get a good design:

  • Typography – simple
  • Composition – a lot of white space, conventions
  • Colour – minimal

Study the designs that you love and those that you hate. What works and what doesn’t?

On Photos: If you use clip art, don’t use clip art that looks like clip art.

Look at designs with fresh eyes. Make sure it’s balanced.

Have fun too!

Really know your audience. Beware of decorative typeface: it can become hokey, very quickly because they look more like illustrations.

Designing for Mobile: Sans serif and white background with dark text is easier to read on mobile.

While you need to be careful of branding, you can use it to link different elements together.

Design by committee does not work! Provide three design and be firm that you will not combine them, etc. Usability can help support your design.

For more, check out Lisa’s website and the presentation slides below.

Code4lib Day 1: Kill the Search Button II – The Handheld Devices are Coming

by Michael Poltorak Nielsen, Statsbiblioteket/State and University Library, Aarhus, Denmark

Current Mobile Interaction Paradigm

You do a lot with your hands, everyday. Our hands are a really good tool, but currently, the handheld interaction is based on glass. That is you do functions by sliding your fingers, which means there is no feedback on what it does, i.e. it’s not intuitive.

Take a look at Pictures Under Glass: Transitional Paradigm dictated by technology, not human capabilities by Bret Victor.

An Alternative

  • direct manipulation
  • gesture driven
  • palpable
  • tactile

Smartphone Gestures

The near future may mean combining something like the Wiimote and the iPhone.

Mobile Projects

The idea was to build an HTML5 app that searches library data, favourites, view own items, renew, and request. Currently in beta, but to be published soon.

The search app can be augmented with gestures, gestures combined with multi-touch interactions.

Possible interactions with focus on

  • keyboard – typing
  • microphone speech
  • screen – touch, visuals
  • camera – pattern, movement
  • accelerometer – acceleration
  • gyroscope – rotation
  • compass  – direction
  • GPS – movement, position

Gestures

Might include simple ones using accelerometer data, including

  • tilt
  • flip
  • turn
  • rotate
  • shake
  • throw

The problem is that gestures are only really supported by Firefox, and partially supported by Chrome. Thus, it was decided that development would move to the native iPhone app environment with gestures, and HTML5 web app without gestures (but possibly later when supported). Features that are implemented include:

  • Restart search – face down
  • Scroll – tilt up and down
  • Switch views – tilt
  • Request items – touch and tilt left
  • Favourites – touch and tilt right

Check out the demo:

Challenges

  • no standard mobile gestures
  • gesture maybe individual
  • gesture may not be appropriate at all
  • sophisticated gestures are hard to code
  • Objective-C

Code4lib Day 1 Morning: HTML5, Microdata and Schema.org (and other takeaways)

I did not take notes on everything in part because some of it was very technical and it can be hard to do notes, but here are some takeaways from the morning:

  • Versioning Control: Use it, Git or Mercurial. Doesn’t need to be code, can be data too. – Description and Slides
  • Take library data and make it available to users, can’t expect them to search for it.
  • Linked Data doesn’t need to be a huge project. Start small.
  • Why RDF? It’s flexible with easy addition of new attributes or classes, and works cleanly with an iterative approach.

HTML5 Microdata and Schema.org

by Jason Ronallo

Other than getting good ranking, we need to provide rich results, i.e. rich snippets. Some digital collection have been providing rich snippets already, such as NCSU Libraries.

How do we get this?

  • embedded semantic markup
  • HTML5 Semantics include nav, header, article, section, footer
  • HTML5 Microdata is a syntax for annotating content to communicate meaning of data to machines
  • similar to RDFA, other microdata
  • Microdata comes back as tree based JSON and allows for DOM API

For example:

<div itemscope itemtype=”http://schema.org/Organization&#8221; itemref=”logo”>
<a itemprop=”url” href=”http://code4lib.org/”&gt;
<span itemprop=”name”>Code4Lib<\span>
</a>
</div>
where: scope = about something
type = type of item
prop = properties

For the user, there is no difference as display is the same. This provides a complete data model.

Schema.org  is a one-stop shop for vocabulary in describing items on the web.

Apologies, I did not take extensive notes on it, but to read more, check out the slides below or the Code4lib article he wrote.

Code4lib Day 1: Keynote on Code4libcon

Daniel Chudnov from George Washington University was the first Keynote of the conference.

Dan began with a bit of an introduction and then went into a very touching overview of the story of his family and his life. His life lesson was that

things fall apart.

We Blew It

We have turned away too many people: way more than 100 people. That was a terrible mistake. If we don’t address this mistake, this [conference] is not going to last.

Code4lib was inspired by Access, with some key aspects:

  • single track
  • participatory
  • social
  • beer
  • fun

The difference the organizers wanted was a (possibly) geekier version in the USA in Spring (so as not to compete with Access). What might have really pushed this discussion is that

we turned away more people in 2012 than attended in 2007.

Why? The most common answers revolved around the capacity of venue. There were of course, some other concerns about keeping it a small, informal, participatory conference that were expressed, especially in the backchannels (IRC and Twitter).

Nevertheless, Dan asked the key question “Why do you come?” He expressed how he comes to connect with people, and hang out with the attendees, and there are many others that wanted to join, but were turned away.

He went on to talk about how while there is a chasm of techies vs. non-techies, there shouldn’t be. Plenty of people want to learn what coders do, and as a group, we should want to help respond to change constructively. They want to code, and we should connect and work more closely with them. We have one choice to make:

HACK OR DIE

We Must Expand

Dan used PyCon as a possible a good mode to follow. They have:

  • 2 days of pre-conference tutorial days
  • up front training for all levels
  • 4 days post-conference sprint days
  • back-end collaboration for all levels
  • plenary talks, plenary lightning
  • multiple tracks

Dan was against multiple tracks for many years, but not anymore, because

we need to connect or this thing we have will fall out from under us.

His point is that next year people won’t even bother if there is no clear statement to make things work.

Challenges

  • break complacency
  • lack of proposals to host
  • too heavy a burden on local organizers

Possible Solutions

Committees need to be formalized, especially an advisory committee of former hosts to help future hosts. The work needs to be done through the year, and more open like it used to be. Dan also suggested a formal program committee to replace the “diebold-o-tron”, but there was some disagreement because it’s less participatory.

Some other ideas included a multi-core code4lib where each regional group would be 1 hour live streaming on the same day, and the BarCamp approach where there are no pre-planned presentations, which might work for regional code4lib conferences. However, concern was expressed with having too many small conferences organized, burning out possible hosts for the annual code4lib.

The next code4lib conference should aim for 500 people.

Chicago is ready. Are you?

Final Notes & Thoughts @ Access 2011

So I didn’t do a full post for all the sessions, but the live notes that were taken and presumably, video recordings will later be posted on the Access 2011 website.

Data Visualization

Jer Thorp gave a great talk on the data visualization work he’s done and has been working on at the New York Times. I couldn’t really take notes since so much of it was visual, but he blew a lot of minds with his work, so check out his blog.

My Lightning Talk

What really excited me beyond the work itself was the fact that he mentioned he was doing it all through Processing, so I decided to do a lightning talk to introduce everyone to Processing and more importantly Processing.js.

For those who aren’t familiar with it, Processing is an open source programming language primarily used for dynamic and interactive graphing and data visualization. Processing.js is the sister project which brings processing to the web. What’s the greatest part of processing.js is that a developer can start doing the same sort of thing but from the JavaScript side.

Check out the demos to see what kind of things you can possible do. I am particularly interested in the educational applications, such as giving students interactive graphs to see how mathematical functions work (see the Bezier Curves tutorial).

Added value: web accessible, Drupal plug-in, WordPress plug-in, fun games like a remake of Asteroids on the exhibition page.

See Access Live Notes for Lightning Talks and talks about other tools.

Digital Preservation

  • what does digital preservation mean? preserving more than objects and items
  • think on scalability
  • preserve what matters
  • start with policy and practice, not a platform
  • library can’t do it alone, partner with IT, Archives, etc.
  • need to think strategically
  • no one answer
  • some good tools
  • get started
  • think about what we can do with partnership

Fail Panel

The fail panel was great, because there were a lot of great stories by the panelists and others. Here are some of the lessons learned from the fail stories.

  • bleeding edge is not always great
  • good escape clauses to get out of bad situations
  • make sure company is stable
  • don’t make thematic websites – not scalable
  • don’t be working on original records or have a backup
  • never trust a tech
  • if you think it’s a bad idea, speak up
  • don’t have a project driven by one person
  • sometimes there isn’t a tech solution
  • make sure you press the right button
  • need to make sure

Share your own stories at failbrary.org

Thoughts

This was actually my first conference, but I think (and I’m clearly not the only one) it’s been really well put together and the food has especially been awesome, many within great socials. There’s been some tech fail, but that’s expected at every place I think.

I have particularly liked this conference because rather than simply having speakers talk, everyone has been highly encouraged to participate in some way (i.e. hackfest + presentations, lightning talks). I never though I’d be speaker at a conference, especially my first, but with the nature of the talks and encouragement of people got me to do a lightning talk. I think that alone speaks loads to the community.

It’s been an awesome experience, I’ve learnt a lot, and met a lot of great people. I really hope to be able to attend the next one.

Access 2012

Sad to see Access 2011 end, but for next year, a  site will be set up to see who will host it, and the planning of the conference will be continued code4lib style.