discovery layer – Learning (Lib)Tech

Code4Lib 2014: Day 1 Afternoon

Afternoon of Day 1 of Code4lib 2014.

Continue reading “Code4Lib 2014: Day 1 Afternoon”

OCUL URM Summit – Notes

Today was the OCUL URM Summit at UofT. These are a bit more sparse than usual. The survey results in particular were done quickly so I only included the higher numbers, not all of them. Continue reading “OCUL URM Summit – Notes”

code4libTO December Meetup Talks

BagIt Profiles – @ruebot

directory of data
bag has what you’re bagging, data, contact email/name, organization information, profile identifier (JSON via a URI)
pull in all the field values
validate
wrote a spec and send it to digital curation community
can look up profiles in the registry

Okay, I got a little lost, but you can see more on github.

Internet Archive Torrent Collections (iaTorrent) – @ruebot

see demo

Bookfinder – @TheRealArty & Steven

I will write this up later probably as a separate blog post, or maybe journal article

TPL’s Web Services Architecture: Understanding the Big Picture – @waharnum

many different systems that don’t easily communicate, which needs specialized knowledge even to do basic tasks
address the challenges by translation, simplication, standardization
Three tiers: Front End Systems (requests to back end) / TPL Web Services (REST) / Back End Systems (responds to front end)
Example: TPL Website -> Account Web Services -> Symphony Web Services (Symphony) – and back
can add new features and functions
helps to solve the challenges mentioned
also helps with reusability e.g. in addition to website, build mobile-friendly website, iPhone App
Might end up with:
- Front End (Website, mobile, App)
- Middle Tier (Account Web Services, ebook Web Services, online payment web services)
- Back End (symphony, overdrive, payment gateway, accounting systems)
other benefits:
- increase ease of knowledge transfer about how our systems work
- follow modern best practice approach to building interoperating systems
- reduce cost and integration time
reduce learning time for new staff or consultants
metrics: wish had resources
bolting together a lot of things, not using a lot of custom code

Ladder (aka MyTPL 2) – @mjsuhonos

wanted to solve problem: discovery layers suck
problems:
- not scalable
- inflexible
- read-only
- expensive
goals:
- better than open source options (VuFind, Blacklight)
- cheaper (than proprietary)
scalable as WorldCat
design:
- schema-free/multi-schema (e.g. Dublin Core)
- horizontally scalable (multi-node)
- modern OSS components
simple data model (RDF)
Features:
- hierarchical relations
- clustering/de-duplication
- versioning
- real-time import & indexing
- multi-thread/process
- responsive UI
- fully multilingual (18/10)
- dynamic faceting
- dynamic mapping modification
- digital content storage (coming soon)
built on a linked data
not a discovery layer; it’s an integration platform

Heritage U of T – @ajmcalorum

News Announcement and Promotional Video
previously not centralized: hard drives, flickr, etc.
need central repository for tri-campus initiative with search & discovery, preservation, long-term access to content and metadata, support for multiple formats (e.g. images, books, documents, video, exhibits)
Drupal + Solr (search) + Fedora Commons (collection management, batch ingesting, metadata crosswalk, digital preservation) == islandora (digital asset management system)
pilot: 8 parent collections (by format, by campus)
exhibits in Drupal, not through islandora/fedora commons
modules: internet archive book reader (OCR on the fly), galleria, colorbox
official launch: 2 weeks ago

That’s it! Food and drinks time!

Code4lib Day 3: Lightning Talks

David Uspal – Project Grab Bag

Interactive Map

Javascript baed (for accessibility)
Data stored in JSON file
SVG graphic
Uses the Raphael.js library – just use HTML5 instead
Search by: ocation, person, call number
To do:
- decouple from CMS (Concrete 5)
- SVG path generation as a web application
- add more configurable options (colors, etc.)

Tap Tour

started at the Indianapolis Museum of Art
easy to create a mobile tour application
currently iPhone/iPod, plans to expand
Drupal CMS back-end (new version released 1/25/2012)

Robert Haschart – Adding Publicly-Accessible Hathi Trust Items to Your Solr-based Discovery System

Assumptions:
- Solr-based index
- SolrMarc used for indexing
- only want publicly-accessible items
- MARC record based with one Solr record per title
list of Hathi-items and download
tweak SolrMarc index specification
add all Hathi records to your index, and adjust interface code to display records correctly
download daily updates, merge updates
Code not yet available

Jeremy Nelson – Aristotle a Django based Discovery Layer

See it in Action
originally forked from Kochief
refactored to use Sunburnt for Solr interactions
developed custom authentication middleware with Millenium
did web redesign
Code on Github

Dennis Schafroth – Turbo MARC in YAZ Library

Problem: XSL transformation on MARC XML is slow
Rule: combined the element with tag/code value when value is allowed
Pazpar2 became twice as fast
a lot faster, but not official standard

Yuka Egusa, Masao Takaku – Recovery of Minamisanriku Town Library from Tsunami Disaster

implemented technical support for a library system – thanks to OSS and cloud service
Amazon’s wish list for books needed from supporters
library can announce library service and daily activities on Facebook
Next-L Enju OSS search system

Ed Summers – jobs.code4lib.org

Jobs are posted
Tags allow to see all the jobs with that tag
OpenID log in
pushes to twitter @code4lib
pushes to mailing list
Code on Github

Christopher Spalding – Search in a Blender

works for ExLibris
collect results and sort
works in VuFind and Solr

Erik Hetzner – Strategy for c4l voting

majoritarian: top-rated talks are chosen
no representation for small parties
each voter gets unlimited votes, 0-3 points
Plurality-at-large
- 1 vote total
Cumulative voting
- number of votes up to talks, but can allow multiple votes
Hacking
- the way done now, reduces to plurality at large
Fix
- limit points users can assign
- and/or only users to give one vote to teach talk
- or adopt a proportional representation system
Inspire by Numbers Rule: The Vexing Mathematics of Democracy

Lightning Talks That Didn’t Happen

Hillel Arnold – Occupy Wall Street Documentation
Jason Clark – BookMeUp (Book Suggestions App)
Jason Ronallo – Digital Collections, Crawling, and Aggregating Content

Code4lib Day 2: How People Search the Library from a Single Search Box

by Cory Lown, North Carolina State University

While there is only one search box, typically there are multiple tabs, which is especially true of academic libraries.

73% of searches from the home page start from the default tab
which was actually opposite of usability tests

Home grown federated search includes:

catalog
articles
journals
databases
best bets (60 hand crafted links based on most frequent queries e.g. Web of Science)
spelling suggestions
loaded links
FAQs
smart subjects

Show top 3-4 results with link to full interface.

Search Stats

From Fall 2010 and Spring 2011, ~739k searches 655k click-throughs

By section:

7.8% best bets (sounds very little, but actually a lot for 60 links)
41.5% articles, 35.2% books and media, 5.5% journals, ~10% everything else
23% looking for other things, e.g. library website
for articles: 70% first 3 results, other 30% see all results
trends of catalogue use is fairly stable, but articles peaks at the end of term

How to you make use of these results?

Top search terms are fairly stable over time. You can make the top queries work well for people (~37k) by using the best bets.

Single/default search signals that our search tools will just work.

It’s important to consider what the default search box doesn’t do, and doubly important to rescue people when they hit that point.

Dynamic results drive traffic. When putting few actual results, the use of the catalogue for books went up a lot compared to suggesting to use the catalogue.

Collecting Data

Custom log is being used right now by tracking searches (timestamp, action, query, referrer URL) and tracking click-throughs. An alternative might be to use Google Analytics.

For more, see the slides below or read the C&RL Article Preprint.