Code4Lib 2014: Day 1 Afternoon

Afternoon of Day 1 of Code4lib 2014.

Structured Data NOW: seeding schema.org in library systems – Dan Scott

Bags of words are hard.

Consistent flaw:

Wanted to make my library system part of the Semantic web, but
* XML
etc.

Schema.org was introduced

offer simple vocabulary for short tail of results (events, products, people)
enable normals to add markup without experts, with lots of examples
enable search engines to aggregate data and apply better disambiguation and relevance strategies

Baby Steps

Evergreen was publishing simplistic title/author/keyword via microdata
OCLC WorldCat also started publishing rich, heavily extended schema.org via JSON
If your holdings are not in OCLC, you’re not linked to Google Books

Iterate Towards Linked Data

Being enriching data using web standard
* persistent URIs
* HTML5
* RDFa (or microdata) expressing schema.org
* sitemaps listing all the URIs of interest

W3C Library Linked Data Incubator Group Report says many of the same things, so go read it.

Reality Check

Ronallo found American academic libraries published under 10k shcema.org instances in total.

RDFa

Lite is pared down to just 5 attributes. Microdata is roughly equivalent form of inline markup. Provide information on type and property.

Test with structured data extracted from a page.

SchemaBibEx

Look at what needed to be extended to bring into schema.org proper. The idea was to make it for all articles and library items.

Mapping

Mapping holdings to schema.org offers
* seller = library
* sku = call number
* serialNumber = barcode

Periodicals
* article type
* periodical extension with PublicationIssue, PublicationVolume, Periodical, Book
* currently under consideration by schema.org

Status

Stopped making new extensions, and looking at best practices, documentation, etc.

Now being published by Koha, VuFind, and about to be published in Evergreen.

Slides

Next Generation Catalogue – RDF as a Basis for New Services – Anne-Lena Westrum, Benjamin Rokseth, Asgeir Rekkavik, and Petter Goksøyr Åsen

4 years ago, we were living inside the black box of the ILS.

One example search for an author, providing 851 results but should be 40 results.

70% of material in stacks, so rely on OPAC to find what they have.

In 2017, will be in a new building. Open digital mediation centre.

Have chosen to move away from MARC. User centred services

Active shelves = physical touchscreen device. Shelf reads RDIF, present information that is relevant to the book e.g. reviews, stuff by the same author (only one edition instead of all), similar books

Collected book recommendations around the country into one RDF store, connected to books via ISBN. Can query database for recommended books.

Move From Black Box to Open System Architecture

Started preparation. System makes user choose specific edition of a book.

> This kind of user experience is like going to a library and being helped by a librarian who is a complete idiot
Need to add common sense to the system.

MARC2RDF
* open source tool kit [code]/code
* conversion from MARC bibliographic data to RDF statements
* enrich data with external content from various APIs and linked open data e.g. cover images, book reviews
* can control multiple groups of data and multiple mapping files
* can add conditional choices

RDF2MARC
* Still going to need MARC records for several purposes e.g. circulation, ILL

More Like This: Approaches to Recommending Related Items using Subject Headings – Kevin Beswick

Recommendations for more serendipitous discovery in part because using ASRS (bookbot).

Did it based on subject headings, most subject terms, weighted subject terms.

Built with Python/Flask App, Solr/SolrMARC.

The most headings and most terms algorithms looked to be producing decent recommendations (first headings too few results), and weighting differs based on subject or user interests which is impossible without user input

Tested algorithms using blind ranking and qualitative comments on result sets of 10. Most subject terms (esp. longer/more headings) better than most headings (better for shorter/fewer headings), but wanted less in the 0-5 range. Found that gov docs and fiction have thematic recommendations can’t achieve with shelf browse.

Found a lot of duplicate titles (different editions, print & electronic). Poorly assigned subject headings can cause issues. Interface considerations include integration on full record of an item of 5 at a time.

Takeaways

overall algorithms perform decently but could improve
but depends on how your items are catalogued
still under active development

Breakouts Time

Don’t be scared to participate!