Access 2012 Day 1: Afternoon Notes

Adventures in Linked Data: Building a Connected Research Environment

by Lisa Goddard

Linked data doesn’t just accommodate collaboration, it enforces collaboration. Need a framework that can handle a lot of data and scale.

Text data is really messy, because it doesn’t fit into a single category. Linked data should allow all of this.

Identify Top Level Entities

Main types of identities with mint URIs for entities include:

  • people
  • places
  • events
  • documents
  • annotations
  • books
  • organizations

Abstract away from implementation details to make it manageable in the long term.

Canonical URIs means that one ‘link’ is actually 3 depending on format through content navigation.

Define Relationships

Through RDF, make machine readable definitions.

Linked data is basically an accessibility initiative for machines.

Use ontologies to provide definitions for entities, relationships, and impose rules.

An ontology is for life.

Ontology searches are available, such as Linked Open Vocabularies (LOV), e.g. foaf:Person (Class) – friend of a friend

Tie the entity and class using rdf:type, such as creator. Which then results in a data model.

CWRC Writer

Provides a way to create a document, which provides an interface to tag in XML, where you can select existing authority file, the web (using APIs), or custom. You can then add relations.

Slides

Quick Comment

This looks like a really neat tool to easily add XML tags in a document. Would want to see it integrated into a standard document writer, much like RefWorks does through Write’n’Cite. I’m definitely looking forward to seeing this move forward.

Big Data, Answers, and Civil Rights

Alistair Croll

If you want volume, velocity, and variety, it’s actually very expensive.

Efficiency means lower costs, new uses, but more demand and consumption.

Big data is about abundance. The number of ways we can do things with this data has exploded.

We live in a world of abundant, instant, ubiquitous information. We evolved to seek peer approval. It all comes down to who is less dumb.

We look for confirmation rather than the truth.

The more we get confirmation, the greater the polarization.

Abundant data has change the way we live and think.

The Problem with Big Data

Polarization can lead to increase in prejudices. You don’t know when you’re not contacted. Increasingly moving from culture of convictions to a culture of evidence.

Genius says possibly. Finds pattern, inspires hypotheses, reason demands testing, but open to changes.

Correlation is so good at predicting that it looks like convincing facts, but they’re just guesses.

See also: Big data, big apple, big ethics by Alistair Croll

Break Time

BiblioBox: A Library in Box

by David Fiander

Inspired by PirateBox, which allows people to share media annonymously within a community using a standalone wiki router (not connected to the Internet). People in the same place like to share stuff.

LibraryBox then simplified by taking out chat and upload function.

Dedicated ebook device that allows browsing and searching of the collection.

Components:

  • Unix based file server using a wifi access point and small flash drive.
  • Ebooks using OPDS metadata format.
  • SQLite database
  • API module usually available in language of choice e.g. Python
  • Bottle – framework for web developing in Python
  • Mako Templating – templating in Python

Adding books much more complex than serving books. For example, author authority file. Want to automate taking out metadata from ePub files, but no good module for reading ePub files in Python.

User View

Add catalogue to ebook app. It then looks like a store, where you can browse by title or author.

Available on GitHub.

Question Answering, Serendipity, and the Research Process of Scholars in the Humanities

by Kim Martin

Serendipity occurs when there is a prepared mind that notices a piece that helps them solve a problem. It allows discovery and thinking outside of the box.

Chance is recognized as an important part of the historical research process.

Shelf browser of some sort in the catalogue can be useful, but what we really need in a system is something that allows personalization and in-depth searching. Researchers just do not typically leave their offices and use search engines.

Visualizations, such as tag clouds, could allow more serendipitous browsing.

More notes on the Access 2012 live blog.

Code4lib Day 2 Morning: Notes & TakeAways

I didn’t take full notes on all the presentations. I like to just sit back and listen to some of the presentations, especially if there are a lot of visuals, but I do have a few notes.

Full Notes for the following sessions:

Building Research Applications with Mendeley

by William Gunn, Mendeley

  • Number of tweets a PLoS article gets is a better predictor of number of citations than impact factor.
  • Mendeley makes science more collaborative and transparent. Great to organize papers and then extract and aggregate research data in the cloud.
  • Can use impact factor as a relevance ranking tool.
  • Linked Data right now by citation, but now have tag co-occurrences, etc.
  • Link to slides.

NoSQL Bibliographic Records: Implementing a Native FRBR Datasotre with Redis

No notes. Instead, have the link to the presentation complete with what looks like speaker notes.

Ask Anything!

  • Things not taught in library school: all the important things, social skills, go talk to the professor directly if you want to get into CS classes.
  • Momento project and UK Archives inserting content for their 404s.
  • In response to librarians lamenting loss of physical books, talk to faculty in digital humanities to present data mining etc., look at ‘train based’ circulations, look at ebook stats.
  • Take a look at libcatcode.org for library cataloguers learning to code as well as codeyear hosted by codeacademy.

Code4lib Day 1 Morning: HTML5, Microdata and Schema.org (and other takeaways)

I did not take notes on everything in part because some of it was very technical and it can be hard to do notes, but here are some takeaways from the morning:

  • Versioning Control: Use it, Git or Mercurial. Doesn’t need to be code, can be data too. – Description and Slides
  • Take library data and make it available to users, can’t expect them to search for it.
  • Linked Data doesn’t need to be a huge project. Start small.
  • Why RDF? It’s flexible with easy addition of new attributes or classes, and works cleanly with an iterative approach.

HTML5 Microdata and Schema.org

by Jason Ronallo

Other than getting good ranking, we need to provide rich results, i.e. rich snippets. Some digital collection have been providing rich snippets already, such as NCSU Libraries.

How do we get this?

  • embedded semantic markup
  • HTML5 Semantics include nav, header, article, section, footer
  • HTML5 Microdata is a syntax for annotating content to communicate meaning of data to machines
  • similar to RDFA, other microdata
  • Microdata comes back as tree based JSON and allows for DOM API

For example:

<div itemscope itemtype=”http://schema.org/Organization&#8221; itemref=”logo”>
<a itemprop=”url” href=”http://code4lib.org/”&gt;
<span itemprop=”name”>Code4Lib<\span>
</a>
</div>
where: scope = about something
type = type of item
prop = properties

For the user, there is no difference as display is the same. This provides a complete data model.

Schema.org  is a one-stop shop for vocabulary in describing items on the web.

Apologies, I did not take extensive notes on it, but to read more, check out the slides below or the Code4lib article he wrote.