linked data – Learning (Lib)Tech

Access 2014: Day 3 Notes

Final half day of Access 2014. Continue reading “Access 2014: Day 3 Notes”

Access 2014 Day 2: Afternoon Notes

We continue with the afternoon of day 2 of Access 2014. Continue reading “Access 2014 Day 2: Afternoon Notes”

BCLA 2014: Linked Open Data and Libraries. When? Or NOW!

Panel of 3 on linked open data.

Continue reading “BCLA 2014: Linked Open Data and Libraries. When? Or NOW!”

Code4Lib Day 3: Closing Keynote – Gordon Dunsire

Granularity in Library Linked Open Data

Slides

Fractals

self-similar at all levels of granularity
each circle represents of things that look very similar (snowflake looking pattern but of different sizes)
characteristic of fractals
cannot determine level: all levels are equal, some more equal than others

Multi-Faceted Granularity

What is described by a bibliographic record? or a single statement?
What is the level of description? How complete is it? e.g. AACR2
How detailed is the schema used? How dumb? – especially relevant right now. The more detailed, the higher level of granularity possible.
Semantic constraints? Unconstrained?

Resource Description Framework – Linked Data

Triple: This resource | has intended audience | Juvenile
Subject / Predicated / Object
do each of these parts have granularity?
higher/lower level, but should talk about coarse or fine grained granularity

Subject: What is the Statement About?

we can focus on description an article / resource / work, then think about coarser or finer granularity:
- coarser: consortium collection / RDF map
- library collection / digital collection
- super-aggregate journal title / jurnal index
- aggregate: issue / festschrift
- focus on description an article / resource / work
- component: section / graphics / page
- sub-component: paragraph / markup
- finer: word rdf/xml
- uri / node

Predicate: What is the Aspect Described?

similar coarse/fine breakdown:
- membership category
- access to resource
- access to content
- suitability rating
- audience and usage
- audience
- audience of audio-visual material
diagram: possible audience map (partial) – unconstrained version to avoid collisions of isbd/dct/schema/rda/m21/frbrer
different links can be made while still retain proper semantic links
currently constructing just one giant graph

What is the Aspect Described?

coarse to fine:
- resource record
- manifestation record
- title and s.o.r
- title statement
- title of manifestation
- title word
- first word of title
why do librarians need so many titles? Why not just use dublin core title and be done with it? Because we need it to do our work e.g. spine title to browse
title = string identifier
RDA: what to do with this? how do we apply these needs?
possible semantic map (partial) – I won’t even try to reproduce this
need to take into account names and ranges
make it more difficult, but more powerful

Semantic Reasoning: The Sub-Property Ladder

this is where the graph becomes useful and property
machines can’t reason, so we’re demantic the semantics such that we can give the rules to machines to process our data
semantic rule:
- if property1 sub-property of property2;
- then data triple: resource property1 “string”
- implies data triple: resource property2 “string”
otherwise, data triple remains the same
simple enough for computer to carry out
doesn’t matter how complex the map actually is, because it can still do it in matters of seconds
machine entailment: isbd” “hast title proper” (finer) -> dct: “has title” (coarser)
might sound simple, but making a computer do interferance
‘dumb(ing)-up, data has been lost, but still meaningful – moved from one schema to another

Data Triples from Multiple Schema / Entailed from Sub-Property Map / rom Property Domains

frbrer: “has intended audience” – “primary school”
isbd: “has note on use or audience” – “for ages 5-9”
rda: “intended audience (work)” – “for children aged 7-“
m21: “target audience” 0> m21terms: -> “Juvenile”
definition attached to the vocabulary
also talking about granularity
can map the sub-property to top level of unc: “has note on use or audience”
“is a” frbrer: “work”, isbd: “resource”, rda: “work” – rda and frbr schema actually separate, not semantically linked even though vocabulary is similar and RDA is based on FRBR
once stabalized can be drawn from each other

What is the Aspect Described?

coarser to finer:
- creator
- author
- screenwriting
- animation screenwriting
- children’s cartoon screenwriting
different controlled vocabulary
graph of RDA for author/creator/screenwriting in relation to work and agent
graph of same thing, but for dc for creator and agent
what is the semantic relationship between the dct creator and the rda creator?
marcrel author maps to dc contributor, not creator – what is the relationship between rda author and marcrel author?
decision from 2005, needs to be reappraised and reviewed
relationship between dc creator and dc contributor?
how does lcsh “screenwriters” fit?

Machine-Generated Granularity

also has issues
e.g. full-text indexing: down to the word level
BabelNet: A very large multilingual ontology
can get quite complex and granular

User-Generated Granularity

users can actually generate useful metadata
can use statistical methods to remove extremes and come back with consensus
going to cause granularity problems e.g. “OK for my kids (7 and 9)”, “Too childish for me (age 14)”

KISS

keep it simple, stupid
keep it simple and stupid?
data model is very simple: triples!
in terms of complexity, actually very simple
but metadata content is complex
and therefore, resource discovery is complex
complex structure of application of simple rules, similar in the hard sciences and math
simplicity is elegance

AAA

Anyone can say anything about any thing
someone will say something about every thing
in every conceivable way
and then constrained linguistically

OWA

open world assumption: the absence of a statement is not a statement of non-existence

Will it get so granular that it becomes too complex?

And the rest is science

Break Time

tiny octopus — How Fine Can an Octopus be?

Access 2012 Day 1: Afternoon Notes

Adventures in Linked Data: Building a Connected Research Environment

by Lisa Goddard

Linked data doesn’t just accommodate collaboration, it enforces collaboration. Need a framework that can handle a lot of data and scale.

Text data is really messy, because it doesn’t fit into a single category. Linked data should allow all of this.

Identify Top Level Entities

Main types of identities with mint URIs for entities include:

people
places
events
documents
annotations
books
organizations

Abstract away from implementation details to make it manageable in the long term.

Canonical URIs means that one ‘link’ is actually 3 depending on format through content navigation.

Define Relationships

Through RDF, make machine readable definitions.

Linked data is basically an accessibility initiative for machines.

Use ontologies to provide definitions for entities, relationships, and impose rules.

An ontology is for life.

Ontology searches are available, such as Linked Open Vocabularies (LOV), e.g. foaf:Person (Class) – friend of a friend

Tie the entity and class using rdf:type, such as creator. Which then results in a data model.

CWRC Writer

Provides a way to create a document, which provides an interface to tag in XML, where you can select existing authority file, the web (using APIs), or custom. You can then add relations.

Slides

Quick Comment

This looks like a really neat tool to easily add XML tags in a document. Would want to see it integrated into a standard document writer, much like RefWorks does through Write’n’Cite. I’m definitely looking forward to seeing this move forward.

Big Data, Answers, and Civil Rights

Alistair Croll

If you want volume, velocity, and variety, it’s actually very expensive.

Efficiency means lower costs, new uses, but more demand and consumption.

Big data is about abundance. The number of ways we can do things with this data has exploded.

We live in a world of abundant, instant, ubiquitous information. We evolved to seek peer approval. It all comes down to who is less dumb.

We look for confirmation rather than the truth.

The more we get confirmation, the greater the polarization.

Abundant data has change the way we live and think.

The Problem with Big Data

Polarization can lead to increase in prejudices. You don’t know when you’re not contacted. Increasingly moving from culture of convictions to a culture of evidence.

Genius says possibly. Finds pattern, inspires hypotheses, reason demands testing, but open to changes.

Correlation is so good at predicting that it looks like convincing facts, but they’re just guesses.

See also: Big data, big apple, big ethics by Alistair Croll

Break Time

BiblioBox: A Library in Box

by David Fiander

Inspired by PirateBox, which allows people to share media annonymously within a community using a standalone wiki router (not connected to the Internet). People in the same place like to share stuff.

LibraryBox then simplified by taking out chat and upload function.

Dedicated ebook device that allows browsing and searching of the collection.

Components:

Unix based file server using a wifi access point and small flash drive.
Ebooks using OPDS metadata format.
SQLite database
API module usually available in language of choice e.g. Python
Bottle – framework for web developing in Python
Mako Templating – templating in Python

Adding books much more complex than serving books. For example, author authority file. Want to automate taking out metadata from ePub files, but no good module for reading ePub files in Python.

User View

Add catalogue to ebook app. It then looks like a store, where you can browse by title or author.

Available on GitHub.

Question Answering, Serendipity, and the Research Process of Scholars in the Humanities

by Kim Martin

Serendipity occurs when there is a prepared mind that notices a piece that helps them solve a problem. It allows discovery and thinking outside of the box.

Chance is recognized as an important part of the historical research process.

Shelf browser of some sort in the catalogue can be useful, but what we really need in a system is something that allows personalization and in-depth searching. Researchers just do not typically leave their offices and use search engines.

Visualizations, such as tag clouds, could allow more serendipitous browsing.

More notes on the Access 2012 live blog.