Access 2012 Day 1: Afternoon Notes

Adventures in Linked Data: Building a Connected Research Environment

by Lisa Goddard

Linked data doesn’t just accommodate collaboration, it enforces collaboration. Need a framework that can handle a lot of data and scale.

Text data is really messy, because it doesn’t fit into a single category. Linked data should allow all of this.

Identify Top Level Entities

Main types of identities with mint URIs for entities include:

  • people
  • places
  • events
  • documents
  • annotations
  • books
  • organizations

Abstract away from implementation details to make it manageable in the long term.

Canonical URIs means that one ‘link’ is actually 3 depending on format through content navigation.

Define Relationships

Through RDF, make machine readable definitions.

Linked data is basically an accessibility initiative for machines.

Use ontologies to provide definitions for entities, relationships, and impose rules.

An ontology is for life.

Ontology searches are available, such as Linked Open Vocabularies (LOV), e.g. foaf:Person (Class) – friend of a friend

Tie the entity and class using rdf:type, such as creator. Which then results in a data model.

CWRC Writer

Provides a way to create a document, which provides an interface to tag in XML, where you can select existing authority file, the web (using APIs), or custom. You can then add relations.

Slides

Quick Comment

This looks like a really neat tool to easily add XML tags in a document. Would want to see it integrated into a standard document writer, much like RefWorks does through Write’n’Cite. I’m definitely looking forward to seeing this move forward.

Big Data, Answers, and Civil Rights

Alistair Croll

If you want volume, velocity, and variety, it’s actually very expensive.

Efficiency means lower costs, new uses, but more demand and consumption.

Big data is about abundance. The number of ways we can do things with this data has exploded.

We live in a world of abundant, instant, ubiquitous information. We evolved to seek peer approval. It all comes down to who is less dumb.

We look for confirmation rather than the truth.

The more we get confirmation, the greater the polarization.

Abundant data has change the way we live and think.

The Problem with Big Data

Polarization can lead to increase in prejudices. You don’t know when you’re not contacted. Increasingly moving from culture of convictions to a culture of evidence.

Genius says possibly. Finds pattern, inspires hypotheses, reason demands testing, but open to changes.

Correlation is so good at predicting that it looks like convincing facts, but they’re just guesses.

See also: Big data, big apple, big ethics by Alistair Croll

Break Time

BiblioBox: A Library in Box

by David Fiander

Inspired by PirateBox, which allows people to share media annonymously within a community using a standalone wiki router (not connected to the Internet). People in the same place like to share stuff.

LibraryBox then simplified by taking out chat and upload function.

Dedicated ebook device that allows browsing and searching of the collection.

Components:

  • Unix based file server using a wifi access point and small flash drive.
  • Ebooks using OPDS metadata format.
  • SQLite database
  • API module usually available in language of choice e.g. Python
  • Bottle – framework for web developing in Python
  • Mako Templating – templating in Python

Adding books much more complex than serving books. For example, author authority file. Want to automate taking out metadata from ePub files, but no good module for reading ePub files in Python.

User View

Add catalogue to ebook app. It then looks like a store, where you can browse by title or author.

Available on GitHub.

Question Answering, Serendipity, and the Research Process of Scholars in the Humanities

by Kim Martin

Serendipity occurs when there is a prepared mind that notices a piece that helps them solve a problem. It allows discovery and thinking outside of the box.

Chance is recognized as an important part of the historical research process.

Shelf browser of some sort in the catalogue can be useful, but what we really need in a system is something that allows personalization and in-depth searching. Researchers just do not typically leave their offices and use search engines.

Visualizations, such as tag clouds, could allow more serendipitous browsing.

More notes on the Access 2012 live blog.

Access 2012 Day 1: Notes on Locked in the Cloud

Locked in the Cloud: What lies beyond the peak of inflated expectations

by John Durno & Corey Davis

Right now, the ‘cloud is quite the hype:

Getting Locked into the ‘Cloud’

Using cloud-based system might still be closed and locked down that is vendor-managed and based on a subscription model. Supposedly a ‘one stop’ solution. While many of the features sound positive, can have many drawbacks.

Numerous ways to be locked in

  • data
  • software
  • API
  • institutional insertia/incumbent bias

Innovation can be stifled, because stuck with what the vendor provides. Switching is considered too costly and frequently entrenched in work culture.

One of the selling points is that you will save a lot of money with cloud computing. Many administrators seem convinced that it’s about managing information, not technology, but you cannot manage information without managing technology.

Why is our backroom workflow so tightly tied to a public service point?

The problem is that even if something better comes along, you might not go with it, because it would be too cumbersome to migrate.

Have an Exit Strategy

While we need a standard to switch, this is still being worked on. Need to know the cost of moving away from the current/new system.

APIs

  • limited functionality
  • limited access to data
  • can be changed or deprecated

Still not the solution. Need unmediated access to data

Caveat Emptor

  • high switching costs
  • escalating subscription costs
  • interoperability issues
  • dwindling innovation
  • limited choice

There are in fact alternatives and something to look forward to. The ‘fabled’ innovative system.

See also: Hacking 360 Link: A Hybrid Approach by John Durno on substituting vendor link resolver.

More notes on the Access 2012 live blog.

Access 2012 Day 1: Ignite Talk – Social Feed Manager

To collect social media data (especially Twitter), researchers are doing this manually (possibly by proxy).

 

Some paid options to collect the data:

  • DataSift
  • Gnip
  • Topsy

Friendly, but not cheap, and more than what we need. Still need tools to collect, process, etc.

What researchers ask for:

  • specific users, keywords
  • historic time periods
  • basic values: user, date, text, counts
  • delimited files to import

We can do this free with APIs.

Built Social Feed Manager with features

  • Users by Item Count with temporal graphs
  • Details on user
  • can export to CSV files
  • hashtag queries by 10 minutes
  • search function with 1000

Free on github

  • python/django
  • user timelines, filter, sample, search
  • simple display with export for user timelines

Leaves out:

  • historical tweets
  • tweets beyond last 3200

By @dchud

More notes on the Access 2012 live blog.

Access 2012: Opening Keynote – We Were Otaku: before it was cool

Aaron Cope

Archives (and libraries), where things are frequently obsessively collecting, are just like what happens with otakus.

Curating: the act of choosing, e.g. flickr galleries

The Economics

Time is money. Stand-in that something that takes time has the greatest value, but the counter is no longer true. Can no longer say that something quickly and cheaply made has little value. e.g. maps

Collapsing Distinctions

Distinction between museums and archives (and libraries) are collapsing. Assumption that archives are the basement of museums. What’s happening is a kind of mushing. Blur in whether looking at archives or showcase, especially in digital realm.

Expectations

Efficiency of storage and retrieval at Amazon (robotic system). Allows you to get something delivered the next day. Makes possible a kind of expectation that the web has. If we can make it happen for trivial things, we’re going to want to make it happen for important things.

It’s About the Users

If people can’t get to it or see it, why are we keeping it? Why is it important? It is no doubt difficult to provide access to physical objects, but doesn’t mean we cannot. We can simply talk about our collection and why we have them. It’s about keeping open a narrative space. We are the timekeepers.

Trust. Users. Delivery. -gov.uk

There is no (final) design, there is only reckoning.

It is everyone else that is letting us do this. We are held to a higher standard. We have to trust our users even if it’s not on our terms. No uniform motive. e.g. Add a random button Cannot assume either the same level of expertise. e.g. Making objects, first class objects that are URLs.

The proxies are important to get people in the door, to see the physical objects. The proxies also provide a broader surface for discussion and conceptualization. Not everyone also has the luxury of travel.

It’s about being present on the network, and allowing things to happen.

The unit of measure of what is important has changed. e.g. foursquare as building registry.

It’s Messy

Ultimately, we need to think about how we share things with people, and allowing people to interact with them. Keeping something safe vs. canonizing.

More notes on the Access 2012 live blog.

Access 2012 Pre-Conference: Learning Python

Today’s preconference session was a great way to force me to learn a bit of Python. The very basics were somewhat of a review since I read the first couple of chapters of the recommended book and I actually already knew much of it, but for those interested in knowing, here’s what we learned.

The Book

Much of the material can be found in Think Python: How to Think Like a Computer Scientist by Allen B. Downey.

Another resource: Cheatsheet of common syntax and data structures

The Basics

We covered the basics including:

  • types (string, int, float)
  • arithmetic
  • concatenation
  • values, variables, expressions
  • arguments and basic functions
  • for loop

Read chapters 1-3 (and do the exercises) and you’ll cover it all.

Turtle World

Had some fun drawing with ‘Bob’ the turtle.

This is covered in chapter 4 of the book.

Conditionals and Recursion

We then covered the slightly less than basic of:

  • modulus
  • Boolean expressions
  • conditionals
  • recursions

See chapter 5 of the book.

At the End of the Day

Honestly, the session wasn’t exactly bad, but I think I would’ve learned more by being sat down and simply being told to follow the book. We didn’t have a bad instructor, but I would want to get more than just what the book tells you.

A simple example would be how to get the full list of functions in TurtleWorld for us to play around rather than just telling us the couple functions that are expected in the one or two exercises.

Overall, a good session if you’re a real beginning with absolutely no programming background, but I think that 90+% of the group would have benefited from a much faster pace session. Other than recursion, I noticed that almost all the other times, people around me were doing other things. So, good instructor and session, just too easy for many.