- 90% of the world’s data was created in the last 2 years
- can tell us much that other information cannot
- emphasize the need for analysis and interpretation
- your data is mined and used to make decisions for you, even more so in the future
- to prepare, know that big data will affect data management, discovery tools, new jobs, revised skills requirements, and revised infrastructures
- businesses will be made up of who has the most data and knows how to best use it Continue reading “Digital Odyssey 2013: Big Data, Small World Notes & Takeaways”
Adventures in Linked Data: Building a Connected Research Environment
by Lisa Goddard
Linked data doesn’t just accommodate collaboration, it enforces collaboration. Need a framework that can handle a lot of data and scale.
Text data is really messy, because it doesn’t fit into a single category. Linked data should allow all of this.
Identify Top Level Entities
Main types of identities with mint URIs for entities include:
Abstract away from implementation details to make it manageable in the long term.
Canonical URIs means that one ‘link’ is actually 3 depending on format through content navigation.
Through RDF, make machine readable definitions.
Linked data is basically an accessibility initiative for machines.
Use ontologies to provide definitions for entities, relationships, and impose rules.
An ontology is for life.
Ontology searches are available, such as Linked Open Vocabularies (LOV), e.g. foaf:Person (Class) – friend of a friend
Tie the entity and class using rdf:type, such as creator. Which then results in a data model.
Provides a way to create a document, which provides an interface to tag in XML, where you can select existing authority file, the web (using APIs), or custom. You can then add relations.
This looks like a really neat tool to easily add XML tags in a document. Would want to see it integrated into a standard document writer, much like RefWorks does through Write’n’Cite. I’m definitely looking forward to seeing this move forward.
Big Data, Answers, and Civil Rights
If you want volume, velocity, and variety, it’s actually very expensive.
Efficiency means lower costs, new uses, but more demand and consumption.
Big data is about abundance. The number of ways we can do things with this data has exploded.
We live in a world of abundant, instant, ubiquitous information. We evolved to seek peer approval. It all comes down to who is less dumb.
We look for confirmation rather than the truth.
The more we get confirmation, the greater the polarization.
Abundant data has change the way we live and think.
The Problem with Big Data
Polarization can lead to increase in prejudices. You don’t know when you’re not contacted. Increasingly moving from culture of convictions to a culture of evidence.
Genius says possibly. Finds pattern, inspires hypotheses, reason demands testing, but open to changes.
Correlation is so good at predicting that it looks like convincing facts, but they’re just guesses.
See also: Big data, big apple, big ethics by Alistair Croll
BiblioBox: A Library in Box
Inspired by PirateBox, which allows people to share media annonymously within a community using a standalone wiki router (not connected to the Internet). People in the same place like to share stuff.
LibraryBox then simplified by taking out chat and upload function.
Dedicated ebook device that allows browsing and searching of the collection.
- Unix based file server using a wifi access point and small flash drive.
- Ebooks using OPDS metadata format.
- SQLite database
- API module usually available in language of choice e.g. Python
- Bottle – framework for web developing in Python
- Mako Templating – templating in Python
Adding books much more complex than serving books. For example, author authority file. Want to automate taking out metadata from ePub files, but no good module for reading ePub files in Python.
Add catalogue to ebook app. It then looks like a store, where you can browse by title or author.
Available on GitHub.
Question Answering, Serendipity, and the Research Process of Scholars in the Humanities
by Kim Martin
Serendipity occurs when there is a prepared mind that notices a piece that helps them solve a problem. It allows discovery and thinking outside of the box.
Chance is recognized as an important part of the historical research process.
Shelf browser of some sort in the catalogue can be useful, but what we really need in a system is something that allows personalization and in-depth searching. Researchers just do not typically leave their offices and use search engines.
Visualizations, such as tag clouds, could allow more serendipitous browsing.
More notes on the Access 2012 live blog.
Locked in the Cloud: What lies beyond the peak of inflated expectations
by John Durno & Corey Davis
Right now, the ‘cloud is quite the hype:
Getting Locked into the ‘Cloud’
Using cloud-based system might still be closed and locked down that is vendor-managed and based on a subscription model. Supposedly a ‘one stop’ solution. While many of the features sound positive, can have many drawbacks.
Numerous ways to be locked in
- institutional insertia/incumbent bias
Innovation can be stifled, because stuck with what the vendor provides. Switching is considered too costly and frequently entrenched in work culture.
One of the selling points is that you will save a lot of money with cloud computing. Many administrators seem convinced that it’s about managing information, not technology, but you cannot manage information without managing technology.
Why is our backroom workflow so tightly tied to a public service point?
The problem is that even if something better comes along, you might not go with it, because it would be too cumbersome to migrate.
Have an Exit Strategy
While we need a standard to switch, this is still being worked on. Need to know the cost of moving away from the current/new system.
- limited functionality
- limited access to data
- can be changed or deprecated
Still not the solution. Need unmediated access to data
- high switching costs
- escalating subscription costs
- interoperability issues
- dwindling innovation
- limited choice
There are in fact alternatives and something to look forward to. The ‘fabled’ innovative system.
See also: Hacking 360 Link: A Hybrid Approach by John Durno on substituting vendor link resolver.
More notes on the Access 2012 live blog.
MJ Suhonos and Peter Van Garderen from Artefactual Systems did a talk on big data in libraries. In particular, I was interested in some of the points MJ talked about on big data. Here are my notes:
- relative: 1980: 2.5GB = big data
- definition: datasets that grow so large, become difficult to work with
- big data is… big, and complicated
- maybe we’ve simply been putting a square big in a round hole
- don’t believe the cloud hype
- big data is less about size, and more about freedom
- open source tools + distributed design = new opportunities