Talking about the NCSU hackerspace focusing on gaming and virtual space.
Created their own cloud for students and faculty.
Providing technology as a core library service.
This presentation had a lot to do with showing off some new spaces, so it’s hard to put into words, but here’s a visual tour of the Hunt Library.
Sharing the Unshareable – Dental Clinic Images in a University Image Repository
by Janet Rothney
Drawers of slides that can only be used within the university (not public).
Have to include experts, because library staff don’t know what’s going on in the images. Also, needed something university-wide to make the repository live longer.
Fedora, Drupal, Islandora, through discovery garden hosted on Amazon and jura(sp?) space.
MeSH wasn’t specific enough, so chose crossopedia (sp?) which is a specialized controlled vocabulary for dentistry. Had path chart for tagging including all options and what went first.
Currently using shared drive in order to restrict use.
Can track patients by number without identifying patient.
ID is required to access the system.
Hope to later share structure and process with other dental organizations and groups.
The distinction between “the internet” & “books” is totally arbitrary, and will disappear in 5 years.
The book is defined as a discrete coherent collection. The boundaries of a book are critically important. The creator’s intention is important. The book is intended to be coherent.
Why are books important?
A book is an best effort at providing all you need to know or feel on a particular topic.
Books govern our knowledge. A book is a:
node of knowledge.
That knowledge (and nodes) shapes our world. It’s the fabric of our world.
Books are a Network
The web was built to transform information. There is an expression of value based on links (what something is linked from and linked to). Each book should have a URL on the internet, and live natively on the web.
Wikipedia might be the best example. An article is edited by multiple authors, links to other articles to give context, and citations begin to organize external sources. Brings together knowledge into a container that makes these pieces useful.
If the web is the most efficient way to disseminate information, and books are the nodes of knowledge, why are books not published online?
What is the Business Model?
No pressing business case to publish directly online. The disruption that this will cause when it happen will be huge.
What does information want?
Information wants to be free. – Stewart Brand
Information wants to be used, and it doesn’t trust you to know how to use it correctly.
Why are books kept off to the side, apart from the network? The Business Model.
Example (of Business Model)
Build books as a web object, then make it downloadable, printable, etc.
Engagement around the idea happened through twitter, an article on theguardian which discusses the idea from the chapter, and a lot of referrals happened.
On the side: provided analytics for web version of the book.
A webbook can generate interest in ways that an ebook cannot. The ideas in a webbook spread far more quickly, and far more easily than an ebook. Easier to find, built in analytics, and can have different business models.
You have a direct connection with the reader. Whereas with an ebook, you’re one step away.
Why Books Will Live on the Web
not defined by format
most important nodes of knowledge
web is most efficient technology to share
disseminate, find, use, build on in ways cannot be imagined by the original
The Process/Model
Creating an ebook is still hard, but online tools like PressBooks/Vook/Atavist/Booktype make it (almost) trivial and free.
The avalanche of books will be overwhelming once the tools become widely known. Other kinds of book writing activity will gain relevance, meaning more and more writing will be “out there”.
More ebooks means each book is harder to find in promotion, discovery, etc. This ultimately means that those who connect with their readers better will win (which is what the web is great at).
Will ask why there even needs to be a business model, which will bring many books online.
Zero to 50K in Three Weeks: Building a Digital Repository from Scratch, Fast
by Brianne Selman
Decided within a day to build a digital repository. While had previously thought about digitizing materials, there had been no repository to put it in. Note: No web programming support in house.
Process:
looked into archival standard
brainstormed
call out and identification of potential content
invited public in to scan personal artifacts (e.g. postcards) of local history
quick preparation of a budget and project plan (to keep money in the library)
met with collector and historian to talk about content and how it would displayed
first priority: collaboration on images (to show off knowledge)
met with scanning consultant to provide and discuss preliminary metadata
met with director and head of IT
At the three week mark, had not spent anything, but had created plan, which convinced
Hurdles:
ITS Expenditure Request
Software RFP (set evaluation matrix with extra weighting on OAI, etc.)
Purchased software and paid for scanning ahead of time. Ended up with ContentDM and at this point, done some scanning, added controlled vocabularies, test PDFs, contacted Canadiana.
[gigya id=”preziEmbed_rbajjjd-0rv5″ name=”preziEmbed_rbajjjd-0rv5″ src=”http://prezi.com/bin/preziloader.swf” type=”application/x-shockwave-flash” allowfullscreen=”true” allowFullScreenInteractive=”true” allowscriptaccess=”always” width=”550″ height=”400″ bgcolor=”#ffffff” flashvars=”prezi_id=rbajjjd-0rv5&lock_to_path=1&color=ffffff&autoplay=no&autohide_ctrls=0″] Zero to 50k in Three Weeks on Prezi
Open Source OCR for Large Collections of Scanned Documents
Python has good image support, then use MapReduce and Hadoop Streaming to coordinate tasks and machines (but use very odd ports).
Abbyy works well if images vary and no consistent approach to cleaning, have non-flexible windows environment, can do processing on one station, and one-off project that needs to get done in a hurry.
Linked data doesn’t just accommodate collaboration, it enforces collaboration. Need a framework that can handle a lot of data and scale.
Text data is really messy, because it doesn’t fit into a single category. Linked data should allow all of this.
Identify Top Level Entities
Main types of identities with mint URIs for entities include:
people
places
events
documents
annotations
books
organizations
Abstract away from implementation details to make it manageable in the long term.
Canonical URIs means that one ‘link’ is actually 3 depending on format through content navigation.
Define Relationships
Through RDF, make machine readable definitions.
Linked data is basically an accessibility initiative for machines.
Use ontologies to provide definitions for entities, relationships, and impose rules.
An ontology is for life.
Ontology searches are available, such as Linked Open Vocabularies (LOV), e.g. foaf:Person (Class) – friend of a friend
Tie the entity and class using rdf:type, such as creator. Which then results in a data model.
CWRC Writer
Provides a way to create a document, which provides an interface to tag in XML, where you can select existing authority file, the web (using APIs), or custom. You can then add relations.
This looks like a really neat tool to easily add XML tags in a document. Would want to see it integrated into a standard document writer, much like RefWorks does through Write’n’Cite. I’m definitely looking forward to seeing this move forward.
If you want volume, velocity, and variety, it’s actually very expensive.
Efficiency means lower costs, new uses, but more demand and consumption.
Big data is about abundance. The number of ways we can do things with this data has exploded.
We live in a world of abundant, instant, ubiquitous information. We evolved to seek peer approval. It all comes down to who is less dumb.
We look for confirmation rather than the truth.
The more we get confirmation, the greater the polarization.
Abundant data has change the way we live and think.
The Problem with Big Data
Polarization can lead to increase in prejudices. You don’t know when you’re not contacted. Increasingly moving from culture of convictions to a culture of evidence.
Genius says possibly. Finds pattern, inspires hypotheses, reason demands testing, but open to changes.
Correlation is so good at predicting that it looks like convincing facts, but they’re just guesses.
Inspired by PirateBox, which allows people to share media annonymously within a community using a standalone wiki router (not connected to the Internet). People in the same place like to share stuff.
LibraryBox then simplified by taking out chat and upload function.
Dedicated ebook device that allows browsing and searching of the collection.
Components:
Unix based file server using a wifi access point and small flash drive.
Adding books much more complex than serving books. For example, author authority file. Want to automate taking out metadata from ePub files, but no good module for reading ePub files in Python.
User View
Add catalogue to ebook app. It then looks like a store, where you can browse by title or author.
Serendipity occurs when there is a prepared mind that notices a piece that helps them solve a problem. It allows discovery and thinking outside of the box.
Chance is recognized as an important part of the historical research process.
Shelf browser of some sort in the catalogue can be useful, but what we really need in a system is something that allows personalization and in-depth searching. Researchers just do not typically leave their offices and use search engines.
Visualizations, such as tag clouds, could allow more serendipitous browsing.
Locked in the Cloud: What lies beyond the peak of inflated expectations
by John Durno & Corey Davis
Right now, the ‘cloud is quite the hype:
Getting Locked into the ‘Cloud’
Using cloud-based system might still be closed and locked down that is vendor-managed and based on a subscription model. Supposedly a ‘one stop’ solution. While many of the features sound positive, can have many drawbacks.
Numerous ways to be locked in
data
software
API
institutional insertia/incumbent bias
Innovation can be stifled, because stuck with what the vendor provides. Switching is considered too costly and frequently entrenched in work culture.
One of the selling points is that you will save a lot of money with cloud computing. Many administrators seem convinced that it’s about managing information, not technology, but you cannot manage information without managing technology.
Why is our backroom workflow so tightly tied to a public service point?
The problem is that even if something better comes along, you might not go with it, because it would be too cumbersome to migrate.
Have an Exit Strategy
While we need a standard to switch, this is still being worked on. Need to know the cost of moving away from the current/new system.
APIs
limited functionality
limited access to data
can be changed or deprecated
Still not the solution. Need unmediated access to data
Caveat Emptor
high switching costs
escalating subscription costs
interoperability issues
dwindling innovation
limited choice
There are in fact alternatives and something to look forward to. The ‘fabled’ innovative system.