Code4LibBC Day 2: Lightning Talks

Second and final day of Code4Lib BC’s lightning talks. Here are my notes.

Mark Jordan – DOCR/SMD

Components

page server
- PHP web app
- PHP queue manager script
- SQLite database
OCR clients
- clients use Tesseract OCR engine (available for Android and iOS)
- first client is a simple Python script

how it works

What I Learned so far

What I want to Learn

most ambitious digitization project: first 50 years of the British Colonist for the 150th anniversary
~100k images in PDF
Acrobat would highlight for you if configured properly
revised site as part of the next stage to digitize next 100 years
Internet Archive: started digitizing newspapers from microfilm
collection setup within the IA
dumping content into IA can do it through API
can upload metadata with it
based on Amazon S3 API, a lot of tools already work with it
boto library with ia-wrapper
python script to upload
new site is a wrapper around the IA for search purposes
reading happens on IA’s site

PKP: scholarly publishing intitiative at SFU
Open Journal System: WordPress for journals
Open Monograph Press: OJS for presses
all software is open source
new: general overhaul statics framework and compatible with addition of PLOS metrics
had a stats framework overhaul: gathered centrally, added features
PKP ALM: application Ruby on Rails web app to aggregate article performance data, and plugin
shows HTML views, PDF downloads, facebook/mendeley shares, pubmed, and more

did field work going around to assess IT environment in BC Public Libraries within the Sitka group
had about 1 day per library
covered 3 federated libraries
connectivity and network main priorities
phones, internet, network, workstations/servers, printing, technical support, training
had to map wifi with laptop and umbrella
they do a lot with very little resources
sometimes get creative e.g. creating internet routers

Status Quo

3 repositories: ContentDM, DSpace (IR), AtoM (for Rare books & special collections)
in addition: separate, collections e.g. Drupal, ElasticSearch
access and presentation in silos, defined by the applications
Tried to improve with in-page view, facets, in-context results, tiling, rapid zoom

Introduce framework

scalable, flexible
plug in metadata and full text (when possible)
use ElasticSearch
series interactions to provide access to metadata and objects
also service provider: OAI-PMH, Open Data API, Open Apps
rely on external services to leverage functionality e.g. RefWorks, Disqus (social commenting)

That’s it for today. Breakout time! When the time comes, have a safe trip home