Code4LibBC Day 2: Lightning Talks

Second and final day of Code4Lib BC’s lightning talks. Here are my notes.

Mark Jordan – DOCR/SMD

Slides Source code

  • OCR clients – phones, tablets that do the work or just a plain script
  • page server – controls the work

Components

  • page server
    • PHP web app
    • PHP queue manager script
    • SQLite database
  • OCR clients
    • clients use Tesseract OCR engine (available for Android and iOS)
    • first client is a simple Python script

how it works

  • client: “im ready , give me an image”
  • server: here you go
  • client: OK here’s your text

What I Learned so far

  • REST
  • Slim microframework
  • SQLite

What I want to Learn

  • Android app development
  • potential for generalizing OCR to other tasks

Peter Tyrrell – Parsing PDFs

  • wanted to do search term highlight in PDFs
  • using NYT Document Viewer (under document cloud)
  • PDF2DJVU
  • DVJULIBRE (DJVU to TXT, to XML, to TIF)
  • Imagemagick (TIF to JPG)
  • and then a whole bunch of other things to store, process, pass into viewer

John Durno – Uploading to Internet Archive via API

Links

  • most ambitious digitization project: first 50 years of the British Colonist for the 150th anniversary
  • ~100k images in PDF
  • Acrobat would highlight for you if configured properly
  • revised site as part of the next stage to digitize next 100 years
  • Internet Archive: started digitizing newspapers from microfilm
  • collection setup within the IA
  • dumping content into IA can do it through API
  • can upload metadata with it
  • based on Amazon S3 API, a lot of tools already work with it
  • boto library with ia-wrapper
  • python script to upload
  • new site is a wrapper around the IA for search purposes
  • reading happens on IA’s site

Colleen Bell – ERM & LibGuides

  • use identifier for subject database lists to export JSON
  • use PHP script to process
  • add as remote script in libguides
  • can also do it for individual resources by ID
  • script pulls based on IDs separated by comma
  • don’t need libguides, can import into any page
  • can do this as long as you can get your data

James MacGregor – Article Metrics with OJS/OMP

  • PKP: scholarly publishing intitiative at SFU
  • Open Journal System: WordPress for journals
  • Open Monograph Press: OJS for presses
  • all software is open source
  • new: general overhaul statics framework and compatible with addition of PLOS metrics
  • had a stats framework overhaul: gathered centrally, added features
  • PKP ALM: application Ruby on Rails web app to aggregate article performance data, and plugin
  • shows HTML views, PDF downloads, facebook/mendeley shares, pubmed, and more

Jonathan Schatz – The Story of BC Libraries’ IT Environments

  • did field work going around to assess IT environment in BC Public Libraries within the Sitka group
  • had about 1 day per library
  • covered 3 federated libraries
  • connectivity and network main priorities
  • phones, internet, network, workstations/servers, printing, technical support, training
  • had to map wifi with laptop and umbrella
  • they do a lot with very little resources
  • sometimes get creative e.g. creating internet routers

Paul Joseph – UBC Digital Library Framework

Status Quo

  • 3 repositories: ContentDM, DSpace (IR), AtoM (for Rare books & special collections)
  • in addition: separate, collections e.g. Drupal, ElasticSearch
  • access and presentation in silos, defined by the applications
  • Tried to improve with in-page view, facets, in-context results, tiling, rapid zoom

Introduce framework

  • scalable, flexible
  • plug in metadata and full text (when possible)
  • use ElasticSearch
  • series interactions to provide access to metadata and objects
  • also service provider: OAI-PMH, Open Data API, Open Apps
  • rely on external services to leverage functionality e.g. RefWorks, Disqus (social commenting)

Calvin Mah / Todd Holbrook – SFU Library Hours Database

  • as part of the API
  • might be something they can host, but lack of enthusiasm
  • maybe could work with coop to host the tool
  • or host yourself at your institution
  • straight from UBC, but re-coded it
  • kept data model and end user look
  • can do usual hours, but can add exceptions using date range
  • feeds to API system, available as JSON data
  • Drupal widgets to drop almost anywhere

That’s it for today. Breakout time! When the time comes, have a safe trip home

Penguins Walking