Second and final day of Code4Lib BC’s lightning talks. Here are my notes.
Mark Jordan – DOCR/SMD
- OCR clients – phones, tablets that do the work or just a plain script
 - page server – controls the work
 
Components
- page server
- PHP web app
 - PHP queue manager script
 - SQLite database
 
 - OCR clients
- clients use Tesseract OCR engine (available for Android and iOS)
 - first client is a simple Python script
 
 
how it works
- client: “im ready , give me an image”
 - server: here you go
 - client: OK here’s your text
 
What I Learned so far
- REST
 - Slim microframework
 - SQLite
 
What I want to Learn
- Android app development
 - potential for generalizing OCR to other tasks
 
Peter Tyrrell – Parsing PDFs
- wanted to do search term highlight in PDFs
 - using NYT Document Viewer (under document cloud)
 - PDF2DJVU
 - DVJULIBRE (DJVU to TXT, to XML, to TIF)
 - Imagemagick (TIF to JPG)
 - and then a whole bunch of other things to store, process, pass into viewer
 
John Durno – Uploading to Internet Archive via API
- most ambitious digitization project: first 50 years of the British Colonist for the 150th anniversary
 - ~100k images in PDF
 - Acrobat would highlight for you if configured properly
 - revised site as part of the next stage to digitize next 100 years
 - Internet Archive: started digitizing newspapers from microfilm
 - collection setup within the IA
 - dumping content into IA can do it through API
 - can upload metadata with it
 - based on Amazon S3 API, a lot of tools already work with it
 - boto library with ia-wrapper
 - python script to upload
 - new site is a wrapper around the IA for search purposes
 - reading happens on IA’s site
 
Colleen Bell – ERM & LibGuides
- use identifier for subject database lists to export JSON
 - use PHP script to process
 - add as remote script in libguides
 - can also do it for individual resources by ID
 - script pulls based on IDs separated by comma
 - don’t need libguides, can import into any page
 - can do this as long as you can get your data
 
James MacGregor – Article Metrics with OJS/OMP
- PKP: scholarly publishing intitiative at SFU
 - Open Journal System: WordPress for journals
 - Open Monograph Press: OJS for presses
 - all software is open source
 - new: general overhaul statics framework and compatible with addition of PLOS metrics
 - had a stats framework overhaul: gathered centrally, added features
 - PKP ALM: application Ruby on Rails web app to aggregate article performance data, and plugin
 - shows HTML views, PDF downloads, facebook/mendeley shares, pubmed, and more
 
Jonathan Schatz – The Story of BC Libraries’ IT Environments
- did field work going around to assess IT environment in BC Public Libraries within the Sitka group
 - had about 1 day per library
 - covered 3 federated libraries
 - connectivity and network main priorities
 - phones, internet, network, workstations/servers, printing, technical support, training
 - had to map wifi with laptop and umbrella
 - they do a lot with very little resources
 - sometimes get creative e.g. creating internet routers
 
Paul Joseph – UBC Digital Library Framework
Status Quo
- 3 repositories: ContentDM, DSpace (IR), AtoM (for Rare books & special collections)
 - in addition: separate, collections e.g. Drupal, ElasticSearch
 - access and presentation in silos, defined by the applications
 - Tried to improve with in-page view, facets, in-context results, tiling, rapid zoom
 
Introduce framework
- scalable, flexible
 - plug in metadata and full text (when possible)
 - use ElasticSearch
 - series interactions to provide access to metadata and objects
 - also service provider: OAI-PMH, Open Data API, Open Apps
 - rely on external services to leverage functionality e.g. RefWorks, Disqus (social commenting)
 
Calvin Mah / Todd Holbrook – SFU Library Hours Database
- as part of the API
 - might be something they can host, but lack of enthusiasm
 - maybe could work with coop to host the tool
 - or host yourself at your institution
 - straight from UBC, but re-coded it
 - kept data model and end user look
 - can do usual hours, but can add exceptions using date range
 - feeds to API system, available as JSON data
 - Drupal widgets to drop almost anywhere
 
That’s it for today. Breakout time! When the time comes, have a safe trip home
