Second and final day of Code4Lib BC’s lightning talks. Here are my notes.
Mark Jordan – DOCR/SMD
- OCR clients – phones, tablets that do the work or just a plain script
- page server – controls the work
Components
- page server
- PHP web app
- PHP queue manager script
- SQLite database
- OCR clients
- clients use Tesseract OCR engine (available for Android and iOS)
- first client is a simple Python script
how it works
- client: “im ready , give me an image”
- server: here you go
- client: OK here’s your text
What I Learned so far
- REST
- Slim microframework
- SQLite
What I want to Learn
- Android app development
- potential for generalizing OCR to other tasks
Peter Tyrrell – Parsing PDFs
- wanted to do search term highlight in PDFs
- using NYT Document Viewer (under document cloud)
- PDF2DJVU
- DVJULIBRE (DJVU to TXT, to XML, to TIF)
- Imagemagick (TIF to JPG)
- and then a whole bunch of other things to store, process, pass into viewer
John Durno – Uploading to Internet Archive via API
- most ambitious digitization project: first 50 years of the British Colonist for the 150th anniversary
- ~100k images in PDF
- Acrobat would highlight for you if configured properly
- revised site as part of the next stage to digitize next 100 years
- Internet Archive: started digitizing newspapers from microfilm
- collection setup within the IA
- dumping content into IA can do it through API
- can upload metadata with it
- based on Amazon S3 API, a lot of tools already work with it
- boto library with ia-wrapper
- python script to upload
- new site is a wrapper around the IA for search purposes
- reading happens on IA’s site
Colleen Bell – ERM & LibGuides
- use identifier for subject database lists to export JSON
- use PHP script to process
- add as remote script in libguides
- can also do it for individual resources by ID
- script pulls based on IDs separated by comma
- don’t need libguides, can import into any page
- can do this as long as you can get your data
James MacGregor – Article Metrics with OJS/OMP
- PKP: scholarly publishing intitiative at SFU
- Open Journal System: WordPress for journals
- Open Monograph Press: OJS for presses
- all software is open source
- new: general overhaul statics framework and compatible with addition of PLOS metrics
- had a stats framework overhaul: gathered centrally, added features
- PKP ALM: application Ruby on Rails web app to aggregate article performance data, and plugin
- shows HTML views, PDF downloads, facebook/mendeley shares, pubmed, and more
Jonathan Schatz – The Story of BC Libraries’ IT Environments
- did field work going around to assess IT environment in BC Public Libraries within the Sitka group
- had about 1 day per library
- covered 3 federated libraries
- connectivity and network main priorities
- phones, internet, network, workstations/servers, printing, technical support, training
- had to map wifi with laptop and umbrella
- they do a lot with very little resources
- sometimes get creative e.g. creating internet routers
Paul Joseph – UBC Digital Library Framework
Status Quo
- 3 repositories: ContentDM, DSpace (IR), AtoM (for Rare books & special collections)
- in addition: separate, collections e.g. Drupal, ElasticSearch
- access and presentation in silos, defined by the applications
- Tried to improve with in-page view, facets, in-context results, tiling, rapid zoom
Introduce framework
- scalable, flexible
- plug in metadata and full text (when possible)
- use ElasticSearch
- series interactions to provide access to metadata and objects
- also service provider: OAI-PMH, Open Data API, Open Apps
- rely on external services to leverage functionality e.g. RefWorks, Disqus (social commenting)
Calvin Mah / Todd Holbrook – SFU Library Hours Database
- as part of the API
- might be something they can host, but lack of enthusiasm
- maybe could work with coop to host the tool
- or host yourself at your institution
- straight from UBC, but re-coded it
- kept data model and end user look
- can do usual hours, but can add exceptions using date range
- feeds to API system, available as JSON data
- Drupal widgets to drop almost anywhere
That’s it for today. Breakout time! When the time comes, have a safe trip home