Code4libBC Day 2: Lightning Talks Part 2

The last part of lightning talks for Code4libBC.

Speeding up Digital Preservation with a Graphics Card, Alex Garnett, SFU

  • GPU-accelerated computing. graphics cards are very powerful nowadays, and many organizations have figured out how to use the graphics cards to do things.
  • graphics are much more powerful to CPU, but very specialized for video or similar workloads
  • applied for a grant to NVidia and got a TitanX video card
  • looked at different projects such as FFmpeg, encoding into archive friendly formats
  • only difference in workflow to get work from CPU to GPU to use a difference encoder
  • used benchmark, gpu x10 faster than cpu and saving cpu time
  • only problem is that most software such as archivematica usually run in VM and don’t have GPU acceleration/access
  • Virtualbox does not support, but amazon does support
  • could have video file run somewhere else with using a different command
  • a lot of aborted efforts to bring to gpu acceleration
  • looking into tesseract OCR library

Scripting Named Entity Recognition (NER) to pluck names, organizations and locations from text, Peter Tyrrell, Andornot

  • discovery interface backed by Solr index service
  • a lot of massaging of metadata happens, which happens all by shell script workflow including OCR, media formats
  • when you have unstructured text (e.g. PDF, docs, text), what do you do? because there’s only very barebone metadata e.g. title, author, keywords
  • wanted to pluck out names (people, orgs) to create new access points
  • Stanford Natural Language Processing, NER based on Java
  • NER file inputs file and entities are recognized; second file output files by entity category
  • demo ensued

PCDM: A Data Model and a Community Model, Justin Simpson

  • Portland Data Data Model, originally came from Hydra, but generalized for any Fedora use
  • compared to Dublin Core
  • DC started ~20 years
  • in Hydra community, found data models were not compatible
  • UCSD proposed a model to the Hydra community, which evolved and with collaboration from the community into the Fedora Community Data Model, which turned into PCDM hosted in duraspace
  • UCSD focused on properties, but with linked data in mind and a model that would work with others in the hydra community
  • hydra technical metadata application profile modelled after Europeana and DPLA map
  • Islandora parallel work with PCDM
  • ontology now exists as RDF schema
  • had a hard problem but well defined scope. All wanted to solve problem and had some experience. developed a shared understanding and out in the open.
  • good example of collaboration between what some might see a competing (i.e. Hydra vs. Islandora)
  • small set allowing different entities to fit the model

Built to grow: scalability factors to consider before commencing your next digital library software project, Marcus Barnes, SFU

  • hard to predict how much scale is needed
  • but what can you do and consider?
  • scalability is ability to handle increased workload without adding resources to a system; handle increased workflow by repeatedly applying cost-effective strategy for extending a system’s capacity.
  • starting points: modern programming techniques, best hardware infrastructure possible, modularize, ongoing monitoring, run scalability audit
  • optimize code/hardware, distribute key components to dedicated hardware as needed
  • share knowledge, useful resources, real-world experience
  • scalability audit specifically for library systems software?
  • technical scalability and organization scalability


Might even have time to catch a quick nap before breakouts this afternoon.
husky pup hugging a penguin plushie