The last part of lightning talks for Code4libBC.
Speeding up Digital Preservation with a Graphics Card, Alex Garnett, SFU
- GPU-accelerated computing. graphics cards are very powerful nowadays, and many organizations have figured out how to use the graphics cards to do things.
 - graphics are much more powerful to CPU, but very specialized for video or similar workloads
 - applied for a grant to NVidia and got a TitanX video card
 - looked at different projects such as FFmpeg, encoding into archive friendly formats
 - only difference in workflow to get work from CPU to GPU to use a difference encoder
 - used benchmark, gpu x10 faster than cpu and saving cpu time
 - only problem is that most software such as archivematica usually run in VM and don’t have GPU acceleration/access
 - Virtualbox does not support, but amazon does support
 - could have video file run somewhere else with using a different command
 - a lot of aborted efforts to bring to gpu acceleration
 - looking into tesseract OCR library
 
Scripting Named Entity Recognition (NER) to pluck names, organizations and locations from text, Peter Tyrrell, Andornot
- discovery interface backed by Solr index service
 - a lot of massaging of metadata happens, which happens all by shell script workflow including OCR, media formats
 - when you have unstructured text (e.g. PDF, docs, text), what do you do? because there’s only very barebone metadata e.g. title, author, keywords
 - wanted to pluck out names (people, orgs) to create new access points
 - Stanford Natural Language Processing, NER based on Java
 - NER file inputs file and entities are recognized; second file output files by entity category
 - demo ensued
 
PCDM: A Data Model and a Community Model, Justin Simpson
- Portland Data Data Model, originally came from Hydra, but generalized for any Fedora use
 - compared to Dublin Core
 - DC started ~20 years
 - in Hydra community, found data models were not compatible
 - UCSD proposed a model to the Hydra community, which evolved and with collaboration from the community into the Fedora Community Data Model, which turned into PCDM hosted in duraspace
 - UCSD focused on properties, but with linked data in mind and a model that would work with others in the hydra community
 - hydra technical metadata application profile modelled after Europeana and DPLA map
 - Islandora parallel work with PCDM
 - ontology now exists as RDF schema
 - had a hard problem but well defined scope. All wanted to solve problem and had some experience. developed a shared understanding and out in the open.
 - good example of collaboration between what some might see a competing (i.e. Hydra vs. Islandora)
 - small set allowing different entities to fit the model
 
Built to grow: scalability factors to consider before commencing your next digital library software project, Marcus Barnes, SFU
- hard to predict how much scale is needed
 - but what can you do and consider?
 - scalability is ability to handle increased workload without adding resources to a system; handle increased workflow by repeatedly applying cost-effective strategy for extending a system’s capacity.
 - starting points: modern programming techniques, best hardware infrastructure possible, modularize, ongoing monitoring, run scalability audit
 - optimize code/hardware, distribute key components to dedicated hardware as needed
 - share knowledge, useful resources, real-world experience
 - scalability audit specifically for library systems software?
 - technical scalability and organization scalability
 
Lunch
Might even have time to catch a quick nap before breakouts this afternoon.
