Code4LibBC Day 2: Lightning Talks Part 2

The last of the ligtning talks at Code4LibBC 2014.

## Identifying performance bottlenecks and submitting improvements for Archivematica

Misty De Meo

Slides
FITS (File ID, Metadata extraction)
one of the big 3 bottlenecks in Archivematica
used VisualVM, profiling tool for Java applications, which method call and time
5 main bottlenecks: JVM lag, DROIDR, MD5 checksum, JHOVE, XSLT
JVM lag: time to start up a fresh Java VM, happens every time you run “fits.sh”, depends on computer, but betwee 0.5s-10s
9.4 hours wasted with one example data set
solution: nailgun, JVM server, built to run Java applications, can repeatedly run commands without reloading a JVM
contributed to nailgun server startup script to FITS 0.8.0, starting in Archivematica in 4.3
DROID: main file ID tool, had to parse XML every time
switched to using FIDO, written in Python, call tool on command line and on every file
MD5 checksum: FITS always calculated MD5 for every file, 10%+ for large files, but Archivematica never used it.
submitted change to make it configurable, which is included in FITS 0.8.0
FITS uses XSLT compiled on startup, but starting up in VM meant it was compiling stylesheets every time, but only needs to happen once, Nailgun fixes this too

Levering linked data tools for traditional catalogues (and traditional cataloguers)

Galen Charlton

Slide
providing leverage for linked data techniques, integrating into traditional tools cataloguers are using, keywords to insert extra code
linked data: why do we have to care about traditional MARC records? no magic overnight conversion
standard processes are slow; system migrations are not trivial
borrow idea of continuous improvement
authority records and identifiers are a gateway to linked data
MARC21 will have subfield 0 to have
many authority files, can determine links (e.g. loc) based on ID in leader
VIAF project by OCLC lets us bring in international sources
bridging the GAP, can do this with a bit of JavaScript. Figure out the VIAG identifier (via datadumps), bring the data in (e.g. RDF)
can load alternate forms of author name, update OPAC displays, etc.
demo in Koha: when adding new authority, pulling autosuggest via VIAF data including
MARCedit Libhub for Bibframe plugin
don’t have to wait to make use of traditional catalogues

Alex Garnett

Sarah Sutherland

Colleen Bell

no coding classes at library school, so proposed a PHP class
only 14 hrs, but get them started
weekly reflections has helped in showing students how much they really have learned

Stefan Khan-Kernahan

Tech: DSpace, ElasticSearch, PostgreSQL
Output: website, linked data via API
Lessons: Elasticsearch has no authentication so need to figure out security, PostgreSQL, ContentDM makes it very hard to get content out of, DSpace is easy, use grunt to run unit tests, Symfony as standalone components

That’s the end of all the lightning talks. Keep warm when going out for lunch.