The last of the ligtning talks at Code4LibBC 2014.
## Identifying performance bottlenecks and submitting improvements for Archivematica
Misty De Meo
- Slides
- FITS (File ID, Metadata extraction)
- one of the big 3 bottlenecks in Archivematica
- used VisualVM, profiling tool for Java applications, which method call and time
- 5 main bottlenecks: JVM lag, DROIDR, MD5 checksum, JHOVE, XSLT
- JVM lag: time to start up a fresh Java VM, happens every time you run “fits.sh”, depends on computer, but betwee 0.5s-10s
- 9.4 hours wasted with one example data set
- solution: nailgun, JVM server, built to run Java applications, can repeatedly run commands without reloading a JVM
- contributed to nailgun server startup script to FITS 0.8.0, starting in Archivematica in 4.3
- DROID: main file ID tool, had to parse XML every time
- switched to using FIDO, written in Python, call tool on command line and on every file
- MD5 checksum: FITS always calculated MD5 for every file, 10%+ for large files, but Archivematica never used it.
- submitted change to make it configurable, which is included in FITS 0.8.0
- FITS uses XSLT compiled on startup, but starting up in VM meant it was compiling stylesheets every time, but only needs to happen once, Nailgun fixes this too
Levering linked data tools for traditional catalogues (and traditional cataloguers)
Galen Charlton
- Slide
- providing leverage for linked data techniques, integrating into traditional tools cataloguers are using, keywords to insert extra code
- linked data: why do we have to care about traditional MARC records? no magic overnight conversion
- standard processes are slow; system migrations are not trivial
- borrow idea of continuous improvement
- authority records and identifiers are a gateway to linked data
- MARC21 will have subfield 0 to have
- many authority files, can determine links (e.g. loc) based on ID in leader
- VIAF project by OCLC lets us bring in international sources
- bridging the GAP, can do this with a bit of JavaScript. Figure out the VIAG identifier (via datadumps), bring the data in (e.g. RDF)
- can load alternate forms of author name, update OPAC displays, etc.
- demo in Koha: when adding new authority, pulling autosuggest via VIAF data including
- MARCedit Libhub for Bibframe plugin
- don’t have to wait to make use of traditional catalogues
Little bins in big workflows: Using small programs to automate tasks & solve problems
Alex Garnett
- unix philosophy: inspiration for linux
- program should do one thing well
- notes from Access 2014 version
CanLII Connects
Sarah Sutherland
- building CanLII connects, website to
- biggest problem is building the community
Coding in Libraries
Colleen Bell
- no coding classes at library school, so proposed a PHP class
- only 14 hrs, but get them started
- weekly reflections has helped in showing students how much they really have learned
UBC Library Open(ish) Collections
Stefan Khan-Kernahan
- Tech: DSpace, ElasticSearch, PostgreSQL
- Output: website, linked data via API
- Lessons: Elasticsearch has no authentication so need to figure out security, PostgreSQL, ContentDM makes it very hard to get content out of, DSpace is easy, use grunt to run unit tests, Symfony as standalone components
That’s It
That’s the end of all the lightning talks. Keep warm when going out for lunch.