We got an update of the work going on at SD especially around automation. Continue reading “COSUGI 2017: Technical Update”
Tag: automation
Semi-Automating Batch Editing MARC Records : Using MarcEdit
This presentation was a lightning talk done at Code4LibBC Unconference 2015 on batch editing MARC records. Continue reading “Semi-Automating Batch Editing MARC Records : Using MarcEdit”
CascadiaFest: Server JS Morning Part 2 Notes
The second half of the morning should prove just as brain charging for CascadiaJS Server Day. Continue reading “CascadiaFest: Server JS Morning Part 2 Notes”
Batch Appending a Single PDF to multiple PDFs
So recently, I came up to the problem of having to add a page at the end of multiple PDFs. Continue reading “Batch Appending a Single PDF to multiple PDFs”
Code4Lib 2014: Day 3 Lightning Talks
Lightning talks on Day 3 of Code4Lib. Continue reading “Code4Lib 2014: Day 3 Lightning Talks”
code4lib Cool Tool Day
So inspired by the ASIS&T Cool Tool Day, I thought it’d be neat to do one of these since there weren’t many volunteers to do lightning talks/presentations at the code4lib Toronto meetup this time around. Our attendance was a little… paltry, but we had some great presentations! Here are my notes from the session.
Presented by @waharnum
- working with REST based web services
- testing automation tool for web services
- best for building with other API
- autogenerate stubs using WSDL
- interface between internal systems
- good for documenting web services, code style with examples
- normally, mostly used for unit testing
- virtual card based whiteboard
- flexible for planning based
- collaborative
- great usability/UI
- even has mobile apps
- maintaining HTML email templates
- also works as a crazy text editor for nerds
XSL Transforms plugin in Firefox
- local reporting
- anything XSLT with just a few security restrictions
- e.g. SVN reporting
Presented by @adr
- cross platform presentation
- push from laptop to another computer
Sidenote: Other Presentation Tools
- deck.js
- Impress.js (like Prezi)
Presented by @ruebot
VIM Plugins
- pathogen – linking for VIM plugins to automatically load VIM plugins
- nerdtree – pull files quickly by displaying directory/tree
Presented by Pomax
Thimble HTML/CSS Live Web editor
- teach anyone (kids, adults) HTML and CSS
- use existing projects to make it fun!
- easy inline flickr search of CC images
- attribution in alt text
Presented by me
- monitor hue changer, supposedly to help people sleep better by telling your body what time of day it is
That’s it! Hope to do another one of these or lightning talks next time.
Code4lib Day 3: Lightning Talks
David Uspal – Project Grab Bag
Interactive Map
- Javascript baed (for accessibility)
- Data stored in JSON file
- SVG graphic
- Uses the Raphael.js library – just use HTML5 instead
- Search by: ocation, person, call number
- To do:
- decouple from CMS (Concrete 5)
- SVG path generation as a web application
- add more configurable options (colors, etc.)
Tap Tour
- started at the Indianapolis Museum of Art
- easy to create a mobile tour application
- currently iPhone/iPod, plans to expand
- Drupal CMS back-end (new version released 1/25/2012)
Robert Haschart – Adding Publicly-Accessible Hathi Trust Items to Your Solr-based Discovery System
- Assumptions:
- Solr-based index
- SolrMarc used for indexing
- only want publicly-accessible items
- MARC record based with one Solr record per title
- list of Hathi-items and download
- tweak SolrMarc index specification
- add all Hathi records to your index, and adjust interface code to display records correctly
- download daily updates, merge updates
- Code not yet available
Jeremy Nelson – Aristotle a Django based Discovery Layer
- See it in Action
- originally forked from Kochief
- refactored to use Sunburnt for Solr interactions
- developed custom authentication middleware with Millenium
- did web redesign
- Code on Github
Dennis Schafroth – Turbo MARC in YAZ Library
- Problem: XSL transformation on MARC XML is slow
- Rule: combined the element with tag/code value when value is allowed
- Pazpar2 became twice as fast
- a lot faster, but not official standard
Yuka Egusa, Masao Takaku – Recovery of Minamisanriku Town Library from Tsunami Disaster
- implemented technical support for a library system – thanks to OSS and cloud service
- Amazon’s wish list for books needed from supporters
- library can announce library service and daily activities on Facebook
- Next-L Enju OSS search system
Ed Summers – jobs.code4lib.org
- Jobs are posted
- Tags allow to see all the jobs with that tag
- OpenID log in
- pushes to twitter @code4lib
- pushes to mailing list
- Code on Github
Christopher Spalding – Search in a Blender
- works for ExLibris
- collect results and sort
- works in VuFind and Solr
Erik Hetzner – Strategy for c4l voting
- majoritarian: top-rated talks are chosen
- no representation for small parties
- each voter gets unlimited votes, 0-3 points
- Plurality-at-large
- 1 vote total
- Cumulative voting
- number of votes up to talks, but can allow multiple votes
- Hacking
- the way done now, reduces to plurality at large
- Fix
- limit points users can assign
- and/or only users to give one vote to teach talk
- or adopt a proportional representation system
- Inspire by Numbers Rule: The Vexing Mathematics of Democracy
Lightning Talks That Didn’t Happen
- Hillel Arnold – Occupy Wall Street Documentation
- Jason Clark – BookMeUp (Book Suggestions App)
- Jason Ronallo – Digital Collections, Crawling, and Aggregating Content
Code4lib Day 1: Lightning Talks Notes
Al Cornish – XTF in 300 seconds (Slides in PDF)
- technology developed and maintained by California Digital Library
- supports the search/display of digital collections (images, PDFs, etc)
- fully open source platform, based on Apache Lucene search toolkit
- Java framework, runs in Tomcat or Jetty servlet engine
- extensive customization possible through XSLT programming
- user and developer group communication through Google Groups
- search interface running on Solr with facets
- can output in RSS
- has a debug mode
Makoto Okamoto – saveMLAK (English)
- Aid activities for the Great East Japan Earthquake through collaboration via wiki
- input from museum, library, archive, kominkan = MLAK
- 20,000 data of damaged area
- Information about places, damages, and relief support
- Key Lessons
- build synergy with twitter
- have offline meet ups & training
Andrew Nagy – Vendors Suck
- vendors aren’t really that bad
- used to think vendors suck, and that they don’t know how to solve libraries’ problems
- but working for a vendor allows to make a greater impact on higher education, more so than from one university (he started to work for SerialsSolution)
- libraries’ problems aren’t really that unique
- together with the vendor, a difference can be made
- call your vendors and talk to the product managers
- if they blow you off, you’ve selected the wrong vendor
- sometimes vendor solutions can provide a better fit
Andreas Orphanides – Heat maps
The library needed grad students to teach instructional sessions, but how to set schedule when classes have a very inflexible schedule? So, he used the data of 2 semesters of instructional sessions using date and start time, but there were inconsistent start times and duration. The question is how best to visualize the data.
- heatmap package from clickheat
- time of day – x-dimension
- day of the week – y-dimension
- could see patterns in way that you can’t in histogram or bar graph
- heat map needn’t be spatial
- heat maps can compare histogram-like data along a single dimension or scatter-like plot data to look for high density areas
Gabriel Farrell – ElasticSearch
- similar to Solr
- goes across servers
- e.g. Free103Point9
Nettie Lagace from NISO
- National Information Standards Organization (NISO)
- work internationally
- want to know: What environment or conditions are needed to identify and solve the problem of interoperability problems?
Eric Larson – Finding images in book page images
A lot of free books exist out there, but you can’t have the time to read them all. What if you just wanted to look at the images? Because a lot of books have great images.
He used curl to pull all those images out, then use imagemagick to manage the images. The processing steps:
- Convert to greyscale
- Contrast boost x8
- Covert image to 1px by height
- Sharpen image
- Heavy-handed grayscaling
- Convert to text
- Look for long continuous line of black to pull pages with images
Code is on github
Adam Wead – Blacklight at the Rock Hall
- went live, soft launch about a month ago
- broken down to the item level
- find bugs he doesn’t know about for a beer!
Kelley McGrath – Finding Movies with FRBR & Facets
- users are looking for movies, either particular movie or genre/topic
- libraries describe publications e.g. date by DVD, not by movie
- users care about versions e.g. Blu-Ray, language
- Try the prototyped catalog
- Hit list provides one result per movie, can filter by different facets
Bohyun Kim – Web Usability in terms of words
- don’t over rely on the context
- but context is still necessary for understanding e.g. “mobile” – means on the go, what they want on the go
- sometimes there is no better term e.g. “Interlibrary Loan”
- brevity will cost you “tour” vs. “online tour”
- Time ran out, but check out the rest of the slides
Simon Spero – Restriction Classes, Bitches
OWL:
- lets you define properties
- control what the property can apply to
- control the values the property can take
- provides an easy way to do this
- provides a really confusing way to do this
The easy way is usually wrong!
When defining what can apply to and the range, this applies to every use of the property. An alternative is Attempto.
Cynthia Ng – Processing & ProcessingJS
- Processing: open source visual programming language
- Processing.js: related project to make processing available through web browsers without plugins
- While both tend to focus on data visualizations, digital art, and (in the case of PJS) games, there are educational oriented applications.
- Examples:
- Kanji Compositing – allows visual breakdown of Japanese kanji characters, interact with parts, and see children.
- Primer on Bezier Curves – scroll down to see interactive (i.e. if you move points, replots on the fly) and animated graphs.
- Obvious use might be instructional materials, but how might we apply it in this context? What other applications might we think of in the information organization world?
Since doing the presentation, I have already gotten one response by Dan Chudnov who did a quick re-rendering of newspaper data from OCR data. Still thinking on (best) use in libraries and other information organizations.
It’s over for today, but if you’d like more, do remember that there is a livestream and you can follow on twitter, #c4l12 or IRC.
PDF Batch Automation (PDF to Image and PDF Merge)
EDIT: I’ve been reminded/informed that this only works in Windows (or MS-DOS anyway) since it uses .bat files. The suggestion if you’re using other OSs is to use php (but really you can use anything) to automate the command.
I’m sure everyone is familiar with Adobe Acrobat (even if they haven’t actually used it). It’s a nice GUI if you want to edit PDFs, but at least as far as I know, it does not do any batch or automation work. For a digital images project, there’s a lot of automation work that needs to be done and for image to image conversion, I was using Photoshop, but then I started dealing with PDFs. Thus, it was only natural to turn to GhostScript.
PDF to Image
So, I don’t really get any credit for this, because it’s already out there and the variables are well explained. So if you want to turn all the pages of your PDF into images, check out this Danzels Internets post. My case was a little different because I only wanted the first page turned into an image as a thumbnail for an entire file and then for an entire folder. I also prefer to do any image modification (even batch) in an image program.
@echo off
FOR %%Z IN (*.pdf) DO gswin32 -sDEVICE=jpeg -dJPEGQ=95 -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -dDOINTERPOLATE -dFirstPage=1 -dLastPage=1 -sOUTPUTFILE=%%Z.jpg -dSAFER -dBATCH -dNOPAUSE %%Z
So, here the major changes are “gswin32” because I use the Windows version, and the “-dFirstPage=1 -dLastPage=1” so that the first and last page it processes is page 1. You can change the output file name too, so I changed it in such a way that it takes the original file name and adds the .jpg extension.
PDF Merge
This is kind of a side note, because I didn’t need this for my project, but I recently downloaded some articles that for some reason had each section in a separate PDF. So, I get no credit for this one either as I got this one from Real’s How-to on Merging PDFs. I put this in here only for possible improvements of what’s presented on that site.
For the merging of PDFs in a directory, for the [merge.bat], you’re supposed to have this code:
@echo off
gswin32 -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=merged.pdf -dBATCH 1.pdf
FOR %%Z IN (*.pdf) DO IF NOT %%Z==1.pdf IF NOT %%Z==merged.pdf IF NOT %%Z==merged2.pdf call merge2.bat %%Z
Maybe it’s clear to other people, but the “1.pdf” is the name of the first pdf. I found that the subsequent ones will be added in alphanumeric order. Also, if you happen not to change the code, it will throw an error and insert a blank page at the beginning.