Code4Lib 2014: Day 3 Morning Presentations

Presentations for Day 3 of Code4Lib 2014.

Under the Hood of Hadoop Processing at OCLC Research – Roy Tennant

  • previously using MapReduce
  • hardware with lots of processing nodes with several copies of Worldcat
  • Java Native, but can use any language you want if you use the “streaming” option best kept as shell script
  • mappers and reducers dont even need to be in the same language
  • HDFS (Hadoop distributed file system) takes care of data across clusters
  • HBase for random access to data elements
  • wrote web based application, “HBase Explorer”
  • JobTracker to keep rack of running jobs and cluster summary for MapProcess

Lucene’s Latest (for Libraries) – Erik Hatcher

  • evolves rapidly
  • Solr 4 changes Lucene library
  • suggester
  • configuration through REST API
  • SolrCloud
  • Querying
    Sorry for the sparse notes, but I don’t really know Lucene/Solr.

All Tiled Up – Mike Graves

  • skip over the digitizing maps part
  • can’t put sphere on plane without getting distortion
  • are ways to fix, but can do projections e.g. Spherical Mercator (Web Mercator)
  • map gets divided up in tiles
  • each tile in the grid can be identified by zoom level, column, row
  • Web GIS Stack: Geodatabase (e.g. Oracle Spacial), OGC Endpoint, Tile Cache (or Geo web cache)
  • takes a lot of effort to set up and maintain
  • can be overkill if just trying to put maps up on the web
  • MBTiles – each map in own SQLite database, table for metadata (name, value) and table for tiles (zoom, column, row, data)
  • TileMill – node library that does image rendering. Super fast at rendering tiles. Export from MBTiles and put into TileMill
  • all large libraries include ability to put online by passing template URL e.g. Leaflet
  • could do it out of file system, but can store metadata with tiles, workflow (tiling on desktop)
  • downsides: pre-tiling is very curator use case, very specific presentation format
  • load georectified maps into TileMill, export to MBTile, make your tiles accessible, point web mapping to library
  • won’t work with vector data
  • lots of data might cause issues
  • cache and uploading, so changing data is difficult (but maps shouldn’t change much)
  • we want your data!

Safe Trip Everyone!

bunny with backpack
That’s it for this year. See you (hopefully) in Portland.