Presentations for Day 3 of Code4Lib 2014.
Under the Hood of Hadoop Processing at OCLC Research – Roy Tennant
- previously using MapReduce
- hardware with lots of processing nodes with several copies of Worldcat
- Java Native, but can use any language you want if you use the “streaming” option best kept as shell script
- mappers and reducers dont even need to be in the same language
- HDFS (Hadoop distributed file system) takes care of data across clusters
- HBase for random access to data elements
- wrote web based application, “HBase Explorer”
- JobTracker to keep rack of running jobs and cluster summary for MapProcess
Lucene’s Latest (for Libraries) – Erik Hatcher
- evolves rapidly
- Solr 4 changes Lucene library
- suggester
- configuration through REST API
- SolrCloud
- Querying
Sorry for the sparse notes, but I don’t really know Lucene/Solr.
All Tiled Up – Mike Graves
- skip over the digitizing maps part
- can’t put sphere on plane without getting distortion
- are ways to fix, but can do projections e.g. Spherical Mercator (Web Mercator)
- map gets divided up in tiles
- each tile in the grid can be identified by zoom level, column, row
- Web GIS Stack: Geodatabase (e.g. Oracle Spacial), OGC Endpoint, Tile Cache (or Geo web cache)
- takes a lot of effort to set up and maintain
- can be overkill if just trying to put maps up on the web
- MBTiles – each map in own SQLite database, table for metadata (name, value) and table for tiles (zoom, column, row, data)
- TileMill – node library that does image rendering. Super fast at rendering tiles. Export from MBTiles and put into TileMill
- all large libraries include ability to put online by passing template URL e.g. Leaflet
- could do it out of file system, but can store metadata with tiles, workflow (tiling on desktop)
- downsides: pre-tiling is very curator use case, very specific presentation format
- load georectified maps into TileMill, export to MBTile, make your tiles accessible, point web mapping to library
- won’t work with vector data
- lots of data might cause issues
- cache and uploading, so changing data is difficult (but maps shouldn’t change much)
- we want your data!