Afternoon of Day 1 of Code4lib 2014.
Tag: discovery layer
OCUL URM Summit – Notes
Today was the OCUL URM Summit at UofT. These are a bit more sparse than usual. The survey results in particular were done quickly so I only included the higher numbers, not all of them. Continue reading “OCUL URM Summit – Notes”
code4libTO December Meetup Talks
BagIt Profiles – @ruebot
- directory of data
- bag has what you’re bagging, data, contact email/name, organization information, profile identifier (JSON via a URI)
- pull in all the field values
- validate
- wrote a spec and send it to digital curation community
- can look up profiles in the registry
Okay, I got a little lost, but you can see more on github.
Internet Archive Torrent Collections (iaTorrent) – @ruebot
- see demo
Bookfinder – @TheRealArty & Steven
- I will write this up later probably as a separate blog post, or maybe journal article
TPL’s Web Services Architecture: Understanding the Big Picture – @waharnum
- many different systems that don’t easily communicate, which needs specialized knowledge even to do basic tasks
- address the challenges by translation, simplication, standardization
- Three tiers: Front End Systems (requests to back end) / TPL Web Services (REST) / Back End Systems (responds to front end)
- Example: TPL Website -> Account Web Services -> Symphony Web Services (Symphony) – and back
- can add new features and functions
- helps to solve the challenges mentioned
- also helps with reusability e.g. in addition to website, build mobile-friendly website, iPhone App
- Might end up with:
- Front End (Website, mobile, App)
- Middle Tier (Account Web Services, ebook Web Services, online payment web services)
- Back End (symphony, overdrive, payment gateway, accounting systems)
- other benefits:
- increase ease of knowledge transfer about how our systems work
- follow modern best practice approach to building interoperating systems
- reduce cost and integration time
- reduce learning time for new staff or consultants
- metrics: wish had resources
- bolting together a lot of things, not using a lot of custom code
Ladder (aka MyTPL 2) – @mjsuhonos
- wanted to solve problem: discovery layers suck
- problems:
- not scalable
- inflexible
- read-only
- expensive
- goals:
- better than open source options (VuFind, Blacklight)
- cheaper (than proprietary)
- scalable as WorldCat
- design:
- schema-free/multi-schema (e.g. Dublin Core)
- horizontally scalable (multi-node)
- modern OSS components
- simple data model (RDF)
- Features:
- hierarchical relations
- clustering/de-duplication
- versioning
- real-time import & indexing
- multi-thread/process
- responsive UI
- fully multilingual (18/10)
- dynamic faceting
- dynamic mapping modification
- digital content storage (coming soon)
- built on a linked data
- not a discovery layer; it’s an integration platform
Heritage U of T – @ajmcalorum
- News Announcement and Promotional Video
- previously not centralized: hard drives, flickr, etc.
- need central repository for tri-campus initiative with search & discovery, preservation, long-term access to content and metadata, support for multiple formats (e.g. images, books, documents, video, exhibits)
- Drupal + Solr (search) + Fedora Commons (collection management, batch ingesting, metadata crosswalk, digital preservation) == islandora (digital asset management system)
- pilot: 8 parent collections (by format, by campus)
- exhibits in Drupal, not through islandora/fedora commons
- modules: internet archive book reader (OCR on the fly), galleria, colorbox
- official launch: 2 weeks ago
That’s it! Food and drinks time!
Code4lib Day 3: Lightning Talks
David Uspal – Project Grab Bag
Interactive Map
- Javascript baed (for accessibility)
- Data stored in JSON file
- SVG graphic
- Uses the Raphael.js library – just use HTML5 instead
- Search by: ocation, person, call number
- To do:
- decouple from CMS (Concrete 5)
- SVG path generation as a web application
- add more configurable options (colors, etc.)
Tap Tour
- started at the Indianapolis Museum of Art
- easy to create a mobile tour application
- currently iPhone/iPod, plans to expand
- Drupal CMS back-end (new version released 1/25/2012)
Robert Haschart – Adding Publicly-Accessible Hathi Trust Items to Your Solr-based Discovery System
- Assumptions:
- Solr-based index
- SolrMarc used for indexing
- only want publicly-accessible items
- MARC record based with one Solr record per title
- list of Hathi-items and download
- tweak SolrMarc index specification
- add all Hathi records to your index, and adjust interface code to display records correctly
- download daily updates, merge updates
- Code not yet available
Jeremy Nelson – Aristotle a Django based Discovery Layer
- See it in Action
- originally forked from Kochief
- refactored to use Sunburnt for Solr interactions
- developed custom authentication middleware with Millenium
- did web redesign
- Code on Github
Dennis Schafroth – Turbo MARC in YAZ Library
- Problem: XSL transformation on MARC XML is slow
- Rule: combined the element with tag/code value when value is allowed
- Pazpar2 became twice as fast
- a lot faster, but not official standard
Yuka Egusa, Masao Takaku – Recovery of Minamisanriku Town Library from Tsunami Disaster
- implemented technical support for a library system – thanks to OSS and cloud service
- Amazon’s wish list for books needed from supporters
- library can announce library service and daily activities on Facebook
- Next-L Enju OSS search system
Ed Summers – jobs.code4lib.org
- Jobs are posted
- Tags allow to see all the jobs with that tag
- OpenID log in
- pushes to twitter @code4lib
- pushes to mailing list
- Code on Github
Christopher Spalding – Search in a Blender
- works for ExLibris
- collect results and sort
- works in VuFind and Solr
Erik Hetzner – Strategy for c4l voting
- majoritarian: top-rated talks are chosen
- no representation for small parties
- each voter gets unlimited votes, 0-3 points
- Plurality-at-large
- 1 vote total
- Cumulative voting
- number of votes up to talks, but can allow multiple votes
- Hacking
- the way done now, reduces to plurality at large
- Fix
- limit points users can assign
- and/or only users to give one vote to teach talk
- or adopt a proportional representation system
- Inspire by Numbers Rule: The Vexing Mathematics of Democracy
Lightning Talks That Didn’t Happen
- Hillel Arnold – Occupy Wall Street Documentation
- Jason Clark – BookMeUp (Book Suggestions App)
- Jason Ronallo – Digital Collections, Crawling, and Aggregating Content
Code4lib Day 2: How People Search the Library from a Single Search Box
by Cory Lown, North Carolina State University
While there is only one search box, typically there are multiple tabs, which is especially true of academic libraries.
- 73% of searches from the home page start from the default tab
- which was actually opposite of usability tests
Home grown federated search includes:
- catalog
- articles
- journals
- databases
- best bets (60 hand crafted links based on most frequent queries e.g. Web of Science)
- spelling suggestions
- loaded links
- FAQs
- smart subjects
Show top 3-4 results with link to full interface.
Search Stats
From Fall 2010 and Spring 2011, ~739k searches 655k click-throughs
By section:
- 7.8% best bets (sounds very little, but actually a lot for 60 links)
- 41.5% articles, 35.2% books and media, 5.5% journals, ~10% everything else
- 23% looking for other things, e.g. library website
- for articles: 70% first 3 results, other 30% see all results
- trends of catalogue use is fairly stable, but articles peaks at the end of term
How to you make use of these results?
Top search terms are fairly stable over time. You can make the top queries work well for people (~37k) by using the best bets.
Single/default search signals that our search tools will just work.
It’s important to consider what the default search box doesn’t do, and doubly important to rescue people when they hit that point.
Dynamic results drive traffic. When putting few actual results, the use of the catalogue for books went up a lot compared to suggesting to use the catalogue.
Collecting Data
Custom log is being used right now by tracking searches (timestamp, action, query, referrer URL) and tracking click-throughs. An alternative might be to use Google Analytics.
For more, see the slides below or read the C&RL Article Preprint.
Code4lib Pre-Conference: Microsoft Research (MSR)
Future Technology
So the first half of the tour was the non-disclosure, confidential part but the group that I was part of basically got information on how Microsoft research trends and some of their results. We then got to play with some of the prototypes they have been working on, which is technology they see as coming into the market in 5-10 years. To get a general sense of what might have been included, take a look at the Future Productivity Vision video they released recently:
Microsoft Research (MSR) at Building 99
The research division focuses on core computer science research of fundamental aspects of computing. A lot of the products of their research include papers, patents, and prototypes. They supplement staff and resources with scholarly research by partnering with academia. The focus is mostly on applied projects.
ChronoZoom
- to be released in March
- working with Berkeley and a couple of other universities
- prototype to help in research and teaching cross-discipline
- no details beyond that as we were told to keep this one under wraps, but check out the link for more information
F#
- practical, functional-first programming language that allows you to write simple code to solve complex problems
- in the .NET family, fully supported by Microsoft Visual Studio
- multi-paradigm: can used different models, e.g. object-oriented
- interoperable: doesn’t work in isolation, can use all of .NET framework
Simplicity: Functional Data
- simple code, strongly typed
- Example 1: let swap (x, y) = (y, x) vs. (in C#) Tuple<U,T> Swap<T,U>(Tuple<T,U> t) { return new Tuple<U,T>(t.Item2, t.Item1) }
- Example 2: let reduce f (x, y, z) = f x + f y + f z vs. (in C#) int Reduce<T>(Func<T,int> f,Tuple<T,T,T> t) { return f(t.Item1) + f(t.Item2) + f(t.Item3); }
Simplicity: Functions as Values
- can define function inline
- can define own units of measure, and enforce conversions
Example:
- type Command = Command of (Rover -> unit)
- let BreakCOmmand = Command(fun rover -> rover.Accelerate(-1.0))
- let TurnLeftCommand = Command(fun rover -> rover.ROtate(-90.0<degs>))
Some Other Features
- built-in run parallel and asynchronous
- can use traditionally, compile and run OR interactively, execute on the fly
- x |> f – apply f to x
There was more, but I honestly couldn’t copy that quickly and didn’t understand every detail, but if you’re interested you try F# through a browser which includes an interactive tutorial, or download it from tools and resources. To learn more about what people are doing with it, take a look at F# Snippets.
F# 3.0
While 2.0 excels at analytical programming, solving computationally complex problems, 3.0 is an accelerator for data-complex problems by bringing information to your fingertips.
Basically, you can load a database (through URI) and while you program, you can see a full list of all the data elements that are available.
For example, after defining a type by loading the netflix database, in typing “netflix.” you would at this point get a list of the fields (e.g. Movies) from the database
Layerscape
- geoscience tool
- can download and run for free
- have the ability to bring a lot of time-sensitive data and use GPU to create visualization
- talk to worldwidetelescope (WWT) through API
- also has a custom ribbon plugin for excel to view in WWT for non-programmers
- can also create custom tours including text and audio, which then exports into videos. Note: The data is included in the tour so that people can see the data – check out the Seismicity Samoa and Tohoku example video we saw (requires Silverlight)
Microsoft Audio Visual Indexing Service (MAVIS)
- keyword search in audio/video files with speech
- speech recognition technologies used to ‘crack’ audio files
- Microsoft Research technology: world-level lattice indexing
- 30-60% accuracy improvement over indexing automatic transcripts – right now, 80% of content, 85%+ accuracy
- can provide closed caption which can also be edited later
- index word alternatives – robust to recognizer errors
- index timing – navigate to exact point in video and provides timeline of where the phrase is spoken
- tune-able – queries from ‘give me something’ to ‘dig deeper to find it’
- computer intensive speech recognition done in Azure
- no need to invest in H/W infrastructure
- front end user search integrated with SQL server
- search infrastructure is the same as full text indexing in SQL
- SOAP based API
- allows integration of media search results in other applications e.g. text search
- need at least 500 hours of transcribed data in order to train the program for other languages
MAVIS Architecture
Great for library and archives in order to pull content from digitized audio and video of formats becoming obsolete or degrading.
Microsoft Academic Search
- free academic search engine
- structure unstructured data
- 38+ publications including non-public data
- can search or browse by domain to see top authors, publications, journals, keywords, organizations
- for recognized terms e.g. Bone Marrow can see term occurrence, definition context from full text indexes, top authors, conferences, journals, etc.
- can search for person and see their publications, but then with disambiguation, and then a profile with list of publications, citations, visualization of coauthors, citers
- can see organization profiles and how they compare to others including Venn diagram of publication keywords
- can pull most of the visualizations and embed into a website
- RSS feed for each element
- full API also available and get results in JSON or XML via SOAP
- site interface allows crowd sourcing to edit information e.g. if disambiguation of publications is wrong (though right now, only with Live account, working on OpenID)
This strikes me as Google Scholar but with more functions, visualizations, and linked data. Right now, not a lot has been indexed, but I can see this as a much better version of Google Scholar.
Being Green > Swag You’ll Probably Throw Away
Finally, at the end of the night, one of the staff presented on why he’s anti-swag, so instead of giving MS swag away, we got the opportunity to take home an epiphyte complete with care package. Unfortunately, I can’t take it home across the border so I found someone to adopt it.

A Brief Look at Summon
Summon is Serials Solutions’ web scale discovery tool. I think so far, it looks pretty good. It has all the things you’d want these days in your searches including:
- sidebar with different options to refine search
- clean, easy to use interface
- save citations to folder and export
- advance search, including ISBN for books
Currently, all records in the catalogue, institutional repository, and journal articles have been included. There’s also a locations refinement category to refine to a specific branch for catalogue materials.
It’ll be interesting to see what our users (including staff) think.
Quick Edit/Add-on: Seems like the major criticism I’ve heard is that it does not do known-item (that is you know what you’re looking for) searches well, but as my supervisor has explained, that’s not the purpose of a discovery tool. If you want to looking for something you know in a library, you use the source that will help you look for that. Some people might say “but look at google, it can do both well”, but even google scholar is unlikely to give you a book if you only enter a couple of words when you’re looking for a book (obviously that’s not true in all cases).
You must be logged in to post a comment.