Skip to content

Learning (Lib)Tech

Stories from my Life as a Technologist

  • About Me
  • About this Blog
  • Contact Me

Tag: metadata

Code4Lib Day 3: Closing Keynote – Gordon Dunsire

Granularity in Library Linked Open Data

Slides

Fractals

  • self-similar at all levels of granularity
  • each circle represents of things that look very similar (snowflake looking pattern but of different sizes)
  • characteristic of fractals
  • cannot determine level: all levels are equal, some more equal than others

Multi-Faceted Granularity

  • What is described by a bibliographic record? or a single statement?
  • What is the level of description? How complete is it? e.g. AACR2
  • How detailed is the schema used? How dumb? – especially relevant right now. The more detailed, the higher level of granularity possible.
  • Semantic constraints? Unconstrained?

Resource Description Framework – Linked Data

  • Triple: This resource | has intended audience | Juvenile
  • Subject / Predicated / Object
  • do each of these parts have granularity?
  • higher/lower level, but should talk about coarse or fine grained granularity

Subject: What is the Statement About?

  • we can focus on description an article / resource / work, then think about coarser or finer granularity:
    • coarser: consortium collection / RDF map
    • library collection / digital collection
    • super-aggregate journal title / jurnal index
    • aggregate: issue / festschrift
    • focus on description an article / resource / work
    • component: section / graphics / page
    • sub-component: paragraph / markup
    • finer: word rdf/xml
    • uri / node

Predicate: What is the Aspect Described?

  • similar coarse/fine breakdown:
    • membership category
    • access to resource
    • access to content
    • suitability rating
    • audience and usage
    • audience
    • audience of audio-visual material
  • diagram: possible audience map (partial) – unconstrained version to avoid collisions of isbd/dct/schema/rda/m21/frbrer
  • different links can be made while still retain proper semantic links
  • currently constructing just one giant graph

What is the Aspect Described?

  • coarse to fine:
    • resource record
    • manifestation record
    • title and s.o.r
    • title statement
    • title of manifestation
    • title word
    • first word of title
  • why do librarians need so many titles? Why not just use dublin core title and be done with it? Because we need it to do our work e.g. spine title to browse
  • title = string identifier
  • RDA: what to do with this? how do we apply these needs?
  • possible semantic map (partial) – I won’t even try to reproduce this
  • need to take into account names and ranges
  • make it more difficult, but more powerful

Semantic Reasoning: The Sub-Property Ladder

  • this is where the graph becomes useful and property
  • machines can’t reason, so we’re demantic the semantics such that we can give the rules to machines to process our data
  • semantic rule:
    • if property1 sub-property of property2;
    • then data triple: resource property1 “string”
    • implies data triple: resource property2 “string”
  • otherwise, data triple remains the same
  • simple enough for computer to carry out
  • doesn’t matter how complex the map actually is, because it can still do it in matters of seconds
  • machine entailment: isbd” “hast title proper” (finer) -> dct: “has title” (coarser)
  • might sound simple, but making a computer do interferance
  • ‘dumb(ing)-up, data has been lost, but still meaningful – moved from one schema to another

Data Triples from Multiple Schema / Entailed from Sub-Property Map / rom Property Domains

  • frbrer: “has intended audience” – “primary school”
  • isbd: “has note on use or audience” – “for ages 5-9”
  • rda: “intended audience (work)” – “for children aged 7-“
  • m21: “target audience” 0> m21terms: -> “Juvenile”
  • definition attached to the vocabulary
  • also talking about granularity
  • can map the sub-property to top level of unc: “has note on use or audience”
  • “is a” frbrer: “work”, isbd: “resource”, rda: “work” – rda and frbr schema actually separate, not semantically linked even though vocabulary is similar and RDA is based on FRBR
  • once stabalized can be drawn from each other

What is the Aspect Described?

  • coarser to finer:
    • creator
    • author
    • screenwriting
    • animation screenwriting
    • children’s cartoon screenwriting
  • different controlled vocabulary
  • graph of RDA for author/creator/screenwriting in relation to work and agent
  • graph of same thing, but for dc for creator and agent
  • what is the semantic relationship between the dct creator and the rda creator?
  • marcrel author maps to dc contributor, not creator – what is the relationship between rda author and marcrel author?
  • decision from 2005, needs to be reappraised and reviewed
  • relationship between dc creator and dc contributor?
  • how does lcsh “screenwriters” fit?

Machine-Generated Granularity

  • also has issues
  • e.g. full-text indexing: down to the word level
  • BabelNet: A very large multilingual ontology
  • can get quite complex and granular

User-Generated Granularity

  • users can actually generate useful metadata
  • can use statistical methods to remove extremes and come back with consensus
  • going to cause granularity problems e.g. “OK for my kids (7 and 9)”, “Too childish for me (age 14)”

KISS

  • keep it simple, stupid
  • keep it simple and stupid?
  • data model is very simple: triples!
  • in terms of complexity, actually very simple
  • but metadata content is complex
  • and therefore, resource discovery is complex
  • complex structure of application of simple rules, similar in the hard sciences and math
  • simplicity is elegance

AAA

  • Anyone can say anything about any thing
  • someone will say something about every thing
  • in every conceivable way
  • and then constrained linguistically

OWA

  • open world assumption: the absence of a statement is not a statement of non-existence

Will it get so granular that it becomes too complex?

And the rest is science

Break Time

tiny octopus
How Fine Can an Octopus be?
Author Cynthia NgPosted on February 14, 2013October 12, 2024Categories EventsTags code4lib, keynote, linked data, metadata

Code4Lib Day 2: Morning Notes

REST IS Your Mobile Strategy

  • Richard Wolf, University of Illinois at Chicago
  • Slides
  • Raw Material

REST

  • Representational State Transfer – a methodology developed alongside HTTP 1.1
  • clients request representations of resources from servers – typically a document
  • basically turns into an API

Examples

  • Twitter
  • New York Times –  Congress API
  • Chicago Transit

iOS Development

  • need to know: Xcode, Objective-C, Cocoa Touch, Provisioning
  • Xcode – Apple developer, like Visual Studio or Eclipse
  • Objective-C – strict superset of C
  • Cocoa Touch – frameworks to talk to iOS, similar to RubyRails
  • UIKit
  • Provisioning Portal – annoying paperwork

OCLC Classify API

  • give it an item, tell you how it’s classified including call number

Process

  • Use Rested -MAC tool,grabs API information, and provides you the raw output
  • Xcode – create a new basic project
  • go from XML to Objective-C
  • use RestKit – maps XML to Objective-C
  • use PaintCode – create GUI
  • hire an artist
  • Apple App Review Process

Librobot App

  • in the store by April 2nd

Why REST Matters – What are the Major Milestones

  • math formula – importance of technology can be determined by the amount of money involved in a court case
  • Personal Computers
  • The Internet
  • Mobility
  • Build an API – ask for ideas, and apps will come.

Take Away

  • you have interesting data
  • make an API
  • If we build it, they will come for it!

All Teh Metadatas Re-Revisited

  • Esme Cowles, UC San Diego Library
  • Matt Critchlow, UC San Diego Library
  • Bradley Westbrook, UC San Diego Library

Continues the story from last year.

Needs

  • more consistent data
  • maintain syntax of hierarchical subjects
  • improve support for complex objects
  • align more strongly with the digital libraries community – most important

User Stories

  • to understand requirements of administration and researchers

Sorry, I had to take a brain break and got a little lost. I’m also going to blame twitter and IRC for distracting me. Take a look at the slides:

Implementation

  • DAMS Repository – new version of lightweight repository, with APIs
  • Manager – separate and uses the API
  • Public Access System – new frontend in Hydra, great community

Timeline

  • release in summer
  • code now available on Github

Browser/Javascript Integration Testing with Ruby

  • Jessie Keck, Stanford University
  • Slides

The Problem

  • needed to test JavaScript
  • especially since using progressive enhancement
  • site works without JavaScript, then more features with JavaScript
  • mistakes happen e.g. killed navigation,

Some Solution(s)

  • Watir == Web Application Testing in Ruby
  • built on watir-webdriver
  • Capybara – RSpec/Cucumber driver
  • ability to test responsive design
  • webkit integration available
  • personally like Capybara syntax (vs. Watir)
  • automated test that there is JavaScript bug e.g. automatically test that facets working

Gotchas

  • might want to use Watir Rails
  • transactional fixtures

Linked Open Communism: Better discovery through data dis- and re- aggregation

  • Corey A Harper, New York University

How to shut up about linked data and actually build something.

Context

  • context, the narrative of the library/archive
  • user stories

Death of Browse

  • discovery systems don’t use authority control
  • browse broken as UI design
  • rich data in authorities disconnected

The Idea Implemented

  • take EAD records, blow them up, take headings to match MARC records
  • pull people, coporations, and topic – pull info from DBpedia
  • index in Solr
  • slower than would like
  • On Github but is buggy

Solr Update

  • Erik Hatcher, LucidWorks

Sorry, but we don’t use Solr, and anyone really interested I think can look up information the update. e.g. Apache Solr Release Notes

Check out the slides:

Break Time

white otter
Time to Get Some Hugs

Ask Anything

Who’s faculty? Half Faculty – small handful who care about being faculty

Planing and pilot phase of bringing together all resources of types. How to decide what to use and where to start?

Normalizing records from MARC to Solr. Want help with format.

How many have library degrees? 2/3 do, 1/3 don’t

Code4Lib – archiving our stuff? Talk to Mark/anarchivist. Mailing list is archived on the university server. Mirrored on post. Regular basis, dumped to media forward.

Goals of BIBFRAME? Replacing/superseding MARC.

First-timers to c4lcon? majority of room. All? < 20

Anyone collecting social media on behalf of user community or collection building purposes? Going to be a lightning talk tomorrow.

Anyone from a theology library? ~5 ppl

Want to know successful examples of gamification to support information literacy by @maccabeelevine e.g. Lemontree

Glossary of technology and stacks. On code4lib wiki? A guide for the perplexed. We can work on it.

Who is using graph databases? 2-3 ppl

Using DSpace? 25-30 FedoraCommons? 25-30 Hydra? 10-15

This conference working for you? Almost everyone.

What do people think of the wiki? One idea is to move it over to github code4lib account.

From the federal government? 3

Anyone interested in integrated TSM into Solr? anarchivist says he knows people

How many non-library degree people considering getting one? 2

How many have project managers as their title? ~12 Public? 5 Academic? rest

CodeRead – looking at PyMARC (sp?). Anyone else looking into this?

Didn’t get all the questions, but that’s most of them.

Lunch Time

squirrel begging for food

Author Cynthia NgPosted on February 13, 2013October 12, 2024Categories Events, TechnologyTags code4lib, metadata, mobile apps, mobile design, usability

Code4Lib Day 1: Afternoon Notes

Practical Relevance Ranking for 10 million books

  • Tom Burton-West, University of Michigan Library

Search Challenges

  • multilingual, 400+ languages
  • OCR quality varies
  • very long documents
  • books are different from what they normally have

Relevance Ranking

  • how to score, weigh
  • default algorithm ranks very short documents very high
  • needed to tune/customize parameters
  • average document size is ~30 times larger
  • did prelim testing with Solr4 and didn’t see the same problem, but need more testing
  • dirty OCR complicates things, as well as language
  • occurrence of words in specific chapters vs. whole book – should we index parts of books?
  • similar issue with other objects e.g. bound journals, dictionaries & encyclopedias
  • difficulty too is inconsistent metadata, breakdowns of articles/chapters/etc. will be inconsistent
  • creating a testing plan and adding click logs

n Characters in Search of an Author

  • Jay Luker, IT Specialist, Smithsonian Astrophysics Data System
  • Slides

Goal of a search is to match user input to metadata. e.g. author names

Building the next generation of the ADS 2.0. Trying to increase recall without sacrificing precision.

Requirements

  • match UTF-8 e.g. matching ASCII version to versions with diacritics/markings
  • match more or less information e.g. first name initial but without triggering substring matching
  • need to work with hand curated synonyms e.g. pseudonyms, maiden/married name

Solving the Problem

  • normalization – strip out punctuation, rearrange name parts – based on whether a common is entered
  • generate name part variations to whatever can be realistically expected
  • transliteration – use index instrospection for list of synonyms
  • expand user queries at each step:
    1. user searches
    2. normalize
    3. name part vars
    4. transliteration
    5. name parts vars of transliterated entries
    6. curated synonyms
    7. transliteration of anything added
    8. name part variations to catch everything
    9. assembled into large boolean query

Implementation

  • Python/JavaScript prototype
  • actual – Solr/Lucene

Evolving Towards a Consortium MARCR BIBFRAME Redis Datastore

  • Jeremy Nelson, Colorado College, jeremy.nelson@coloradocollege.edu
  • Sheila Yeh, University of Denver

Presentation Slides

I think this presentation speaks for itself.

Journal Article: Building a Library App Portfolio with Redis and Django

Hybrid Archival Collections Using Blacklight and Hydra

  • Adam Wead, Rock and Roll Hall of Fame and Museum
  • Presentation

Centre of everything is the Solr index. Blacklight puts everything into Solr. Library materials is easy enough, but with Archival collections use EAD with many items (not just one item as typical of MARC).

Extended Blacklight to search EAD

  • index collections and single items from a collection
  • search results include books, entire collections, and items from collections

Digital Content

  • kept in Fedora – objects described using Rubys
  • use Hydra to manage the content in Fedora – manages RDF relationships
  • indexes into Solr
  • Need to related Fedora content to its archival collection
  • content originates from sources in collection, and part of series
  • collection metadata already exists in Solr
  • create RDF representations of collections
  • Hydra queries Solr for collection meatadata
  • creates objects for series, subseries, items

Issues

  • terrible Solr performance for series, 500+ items
  • no EAD “round tripping” – EAD can go into Solr, but not back out
  • currently 60% complete

Citation search in SOLR and second-order operators

  • Roman Chyla, Astrophysics Data System

Sorry, I don’t have notes for this. My brain is a bit fried by this point. Will post link when I get it.

Break Time

Breakout Sessions – reports will be available on the wiki

Next Up – lightning talks

pig taking bath
‘Cause Pig
Author Cynthia NgPosted on February 12, 2013October 12, 2024Categories Events, TechnologyTags code4lib, digital collections, metadata, search

Code4Lib Day 1: Morning Notes

Was trying to do too many things this morning, so sorry if the notes are not complete.

ARCHITECTING ScholarSphere: How We Built a Repository App That Doesn’t Feel Like Yet Another Janky Old Repository App

  • Dan Coughlin, Penn State University
  • Mike Giarlo, Penn State University

Presentation Slides

Trying to make it less confusing without exposing what system it’s using.

Simple Metadata Management

  • building metadata widgets
  • required: title, creator, keyword, rights
  • hide most non-required, have ‘more’ link for rest
  • limited to a set numbers, with tooltip
  • use jQuery autocomplete to suggest authority vocabulary

Dashboard

  • list of uploaded files
  • list of files have access to

Background Jobs

  • I got lost here talking about rescue jobs, sorry
  • has tracebacks for

Permissions Widget

  • set visibility
  • share with specific people

Version Control

  • can restore previous versions

Social Features

  • not in initial requirements
  • profile
  • contributions – “trophies”
  • activity – follow/following

8 months to develop, but spent 2 months just doing usability and responding to feedback.

Available on GitHub.

Pitfall! Working with Legacy Born Digital Materials in Special Collections

  • Donald Mennerich, The New York Public Library
  • Mark A. Matienzo, Yale University Library

Presentation Slides

Disk Images Process

  • process
  • stream – digitized analog magnetic signal
  • sector – stream decoded using algorithm(s)
  • object
  • physical – entirety of device
  • logical

Pitfalls

  • formats mean different things
  • communities of practice use different kinds of container formats
  • no single solution

Quest for Access

  • delivery format
  • what allowed to be done with material
  • need usability testing

Pitfalls

  • no ideal single model
  • decisions through the life cycle have an impact on access
  • capacities of institution

Collection

  • faculty papers – 162 floppies
  • goal: “recover” backup into something useful with minimal changes, repeatable process
  • Vita Russo Papers
  • goal: preserve original, describe and arrange, access

Conclusions

  • Time consuming
  • acknowledge researchers
  • need to work on communities of practice

Hacking the DPLA

  • Nate Hill, Chattanooga Public Library, nathanielhill AT gmail.com
  • Sam Klein, Wikipedia

A rally to get involved.

It’s an API, and a community.

Examples

  • Biodiversity Heritage Library
  • Minnesota Digital

Events

  • Digital Public Library of America Appfest
  • Launch at Boston Public Library April 18-19

Documentation and API Creator is on GitHub.

EAD without XSLT: A Practical New Approach to Web-Based Finding Aids

  • Trevor Thornton, New York Public Library

Side note: EAD = Encoded Archival Description — a way of describing archival collection.

Project Goals

  • enable multiple presentations of the same data
  • support dynamic web apps
  • cross-collection search with component-level specificity in results, and faceting on common access points

Archives Data Management Application

  • system using Ruby on Rails + MySQL + Solr
  • based on existing infrastructure
  • stick with what they know
  • didn’t need to do anything more complex
  • key functionality: data import, search index, API

Core Models

  • collection: collection as we know it, may also be single volume
  • component: some collections at item level, some not
  • description: some data has descriptive attributes
  • access term

I just felt like I was copying the slides at this point, so I’ll try to get a link to the presentation slides instead.

The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery

  • Michael Klein, Senior Software Developer, Northwestern University LIbrary, michael.klein AT northwestern DOT edu
  • Nathan Rogers, Programmer/Analyst, Indiana University

Demo!

  • can upload from computer, but also shared dropbox
  • forced to enter some metadata

Avalon

  • is a stack
  • media streaming server

Content Processing

  • with Matterhorn
  • workflow pipeline – batch/unattended ingest – uploading one delimited file with names of files that should be related
  • pingbacks for status updates
  • caching of key metadata/images

Stream Security

  • support different types of streaming (for desktop & mobile) and authentication
  • use authentication tokens
  • half is media ID, add another half, whole thing is auth token

Lunch Time

I'm Hungry
‘Nuff Said
Author Cynthia NgPosted on February 12, 2013October 12, 2024Categories Events, TechnologyTags code4lib, digital collections, digital repository, metadata

code4libTO December Meetup Talks

BagIt Profiles – @ruebot

  • directory of data
  • bag has what you’re bagging, data, contact email/name, organization information, profile identifier (JSON via a URI)
  • pull in all the field values
  • validate
  • wrote a spec and send it to digital curation community
  • can look up profiles in the registry

Okay, I got a little lost, but you can see more on github.

Internet Archive Torrent Collections (iaTorrent) – @ruebot

  • see demo

Bookfinder – @TheRealArty & Steven

  • I will write this up later probably as a separate blog post, or maybe journal article

TPL’s Web Services Architecture: Understanding the Big Picture – @waharnum

  • many different systems that don’t easily communicate, which needs specialized knowledge even to do basic tasks
  • address the challenges by translation, simplication, standardization
  • Three tiers: Front End Systems (requests to back end) / TPL Web Services (REST) / Back End Systems (responds to front end)
  • Example: TPL Website -> Account Web Services -> Symphony Web Services (Symphony) – and back
  • can add new features and functions
  • helps to solve the challenges mentioned
  • also helps with reusability e.g. in addition to website, build mobile-friendly website, iPhone App
  • Might end up with:
    • Front End (Website, mobile, App)
    • Middle Tier (Account Web Services, ebook Web Services, online payment web services)
    • Back End (symphony, overdrive, payment gateway, accounting systems)
  • other benefits:
    • increase ease of knowledge transfer about how our systems work
    • follow modern best practice approach to building interoperating systems
    • reduce cost and integration time
  • reduce learning time for new staff or consultants
  • metrics: wish had resources
  • bolting together a lot of things, not using a lot of custom code

Ladder (aka MyTPL 2) – @mjsuhonos

  • wanted to solve problem: discovery layers suck
  • problems:
    • not scalable
    • inflexible
    • read-only
    • expensive
  • goals:
    • better than open source options (VuFind, Blacklight)
    • cheaper (than proprietary)
  • scalable as WorldCat
  • design:
    • schema-free/multi-schema (e.g. Dublin Core)
    • horizontally scalable (multi-node)
    • modern OSS components
  • simple data model (RDF)
  • Features:
    • hierarchical relations
    • clustering/de-duplication
    • versioning
    • real-time import & indexing
    • multi-thread/process
    • responsive UI
    • fully multilingual (18/10)
    • dynamic faceting
    • dynamic mapping modification
    • digital content storage (coming soon)
  • built on a linked data
  • not a discovery layer; it’s an integration platform

Heritage U of T – @ajmcalorum

  • News Announcement and Promotional Video
  • previously not centralized: hard drives, flickr, etc.
  • need central repository for tri-campus initiative with search & discovery, preservation, long-term access to content and metadata, support for multiple formats (e.g. images, books, documents, video, exhibits)
  • Drupal + Solr (search) + Fedora Commons (collection management, batch ingesting, metadata crosswalk, digital preservation) == islandora (digital asset management system)
  • pilot: 8 parent collections (by format, by campus)
  • exhibits in Drupal, not through islandora/fedora commons
  • modules: internet archive book reader (OCR on the fly), galleria, colorbox
  • official launch: 2 weeks ago

That’s it! Food and drinks time!

Author Cynthia NgPosted on December 13, 2012October 12, 2024Categories Events, TechnologyTags code4lib, digital collections, discovery layer, metadata, wayfinding

Posts pagination

Previous page Page 1 Page 2 Page 3 Page 4 Next page
Cynthia Ng Avatar

Technologist, Support Engineer, Librarian, Metadata and Technical Services expert, Educator, Mentor, Web Developer, UXer, Accessibility Advocate, Documentarian

  • Mastodon
  • LinkedIn
  • YouTube
  • RSS Feed
  • Link

Categories

  • Events (268)
  • Project work (90)
  • Work culture (59)
  • Technology (53)
  • Update (52)

Tags

  • code4lib (86)
  • reflection (69)
  • accessibility (61)
  • presentation (32)
  • GitLab (31)

Year

  • 2025 (3)
  • 2024 (8)
  • 2023 (17)
  • 2022 (14)
  • 2021 (5)
  • About Me
  • About this Blog
  • Contact Me
Learning (Lib)Tech Proudly powered by WordPress