Code4LibBC 2025: Day 2 lightning talk notes

Notes from the second day of Code4libBC 2025.

Transforming Unstructured Data in a Knowledge Base: Exploring the Potential of RAG and LLMs at SFU Library

Ian Song, SFU

  • potential of the application of LLMs in the library
  • data challenge: growth has outpaced traditional systems, traditional search is inadequate and struggle with contextual queries
  • 80% of all data is in unstructured formats (text, audio, video, images)
  • unstructured assets include
    • audio (over 2000 digitized cassettes),
    • e-texts (PDF and coded e-text for print disabled),
    • internal documents: policy, instructional, administrative
    • full text: traditional CMS/databases, no deep contextual analysis
  • new opportunities: advances in AI, specifically LLMs, can process and understand human language at scale
  • limitations:
    • knowledge cutoff: trained on historical data
    • hallucinations/inaccuracies: plausible sounding false info
    • privacy/bias: inherited from training datasets
  • RAG:
    • indexing: creating the KB: break docs into chunks, converted to numerical representations (embeddings) and stored in vector db
    • retrieval/generation: answering queries matched (using semantic search) against the vector db to find relevant chunks, fed into LLM to generate a grounded response
    • semantic search/reasoning: factual, auditable, and contextually rich
  • GPT4All open source model
    • privacy/security: full control data locally, no info leaves the network
    • cost-effective deployment: can run RAG on local machine (CPU, or modest GPU)
    • flexibility/control: choose from many open source LLM options (llama, mistral, etc.), fine-grained control over dev pathway (GUI/CLI0 and indexing framework (direct or langchain/llamaindex), controllable chunk size
  • optimizing: pre-processing diverse data
    • challenge 1: digital audio files, transcribe to generate structured format, enhancement: add metadata to improve retrieval
    • challenge 2: complex PDF, OCR, metadata enhancement
    • theses: if each file contain full text, would be ideal
  • future trend: advanced RAG architectures with knowledge graphs, multimodal, alternative technologies (such as context window expansion)
  • RAG addresses key LLM flaws
  • local deployment is critical
  • data quality is paramount: pre-processing key to high-performance RAG
  • next steps: pilot on smaller defined transcribed audio, develop user interface, explore integration with internal knowledge graph

Consent Not Required: (AI) Technology as Connection

Coco Chen, SFU & Rebecca Ardron, Alexander College

  • lack of consent, understanding, awareness of data collection: epistemic injustice
  • teaching kids in the era of AI: decline in intergenerational bonding, inherited skills, increase in transactional interactions
  • products that are marketed to sell, but collecting data
  • reduced boundaries: preferred AI in search due to positive bias, human relationship communication, unblurring images, distancing from connections

Building AI Literacy in the Public Library

Jaclyn Fong, West Vancouver Memorial Library

  • research AI programs offered in public libraries across BC and Canada
  • 2024-Oct co-op student developed a 2-part AI course: part 1 more intro/lecture style, part 2 more hands-on
  • 2025-Jan ran first time
  • 2025-Summer co-op student added class about writing prompts to make 3-part course
  • 2025-Fall ran two more times, develop class on AI privacy
  • Understanding, Exploring, Talking (Prompt Creation for Beginners), adding privacy
  • full attendance: more than 90 participants
  • also Tech Talks with guest speakers
  • what’s next: recommended AI resource page on website, more AI-theme tech talks

Defending Library Services Against AI Scrapers

Scott Leslie, BC Libraries Co-op

  • unwanted traffic has always been an issue: web scrapers, malicious
  • robots.txt was introduced, which worked fairly well for years
  • now have AI trainers that are ignoring robots.txt
  • used to be infrequent enough that could do IP banning
  • needed something more automated
  • landed on CrowdSec
  • crowd-sourced IP ranges known to be harvesters or bad actors; also learning to your log files
  • can pay to get better curated list
  • not the only approach: CloudFlare (may still allow AI through if paid), geo-location/blanket range IP blocking
  • limit ports
  • consideration: can the list be challenged/adjusted

Break

Time to look for a snack.

squirrel with head inside a camera

Working with APIs: Flows and Runner in Postman

Olga Kalachinskaya, Douglas College

  • running automatically multiple API calls
  • options: programming languages, MS Power Automate (works with Excel), API testing/automation tools (like Postman, Insomnia, etc.)
  • using it with Folio
  • use case 1: due to a bug, Course Listing records in Reverse were not auto deleted when a course was deleted; internal records that become noise in the system
  • use case 2: tech services were preparing to load new Authority records from Backstage and prior to they needed to delete from Folio
  • options:
    • ask vendor to do it (free, easy, quick)
    • use APIs to do it myself
  • decided to use Postman, with free version
  • flows provide drag-and-drop interface for building API workflows to chain multiple requests
  • runner to create collections of API requests and executive in sequence/parallel
  • can import CSV or JSON files
  • logging
  • limit access token permissions
  • resources: Folio (tickets, community), Postman docs

Bulk DOI generation in DSpace with the Super-Duper-App!

Daniel Sifton, VIU

  • hosted DSpace VIURR for IR
  • about 18% or 5500 did not have DOI
  • option 1: can manually edit the records
  • option 2: extract dspace metadata through csv/api, create payload.json, POST, insert DOI in dspace metadata through csv/api
  • searched for scripts online: 3+ and 4+ python scripts, php web app
  • need mechanism: export from metadata from dspace > transform to datacite metadata > generate bulk DOI > merge DOI export with dspace metadata > import back to dspace
  • csv merger (select data and merge to new datacite DOI import) > datacite-bulk-doi-creator (from CSV file) > csv merger (merge source field from datacite DOI creator to dspace export/import file)
  • used Flet: Python GUI framework, to get a web app
  • process now through super-duper-app
  • mapping DC to datacite, had to account for variation such as uri, uri[], uri[en]
  • needs improvement: fields names logic, secure pasting, source URL match points, year/unknown, duplicate downloads

Building an assignment planner on Playlab

Joyce Wong, Langara College

  • typical assignment planner: formulaic, dated, no customizations, one way output; no context, just gives a list of tasks
  • Playlab: non-profit AI platform for educators and students, build AI apps but can select to avoid “answer generation”
  • POP
    • Persona: learning strategist
    • Objective: create a schedule
    • Parameters: offer Langara library and writing support, positivetone, no answer generation
  • app takes type of assignment, start date, due date
  • if less than 5 days, prompts to
  • will ask if research required, other specific requirements
  • similar to reference interview with student
  • will ask if certain dates can’t work on assignment
  • revises schedule based on answers
  • can have a conversation
  • workflow can include specific steps
  • guidelines and guardrails
  • can choose different AI models, variability (20%)
  • build tools that students can actually use

End

That’s all the talks. See you next time!

cat looking out of a box