Code4LibBC 2025: Day 2 lightning talk notes

Notes from the second day of Code4libBC 2025.

Transforming Unstructured Data in a Knowledge Base: Exploring the Potential of RAG and LLMs at SFU Library

Ian Song, SFU

potential of the application of LLMs in the library
data challenge: growth has outpaced traditional systems, traditional search is inadequate and struggle with contextual queries
80% of all data is in unstructured formats (text, audio, video, images)
unstructured assets include
- audio (over 2000 digitized cassettes),
- e-texts (PDF and coded e-text for print disabled),
- internal documents: policy, instructional, administrative
- full text: traditional CMS/databases, no deep contextual analysis
new opportunities: advances in AI, specifically LLMs, can process and understand human language at scale
limitations:
- knowledge cutoff: trained on historical data
- hallucinations/inaccuracies: plausible sounding false info
- privacy/bias: inherited from training datasets
RAG:
- indexing: creating the KB: break docs into chunks, converted to numerical representations (embeddings) and stored in vector db
- retrieval/generation: answering queries matched (using semantic search) against the vector db to find relevant chunks, fed into LLM to generate a grounded response
- semantic search/reasoning: factual, auditable, and contextually rich
GPT4All open source model
- privacy/security: full control data locally, no info leaves the network
- cost-effective deployment: can run RAG on local machine (CPU, or modest GPU)
- flexibility/control: choose from many open source LLM options (llama, mistral, etc.), fine-grained control over dev pathway (GUI/CLI0 and indexing framework (direct or langchain/llamaindex), controllable chunk size
optimizing: pre-processing diverse data
- challenge 1: digital audio files, transcribe to generate structured format, enhancement: add metadata to improve retrieval
- challenge 2: complex PDF, OCR, metadata enhancement
- theses: if each file contain full text, would be ideal
future trend: advanced RAG architectures with knowledge graphs, multimodal, alternative technologies (such as context window expansion)
RAG addresses key LLM flaws
local deployment is critical
data quality is paramount: pre-processing key to high-performance RAG
next steps: pilot on smaller defined transcribed audio, develop user interface, explore integration with internal knowledge graph

Consent Not Required: (AI) Technology as Connection

Coco Chen, SFU & Rebecca Ardron, Alexander College

lack of consent, understanding, awareness of data collection: epistemic injustice
teaching kids in the era of AI: decline in intergenerational bonding, inherited skills, increase in transactional interactions
products that are marketed to sell, but collecting data
reduced boundaries: preferred AI in search due to positive bias, human relationship communication, unblurring images, distancing from connections

Building AI Literacy in the Public Library

Jaclyn Fong, West Vancouver Memorial Library

research AI programs offered in public libraries across BC and Canada
2024-Oct co-op student developed a 2-part AI course: part 1 more intro/lecture style, part 2 more hands-on
2025-Jan ran first time
2025-Summer co-op student added class about writing prompts to make 3-part course
2025-Fall ran two more times, develop class on AI privacy
Understanding, Exploring, Talking (Prompt Creation for Beginners), adding privacy
full attendance: more than 90 participants
also Tech Talks with guest speakers
what’s next: recommended AI resource page on website, more AI-theme tech talks

Defending Library Services Against AI Scrapers

Scott Leslie, BC Libraries Co-op

unwanted traffic has always been an issue: web scrapers, malicious
robots.txt was introduced, which worked fairly well for years
now have AI trainers that are ignoring robots.txt
used to be infrequent enough that could do IP banning
needed something more automated
landed on CrowdSec
crowd-sourced IP ranges known to be harvesters or bad actors; also learning to your log files
can pay to get better curated list
not the only approach: CloudFlare (may still allow AI through if paid), geo-location/blanket range IP blocking
limit ports
consideration: can the list be challenged/adjusted

Break

Time to look for a snack.

Working with APIs: Flows and Runner in Postman

Olga Kalachinskaya, Douglas College

running automatically multiple API calls
options: programming languages, MS Power Automate (works with Excel), API testing/automation tools (like Postman, Insomnia, etc.)
using it with Folio
use case 1: due to a bug, Course Listing records in Reverse were not auto deleted when a course was deleted; internal records that become noise in the system
use case 2: tech services were preparing to load new Authority records from Backstage and prior to they needed to delete from Folio
options:
- ask vendor to do it (free, easy, quick)
- use APIs to do it myself
decided to use Postman, with free version
flows provide drag-and-drop interface for building API workflows to chain multiple requests
runner to create collections of API requests and executive in sequence/parallel
can import CSV or JSON files
logging
limit access token permissions
resources: Folio (tickets, community), Postman docs

Bulk DOI generation in DSpace with the Super-Duper-App!

Daniel Sifton, VIU

hosted DSpace VIURR for IR
about 18% or 5500 did not have DOI
option 1: can manually edit the records
option 2: extract dspace metadata through csv/api, create payload.json, POST, insert DOI in dspace metadata through csv/api
searched for scripts online: 3+ and 4+ python scripts, php web app
need mechanism: export from metadata from dspace > transform to datacite metadata > generate bulk DOI > merge DOI export with dspace metadata > import back to dspace
csv merger (select data and merge to new datacite DOI import) > datacite-bulk-doi-creator (from CSV file) > csv merger (merge source field from datacite DOI creator to dspace export/import file)
used Flet: Python GUI framework, to get a web app
process now through super-duper-app
mapping DC to datacite, had to account for variation such as uri, uri[], uri[en]
needs improvement: fields names logic, secure pasting, source URL match points, year/unknown, duplicate downloads

Building an assignment planner on Playlab

Joyce Wong, Langara College

typical assignment planner: formulaic, dated, no customizations, one way output; no context, just gives a list of tasks
Playlab: non-profit AI platform for educators and students, build AI apps but can select to avoid “answer generation”
POP
- Persona: learning strategist
- Objective: create a schedule
- Parameters: offer Langara library and writing support, positivetone, no answer generation
app takes type of assignment, start date, due date
if less than 5 days, prompts to
will ask if research required, other specific requirements
similar to reference interview with student
will ask if certain dates can’t work on assignment
revises schedule based on answers
can have a conversation
workflow can include specific steps
guidelines and guardrails
can choose different AI models, variability (20%)
build tools that students can actually use

End

That’s all the talks. See you next time!