Digital Odyssey 2013: Big Data, Small World Notes & Takeaways

Big Data

  • 90% of the world’s data was created in the last 2 years
  • can tell us much that other information cannot
  • emphasize the need for analysis and interpretation
  • your data is mined and used to make decisions for you, even more so in the future
  • to prepare, know that big data will affect data management, discovery tools, new jobs, revised skills requirements, and revised infrastructures
  • businesses will be made up of who has the most data and knows how to best use it

Data Visualization

Patrick Cain

  • examples: homicide map, number of registered organ donours, bedbugs, drunk driving, crowd sourced neighbourhood maps
  • providing information on what is otherwise a closed database
  • problem is analyzing the data and drawing conclusions
  • made people look at their community/neighbourhood in a different way
  • more examples: includes information on free visualization tools
  • a lot of public documents are behind a wall
  • finding the balance between potential harm and making data open
  • the data is imperfect, but help to tell a story
  • have information on where the data comes from
  • raw data gives you more power, can ask your own questions, but still have the difficulty of interpreting the data

The Changing Face of Toronto Public Library’s Data

Alan Harnum

  • patrons do the usual: borrow, use ref, ask questions, attend programs, use computers, wireless, visit the website, etc.
  • end up with a lot of transactional data, but it’s not all one big system, and not always available right away, in the form we want it, or at all
  • conceptualization and modelling of data
  • usage patterns are changing: digital borrow increasing, data needs are changing, policy needs are changing
  • understanding data relationships: what influences what e.g. How many people use the library only for wireless?
  • need for real-time data to improve service planning, responsiveness to inquiries, decision making
  • policy implications
    • privacy – need to balance wanting to collect data with protecting customer privacy (datasets can be deanonymized)
    • service delivery – robust data, but without losing what patrons like
    • evaluation and measurement – how to use effectively while remembering that data is not the only decision making tool
  • most important to remember: not everything that counts is counted, but not everything that is counted counts

Lunch Time

Sleepy Time?

Lightning Talks

Big Data in Libraries

MJ Suhonos

  • not actually as big in libraries: LoC: 1.9mill, Europeana: 20mill
  • big data varies depending on the capacity
  • think really complicated, but not actually that complex
  • big data = cumbersome, out of our reach
  • we don’t have to use the old tools, there are new ones
  • have new opportunities
  • cloud is not a magical bullet, just another tool – can do it in a more flexible way
  • less about size and more about freedom and new opportunities because we didn’t have the tools in the past
  • increasing the capacity around you
  • can increase the discoverability of the long tail
  • how to improve tools a little bit to solve problems we couldn’t before all over the place
  • linked data is metadata infrastructure
  • open data is policy infrastructure
  • “The cloud is a lie”

Engagement and Impact of Twitter by Canadian Libraries

Angela Hamilton & Sarah Forbes

  • found mostly analytic tools for marketers and for profit companies
  • used tool to pull tweets: at 38k
  • looking at the content, so no way to automate
  • coding considerations: retweets, mentions, content type, tone, hashtags, links and media
  • should have double coded to be more accurate
  • should have gotten more help

On Dentographs

William Denton

  • DDC in checkerboard to visualize depth and breath of the collection
  • particularly good for collection comparisons
  • LC in mountain version
  • could do animation of how it changes year to year or day to day
  • if use internal data, could do it based on circulation or holds
  • by doing visualization, know what to do next time
  • going from one medium to another, can extend
  • Presentation Write up
  • Code4Lib Journal Article

Open Data Policy in Canada

  • Tracey Lauriault

examples: Open North, Hacking Health, Treaty Process, Residential School Map, municipal quality of life, AAAS remote sensing

where to get data

  • Federal Data:, research data Canada,, Canadian International Development Agency
  • Provinces: Ontario, Alberta, Quebec, BC, Saskatchewan
  • Cities: 36 cities right now
  • Community Data Portal

data advocacy

  • Community Data Canada
  • Canadian Council on Social Development
  • Data Liberation Initiative

data policy

  • not a lot of funding that requires data management
  • Canadian Insititute of Health Research – encourage, but not policy
  • Open Government Resolution by Office of the Information Commissioner of Canada – unfortunately don’t have a lot of power
  • GeoConnections policy primers and guidelines
  • Open Government Partnership – but only federal
  • cippic – do everything in Canada
  • should make public anything that is publicly funded

Privacy by Design: Big Privacy for Big Data

Michelle Chibba

  • Ontario’s Information and Privacy Commissioner
  • philosophy: consultation, cooperation, and collaboration
  • confidentiality not the same as privacy
  • privacy is all about the individual, and individual rights
  • if unique, persistent, and linked to individual then it’s personally identifiable information (PII)
  • people must be able to trust that organizations will manage their information properly
  • forget the content, the metadata is what identifies the person
  • most importantly, the information is persistent
  • any digitization that can be intercepted and recreated into understandable information, then it is a record
  • actually fairly easy to deanonymize information based on a few data points
  • good data security is not the same thing as privacy
  • most privacy breaches remain unknown
  • need privacy by design
    • proactive
    • default
    • embedded
    • full functionality
    • security
    • viable and transparent
  • can de-identify data using proper techniques based on the objectives/needs: Dispelling the myths surrounding de-identification
  • data comanagement: accountability, minimization (collect little, use central registry), security, access
  • UI design concepts tied to transparency and trust focused on context, awareness, discoverability, comprehension
  • big data touching on privacy, example: connect un/structured data by casinos: card counter, relative of employee, or other relationships with the casino
  • need to be able to use both unstructured and structured data to connect the dots
  • features for next-generation sensemaking systems: full attribution, data tethering, analytics on anonymized data, tamper-resistent audit logs, false negative favouring methods, self-correction false positives, information transfer accounting
  • your identity is your most valuable possession

Summary of Talk

via @mjsuhonos

Author: Cynthia

Technologist, Librarian, Metadata and Technical Services expert, Educator, Mentor, Web Developer, UXer, Accessibility Advocate, Documentarian

4 thoughts on “Digital Odyssey 2013: Big Data, Small World Notes & Takeaways”

Leave a Comment

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: