Code4lib Day 1: Lightning Talks Notes

Al Cornish – XTF in 300 seconds (Slides in PDF)

  • technology developed and maintained by California Digital Library
  • supports the search/display of digital collections (images, PDFs, etc)
  • fully open source platform, based on Apache Lucene search toolkit
  • Java framework, runs in Tomcat or Jetty servlet engine
  • extensive customization possible through XSLT programming
  • user and developer group communication through Google Groups
  • search interface running on Solr with facets
  • can output in RSS
  • has a debug mode

Makoto Okamoto – saveMLAK (English)

  • Aid activities for the Great East Japan Earthquake through collaboration via wiki
  • input from museum, library, archive, kominkan = MLAK
  • 20,000 data of damaged area
  • Information about places, damages, and relief support
  • Key Lessons
    • build synergy with twitter
    • have offline meet ups & training

Andrew Nagy – Vendors Suck

  • vendors aren’t really that bad
  • used to think vendors suck, and that they don’t know how to solve libraries’ problems
  • but working for a vendor allows to make a greater impact on higher education, more so than from one university (he started to work for SerialsSolution)
  • libraries’ problems aren’t really that unique
  • together with the vendor, a difference can be made
  • call your vendors and talk to the product managers
  • if they blow you off, you’ve selected the wrong vendor
  • sometimes vendor solutions can provide a better fit

Andreas Orphanides – Heat maps

The library needed grad students to teach instructional sessions, but how to set schedule when classes have a very inflexible schedule? So, he used the data of 2 semesters of instructional sessions using date and start time, but there were inconsistent start times and duration. The question is how best to visualize the data.

  • heatmap package from clickheat
  • time of day – x-dimension
  • day of the week – y-dimension
  • could see patterns in way that you can’t in histogram or bar graph
  • heat map needn’t be spatial
  • heat maps can compare histogram-like data along a single dimension or scatter-like plot data to look for high density areas

Gabriel Farrell – ElasticSearch

Nettie Lagace from NISO

  • National Information Standards Organization (NISO)
  • work internationally
  • want to know: What environment or conditions are needed to identify and solve the problem of interoperability problems?

Eric Larson – Finding images in book page images

A lot of free books exist out there, but you can’t have the time to read them all. What if you just wanted to look at the images? Because a lot of books have great images.

He used curl to pull all those images out, then use imagemagick to manage the images. The processing steps:

  1. Convert to greyscale
  2. Contrast boost x8
  3. Covert image to 1px by height
  4. Sharpen image
  5. Heavy-handed grayscaling
  6. Convert to text
  7. Look for long continuous line of black to pull pages with images

Code is on github

Adam Wead – Blacklight at the Rock Hall

  • went live, soft launch about a month ago
  • broken down to the item level
  • find bugs he doesn’t know about for a beer!

Kelley McGrath – Finding Movies with FRBR & Facets

  • users are looking for movies, either particular movie or genre/topic
  • libraries describe publications e.g. date by DVD, not by movie
  • users care about versions e.g. Blu-Ray, language
  • Try the prototyped catalog
  • Hit list provides one result per movie, can filter by different facets

Bohyun Kim – Web Usability in terms of words

  • don’t over rely on the context
  • but context is still necessary for understanding e.g. “mobile” – means on the go, what they want on the go
  • sometimes there is no better term e.g. “Interlibrary Loan”
  • brevity will cost you “tour” vs. “online tour”
  • Time ran out, but check out the rest of the slides

Simon Spero – Restriction Classes, Bitches

OWL:

  • lets you define properties
  • control what the property can apply to
  • control the values the property can take
  • provides an easy way to do this
  • provides a really confusing way to do this

The easy way is usually wrong!

When defining what can apply to and the range, this applies to every use of the property. An alternative is Attempto.

Cynthia Ng – Processing & ProcessingJS

  • Processing: open source visual programming language
  • Processing.js: related project to make processing available through web browsers without plugins
  • While both tend to focus on data visualizations, digital art, and (in the case of PJS) games, there are educational oriented applications.
  • Examples:
    • Kanji Compositing – allows visual breakdown of Japanese kanji characters, interact with parts, and see children.
    • Primer on Bezier Curves – scroll down to see interactive (i.e. if you move points, replots on the fly) and animated graphs.
  • Obvious use might be instructional materials, but how might we apply it in this context? What other applications might we think of in the information organization world?

Since doing the presentation, I have already gotten one response by Dan Chudnov who did a quick re-rendering of newspaper data from OCR data. Still thinking on (best) use in libraries and other information organizations.

It’s over for today, but if you’d like more, do remember that there is a livestream and you can follow on twitter, #c4l12 or IRC.

MediaWiki Image Link Workaround

So, in playing around with my user page and trying to make it look pretty, I found out that you can link an image to an internal or external link like you might normally do instead of the File: page. That’s great, but the problem I found was that except for the basic internal and external link, when you linked the image to something that inherently has a little icon next to it (e.g. mailto link gives you a little e-mail icon), then it would show the icon next to the image (see below left).

So, it turns out that there is a workaround to hide the icon (see above right). You can add this bit of code to the main.css file:

#bodyContent .plainlinks a {
background: none repeat scroll 0 0 transparent !important;
padding: 0 !important;
}

Or if the CSS Extension is installed, you can hack it by using the same code, but you’ll probably want it to be context dependent if you can.

When Basic Tutorials Go Defunct?

Documentation, tutorials, and user guides must evolve and be updated as technology and software move ahead, but when so many web-based applications use the same basic WYSIWYG, are basic tutorials even needed anymore?

This issue was brought up recently with our wiki’s update to the newest version of MediaWiki.  If you use wikipedia at all, you’ve probably been using the new version for quite some time now.  One of the greatest improvements for the end-user is the new toolbar.

MediaWiki 1.16 Toolbar
MediaWiki 1.16 Toolbar

It covers all your basic formatting needs including tables (which is not the easiest for new users to figure out).  The help section is really nice too (since MediaWiki is not a WYSIWYG) showing the user how something will display (of course there’s always the preview button).

After this update, I realized that users will unlikely need as much guidance in editing their wiki pages and the basic tutorials that I created don’t really seem to be needed anymore, or do they? I haven’t exactly polled my users on this issue or anything.  For the moment, I have kept it live and updated as it’s being used as a general help article as well.  Maybe some users need a bit more structure via a linear method of creating pages, but it would be interesting to know…

When Wiki and HTML Formatting Collide

So I’ve been messing around with wiki coding since obviously I’ve been working on developing content on the wiki. One of the things I was trying to do was a hanging indent (here’s another more complex one where you don’t need to set a margin and documentation is better) in order to display citation examples properly. More than that, I wanted to offset the whole citation (i.e. add an indent) in order to make it stand out from the rest of the text.

Template Code (Hanging Indent)
Whether you look at the first or second template, they both modify the CSS in order to make the hanging indent. They essentially change two attributes:

margin-left:2em;
text-indent:-2em; (shifts the first line of a paragraph)

Now in order to indent a line or paragraph, there are usually a couple of ways to do it in wiki, but throw the hanging indent template into it and it didn’t always work out so well.

Add Wiki Code
Usually the best way to do a simple indent in wiki is using a colon, such as

: Indented text

However, I suspect that rather than adding to the margin, the wiki changes the margin for that text, and the hanging indent code overrides it. So, the result is that it does nothing.

Add HTML Code
The other option was to use the <blockquote> tag. As the blockquote does not interfere with the CSS styling, this had the intended effect except that just like in this post, if I use blockquote,

you get spacing before and after the blockquote as you would with a <p> tag

My Solution
Not a very elegant solution, and rather the brute force way, but I just ended up creating a template for citation examples that hard coded the extra margin. I suppose the other solution would have been to add an extra variable to the hanging indent template but I figured that would not be worth the trouble.

PDF2Wiki Conversion Comparison

So, some people may ask, why are you trying to convert PDF to Wiki? PDF is usually the last step in the process, so just use the original document. My response would naturally be, what if you don’t have the original document?

A Two-Step Process
Through my searching and reading on the topic, it seems there is no PDF2Wiki Converter. Every site that I have read explains converting the PDF to one of: DOC, RTF, HTML, XML first then to wiki format.

PDF2HTML
I tried a number of PDF to HTML programs, but none of them worked to my satisfaction. Most of them only converted simple formatting, such as bold and italics.  Adobe has an online conversion tool. It’s better than some of the others I’ve tried as it interprets lists and such. The resulting code is rather ugly and a lot of the code would need to be stripped before using a HTML to Wiki converter. See my previous post on HTML2Wiki for a couple of tools on tidying or stripping HTML code.

PDF2DOC
I found that a much better alternative was converting the PDF to a DOC/RTF file since it’s a lot simpler and some formatting might be lost, but you won’t have a lot of needless code that might mess up your wiki page. There are a lot of online tools that provide a PDF to DOC/RTF service, however, again, they only tend to do basic formatting.  Adobe Acrobat does a really good job, because it will change lists into formatted lists (instead of normal text).  The major downside of course is that Acrobat is a paid program though there is a 30-day trial.

Conclusion
I had a lot of problems in particular with PDF to HTML, so I thought PDF to DOC/RTF is simply. Honestly though, unless you have a PDF file which is really long and has a lot of simple formatting (bold, italics, etc.), if you cannot get your hands on Acrobat, then I suggest simply copy/paste (or alternatively save as a text file) and manually formatting it in the wiki’s editing box. Of course this depends on the wiki you’re using because ones that don’t have a toolbar to help you quickly format might be a bit of a pain. Someone please let me know if you have found a better method!