Recently, there has been a lot of discussion on password strength and how to make a strong password. I think the xkcd comic sums it up pretty nicely:
With only two weeks left and after last night’s meetup, I thought I’d reflect a little on some of the Government of Canada (GC) initiatives I’ve been part of over the term that are outside of my assigned projects, most of which are fairly recent or new.
Young Professionals Network Committees
Admittedly, this is a departmental (not GC) group, but it’s worth a mention.
Many departments (if not most) have a Young Professional Network (even if not by that name). YPN has committees to organize events as well as other work to support staff at the department. I sat on and contributed to:
- Retention and Renewal Report, another survey is going out to validate the results
- Student Committee, where we’re currently trying to develop a new orientation guide for students in the department
- Spend a Day with Senior Management, a job shadow event which has been approved by the YPN sponsor ADM
Contributing to the committees has been a good experience. It allowed me to meet other people working in different sectors and has given me a sense of accomplishment and contribution towards the department even if I’m not here to see the results.
Wiki Community of Practice – WikiCoP
My understanding is that my coworker started wikicop about a year ago in order to have people in the GC community meet every 1-3 months and share ideas, knowledge, and experience on their internal wikis as many departments are developing or have them now. Although I only got the chance to attend a couple, it was great to see what other departments were doing with their wikis and to participate in the discussions. I also got a chance to see a couple of the ways Confluence was being used, which was neat.
The GC wiki, GCPEDIA, is a great place for GC staff to share information GC-wide without making it public. There is a lot of great stuff including draft strategies, guidelines, and start up initiatives surrounding all aspects including social media and web usability. I didn’t actually take part in sharing much information, but I have been helping with maintenance. Most of it is day-to-day stuff like fixing broken/double redirects, categorizing pages/files, and page clean ups, but I have also:
- participated in a wikibee (essentially you do a big clean up as a group in person) for UXWG (User Experience Working Group)
- been helping with the migration to a new and much improved National Inventory of Bridgeable Students [internal link]
Doing wiki maintenance has helped me learn more about the different departments and what goes on in GC. I also got to know a few people through doing wiki maintenance and participating in the 2011 Best User Page Contest. It was lots of fun!
I think that’s one of the things that makes GCPEDIA interesting to work on. The very active (more permanent) people have been very encouraging (i.e. @jesgood and C. Au) and people will do little things to increase the sense of community and enjoyment, namely by making fun user boxes. I got a green belt/experienced contributor award (basically it’s a level up system based on how much you contribute to GCPEDIA), the 5th level, which I think is pretty decent for a single summer.
Web 2.0 Practioners – W2P
It’s kind of funny, because I avoided Twitter for the longest time. I didn’t think I’d have much use for it, and it just seemed like another social media platform, especially since I don’t have a phone with internet and lacked a laptop for the longest time, I didn’t see how I’d get involve with any conversation.
I was pushed onto twitter because of work. It helped that I got tweetdeck installed. Regardless, I was somewhat surprised by how much of day-to-day sharing between GC employees involved twitter. I shouldn’t have been, but then I used to work at an agency where you had no internet access.
It’s been a great source of keeping up with GC Web/technology news, finding interesting reads, and resources. But most of all, #w2p really taught me what a great community can be built through twitter. It’s been a rare experience for me to simply show up and be so welcomed into a group of veritable strangers. Being a little nervous about going by myself to my first #w2p meet, I was encouraged by many #w2p members most memorably by @spydergrrl (for various reasons includind the fact that she was a co-host). At the meet, I ended up chatting mostly with @mhellstern who introduced me to lots of other people. It was great.
The proof that #w2p can just suck you in (in a good way) is how involved I got. After only two meetups, I ended up co-hosting last night’s meet up. Thanks to @macjudith and her discussions with a friend, the meetup’s theme was to meet the (bridgeable) students of #w2p and I cohosted with @mhellstern (I didn’t even know she was a bridgeable student!). Each student/recent graduate got a couple of minutes to introduce themselves and “sell” themselves just a little bit. We had a great turn out, plus as always, great conversations and stories. I got to finally put a few more faces to twitter nicknames, especially the ones from my department! Not least of all, it meant I got to add another userbox to my GCPEDIA user page (see the fun?).
I will definitely miss #w2p, because unless I get a position in the area in the future… well, it’s not unknown that getting a group together like this outside the NCR can be difficult since this is where most GC staff work and where a lot of this type of work is done since this is where all the “headquarters” are located.
Sense of Contribution, Engagement, Belonging, and Community
I’ve frequently heard people on contract talk about how they don’t in any way feel connected to their department, or the government, especially as a student when you may conceivably never return in, but I didn’t get that feeling thanks to joining #w2p and other groups. There are of course so many different ways to get involved and to find out what’s going on in the GC world, and these are but a few examples, so I encourage GC staff, especially students to get involved; it doesn’t matter that it’s only for a short time, and newbies are welcomed!
[Update August 20, 2014] – I’ve not actually tried this, but found a new article on How to Create RSS Feeds in Twitter using Google Script.
[Update March 4, 2013] – As of March 5th, 2013, twitter will no longer support unauthenticated feeds of any kind and will be dropping support for RSS altogether (meaning you can only get JSON feeds). Therefore, you will need to make your own (see comments for one suggestion) or use an app to follow specific feeds. Continue reading “Creating a Twitter Search, Hashtag, User, Favorites or List RSS Feed”
Seems like after the upgrade, a number of people have had the problem with randomly missing categories from the admin panel/dashboard. One of mine randomly disappeared after I renamed it. It was no longer a subcategory and was at the bottom of the list (not alphabetical order) on all the filters etc., but refused to show up in the Categories admin panel.
So, I found lots of information on how to fix missing categories for wordpress, but not when you use wordpress.com. After much searching, I found one solution on wordpress.com forums (second last reply), but what’s missing is the key element on how to figure out your category ID. Easiest way then:
Find your category ID by looking at the source code for your blog. You will see for example, <li class=”cat-item cat-item-78954″>. The number is your category ID. Similar thing for tags.
You must have categories (or tags) displayed on your blog for this to work obviously.
Summon is Serials Solutions’ web scale discovery tool. I think so far, it looks pretty good. It has all the things you’d want these days in your searches including:
- sidebar with different options to refine search
- clean, easy to use interface
- save citations to folder and export
- advance search, including ISBN for books
Currently, all records in the catalogue, institutional repository, and journal articles have been included. There’s also a locations refinement category to refine to a specific branch for catalogue materials.
It’ll be interesting to see what our users (including staff) think.
Quick Edit/Add-on: Seems like the major criticism I’ve heard is that it does not do known-item (that is you know what you’re looking for) searches well, but as my supervisor has explained, that’s not the purpose of a discovery tool. If you want to looking for something you know in a library, you use the source that will help you look for that. Some people might say “but look at google, it can do both well”, but even google scholar is unlikely to give you a book if you only enter a couple of words when you’re looking for a book (obviously that’s not true in all cases).
EDIT: I’ve been reminded/informed that this only works in Windows (or MS-DOS anyway) since it uses .bat files. The suggestion if you’re using other OSs is to use php (but really you can use anything) to automate the command.
I’m sure everyone is familiar with Adobe Acrobat (even if they haven’t actually used it). It’s a nice GUI if you want to edit PDFs, but at least as far as I know, it does not do any batch or automation work. For a digital images project, there’s a lot of automation work that needs to be done and for image to image conversion, I was using Photoshop, but then I started dealing with PDFs. Thus, it was only natural to turn to GhostScript.
PDF to Image
So, I don’t really get any credit for this, because it’s already out there and the variables are well explained. So if you want to turn all the pages of your PDF into images, check out this Danzels Internets post. My case was a little different because I only wanted the first page turned into an image as a thumbnail for an entire file and then for an entire folder. I also prefer to do any image modification (even batch) in an image program.
FOR %%Z IN (*.pdf) DO gswin32 -sDEVICE=jpeg -dJPEGQ=95 -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -dDOINTERPOLATE -dFirstPage=1 -dLastPage=1 -sOUTPUTFILE=%%Z.jpg -dSAFER -dBATCH -dNOPAUSE %%Z
So, here the major changes are “gswin32” because I use the Windows version, and the “-dFirstPage=1 -dLastPage=1” so that the first and last page it processes is page 1. You can change the output file name too, so I changed it in such a way that it takes the original file name and adds the .jpg extension.
This is kind of a side note, because I didn’t need this for my project, but I recently downloaded some articles that for some reason had each section in a separate PDF. So, I get no credit for this one either as I got this one from Real’s How-to on Merging PDFs. I put this in here only for possible improvements of what’s presented on that site.
For the merging of PDFs in a directory, for the [merge.bat], you’re supposed to have this code:
gswin32 -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=merged.pdf -dBATCH 1.pdf
FOR %%Z IN (*.pdf) DO IF NOT %%Z==1.pdf IF NOT %%Z==merged.pdf IF NOT %%Z==merged2.pdf call merge2.bat %%Z
Maybe it’s clear to other people, but the “1.pdf” is the name of the first pdf. I found that the subsequent ones will be added in alphanumeric order. Also, if you happen not to change the code, it will throw an error and insert a blank page at the beginning.
I don’t normally post news items, but I was really excited to hear about the new version of evergreen (here’s the list of new features). I have been taking a library automation course, so I have been learning more about ILS, particularly OpenSource (OS) ones. I didn’t know how many OS systems were available, so I was interested in reading and hearing more. I was a bit disappointed when I heard there was no OS ILS suitable for large libraries, but even if the new version of Evergreen doesn’t quite meet those needs, I’m happy to hear that it’s moving in that direction.
So, some people may ask, why are you trying to convert PDF to Wiki? PDF is usually the last step in the process, so just use the original document. My response would naturally be, what if you don’t have the original document?
A Two-Step Process
Through my searching and reading on the topic, it seems there is no PDF2Wiki Converter. Every site that I have read explains converting the PDF to one of: DOC, RTF, HTML, XML first then to wiki format.
I tried a number of PDF to HTML programs, but none of them worked to my satisfaction. Most of them only converted simple formatting, such as bold and italics. Adobe has an online conversion tool. It’s better than some of the others I’ve tried as it interprets lists and such. The resulting code is rather ugly and a lot of the code would need to be stripped before using a HTML to Wiki converter. See my previous post on HTML2Wiki for a couple of tools on tidying or stripping HTML code.
I found that a much better alternative was converting the PDF to a DOC/RTF file since it’s a lot simpler and some formatting might be lost, but you won’t have a lot of needless code that might mess up your wiki page. There are a lot of online tools that provide a PDF to DOC/RTF service, however, again, they only tend to do basic formatting. Adobe Acrobat does a really good job, because it will change lists into formatted lists (instead of normal text). The major downside of course is that Acrobat is a paid program though there is a 30-day trial.
I had a lot of problems in particular with PDF to HTML, so I thought PDF to DOC/RTF is simply. Honestly though, unless you have a PDF file which is really long and has a lot of simple formatting (bold, italics, etc.), if you cannot get your hands on Acrobat, then I suggest simply copy/paste (or alternatively save as a text file) and manually formatting it in the wiki’s editing box. Of course this depends on the wiki you’re using because ones that don’t have a toolbar to help you quickly format might be a bit of a pain. Someone please let me know if you have found a better method!
So to continue on ways to convert existing documents to wiki code, next is formatted text documents, which is typically word DOC files, but may also be something like RTF files.
Most sites I found actually just instructed people to use a 2 step conversion. From Word to HTML and then to wiki code. While this may work, it’s much less efficient and I can imagine more things are lost in the process. Admittedly, the converters that I have found are all geared towards MediaWiki, so if you’re using a different wiki then these converters may not work so well. Nevertheless, MediaWiki provides a list of Word to Wiki converters the most basic of which does not seem to be specifically geared to MediaWiki.
OpenOffice Sun Wiki Publisher Plugin (MAC and Windows compatible, not sure about other platforms)
(the wiki converter is built-in, the publishing part of it is optional)
The downside of OpenOffice is that it does not always interpret word documents very well. Embedded images tend to turn into hex code (ex. ffd8ffe000104a46494600010201 etc.) and tables aren’t always interpreted correctly either. The one I tried turned into overlapping text. So, in part, the usefulness of the outputted wiki code will depend on how well OpenOffice has read the word DOC itself, but it should handle ODT and RTF just fine.
Word2MediaWikiPlus Macro (Windows Only)
Word is the better choice for documents that OpenOffice can’t seem to handle very well. There is also a Word2MediaWiki Macro which is easier to use, but does not convert tables or deal with images very well.
For the OpenOffice plugin, ‘special characters’ (used loosely here) sometimes turn into weird symbols or random special characters. As with the HTML converters from the last post, something like ’ (not straight apostrophe) gets changed into ‚Äô, or a bullet point (which isn’t recognized to be in a bulleted list) turns into ‚Ä¢.
The Word2MediaWikiPlus (W2MWP) converter is better at dealing with special characters. The macro will simply insert the character as is and at times put a nowiki tag around it, but regardless, it displays just fine.
For some reason, the W2MWP plugin turns text boxes into a single cell table and then repeats the same text again as regular text (not inside a table). The OpenOffice plugin strips the text of formatting and leaves it as regular text in the wiki output.
When tables are interpreted correctly, I think the OpenOffice plugin does a better job overall. The W2MWP macro is better at keeping formatting, such as colours and border style (below right), but OpenOffice one seems to interpret things inside a table better, such as type of lists (below left). (It’s supposed to be a bulleted list, not a numbered list.)
Needs Good Original Document Formatting
In both cases, the usefulness of the wiki code will depend on how well the original document was formatted. For example, in one of the documents I tested, a number of the number and bullet lists were not formatted as such, but instead, numbers and bullets were just manually added. In both plugins, they were considered to be regular text with a ‘special’ character or number at the beginning of it.
Whether the Word2Wiki or the OpenOffice plugin is better depends on your priorities. OpenOffice seems to interpret lists and text boxes better, and doing a replace all for characters that weren’t interpreted properly is a pretty quick step. W2MWP is better at keeping formatting and interpreting all characters. So, if you like the way your document looks and you want to keep it that way, use the W2MWP macro. The big downside of course is that it doesn’t work on MACs (which I’m using right now, yay for VMware). Nevertheless, my conclusion is that the DOC2Wiki Converters are useful, but may not be the optimal solution depending on how much you’re willing to install and play around with. And if the document isn’t formatted like it should be, then manual wiki formatting might be the way to go.
So, for the past little while on and off, I’ve been looking for and playing around with HTML to Wiki Converters to see which one works best. Most of the ones I’ve found are online and most of them seem to be based on a Perl script created by David Iberri, who provides a web interface as well.
David Iberri has provided a running web interface version for his script for a lot of different wiki dialects. However, I’ve only tested the MediaWiki version for the purposes of my project. I really like the “Fetch from URL” feature which is not available on many others.
Interestingly, I found what looks to be the exact same converter on another site, but it gives me slightly different results. (see below)
Seapine’s HTML to Wiki
The one is really good for basic things and even though it does not have a “Fetch from URL” feature, you can easily copy/paste. However, this converter frequently broke for me when dealing with whole pages because it seemed to stop working when it faced something that it didn’t quite recognize.
Batch/Site HTML to MediaWiki converter
I have not actually tried this one, but I thought it might be a useful resource for later and for other people. This uses the same Perl script in combination with MediaWiki’s PHP importing scripts.
Comparison between HTML2Wiki and the berliOS version
Neither deals with ’ (the non-straight apostrophe) very well for some reason, and I’m guessing it will have problems with some other characters as well. Currently, both give a � in place. However, if it’s always the same character in your wiki document, it’s easy enough to do a replace all.
Both seem to handle tables quite well and one as well as the other, though sometimes the Iberri one seems to forget to put the first line of the table code on a new line, which of course, means the table fails to work.
I would say that overall I like the berliOS version better for links because it can recognize anchor links, whereas the Iberri one will display text. For example (berliOS):
[#reserve Finding Articles on Course Reserve].
The Iberri one does a better job at “oh my god i don’t understand this” by simply stripping the HTML and leaving text. The berliOS one will try to interpret it and end up with odd things at times. However, I think it’s pretty understandable that it doesn’t handle mouse over boxes very well especially when the original script to do that is CSS and not a part of the HTML tag. For example (berliOS):
You CAN find hundreds of thousands of articles through the UBC Library Web. more »
UBC Library subscribes to tens of thousands of magazines, journals and newspapers, in print and in full text online. The UBC Library Catalogue DOES NOT list individual articles by topic. more »
To search for articles by topic, you need to start your search in an index or database. (Instructions follow.) Like the catalogues of most libraries in the world, UBC Library�s catalogue does not contain a listing for each article in each journal in its collection. Search engines like Google DO NOT retrieve most academic articles. But… more »
”’Google Scholar (Beta)”’ has begun to reach some academic journals and online archives, but for now, Indexes and Databases are the most complete searchable lists of articles.
Most academic and publicly-funded researchers publish the results of their research in scholarly journals or in online archives, which search engines don�t reach. Most popular magazines do not provide their content for free on the Web. Newspaper articles have a different search guide (right here).
So overall, I like the berliOS one better because it recognizes more elements, but it’s easier to screw things up with it. So I would say the Iberri one is easier to use since it generally just strips what it doesn’t understand.
On a related footnote-sort note, after converting to wiki code, if there is a lot of HTML code left that seems to be messing up the wiki page, you can try stripping or ‘tidying’ the HTML code. HTML Tidy tries to make the HTML conform to current HTML standards, but depending on how the page is done, it might start creating CSS which obviously wiki pages don’t understand, so the strip HTML function may work better.
Zubrag’s Strip HTML online tool