Semi-Automating Batch Editing MARC Records : Using MarcEdit

This presentation was a lightning talk done at Code4LibBC Unconference 2015 on batch editing MARC records.

I’ve been hearing for quite some time that people struggle with vendor records, not least of all because making changes can be very time consuming. I’d like to present one possible method to help not only to fix vendor records, but to do it in a semi-automated way.

What do you want to do?

Let’s say you have a set of vendor records. First, what do you want to do with them?

Choose Your Tool

For the purpose of this presentation, MarcEdit is the tool of choice since it’s commonly used and does not require knowledge of a programming language. Nevertheless, I’ll be using MarcEdit as an example. There’s no reason why you can’t use what I’ll be talking about with other tool. While I haven’t tried using them myself, other recommended tools include:
* PyMARC (Python)
* MARC::RECORD (Perl)

These will probably be more extensible and would even allow proper automation, but may also have a steeper learning curve.

Operations

Before we can decide, let’s figure out what operations are available to us. We can

  • add (sub)fields
  • delete (sub)fields
  • swap (sub)fields
  • reorder (sub)fields
  • find/replace (sub)fields data
  • edit indicators

All of these operations can also be done using a selection criteria.

Deciding Tasks

Let’s take a sample MARC record of a print book.

Example:
[code]
=LDR 01249cam a22003857i 4500
=001 39952
=005 20150925220242.0
=008 150501t20152015oncb\\\\\\000\1\eng\
=010 \$a 2014455313
=020 \$a9781552453056 (pbk.)
=035 \$a(OCoLC)ocn897352758
=050 00$aPR9199.3.A365$bF53 2015
=082 04$aC813/.54$223
=100 1\$aAlexis, André,$d1957-$eauthor.
=245 10$aFifteen dogs :$ban apologue /$cAndré Alexis.
=250 \$aFirst edition.
=264 \1$aToronto :$bCoach House Books,$c2015
=300 \$a171 pages :$bmaps ;$c21 cm
=650 \0$aDogs$vFiction$aAnimal intelligence$vFiction$aAllegories.
=650 \0$aIntellect$vFiction.
=650 \0$aConsciousness in animals$vFiction.
=901 \$a39952$b$c39952$tbiblio
=906 \$a7$bcbc$ccopycat$d3$encip$f20$gy-gencatlg
=925 0\$aacquire$b1 shelf copy$xpolicy default
=955 \$bhc08 2015-05-01 z-processor$ihc08 2015-05-05 to BCCD
[/code]

We might want to delete all the local fields, for example, 901, 906, 025, 955.

Another task might be to move information from one field to another. For example, we want to move ISBN from 020a to 534z. Using the swap function with the add to existing/create new checked, we can move each of the 020a into the same line of 534.

One more example. This example record erroneously has multiple subject headers on one line, so we need to separate them. Using the “Edit Field Data Function”, we can:

Find: (\$a[^$]*)
Replace: $+/r
“Use Regular Expression” option checked
to move each subject header into its own line.

For more information, see the Regular Expression Recursive Replacement in MarcEdit post.

Of course, these are just some examples, what needs to be done will depend on the set of records you regularly work with.

Separating Tasks

Once you’ve decided on your tasks, you will want to separate them into multiple categories.

For example, deleting local fields is something you would always do, moving the ISBN information is only something you would do for audio records, and separating subject headers is only for a specific vendor. While you would always want to fix erroneously created subject headers, since you only tend to have these errors from specific sources, I suggest only applying these changes when necessary.

Automating Your Tasks

Before going ahead with this part of the process, you will want to test each task you have decided on.

Once you’ve done that, in MarcEdit, all you have to do is create an automated task and add individual tasks to it.

You’ll want to make sure they are in order since it will process them from top to bottom.

Finally, test your automated tasks on your record sets or on some example records to make sure that everything worked properly.

Everytime you run an automated task, it will provide you with a report of what it’s done, though I have personally never found the report particularly useful. Instead, I tend to spot check that it seems to be working as expected.

Take Away

I know that what I’m doing will not fit everyone’s situation, but hopefully it gives you ideas on how to make your record processing more efficient.

It’s not a perfect process and your records won’t be perfect, but the main idea is to save as much time as possible while making records meet your minimum standards.

I would love to hear what other people are doing and if they have ideas on how to do this sort of thing better. I believe we also have a breakout scheduled this afternoon for those interested in discussing this topic further.

Thank you.