Sep 14, 2011
Seems like the Omeka forums get a lot of traffic from people looking to migrate from ContentDM to Omeka. I, personally, get inquiries about this all the time (for some unknown reason). So I figured I may as well share what I know about the process here so I can just send a link or you can find it on Google or whatever.
It’s worth noting at the outset that I don’t know anything about generating ContentDM export files
(partially because even the ContentDM documentation is proprietary, or at least hidden behind a login). But I do know that every time someone has sent me a cDM export file, it is in tab-delimited format (UPDATE: here is the Tab-delimited export documentation), which is basically a plain text spreadsheet. I gather these spreadsheets can be produced pretty easily so I think we can start by assuming you already have the spreadsheet and need to prep it for import into Omeka, which will be done in this tutorial using the CSV Import plugin. If you are not already familiar with how that plugin works, check out the documentation page at Omeka.org before continuing.
You probably need to use Excel
First, open the tab-delimited spreadsheet in Microsoft Excel. You will later save this file in CSV format. If you prefer using non-MS spreadsheet software, you are probably out of luck. As far as I know, neither OpenOffice/LibreOffice nor Apple’s Numbers have an equivalent ‘Text-to-Columns…’ functionality, which will be used in this process.
Remove unwanted and problematic rows/columns, Rename column headers
Once you have your spreadsheet open in Excel, go ahead and remove any ContentDM-specific administrative metadata, or anything else you don’t wish to carry over to Omeka during the migration. At this point, you should probably rename the column headings to something meaningful. This will help with the crosswalk step later.
Breaking out semicolon-delimited values
Next, you will want to check for columns having multiple entries within a single cell. At the very least, this will probably include the Subjects column (because archivists/librarians are never satisfied with assigning just one subject term… subject classification being the Lay’s potato chip of librarianship). By default, these multiple subjects will be separated by a semicolon (e.g. “Librarianship — Potato Chip Analogies; Librarianship — Puns; Librarianship — Personality Disorders;” ). Instead of copying each one of these entries into a new Subjects column, you can just use the “Data > Text to Columns…” feature in Excel. I recommend using a separate worksheet for this step as the new columns will overwrite your existing ones if you are not careful.
Getting the File Path URLs
So you’ve shaped up all the metadata on the spreadsheet. Now you need to define the path to the item file in ContentDM. This is probably the trickiest part to come up with on your own, especially if you are not so familiar with ContentDM. Basically, our starting point will be the cDM “Reference URL.” Those look something like this: http://images.ulib.csuohio.edu/u?/press,59. Assuming you have a whole column of Reference URLs, you need to run a Find and Replace to create your file path (again, I recommend doing this in a separate worksheet so you don’t accidentally overwrite important data). Let’s begin.
UPDATE: in ContentDM version 6+, ShowFile is replaced by GetFile. Adjust the following instructions as needed (i.e. in the first Find and Replace, swap out showfile.exe with getfile.exe).
Start with something like:
… and REPLACE with this:
Next, FIND this:
…and REPLACE with:
So now we have something that looks like:
This is a working file path that can be used by the CSV Import plugin to ingest the item file along with the metadata record.
In some instances, you might need to tweak this process. For example, if your ContentDM installation includes JP2 or TIF files (or some other unfriendly image format) but you don’t want the hassle of building a custom display wrapper into your Omeka theme, you can append some additional query string parameters to your file URL.
So if you want ContentDM to serve up a JPG instead of a JP2 (or other…) file, add this to your file column
…using this Excel function (where A2 is the first column/cell in need of appending):
Finally, you need to swap showfile with getimage in the file URL above by running one more find and replace in Excel.
So now, your file path looks like:
This will return a JPG file, which is pretty handy.
These additional parameters will vary by installation and file type. I don’t know what all of the parameters are or even what each one does; only that this usually works. Again, this is a case where actual ContentDM documentation would be really handy. UPDATE: Keep in mind that this only works with image file types. For more details, check out the GetImage documentation.
It’s usually a good idea to plan out your metadata crosswalk in advance, especially if you have multiple export files (and you should if your collection is bigger than a few hundred items; more on that later). Remember that Omeka – out of the box – only uses the first 15 Dublin Core elements. You may need to add a new Item Type or install Dublin Core Extended in order to find/create an appropriate home for your legacy/custom metadata in Omeka.
To avoid server timeouts, you should consider breaking your spreadsheets into manageable batches. I try not to import more than a few hundred items at a time, and even then one of the two servers involved is likely to timeout or throw an error or something. Keeping the batches small makes it easier to isolate problems, avoid import errors, and undo problematic imports.
Using the CSV Import Plugin
From here, just follow the standard instructions for using the CSV Import plugin.
Bugs, Known Issues, and Limitations
As of version 1.3, there are still some quirks. For example, your file path – that hideous long URL you worked so hard to create – will become the actual name of your imported/migrated file. In some instances, your files may be ingested sans file extension (e.g. .pdf, .jpg, .mp3), which can cause various headaches (though it’s worth noting that these files will generally display inline on your site, due to the way most Omeka themes handle media files, and will only break down when someone tries to download the file, in which case they would need to manually add the file extension). From time to time, you could have an import that hangs indefinitely, never finishing and never failing — and thus not easily “undo-able” (at least, the “Undo Import” button will not be visible). In such a case, you can manually create that button by entering the following URL pattern into your address bar:
http://[PATH TO YOUR OMEKA INSTALLATION]/admin/csv-import/index/undo-import/id/[IMPORT ID]
– this is on the plugin documentation page by the way, as are several other points in this tutorial.
One of the biggest limitations of the CSV import strategy is that you will probably have issues migrating compound objects and other multi-file items, primarily because of the way ContentDM formats the export file and serves compound objects online and partially due to limitations in the way the plugin works with Omeka. Basically, you need all the files for an item to be in the same row as all of the other item-level metadata (e.g. in columns like “File 1,” “File 2,” “File 3,” etc). And there is currently no way to use the CSV Import plugin to assign file-level metadata. For example, if you had a postcard in your ContentDM collection and it had distinct metadata for each side (say, for front.jpg and verso.jpg), along with general metadata for the object as whole, something is going to be lost in the migration without some serious elbow grease.
The Omeka Dev forums are the best place to report bugs, inquire about error messages, discuss workarounds, and submit patches. The general Omeka Forums are also great for more basic questions; happily, most questions get answered in fairly short order. Please do not post support questions here. Please do, however, feel free to leave general comments, suggestions for improvement, requests for clarification, etc.
IMAGE NOTE: poorly Photoshopped post image contains assets by multiple artists and designers, including the amazing “Bob” sketch from Matt Haley’s unreleased but totally awesome sounding Twin Peaks: Season 3 graphic novel.