Archive for the ‘Curator’s Workbench’ tag
I’d like to extend a special thanks to those that helped us diagnose and resolve several issues on the Mac OS. This point release addresses the problems we had launching the software via the “workbench.app” file. It will no longer be necessary to use the separate startup script. It also addresses a problem related to File Dialog windows. In certain cases these interactions with the Mac file picker resulted in an immediate software crash.
As it turns out both issues were really addressed upstream, in the Eclipse project. I was able to pull updated versions of the Eclipse plugins and rebuild the workbench with the fixes. At this point I’ll tip my hat to the Eclipse Project folks. It is wonderful to be part of such a large and vibrant developer community.
These fixes are in the current stable release, available at the usual download site:
We recently released an updated version of the Creator’s Workbench with a number of significant additions to functionality (available for download here). The new version includes the ability to reuse crosswalks, create data dictionaries, and create mapped metadata ingest forms, among other things. Below is a summary of the major changes.
In previous releases each crosswalk was a separate effort, involving deep knowledge of the MODS schema and the user supplied metadata. Now crosswalks can be copied between projects and used again. They can also take advantage of common MODS mappings from a shared data dictionary. This means that not everyone creating crosswalks needs to be a MODS expert. It also makes building crosswalks far less time consuming.
Dictionaries streamline the process of mapping custom metadata to objects in crosswalks and forms. Dictionaries can conveniently package up the most common mappings and patterns for MODS elements for a set of users, allowing users and groups to share and reuse those standard patterns without having to build complex crosswalks for each project. Dictionaries can be designed for blocks of metadata and for the crosswalk connections that are used to create them. Dictionaries include labels and descriptive text to guide their use. They can be stored on network drives and shared by teams.
Deposit Form Designer
The forms feature allows the creation of web deposit forms suitable for a particular content stream. The forms use dictionary and crosswalk mapping components to map the input fields to the MODS schema or dictionary elements. Form designs also include explanatory text and designation of required fields. The forms work in tandem with a server-side form-hosting application, which can be configured to put uploads and MODS records into a folder or to deposit them into the repository via SWORD. The forms feature simplifies the creation of deposit forms, shifting form design from software developers to curators, who have greater familiarity with both the depositor community and with descriptive standards.
Originals and Drives
We’ve rebuilt the originals part of the workbench to better track drives and maintain the connection between groups of originals and original media. This means we have a clearer interface and better support for removable drives.
When you highlight an image file, either in the originals tree or the arrangement, you can see a preview of the image in the workbench. This saves a lot of time for those working with image collections. The preview window can be arranged and sized according to your workflow needs.
We added export support for a variety of identified work flow needs. A BagIt ZIP export function delivers project-based packages meeting this widely used standard. The project arrangement can now also be exported as a comma-separated file. This was implemented to support a local EAD authoring scenario, but may be useful to others. You can also now export entire projects to a ZIP file or a shared drive. This was done to support multi-user work flows and projects with a significant wait between the capture step and arrangement or description. Individual files, such as crosswalks and data dictionaries, can also be exported and imported across project and workbench installations.
A screencast demonstration of the Curator’s Workbench software tool is now available on our GitHub wiki. The demo takes you through a sample project, staging and capturing targeted folders, creating a MODS crosswalk with tabular metadata, and exporting a submission METS file in XML. You can also follow this link directly to YouTube.
The Curator’s Workbench Guide v2 expands on the existing documentation and provides more specific details on setting a staging area, creating and matching metadata crosswalks, and wrapping up projects.
The updated guide is available on the download page.
It was fun to make and I hope that will also make it fun to present. Just finished the first draft of a Prezi presentation on the Curator’s Workbench. If you wish to pan and zoom in a world of ideas, follow the link.
I’m pleased to announce that workbench source code is now hosted at github.com. I’ll be added more licensing and build information soon (Apache 2). This is my first git-hosted project, so I am still learning the ropes. However, I hope that git will facilitate community development on the project, especially of repository or discipline-specific plugins.
The project git page is here:
Before this can be very useful I’ll need to add some more developer documentation. For now I’ll just mention that the build is orchestrated by Maven 3 and the Tycho plugin. This mean that even though the project uses the Eclipse framework, it can be build on the maven command line and in continuous integration environments. A continuous integration server is in the works and setting it up will help me diagnose any lingering build issues in the trunk. Also coming soon are nightly snapshot and stable builds, which I’ll link to on the download page.
The workbench is designed to update itself and any plugins via update sites. This means that the workbench will detect when newer versions are available and prompt the user for download/install. The primary update site and the workbench menu options to support updates are in the works.
If there are any questions about workbench code or functions, please don’t hesitate to post a comment.
We prepared a poster for International Digital Curation Conference (IDCC 2010) on the Curator’s Workbench. 60cm by 80cm is not a lot of room, but we did our best. For more information, come find Erin O’Meara at the poster session. You might also talk to Cal Lee or Helen Tibbo, both on our CDR steering committee.
Here is the poster in PDF.
I am proud to announce this new desktop tool, which is definitely the coolest software I’ve worked on this year. It solves several problems we faced in submission work flow and we hope it can dramatically speed up processing for large collections with custom metadata. The features break down into three vaguely overlapping categories, those being capture, rearrangement and description.
Here are some screenshots of the interface:
This screenshot shows the project tree to the left and a MODS editor on the right. The user is editing the MODS elements for a single folder called “TUCASI”. The attributes of the selected MODS name element are editable in the properties view in the lower right quadrant.
The most novel feature and the one I most want to highlight is batch metadata crosswalks. The screenshot above shows a crosswalk editor, which consists of a canvas and a palette of widgets. The end user can construct a pretty sophisticated mapping of custom metadata to MODS by “visual programming”. By dropping widgets on the canvas and linking them together, they define how a field becomes an element. Presently the editor only supports tab-separated metadata sources, but as time allows we plan to extend the feature to support any delimited file and XML sources.
Whenever a crosswalk definition is saved, it is used to generate or regenerate a set of MODS records. These MODS records can be automatically associated with files and folders through a matcher widget on the canvas, which works as long as you have file and folder names in your custom metadata. Otherwise you can drag and drop a MODS records onto the appropriate item in the arrangement.
This visual programming and automation of crosswalks saves a lot of valuable time on the part of curators and programmers, who would otherwise be engaged to create custom scripts for each new custom metadata format. Since we are collecting data from disparate parts of the university, each collection may come with a unique descriptive metadata format, often manually created spreadsheets or discipline-specific XML. It’s just not resource efficient to create custom scripts for most incoming collections. The crosswalk feature lets us migrate literally thousands of descriptive records at a time and link them to data objects without new software development.
The last feature to mention today is staging of files. I designed the workbench to process large numbers of files and folders in one submission. However repository ingest happens via a web interface, which is not the most reliable way of transmitting thousands of large files let alone a SIP containing such numbers. So we needed to stage files in advance. The diagram above shows how data flows from incoming data to staging, archival and access storage. Individual users have accounts in a staging area within our iRODS grid. Files placed there by the workbench are readable by Fedora at ingest time, when they are copied into archival storage.
This approach comes with several advantages:
- There are no data transmission failures at submission time
- The transmission of files to staging can be incremental, controlled and “paranoid” with a checksum comparison
- The workbench can inform users of staging issues as they arise, so they can be addressed before submission.
- Files are staged in the background while you work on arrangement and description
- There are efficiencies to be gained at ingest time, when copying from a staging grid location to an archival grid location.
Some Notes on the Software Technology
The workbench is built upon a considerable pile of open source code and standards, including the following:
- Eclipse Rich Client Platform (RCP)
- Eclipse Modeling Framework (EMF) and Graphical Modeling Framework (GMF)
- METS XML for project definition files and submission files
- MODS XML
- iRODS jargon client libraries
The Eclipse RCP is extensible via the OSGi framework. This means that parts of the tool can be made modular and/or mashable to better fit non-UNC environments. This will require some refactoring that we need to do anyway, but most of it is already there with OSGi.
One module that I’d like to see is a way to integrate Google Refine into workflows. This seems like a natural fit for cleaning up custom metadata and normalizing various sources before crosswalks are applied.
Another modular area would be export for submission. The current implementation transforms our internal METS project definition into a submission METS for ingest into the CDR. Needless to say, this submission METS is in a CDR-specific profile. So a natural extension point would be to support other export modules for other repositories.
The BETA software is available for download, experimentation and use. We cannot provide any support, but we do welcome your comments here or contact us directly. Oh yeah, you may only download and use the software at your own risk. See our download page.