Digitizing video for preservation and access (Cindy McLellan)
The City of Vancouver is an exciting city to be a part of at the moment for many reasons. Celebrate Vancouver 125 events have been taking place throughout 2011 all over the city to celebrate 125 years since Vancouver’s incorporation in 1886. The City is involved with the open government movement, with Vancouver’s open data catalogue, now over one year old being one open information source. Setting the goal of being the greenest city by 2020, Vancouver is asking its citizens to contribute their ideas and make lasting changes.
From my position at the City of Vancouver Archives I mention these broad City initiatives because they have important commonalities: collaboration, innovation, and interdisciplinary participation. Working as a Digital Archivist I see all three of these characteristics as key to any successful digital archives project. At the Archives we are involved in ICA-Atom and Archivematica development, both collaborative and innovative projects that require input from multiple disciplines. Multiple memory institutions, archivists, programmers, designers and researchers have all contributed unique and necessary input to these open source projects.
Hybrids: Approach with caution and other smart people
I was hired in March to arrange and describe the Yaletown Productions Inc. fonds, which consists of the records of a local film production company. The records date from company director Michael Collier’s university days in 1969, as a member of the Simon Fraser University Film Workshop, under Stan Fox, to 2001 television show proposals stored on a computer hard drive. In addition to having both paper and digital records (which archivists call a “hybrid” fonds), this donation includes hundreds of hours of film and video material consisting of raw footage, completed productions, and everything in between. It has been my challenge to appraise, digitize, preserve, arrange, describe, and make available to the public these materials in all their various formats. Much of this work involved the straightforward application of archival theory and practice.
In the detailed planning required for digital preservation, no one should labour alone. The atmosphere among those with this responsibility is one of sharing. Staying in touch through wikis, blogs, our time together at conferences, listservs, and simply being a small and active community, digital archivists need not, and should not, work alone. The Digital team here at the Archives meets once a week to discuss issues and work out problems. At yesterday’s meeting, we discussed digital video preservation metadata that Courtney Mumma drew up based off of existing image and audio metadata that digital archivist Glenn Dingwall wrote and specifications from a recent project at the Indiana University Archives. At this meeting we also discussed several logistical issues such as where the 2.2 terabytes of Yaletown’s Matroska files would be stored, and how this location will be reflected in the item-level descriptions.
The digitization specifications for the video materials were arrived at through collaboration with Dave Rice at Audiovisual Preservation Solutions. Proprietary software imposes many barriers to long term digital preservation. Preserving video in an open source environment is a challenging commitment. Much of the archival community specializing in audiovisual media preservation is using jpeg2000 encoding in MXF container for video preservation. Unfortunately, the accompanying open source software with which it can be viewed and manipulated does not yet exist. Until open source tools are available, the Archives has chosen to use lossless ffv1 encoding with a Matroska container as its preservation format (the file extension is MKV). The advantage to this choice is that we have a lossless preservation bitstream in an open source container. Ffv1 provides lossless video encoding of YUV and RGB video at multiple bitrates, may be played back in a variety of free media players.
A selection of highlights from the Yaletown Productions Inc. fonds will be screened at Vancity Theatre on Sunday, November 6th as a Celebrate Vancouver 125 event. The show will be curated by Michael Collier and compiled on a Digibeta tape. The company creating the Digibeta tape for us will be able to work faster with a compressed format, for this reason we are converting the chosen clips from FFV1 to ProRes 422.
Looking at my calendar for Thursday October 6th, the Day of Digital Archives, I will be meeting with the other Yaletown project Celebrate Vancouver 125 event organizers to give City Archivist Leslie Mobbs an update on the planning so far, and the date we will be able to show him a rough cut of the show. The rest of my day will be filled with creating item-level descriptions of some digitized audio material, setting up three hard drives of digitized video content to be copied to storage, and aiding Courtney Mumma in monitoring the #digitalArchivesDay hashtag and responding on Twitter with our @VanArchives account.
All in the course of a digi-day – Tuesday (Courtney Mumma)
Like Cindy, my digital archives work has been based around one group of amassed records, the records of the organizing committee for the 2010 Winter Games. I work hard every day to process over 25 terabytes of digital video, photographs, textual documents and web harvested material alongside over 200 boxes of architectural drawings, plans, maps and other business records. I’m constantly focused on keeping the archives entrusted to us reliable, authentic and trustworthy over time.
The project is too big to cover everything my job involves in one post, so I’ll attempt to communicate what one day in this job entails. This Tuesday, I started the day by reviewing what kinds of information we should collect about the process when we capture digital video (created with the DV codec) by attaching a donated DVCAM player via FireWire to our open source DV Migrator system. The system operates on the Xubuntu 11.04 Linux platform and Dave Rice, whom Cindy mentioned above, is contributing DV Analyzer software and some customized code to help us capture the video from tapes to bits on our machines. The digital files will be ultimately be easier to preserve, manipulate and access.
The information we collect about the process is important so that we can assure researchers that the digital video we preserve and provide access to is as close as possible to the original.
Testing DV capture involves a lot of trial and error. The process we’re developing involves a variety of open source software, all of which requires practice and needs frequent updating. Over the course of the digital archives development project, I’ve learned more Linux command line language than I ever thought I’d know, all of it towards customizing a system to fulfill the unique needs of protecting and preserving archival records.
Mo’ systems, mo’ problems
Another part of my day on Tuesday was describing the analogue part of the Games records using our trial implementation of ICA-AtoM, which is replacing our current access system in early 2012. With all of the analogue records described in ICA-AtoM, I’ll be better able to attach the digital part of the fonds once it’s been processed in Archivematica. The Archivematica processing will take place largely over November and December this year, so I’m trying to get all of the analogue descriptions completed so that I can attach the descriptions of the digital records, along with their access copies (including photos and video!) to the appropriate series or file.
Glenn Dingwall, another digital archivist on staff, is charting which fields in the new system correspond to the old one so that our users will understand what is in our holdings at least as well as they already do (although we hope and expect that the user experience will be far superior). When I run into problems with ICA-AtoM, I report to a list of users that includes its developers. Since it’s open source software, if my problem is shared with other users or if it’s a bug, the developers act quickly to plan for a fix.
On Tuesday, I looked at a City website because its domain was about to expire and I had to assess whether it had any archival value. Since that site was on the City’s network, I just reviewed the files using our internal server and ultimately decided that its content was not unique enough to justify its permanent preservation. However, in another case, a blog that was part of a generous donation clearly contributed to the functionality of the donor organization, so I selected it for preservation and then captured it using HTTrack website copier. It will be a while until the blog is processed in Archivematica and described in ICA-AtoM for users to access, but at least we know that, for now, it is safe and backed up at the Archives. Reviewing the web content required more than technical skills, it also required the application of archival appraisal techniques, the theory behind which goes back hundreds of years and is a fundamental part of what any archivist does.
In the in-between times on Tuesday, I consulted with our Digital Archives team to review what digital photo format would be most useful to our researchers, how we might manage privacy, confidentiality and copyright restrictions in the new system and in the Reading Room and how much storage we might need to manage and provide access to our holdings in the coming year. I scheduled a tour with a digital initiatives team from a local university library so we could share our experiences, successes, failures and goals for the future, in hopes that our collaboration results in better processes and, therefore, better public access to valuable historical materials.
This describes some of what I do in a day as a digital archivist. While technical skill is valuable to what I do, the bulk of my work depends upon my training as a good old-fashioned archivist. Whether they are digital or analogue, the archives are sacred and an archivist’s work is a sacred trust. Our digital memory should be as rich as that which we’ve preserved of our paper past, and Cindy and I, along with our Digital Archives team, spend each day working gathering that legacy.