Last fall I was able to attend iPRES, the International Conference on Digital Preservation. This was the 15th time the conference has been held, and the first time it has been in North America since 2015. The 2018 conference was held in Boston from Sept. 12-16, 2018. Previously it was held in Kyoto; the 2019 conference will be in Amsterdam.
The conference brought together 421 attendees from thirty-two countries, including scientists, archivists, librarians, and other professionals from disciplines that have an interest in preserving digital information over long time spans. The interdisciplinary approach of iPRES is valuable in digital preservation. Digital preservation is not a problem unique to archives. An interdisciplinary approach lets smaller communities, such as archives, to find out more about how larger communities, often with better resources and larger research budgets, are addressing problems of a similar nature.
As with any conference, there was a mix of sessions that were directly relevant to my day-to-day work and those that, though not directly relevant, were informative and thought-provoking. The program committee assembled and interesting mix of long and short format papers, panels and workshops. Fortunately, it’s possible to catch-up on what I may have missed, as all of the conference proceedings are available through the Open Science Framework at https://osf.io/u5w3q/. Looking back at my notes, there were a couple of sessions that earned a disproportionate amount of space in my notebook.
Session 308 addressed the overall feasibility of digital preservation, and the cost associated with it. From an ethical perspective, archives should not acquire material that they aren’t able to preserve. While most archives are very good at assessing this with respect to analog materials, digital preservation is a (relatively) new activity for archives, and it is often unclear as to what long term digital preservation actually means (e.g., how long is “long term”? Is it even possible to commit to preserving digital material for time spans as short as 20 or 50 years, given the known difficulties associated with digital preservation). The presentations in this session examined criteria to consider when deciding on the feasibility of a given preservation project, and approaches to estimating the costs associated with preservation activity.
It was encouraging to see some research on the issue of cost modeling. Decisions about whether or not an institution is able to preserve something are not only dependent upon whether or not the institution has the requisite skills and technology to do so, but also upon whether they can afford to. The financial sustainability of preservation is something that is often overlooked in preservation planning. Many projects to acquire and ingest born-digital materials or to digitize existing holdings are grant funded. However, there continue to be ongoing preservation costs after the initial project funds run out. Tools like the digital preservation cost calculator presented by Kate Dohe and David Durden help fill a gap in the existing digital preservation toolkit.
Session 401 examined various preservation workflows. The papers discussed several different workflows: data recovery from obsolete 8″ floppy disks, transferring files from removable media, and appraising large volumes of email. The presentation by Joanne Kaczmarek and Brent West was particularly interesting. Kaczmarek and West discussed the use of predictive coding and machine learning to train computers to appraise email. Software with this feature is commonly used by lawyers for e-discovery – identifying digital records relevant for discovery within a legal proceeding. By providing the computer with examples of what is and isn’t being searched for, the software can learn and extrapolate the decision criteria to a larger set of documents. The results are statistically comparable to the results provided by a human review of the same set of documents. Although Kaczmarski and West were only considering its use in appraising records (email) for acquisition, this type of technology could be applied to digital preservation in many ways. It could potentially be used to help identify records with sensitive content that need to be subject to more stringent access controls (e.g., records subject to FOI restrictions), and to help researchers identify records with content relevant to their research enquiry.
There was a lot of really great content present in Boston, and some difficult choices had to be made about what sessions to attend. Other memorable sessions included 203 Capacity and Accomplishment which examined questions like “what is an acceptable level of preservation given institutional constraints and goals?” and “how can outcomes be measured in order to judge success?”, and 302 Minute Madness, which had poster presenters deliver one-minute summaries of their posters to encourage attendees to come see their posters, and the corresponding poster sessions which covered a diverse range of topics.
Congratulations to the organizers for putting on a great conference, and to the presenters for providing consistently excellent content. This was the first time attending iPRES for me. I found it to be very useful to meet and talk to people in other disciplines and see where the similarities and differences in exists in our respective challenges and approaches to digital preservation. Overall, it was a very positive experience. I look forward to attending future iPRES conferences.