Digitization 101: Event: First International Workshop on Database Preservation

Tuesday, February 06, 2007

Event: First International Workshop on Database Preservation

I don't normally post information on one-day events, but this one really caught my eye. Those of us who are reading this likely are concentrating most of our efforts on preserving content that has not been created and stored in a database. Yet there is a tremendous amount of data stored in databases. Historically, that content could be "preserved" (and I'm using that term very loosely) by converting it to a homogenous format that could be read by a different program OR by (heaven-forbid) printing the data. But our database structures are now too complex and what we store in these databases can be in itself very complex. And so how do we preserve -- truly preserve -- these complex structures?

I know that several people who read this blog work on campuses where you are dealing with collecting and preserving content from across your institution, including administrative records, which are now stored in complex systems (e.g. PeopleSoft). You are already attacking this problem. The notice of this one-day workshop is a wake-up call for the rest of us wke up and also pay attention.

First International Workshop on Database Preservation (PresDB’07)
----------------------------------------------------------------------
23 March DCC, Edinburgh, UK
http://homepages.inf.ed.ac.uk/hmueller/presdb07/

Most of scientific research is now based on digital data resources, and databases are playing an increasingly important role. Much of the data is either impossible (e.g. climate and
demographic data) to reproduce or can only be recovered at enormous costs (e.g. data from high energy physics experiments or space flight missions). Nearly every reference manual, dictionary and gazetteer benefits from some form of database management support, and there has been an explosion in the number of curated databases in biology. These databases represent a huge investment of human effort. The need for preservation is self-evident.

While considerable thought has been given in the past to the preservation of fixed "digital objects", the preservation of databases, which have an internal structure and which may change over time, poses new challenges. Typically databases are centrally managed, and their survival depends on the viability of commercial organisations or the continued public funding of data centres. Libraries, the traditional curators of scientific and scholarly reference material, have largely abrogated their archival responsibility to databases.

Database preservation raises new technical, economic and legal issues. For example:

What are the salient features of a database that should be preserved?
What are the different stages in the database preservation's life cycle?
How do we keep archived databases readable and usable in the long term (at acceptable cost)?
How do we separate the data from a specific database management environment?
How can we preserve the original data semantics and structure?
How can we preserve data while it continues to evolve?
How can we have efficient preservation frameworks, while retaining the ability to query different database versions?
How can multi-user online access be provided to hundreds of archived databases containing terabytes of data?
Can we move from a centralised model to a distributed, redundant model of database preservation?
What documentation is preserved together with a database, and in what format?
What are the legal encumbrances on database preservation?
What can be learned from traditional archival appraisal for the selection of databases for preservation?
To what extent can the preservation strategies, and procedural policies developed by archivists be adapted for databases?

The workshop aims to bring together an interdisciplinary group of researchers and practitioners who will address archival issues associated with databases. All participants’ presentations will be hosted by the workshop site and a short report with the final conclusions of the workshop discussions will be published.

Organization
============
PresDB is an informal workshop organized by a small executive committee. The one-day program of the workshop will consist of oral presentations and brainstorming sessions. Attendance will be mainly by invitation from the executive committee. To stimulate interaction and discussion, participants are also invited to submit short position papers until
02/03/2007 (submissions will be send via e-mail to Vassilis Christophides christop@ics.forth.gr).

Timing and Venue
================
The workshop will be take place the 23 of March at the UK Digital Curation Centre and the Database Group in the School of Informatics, University of Edinburgh.

Executive Committee
===================
Peter Buneman, University of Edinburgh, UK Vassilis Christophides, University of Crete and FORTH-ICS, Greece (Chair) Bertram Ludaescher University of California, Davis, USA Chris Rusbridge, Digital Curation Center (DCC), UK Wang-Chiew Tan, University of California, Santa Cruz, USA Ken Thibodeau National Archives and Records Administration (NARA), USA

Technorati tag: Digital Preservation

Tuesday, February 06, 2007

Event: First International Workshop on Database Preservation

No comments: