Lisa Gregory and Jennifer Ricker - State Library of North Carolina
Tasked with preserving state digital publications forever.
Strategies
- Emulation
- Migration - transferring the files to a stable format.
Currently using ArchiveIT for web harvesting, CONTENTdm and OCLC's digital archive.
Approaching to Migration Testing
- What file formats do they have? ~20 different file formats, including older file formats. Mostly text files. Digitized and born digital files. Included some corrupted files (they corrupted them).
- Tools - Ffmpeg, Inkspace, PLANETS testbed, XENA, ArcMap/TerraGo. Not all transformations were successful. Tools where free, open source, documented, supported, audit trail/reporting, easy to use, and versatile.
- Expectations - no visual/auditory loss of content, no loss of metadata, minimal degradation in quality, etc.
- ffmpeg - not so successful with one file format
- Inkspace - some font changes, but acceptable
- PLANETS Testbed - many document file type. Most worked beautifully. Word 95 didn't work. Converting to PDF/A did not work from some specific software packages.
- Xena - Many of the same tests that they did with the PLANETS testbed. Similar results.
- ArcMap and TerraGo are both proprietary software tools. Worked.
- File format observations - Challenges that they expected and found
- Complex and related files - No open source tool that could migrate these and keep the file relationships
- Had trouble with files that had layers (e.g., Adobe Illustrator)
- Proprietary formats that are not widely used (e.g., Microsoft Publisher)
- Surprises
- Audio-video formats have their own complexities
- The files are huge
- Frame rates, compression and codec, oh my! My want to find someone who already knows this stuff, rather than coming up to speed yourself.
- PDF/A (argh!) -
- 1A -1B restrictions plus lower level of performance. Better accessibility.
- 1B - self contained, no external references, lower level of compliance, digitized materials, metadata required. Could be 1B compliance with Adobe Acrobat, but not with open source tools.
- Tools to have
- FFmpeg
- FITS
- FLAC Frontend
- Ghostscript
- Inkscape
- MPEG streamclip
- PLANETS Testbed (RIP?)
- XENA
- More helpful knowledge
- Free and open source has downsides
- "Free in upfront costs
- Might be developed by a single person or by hundreds
- Learning curve can be steep
- Documentation can be confusing or nonexistent
- Can you rock the command line?
- Build in time for stops along the road
- Tool installation
- Troubleshooting
- General Googling for assistance
- There are still unknowns
- QA- what should we use / rely on?
- How can we facilitate batch processing?
- On the fly or scheduled bulk migration?
- QA - how much should we do?
- ARC to WARC?
- Overcoming challenges to production implementation
- Usual culprits - staff time, resources, IT restrictions, programming skills
- Testing Archivematica by Artefactual Systems - OAIS complaint
- Formal workflow descriptions - striving to be OAIS compliant
- Tackling at-riskier files
- Older files
- Older formats
- Obsolete formats
- Databases
- More work on A/V formats
No comments:
Post a Comment