Monday, June 10, 2013

#SLA2013 - The Digital Preservation Network, James Hilton

He is the chief evangelist for the DPN, which is indeed a large network of institutions and people.
"Everything that ever mattered in the world was more complicated" than was stipulated.

Problem: the scholarship that is being produced today is at serious risk of being lost forever to future generations. True for traditional and emerging scholarship.  Less than 50-50 chance that our intellectual children will have access to our work.  In regards to data, this is also true.  The volume of data is increasing and so are the preservation requirements.
  • The LHC (CERN's super collider) produce 3 million DVDs worth of data every six months.
  • Much of the coming data tsunami will be discarded.
  • Some data can NEVER be thrown away. Some data is a glimpse in time.
Only universities - in partnership with others -  are positioned to solve the data problem for the long haul. Why?  They've been around for a long time, hence, likely will continue to be around for a long time.  Yes, you can do partnerships with for-profit organizations, but they want to make money, rather than lose money on a scholarly endeavor.

There are currently lots of digital collections with a smattering of aggregation... e.g., HathiTrust. Most of the emphasis is on current access and little more than a promissory nod to preservation.  All susceptible to multiple single points of failure.  Single points of failure come in multiple forms - tech, physical, political, failure of organizational will, etc.  We need to consider not decade long access/preservation, but century-long access/preservation. [The fourth sentence in this paragraph used the word "value" instead of "failure", due to an iPad that like to autocorrect.  The word should indeed be "failure".  Thank you, Andrew, for pointing out the flaw.]

We've seen this picture before in networking and it's why we built Internet2.  Internet2 is our high-speed backbone. Internet2 allows us to continue to compete in science.  Internet2 was built to scale and evolve.

  • Lots of one-off solutions
  • Emerging aggregations
  • Multiple single points of failure
  • Many layers to the problems
  • Huge cost advantages  accrue to scaled solutions
  • Commercial solutions come with their own problems
  • Waiting only makes the problem harder and more expensive to solve
DPN is a leverage to force the debate and adoption on standards.  We have momentum to put behind the standards debate.

DPN - eliminate single points of failure by building in replication diversity starting at archive layer.
  • Light archives are better at preservation than dark archives.  However, some things can not be made "light" (accessible).
  • DPN is a dark archive.
Create a sustainable framework at scale that evolves and adapts to new preservation challenges and allows movement up the preservation stack.

Start with well-understood objects and leverage current efforts.  Evolve and adapt to new forms of scholarship and data, adjusting replicating node architecture if/as needed.
DPN is an ecosystem, not a software project.  Designed to evolve to address new forms of scholarship, changing formats and the evolution of software and tech platforms. DPN is a federation,not a monopoly. DPN federation.
  • Audit and verify to ensure succession
  • Provide grant-based/contract funding to the replicating Ned's in a manner that ensures functional independence.
  • Provide a legal framework for holding succession rights
  • Provide a structure for aligning and leveraging  preservation activities/investment.
Currently 56 AAU-like institutions.
Can we afford DPN? Membership is $20000/year/institution.
  • At $15million for first year at full capacity, that is  .0005% of the research expenditures of R1 institutions. (I hope that is an accurate paraphrase.)
  • Grant funding
  • Charge for some services
The emerging digital stack
  • Access oriented Repositories
  • Preservation oriented repositories
  • DPN backbone
  • Code
Activities related to the stack - Internet, CLOCKSS, Portico, Meta

DPN benefits:
  • Preserve scholarship
  • Ensure continued access
  • Evolve to include new forms of scholarship and data as they emerge
  • Rationalize our collective investment in preservation efforts and leverage diverse funding sources
  • Create a framework against which the academy can retool publication workflows for a digital world
  • Provide a way of planning campus-based cyber infrastructure so that it efficiently feeds preservation efforts
Interesting thought - these days, peer review happens after the fact with "likes", etc.

{Syracuse Univ. is a charter member!}
Inaugural board will be in place fall 2013. To include 3 librarians.
Bylaws and elections 2014.
Have a technology working group.

Where are we today?
DPN members connect to a DPN node.  Deposits are replicated on three other nodes. What are the continuous auditing that needs to occur in order to ensure integrity?

Have a succession rights working group.  {Peter Hirtle is a member of that group.}
Also a business model working group.  What is the cost to preserve for 50 years?

  • The perfect
  • The study
  • Inertia 

No comments: