Friday, November 10, 2017

#NYLA2017 : Big Question, Big Data, and HathiTrust

Mike Furlough

HathiTrust shows how libraries can collaborate.  Over 130 members - academic/research libraries. Member fees support 100% of operational expenses.  Fees begin at about $9500 in 2018.  They do not see themselves as a subscription service.

HathiTrust has a portfolio of work:
  • Collection development
  • Preservation
  • Use
  • Rights management
  • Collection management
  • Computational research 
15.8 million digitized items
  • 7.8 book titles
  • 430k serials
  • Over 1 Million federal government documents 
  • 5.96 million open for reading
Some materials are not fully viewable outside of the U.S. due to differences in  the public domain.

Access in a nutshell
Anyone anywhere can search
Anyone can read public domain works
Can engage in text mining

Members can replace lost or damaged works from the collection (Section 108 exemption).
For someone who is print disabled, member institutions can make any work available.   There is not direct access for students currently.        

Collection Action: Copyright Review
Systematic manual review of copyright registrations to determine status of portions of the HathiTrust a collection, supported by IMLS.  Trying to work 10-15 minutes per item.  Have reviewed 700K items over 8 years. Over time, 100+ people at 30 Institutions have down this work.

Shared Print Monograph Program
Just launching this year.  Phase 1.
49 retention libraries proposed over 16 million commitments.

U.S. Federal Documents Program 
The goal was to digitized as many as possible.
Are creating a federal documents registry of documents since 1776.
They are beginning to do gap analysis and target collections for digitization.
They have set priorities.

“Non-consumptive” Research: The HathiTrust Research Center
Non-consumptive is text mining or data mining.
Indiana University and University of Illinois are cohosting this center.
Analytics portal
Dataset distribution 

HathiTrust has gone through six stages beginning in 2002. They worked on infrastructure first.  
What is different now?
Membership diversification
Organizational maturity
Mass digitization is assumed and non-controversial 
Legal challenges have ended, but questions remain 

Don’t mess up what you do well.
Keep building the collections and do it faster.
No strong impetus to expand collecting focus. 
Quality is important.

No comments: