A great deal of scholarly practice is becoming data and computationally intensive, and across all disciplines. Funding agencies are increasingly requiring that data produced as part of a grant be stored, maintained, and often shared (even before the research is completed). This has led to new areas of study, including data curation, eScience and data science, as well as new jobs, etc. However, it is not clear what the real need for data curation experts is. Nor clear where they will or should reside. Where is an important question since we don't have a national effort in the U.S. to store research data.
Lynch did make an important distinction about two types of research data. First, there is data that can be easily recreated. Do we need to store this data in perpetuity? Perhaps not. Then there is observational data which can be difficult, if not impoosible, to recreate. This data does need to be stored and maintained.
He described two used of research data. The first is to support the original research as well natural extensions of thar research. The other use of data is as proxy data and you cannot predict those uses. For example, data about high tides could be used by the shipping industry, but also could be used by environmentalists. Since you can't predict its use, the whereabouts of data sets needs to be known and access available.
Lynch spent a long time talking about access. When talking about big data, we all assume data that is digital, however, Lynch talked about the tremendous about of data - including specimens - that is not yet digital and that is very fragile. Who is going to create the digital surrogates? Where will the funding come from?
Universities sit on a tremendous amount of data that might be "hidden" in various departments. The library might not even know where it all is. Lynch believes that at some point various university offices will get involved in how data is stored, maintained and shared including the office of risk management and those that audit varous processes.
Near the end of the question and answer period, Clifford Lynch made the point that people outside of our research institutions cannot easily get to scholarly material anymore. (He used the phrase "scholarly material" in a broad sense, including data, databases, etc.) For me, that raised the question of how libraries will make scholarly information available to people are not part of their user-base.
Lynch mentioned that the UK had been a poster child for creating systems for storing data, but that funding shifts had harmed those systems. He mentioned both AHDS and JISC. In the U.S., there isn't a natural focal point in order to build systems similar to the UK.
Lynch also touched upon:
- The unauthorized sharing of scholarly information.
- Ethical constraints on research data.
- "quantitative measurements of illusive things" - journal impact factors and other bibliometrics.
- Managing nontraditional "publications"
- Open access, self publishing,and specialized web site which have replaced specialized encyclopedias.
- "Report from theResearch Data Workforce Summit"
Several of my students attended Lynch's talk. Two mentioned being inspired by it and now being interested in knowing more about eScience. They saw a vision of an optimistic future. One student heard in Lynch an air of pessimism. Lynch, however, would say that he is neither an optimist nor a pessimist, but just a realist.