Tuesday, June 10, 2014

#SLA2014 : Amy Affelt - The Accidental Data Scientist: A new role for librarians and information professionals 

What is big data? You know it when you see it.

McKinsey: amount of data collected will grow by 40% per year.
15 out of 17 industries will have more data than the information stored in the Library of Congress.

How is the data different?  It is being collected in the background and automatically, as well as being user generated. 

Gartner's five V's:
  • Volume
  • Velocity
  • Variety
  • Verification
  • Value
Verification and value are places were information professionals can have a role determining the value is challenging, risky, and expensive.

Cool big data applications...

  • Msft readmissions manager
  • Stanford drug pairings
  • MyAchoo
  • Street bump
  • Xerox ExpressLanes
  • Fixed
  • My magic +
  • RUWT
  • Qcue
We have the skills to work with big data.  We think about things in a critical way.  We should not say "it is easy", but we should work to ensure that our skills are valued. 

Big data busts:
  • Google flu trends
  • Crimson Tide v. Auburn
  • Target "targeted" coupons
  • Lego - did not use big data methods 
  • Boston Marathon Manhunt - did not take a big data approach
Bad big data advice
  • Sketchy citation algorithms - what if the citing article states that the citation is junk?
  • Re-use of data - how do you ensure that the recycled data is clean?
  • Global data sharing - garbage in, garbage out.  How do you prevent garbage in?
We can help people find data and make sure that it is authoritative.
Did you consider alternative data sources?
What biases are inherent in the interpretation?

We'll take it from here:
  • Search
  • Discover
  • Analyze
  • Communicate impact
  • Create deliverables
What's in it for me?
  • Look for big data projects in your industry.  How could you fit into those projects?
  • What are the vexing issues?
  • What is our mission?
  • Set the context to build connections between data points.  Patterns v. Predictions, Coincidence v. Causation
  • Embed into IT and Bog Data teams to provide point of need research
  • Curiosity = high quality
  • Data science v. Data intelligence - not bigdata but better data
Big data communications framework
  • Understand the business platforms
  • Determine impact measurements
  • Discover data available
  • Decide which data is most valuable
  • Formulate hypothesis
  • Communicate the results - what's the story?
How do you get hired as a data scientist?  Gigaom.com article.  Also...
  • Core competencies
  • Learn totally a story
  • Exercise creativity and curiosity/healthy skepticism 
  • Show up and be ready to learn 
New big data roles
  • Data policy expert 
  • Data release expert  
  • Exit survey on data expert

No comments: