Friday, August 21, 2009

Selection criteria, Zipf's law and the Pareto Principle

My blog post on Wednesday talked about using Zipf's law as a way of deciding what to digitize. According to Walt Crawford, he has argued using the Pareto Principle for determining what is popular vs. the exceptional items. While the Pareto Principle and Zipf's law are very different, you can see how each could be used in creating selection criteria.

Custer - using Zipf's law - focused on trying to satisfy 70% of user needs. Crawford, however, argues that the obscure items are "exactly what needs to be digitized...It's the oddball stuff that will disappear otherwise..." This is very different than trying to satisfy a large number of user requests.

What's your take on this? Do you want to satisfy the most user requests or provide access to important obscure items in your collection? Which would benefit your institution more?

BTW Crawford's article on this appeared in American Libraries, June/July 2001, p. 72. (v. 32, n. 6).

Technorati tag:


Ben said...

"the oddball stuff that will disappear otherwise..."

I may be misunderstanding this or just reading it out of context, but this philosophy sounds like it's treating digitization as a preservation tool. A lot of what we do in archives is driven, quite simply, by use. For our institution it makes much more sense to use digitization to provide access to the greatest possible number of users rather than focus our efforts on preserving what is most at risk.

Susan D'Entremont said...

This is a tough one. Lucky for me, I don't have to make this decision - we leave it up to our participants! We have organizations using both principles when choosing what to digitize. Although, honestly, most of our participants are small, so they have begun by choosing items that are the easiest to digitize and have no ownership issues. (Is there a law or principle for that approach?)

I didn't read the comment about the "oddball" stuff the same way Ben did. I'm thinking more of the wierd stuff that we know about as archivists but that people don't use because they don't know it's there. Even if we make good finding aids, the general public may not even think to LOOK for it.

Digitization, at least so far, seems to be making material much more accessible to the general, browsing, public than other access methods do, so sometimes it's worth taking the risk to digitize things that aren't used much in your repository if you think that people would just love this stuff if they only knew it was there.

Mark Custer said...

Hi Jill,

I just noticed these two blog posts, so I thought that I would quickly respond.

First, thank you for posting information about the Crawford article. Pareto, Bradford, and (perhaps most notably) Richard Trueswell are all names that have been used in library literature to discuss these sorts of 80/20 power law distributions.

And, I'll try to expand later about the reason that I choose to invoke Zipf's name instead, but for now I'm just going to clarify my point about "mass representation."

"Mass representation" is a prioritization strategy that can be enacted, I think, by any institution that has digitization aims (be those large or small). First and foremost, though, information about their collections needs to be made available (which is why I push for the creation of EAD finding aids as soon as a new collection is accessioned). After that, you can collect data about these collections, both in the reading room and on the web. Using that data, you can then pinpoint the most frequently found/requested collections. And, if you prominently display digitized material from these high-use collections, you'll have the LARGEST impact possible for the LEAST amount of work. That's it, really.

This is not to say, though, that we should or shouldn't cater to the majority or to the minority (of our users or of our collections). In fact, it doesn't remove all of the selection decisions involved; it just helps to focus those decisions.

Jill Hurst-Wahl said...

Mark, thank you for the clarification. Your conference session sparked email, FriendFeed and blog "conversations" (and those are always a good thing).

What you are advocating is indeed an interesting way of going about the selection process. Much better than not knowing what to digitize and more focused than digitize everything. You are advocating that the right things be digitized and giving a strategy for doing that!