Wednesday, November 30, 2005

Brick and mortar libraries (something fun)

After getting involved in the Librarian Trading Cards Pool, someone told me about the Libraries and Librarians Photo Pool (with nearly 2,000 photos) . This "pool" on Flickr is mostly of library buildings, with photos from around the world. If you're thinking of remodeling your library, you might get some ideas here. Otherwise, you might just find it comforting to look at pictures of brick and mortar libraries. Perhaps you should add in photos of your library?

How is Google's digitization quality?

When Google announced its project nearly a year ago, I was anxious to hear how they were going to digitize the materials. I soon realized that confidentiality agreements and the air of secrecy was going to keep me (and you) from learning from this project. I know that we'll learn more about copyright because of Google's work, but would it be wonderful to learn more about how they are going about this effort? Even just some tidbits?

We can learn a bit from looking at the books that Google has digitized. And what we learn is that their quality isn't all good. If you search through the materials, you'll find items were the images are very crisp and clear, and others that are blurry and (perhaps) sloppily done.

For example, if you flip through this book (from 1908 and in the public domain), you'll see a fingernail, book clamps, obscured pages, pages missing (p. 61), and pages that are crocked. And nearly every page is hard to read. Is this an anomaly? No. Look at this book (from 1916 and in the public domain) and you'll see brown pages (p. 22). What's up with that?!

Without signing in, you can only see a few pages of the newer books. Even without signing in, one quickly senses that the pages are clearer and much easier to read. (Look at this example from 2004.) Is Google doing something different with these so that they are scanned better?

BTW Google will display only snippets of a book where it has not received permission to digitize and display more pages from the book. Here's an example of that. Useless, right?!

Of course, Google would say that they want you to find the books online and not read the books online. To read the full-text, it is hoped that you'll purchase a copy of the book. Fine. But can I purchase a copy of a book published in 1908? Likely I would have to get a copy from my library through interlibrary loan (ILL), if it is available. Even if I have to get a book through ILL, Google has done its job because it has made me aware of a book that I might not have known about otherwise.

So can we overlook the errors and problems because Google is helping us find books? Part of me says "yes", but then I remember that we don't want be digitizing old books more than once. We want to do it correctly the first time. If these books have to be digitized again to improve the quality of the images, then time and money has been wasted. In addition, the books will have to be handled once more, which I hope is not once to many.

Google need to do better. The company is leading us down an important path. It need to do so the right way.

Finally, I found that if you page through a public domain book too quickly, Google senses that and feels that you may be a robot or virus, and thus stops you. You must then type in a code to continue. (This also occurs if you look at a book more than once.)



Technorati Tag:

Tuesday, November 29, 2005

Market for used scanners?

Occasionally I get an e-mail message that lists used scanner which are available for purchase. Today's email was from James River Systems. (It could be that all of the e-mails with used equipment have been from them...I haven't tracked that.) It has never crossed my mind to buy a used scanner. Just like a used car, you wouldn't know what horrors it had been through. But a good used scanner could -- I guess -- help launch a project that couldn't afford new equipment.

It looks like James River Systems will sell equipment on consignment, too, and even make introductions (for a price) between a purchaser and someone who has equipment to sell. And besides giving equipment a second lease on life and providing equipment to a projectt that couldn't afford new equipment, they are helping to keep equipment out of landfills. An interesting business...

Dublin Core questions asked by a reader. Can you give answers?

A reader has asked about the adoption rate of Dublin Core outside of the library/information profession. Here are his questions, as I understand them. Maybe you can comment with you thoughts and answers, and help me give him the information he desires.
  • Is Dublin Core being supported by common tools? Think not of OPACs and digitization programs, but more common web tools.
  • How well is Dublin Core being adopted outside of the library/information community?
  • Are tools such as RSS using Dublin Core?
In other words, is Dublin Core really something that the non-library community needs to be learning and using?

one millionth image online!

Congratulations to the Library of Congress for putting its one millionth image online. For some institutions, just getting started on a digitization program seems like an insurmountable task. How wonderful to see an institution start and keep going with great consequences!

[Actually the image has been selected as the one millionth. The photograph depicts Washington Senators baseball player Herman A. "Germany" Schaefer using a camera during a visit to play the New York Highlanders in April 1911.]

Monday, November 28, 2005

Event: Digital Preservation in State Government: Best Practices Exchange 2006

This will be held March 27th-28th, 2006, in Wilmington, NC. Lots of information is available on the event's web site.

Report: A Textured Sculpture: The Information Needs of End-Users of Digitised Collections of New Zealand Cultural Heritage Resources

I'm late in posting about this, but -- as the saying goes -- better late than never.

According to the NZ web site:
In 2004-05 the National Library continued to support research into library and information studies through commissioning the School of Information Management at Victoria University of Wellington to investigate the needs of end users of digitised cultural heritage collections. The report is presented here in its final version.

Results from the surveys and interviews indicate that digital access is indispensable to cultural heritage research. However, participants also identified a number of barriers to digital access, some that are more generic, and others that are more relevant to scholarly historical research. The importance of New Zealand primary documents for cultural heritage research is repeatedly mentioned by participants, with particular emphasis on image sources (i.e. maps, photographs), newspapers, and all Māori cultural materials.
The full report and an eight-page summary are available. Included in the full report are the survey questions used with the end-users.

Presentations available: European Fedora User Meeting

Look closely and you'll see links in the agenda to the PowerPoint presentations.

Friday, November 25, 2005

Qualified Dublin Core

At a meeting this fall, a colleague implored that we not just talk about (and recommend) Dublin Core, but that we specifically get people to use Qualified Dublin Core. The Dublin Core web site says:

"Qualified Dublin Core"Â’ employs additional qualifiers to further refine the meaning of a resource. One use for such qualifiers are to indicate if a metadata value is a compound or structured value, rather than just a string.

Qualifiers allow applications to increase the specificity or precision of the metadata. They may also introduce complexity that could impair the metadata's compatibility with other Dublin Core software applications. With this in mind, designers should only select from the set of approved Dublin Core qualifiers that were developed by the Dublin Core community process.

Unfortunately, qualifiers often introduce additional complexity that can make metadata less interoperable unless approved DC Qualifiers developed within the DCMI are used with such interoperability considerations in mind.


The other version of Dublin Core is referred to as Simple Dublin Core or sometimes Unqualifed or Basic Dublin Core. If you are not familiar with Simple and Qualified Dublin Core, go to the web site and read more about it, and then talk to your colleagues about the pros and cons. Hopefully the benefits of using Qualified Dublin Core will outweigh any negatives.



Technorati Tag:

"Bit-level" preservation

I'm reading a document a friend wrote and found the phrase "bit-level" preservation. Bit-level is preservation the file as it was submitted. The Florida Center for Library Automation says that bit-level preservation includes maintaining onsite and offsite backup copies, virus checking, fixity-checking, and periodic refreshment by copying files to new storage media. In other words, maintaining the integrity of the original file is preserved for later dissemination.

You can contrast that with full preservation. FCLA says:
Full preservation includes bit-level preservation of the originally submitted files, as well as services intended to ensure that the information content of the files will remain usable into the indefinite future. These services vary according to the file type but may include the creation of normalized forms of the file and/or the reformatting of obsolete formats to reasonably comparable successor formats. It is not guaranteed, however, that normalized or migrated versions of any file will be identical in functionality or in “look and feel” to the original file. Note also that if a logical object is comprised of individual files in both supported and unsupported formats, there is no guarantee that the logical object will remain usable as intended.
The assumption is, of course, that you have defined what file types you want to do full preservation on and why, and that those decisions match your organization's needs.

At any rate, these are both good definitions to keep handy. A Google search doesn't show bit-level preservation to be use widely at the moment, but I'm sure it will be.

Wednesday, November 23, 2005

When/where do vendors release new products?

This may seem like a stupid question, but it occurred to me earlier this week, when I heard that a vendor is going to announce it new product at the ALA mid-winter conference. In the information industry, vendors like Dialog, NewsNet (defunct), Dow Jones and even the smaller guys long ago used to introduce their new products at National Online in May in New York City. National Online, hosted by Information Today, was the first conference of the year and a very big deal. People attended National Online because it was were you learned what was new. Vendors would sometimes even announce things that weren't quite ready, just to get the word out at this big event.

BTW vendors would then follow-up their appear at National Online with appearance at other conferences like those held by SLA and ALA. The last conference of the session was -- at that time -- the Online conference in London, UK (which is still a big deal conference). [Marydee Ojala in her blog today notes three vendors who will be making announcements at Online Information this year.]

As the industry grew more diverse and more conferences appeared on the scene, National Online became less important and it no longer exists. The industry changes also brought about a change to when new products are introduced. It may still be true that a vendor will try to time the formal release of a new product with a major conference, and probably a conference where the audience is hoped to be appreciative of its new product. But new products are also released at other times during the year, with whatever fanfare that can be mustered.

Here's the question for the day: Where should digitization-related vendors announce their new products in order to get maximum exposure and generate good word-of-mouth advertising? Is a traditional information industry conference the right place? What about the conference hosted by AIIM? (I've been told that the AIIM conference attracts vendors and not necessarily a lot of end users, even though it sounds like a conference we would be interested in given that AIIM members deal with information and image management.) Is there a digitization-related event that would be more appropriate?

If you're stumped, then join the club! I don't think there is a good place (conference) for these vendors to announce new products. Any conference will have some of their ideal customers there, but there is no one conference to give them the best launch. That's too bad, because that's the conference I'd like to go to!

Working with smaller institutions

In a post yesterday, I wrote that it was sad that many small institutions were not digitizing and are, in essence, being left behind.

Kay Schlumpf -- who is involved with the Digital Past at the North Suburban Library System based in Wheeling, Illinois -- wrote a comment and said that their project:
...is trying to find ways to get more small cultural institutions involved in digitization efforts. We have several instances where local public libraries have formed partnerships with small historical societies to get their items online. We have another library that brought together a group of 4 smaller local museums to digitize some of their materials as well.

We even offer a digitization lab with hands on help and training free to participants. There is a very small fee to participate but most times the library or a donor will step in to cover that cost.

Currently we are focused in northern Illinois, but are willing to form partnerships with others.
Wow! These efforts are wonderful to hear. Anyone else have a success story about working with smaller institutions to get them involved in digitization?

Event: Digital Preservation in State Government: Best Practices Exchange, March 27th-28th, 2006

Received via the Archives discussion list:

The State Library of North Carolina is pleased to announce:

Digital Preservation in State Government: Best Practices Exchange 2006
http://statelibrary.dcr.state.nc.us/digidocs/bestpractices

When: March 27th - 28th, 2006
Where: Wilmington, North Carolina at the Hilton Wilmington Riverside
Registration Fee: $150
Registration Opens: December 5, 2005
Registration Deadline: February 23, 2005

Come join fellow librarians, archivists, records managers, and other information professionals as they share their experiences in managing and preserving digital state government information for public access. Bring examples of your successes, failures, and lessons learned to share with colleagues in facilitated exchange sessions. You will most certainly provide and take away something of value from this experience.

The Best Practices Exchange consists of two facilitated large group sessions (an opening forum and a closing wrap-up), six small group topic-based exchange sessions, and an evening reception.

Exchange Session Topics include:
  • Repository Systems
  • Identification, Selection and Appraisal of Digital Assets
  • Collection of Digital Assets
  • Authentication of Digital Assets
  • Metadata
  • Resources/Workflows for Managing Digital Assets
  • Access to Archived Digital Assets
  • Preservation of Digital Assets
  • Organization (Central versus Federated)

For more information on the Best Practices Exchange, visit:
http://statelibrary.dcr.state.nc.us/digidocs/bestpractices

or, contact Christy Allen at:

Christy E. Allen
Digital State Documents Librarian
Documents Branch, State Library of North Carolina
4643 Mail Service Center
Raleigh, NC 27699-4643
Phone: 919-807-7447
Fax: 919-733-1843
Email: callen@library.dcr.state.nc.us

Tuesday, November 22, 2005

Librarian Trading Cards / Pool (something fun)

Okay, this has NOTHING to do with digitization (although it uses digital photos), but is something fun that librarians are doing. Librarians are creating electronic "trading cards" using Flickr (and other products). You can read about it in Steve Cohen's blog here and here. It's a fun way of making ourselves more visible to each other and maybe those that need our services. (The hard part might be finding a decent photo to use, though.) I did one and Steve hopes that many more librarians will take time to do one, too. If you look through them, I bet you'll find some people you know (or whose blogs you read). I found it nice to put faces with names I already knew.

Library of Congress Plans World Digital Library

From the press release:

Librarian of Congress James H. Billington and Google Co-Founder Sergey Brin announced today that Google is the first private-sector company to contribute to the Library's initiative to develop a plan to begin building a World Digital Library (WDL) for use by other libraries around the globe. The effort would be supported by funds from nonexclusive, public and private partnerships, of which Google is the first.

The concept for the WDL came from a speech that Billington delivered to the newly established U.S. National Commission for UNESCO on June 6, 2005, at Georgetown University. The full text is available at www.loc.gov/about/welcome/speeches.

In his speech, Billington proposed that public research institutions and libraries work with private funders to begin digitizing significant primary materials of different cultures from institutions across the globe. Billington said that the World Digital Library would bring together online "“rare and unique cultural materials held in U.S. and Western repositories with those of other great cultures such as those that lie beyond Europe and involve more than 1 billion people: Chinese East Asia, Indian South Asia and the worlds of Islam stretching from Indonesia through Central and West Asia to Africa."

Google Inc. has agreed to donate $3 million as the first partner in this public-private initiative.

And...
To lay the groundwork for the WDL, the Library will develop a plan for identifying technology issues related to digitization and organization of WDL collections. These might include presentation, maintenance, standards and metadata schemas that support both access and preservation. The plan will also identify resources, such as equipment, staffing and funding, required to digitize and launch an online presentation of a WDL collection.
In their commentary on this, Danny Sullivan and Gary Price note:
Over the past year, Google has digitized about 5,000 public domain books from the Library of Congress, material that may ultimately end up in Google Book Search, though it's not currently listed there yet. Google will continue scanning public domain books from the Library of Congress Law Library. Google said it's too early to tell if any of the scanning work it has already done will end up in the WDL.
Now we have several big projects underway. It will be interesting to see how they all fair.

The actual digitization isn't the hard part

Next week I'll be doing a short talk on digitization. I suspect the group expects me to talk about what people's concerns are with digitizing materials, but digitizing is the easy part. It is everything else that causes difficulty. Perhaps this goes along with where we often focus our time when learning about digitization. We tend focus on the process of digitizing, since that seems so foreign to us, but there is so much more to a digitization program including project planning, metadata creation, copyright, preservation, marketing, etc. The actually digitization can be learned and is often very rote. Other areas require more thought and more preparation.

What's the most different area that needs addressing in a digitization program? I think my answer changes depending on the situation. Clearly every hurdle can be overcome if there is money to solve the problem. But sometimes the hurdle is management's attitude. They don't see the importance of beginning such a program. They don't understand the positive impact it will have on the institution and those it serves.

The saddest part of digitization is that more institutions are not doing it. Many institutions, especially those small ones (e.g., small historical societies) with great collections, are being left behind. A divide is occurring and I don't see anyone riding in on a white horse (the proverbial hero) to solve it. The only way to get these institutions involved in digitization is to create collaborative efforts that they can easily be a part of. These smaller institutions don't need to digitize everything, but they do need to make some materials available online so that people know that they exist and know -- by example -- what they own. This would help those institutions stay visible and help drive visitors to them (both online and to their physical buildings).



Technorati Tags: ,

Monday, November 21, 2005

Who is reading Digitization 101?

I have two searches that run constantly to see where this blog is being mentioned. I did it to help me understand who is reading this blog and hopefully to be able to target postings more to my readers. The mentions come from places I'd suspect, as well as unexpected places (e.g., blogs published in French and German). Thankfully, I can use Babelfish to give me a rough translation of the pages.

The searches I have running are in Feedster and Google. They are not perfect, but they do deliver interesting results. And they have helped me find other blogs that I finding interesting and informative.

In case your curious, looking at the counter I have on the blog, I'm getting an average of 82 visits per day from around the world (every continent). That does not include all the people who read this blog through a blog reader. I've gotten messages from some of you. It would be wonderful to meet more of you, especially if there is a topic you would like addresses here.

Addendum, 11/23/2005: Guenter Waibel at RLG wrote to remind me of the PubSub features to check Daily Link Counts and Site Statistics. I had actually just started using it and it is an interesting tool.

Addendum, 11/23/2005, 2:44 p.m.: I should also mention that PubSub has a list of librarian blogs and shows them ranked against each other by something called ListRank. Besides seeing how a blog ranks (or not), you'll likely find a few blogs that you didn't know about and perhaps should investigate.

Talking with vendors

This year, I've interacted with many digitization-related vendors through one-on-one meetings, e-mail, and phone calls. Here's what I've learned:
  • Digitization vendors based outside of the U.S. (i.e., India) are reaching out to find potential customers. They use e-mail and phone calls to introduce their services and try to solicit business. Several vendors based in India have contacted me this year (one per month?), but only two U.S. digitization vendors have contacted me without any prompting. None of the digitization-related vendors in my geographic region have ever contacted me. (Yes, I do know who they are and what they do.)
  • Many of the vendors who are reaching out are those that are looking for big projects like digitizing corporate files and working papers (e.g., banking records). Sadly, many that I talk to don't realize that the requirements for digitizing materials from a library, archive or museum are likely to be very different.
  • Vendors are very picky about where they will exhibit their services (e.g., a conference vendor exhibit hall). Everyone wants to exhibit where the big customers will see them. However, they should also exhibit were smaller customers can also see them. You never know who knows who, and that small customer could lead to something very big. (Consider, for example, that a smaller organization is likely to find a major organization to partner with in order to create a more successful project.)
  • Vendors would be wise to learn what an organization thinks about when considering a digitization project, so they understand how their services fit into the entire mix.
I do like talking to vendors. Today I talked with two. And next week, I'll visit one of them. I think I can give a vendor the inside scoop (story) on what libraries, archives, museums and even corporations are looking for. And I like hearing what they are doing and who they are working with. Hopefully our conversations are mutual learning experiences.

Saturday, November 19, 2005

My web site is back up! Hallelujah!

If you've never looked at my site, feel free to take a peek and read more about what I do in regards to digitization and in doing business intelligence (BI) research. Although they seem like very different things, in both I help organizations have the information they need to move ahead with decisions and projects.

I also occasionally do workshops (usually digitization, BI or computer-skills related) and this year developed one based on my blogging experience. I offered it -- How to Create a Blog for your Business -- in September (very successful, if I do say so myself) and will be offering it again in January 2006.

Friday, November 18, 2005

Online Book: Creating an Institutional Repository: LEADIRS Workbook

This may be of interest to those building institutional reposittories of digital materials. It is 134 pages in length. The introduction says:
The Learning About Digital Institutional Repositories Seminars programme (LEADIRS) aims to describe and illustrate how to build an online institutional repository.

The LEADIRS series of seminars present specialists from the UK and abroad sharing their expertise and experiences in building institutional repositories. This workbook book supplements the seminar presentations and offers practical advice as well as work sheets you can use to get started with your own repository programme. Where possible, we point you to real-world examples of planning aids or presentations used by university library teams in the UK and around the world.

The information in this book is as complete as possible at the time of writing. Because each institutional repository service will be unique to the institution where it is built, this information is meant to be helpful and to provoke discussion and exploration. It is not meant to be prescriptive. We cannot account for or anticipate the unique challenges and resources of your institution.

Larry Lessig -- the "discussion": the morning after

Last night, Larry Lessig participated in a Google Print debate. Today in his blog, he has some "morning after" thoughts about what was said and the implications for Fair Use. From his vantage point, the arguments that the publishers are putting forth will potentially shrink Fair Use. For sure, this topic -- digitizing and making available copyrighted materials -- is changing how we think of Fair Use and maybe in the long run that will be the what we'll all remember about Google's efforts.

Thursday, November 17, 2005

My web site is down

Actually, it has been down for more than a day due to a server that is dying. I have e-mail (thank goodness), but no web presence other than this blog.

When your web site is down, it is like you don't exist. Your "public" face is gone. No one can find you. Maybe Internet hosting services should develop some affordable mirror site ideas, which would guarantee that a web site is always available no matter what. (BTW thank goodness for cached sites in Google and those archived at the Internet Archive. And why does the Wayback Machine not show any pages archived for this year?)

People have asked why my blog is not part of my web site. The simple truth is that I didn't even think about putting on the same site as my web site, when I set it up. And now I'm happy that my blog is someplace else (Blogger) and available when my site is not.

The prognosis is that my site will be back up tomorrow. Let's keep our fingers crossed.

Google's Librarian Center

In case you haven't seen this, Google has launched a site/service geared towards librarians. According to the page, Google is going to produce a quarterly newsletter for librarians and I'm assuming that they'll do other stuff to.

Interestingly, they're also looking for pointers (URLs) to lesson plans and other documents that people have used to teach others about Google. Now it looks like they are not looking for this stuff in order to share it more widely, but rather as a way to learn what works when teaching people how to use Google! I wonder if they'll then use this information to create their own materials and put those trainers (librarians) "out of business" (at least for teaching about Google)?



Technorati Tag:

Continuing partners

When Carole Ann Fabian (director of the Univ. of Buffalo's Educational Technology Center) talks about digitization projects, she talks about the need for "continuing partners." Interestingly, this also came up -- in a sense -- at the Statewide Digitization Planners Conference in October. Projects (and programs) that are collaborative in nature are more successful than those that are solo efforts. Why? Here are reasons that come to my mind:
  • When you collaborate, you bring in additional resources and skills to compliment those you already have.
  • Two or more partners are less likely to let a project fail.
  • There are more people to pick up a dropped "ball."
  • The additional resources can help to create a better end-product.
  • Responsiblities are distributed, so no one group feels overwhelmed.
The challenge is finding the right collaborators. It seems, though, that going through the trouble of finding the right collaborators is more than worth it if you realize that the project itself will be stronger and better because of it.

Wednesday, November 16, 2005

Google Print — the debate

There will be a debate in NYC this Thursday evening (Nov. 17) at 7 p.m. on Google Print. Larry Lessig's blog gives the details. He also notes today that the debate will be streamed over the Internet at this site (note that there is nothing there now). I hoping that the debate is archived on the Internet, so it can be watched later (for those of us who will not be glued to our computers Thursday night).

BTW I hadn't posted this earlier, because it was "just" an event in NYC, but now that it's going to be webcasted...!

FILE FORMAT REGISTRY: new version released by the UK National Archives

I received this press release today:

The UK National Archives has released PRONOM 4, the latest version of its web-based technical registry to support long-term digital preservation. Adrian Brown, Head of Digital Preservation, at The National Archives said: ‘PRONOM 4 incorporates a number of significant enhancements, including an automatic file format identification tool.’

PRONOM 4:

  • Now holds detailed technical information about individual file formats, including links to the full format specifications where available
  • In anticipation of the launch of the PRONOM Unique Identifier scheme, later in 2005, PRONOM 4 now also supports the use of unique identifiers. The scheme will provide persistent unique identifiers for file formats recorded in PRONOM, and has already been adopted as the preferred encoding scheme for describing formats within the e-Government Metadata Standard in the UK
  • Introduces DROID (Digital Record Object Identification) the first in a planned series of tools, which use the content of the registry to provide specialized preservation services. DROID is an automatic file format identification tool, which uses byte signatures stored in PRONOM to identify and report the specific file format versions of digital files

Dr Peter Townsend, Commercial Director of Tessella, said: “The introduction of DROID will allow repositories all over the world to identify the format of the files they need to preserve, and take a first step on the road to long-term preservation. One of the first repositories that will take advantage of this new tool will be the award-winning Digital Archive, developed by Tessella for the UK National Archives.”

Kevin Gell, Managing Director of Tessella, said: “Tessella has built a long-standing relationship with the UK National Archives, which includes the development of all four releases of PRONOM. Projects such as these, and the Electronic Records Archives program for the US National Archives and Records Administration, are demonstrating to the world that the seemingly insurmountable problems of digital preservation are beginning to be solved, and that the benefits of innovative solutions can be shared with the rest of the digital preservation community.”

Adrian Brown continued: “There is an ongoing programme of development for PRONOM, and we very much welcome feedback, including ideas for future enhancements. We are also always interested to hear from anyone who is either using, or would like to use, PRONOM content or services.”

Notes to editors: [edited]
  • The UK National Archives hold one of the largest archival collections in the world, spanning 1,000 years of British history. Launched in 2004, the National Archives brings together the Public Records Office and the Historical Manuscripts Commission, and is responsible for the long-term preservation of, and access to, Government records in an authentic and complete state. Increasingly these records are ‘born digital’ files published by government departments. [www.nationalarchives.gov.uk]
  • The Digital Archive stores important UK Government records, including public enquiries such as the Hutton Inquiry, the websites of Number 10 Downing Street and the Cabinet office, e-mails, web pages and databases. [www.nationalarchives.gov.uk/preservation/digitalarchive]
  • For further information on PRONOM please visit: www.nationalarchives.gov.uk/PRONOM/ www.rlg.org/en/page.php?Page_ID=20571&Printable=1&Article_ID=1717
  • For further information on DROID please visit: www.nationalarchives.gov.uk/aboutapps/pronom/droid.htm


Tuesday, November 15, 2005

What would happen if... (a one-act play)

I failed to mention that there was a short play done at the NYLA annual conference "about how library systems provide effective services and what would happen if they were forced to close." (from the flyer) I wish more people had seen it because it did really show the problems very quickly of what happens when a system fails and the libraries are no longer "attached" to each other. In the play, a father and daughter are trying to return books as well as obtain information she needs for a school project. However, the library can't help them even though the information is available elsewhere. No interlibrary loan or access to resources outside of the library's own four walls. No access to the services that we all take for granted.

If you would like more information on the one-act play (perhaps so you can do something similar), contact Debby Emerson at the Rochester Regional Library Council.

Article: Building an Online Library, One Volume at a Time

I saw mention of this article in a German blog, but couldn't get at the original article (stopped by the subscriber screen). Thanks to digitizationblog for pointing us to the URL for the free article available at WSJ.com. Here we get a peek inside the digitization work occurring at the University of Toronto and see pictures of the scanner they are using to digitize books. The scanner being used requires human intervention and is not a robotic machine.

Now we know from talking to people at Kirtas and comments posted in this blog that the University of Toronto was using a Kirtas scanner, yet this does not look like a Kirtas scanner. So who's scanners are these? And what happened to the Kirtas scanners? We do know from the article that these scanners cost $20,000 to $40,000 each, which is much less than a Kirtas scanner. Is this one of the scanners that Brewster Kahle and the Internet Archive are developing? If anyone has answers, I hope you'll let me know.

BTW here is a good quote from the WSJ article:
Mr. Kahle estimates it costs about 10 cents a page to get a book online, taking into account equipment, labor and the cost of hosting the pages on the Internet Archive's Web servers.

Monday, November 14, 2005

New York Online Virtual Electronic Library (NOVEL) promotional & help materials

New York State has build an online virtual library, which I bet most New Yorkers still don't know about. (As far as I can tell, there hasn't been any mass promotion of NOVEL. No radio, newspaper, or TV ads, for example.) The Rochester Regional Library Council has created materials that others can use to promote NOVEL as well as help people understand what NOVEL is.

By the way, one of the things that NYS has not done is to create a really good front end to NOVEL (which is why the materials created by RRLC are helpful). The existing front ends assume that you know why you're at that page and what is available to you, yet that "what" is not going to be intuitively obvious for most people. Hopefully the State Library will spend some of its money improving the front end and making NOVEL known by every New Yorker.

Peter Drucker, 1909 - 2005

In 2002, Peter Drucker was one of the keynote speakers at the Special Libraries Association conference in Los Angeles. He drew a large and appreciative crowd, and was comfortable just talking about management (no fancy presentation) which kept all of us entralled. Let's hope that we all can have such long and fruitful lives...and just an imporant impact on our professions.

Friday, November 11, 2005

Obtaining copyright permission

I'm working with a client to help them prepare to apply for grants to do create a web site that will included digitized materials. The budget at this point is limited, so I'm working as quickly and efficiently as possible to get the minimum information they will need in order to outsource the entire project when funding becomes available.

One area that will need to be addressed is obtaining copyright permission to digitize some of the materials. Based on the few hours I had with the collection, I've suggested a few items to digitized including some where permission will be needed. However, I tried to selected materials where the copyright holder might easily be persuaded to give permission (or at least that is my hope). I have someone in mind who could do the leg work and contact the copyright holders which will be tedious work. This week I found a two-page article in Information Outlook (Oct. 2005, pp. 42-43) that talks about obtaining copyright permissions ("Enterprise-Wide Copyright Permissions"). The articles provides brief guidelines and an overview of the procedure that I think will be helpful to my client (likely as a refresher for them) and for the others that work on the project. [And a brief article is always easier for people to digest than a long in depth one.]

The article points to a document at Washington State University entitled "Getting Permission: Where and How?" which is a guide on obtaining permissions for various types of materials (e.g., music). Besides this article, the WSU web site has other useful information on copyright, so a good resource to bookmark.

When the client completes this project (assuming they do get funding), I hope they will write an article about it, or place lessons learned on the web site. I think people will be interested to know how it worked to outsource everything and how the resultant project impacted the institution.

Thursday, November 10, 2005

How is it suppose to look? (Product testing)

As you build systems and create information products (e.g., digital libraries), you assume that you know how it will look to people who use it. However, do you ever check to see if your assumptions are true?

This came to mind to day after receiving an electronic document from someone and seeing that the fonts used were not fonts that my computer recognized. It was easy to rectify, but here someone had made an assumption about what my machine could handle (and what the document would look like) which was not true. This problem can also occur with web sites because of how monitors can vary, screen sizes, differences in browsers, etc.

The solution? Testing, testing, testing. Okay, you know to test a web site on multiple browsers and monitors. And you do test your PowerPoint presentations on different machines (and in different room environments) if time permits. But what about those other files you send around and those formatted/stylized e-mails? During the summer, I sent an e-mail to a a large group of people and I did test it ahead of time by sending it to my many e-mail accounts on different e-mail systems. That allowed me to see how the message might look on the receiving end and led me to make several changes to the e-mail message. (Oddly, I found that the best thing to do was to build the HTML e-mail in FrontPage then copy it into Outlook. That seemed to ensure that the formatting was more correct when people received it.)

Testing, though, is time consuming and it slows us down. We skip testing because we think all will be okay (based on our assumptions), but that is likely when something will go awry. For example, a remember hearing a story about an international non-government organization (NGO) who had created an information product to be used by people around the world. The product was housed on the organization's central computers and people accessed it over the Internet from various parts of the world. Unfortunately, the organization assumed that the access would be as fast and cost effective as it is in the U.S. It wasn't. Testing early on could have lead them in the right direction. Instead, they ended up getting feedback after the product had launched and then devising the product and disseminating it on CDs.

You might say that the product definition phrase should have been done better, but testing finds those things that aren't thought of. Testing during a products development ensures that it meets the users' needs and works within their environments.

Yes, testing does cost us time and money, but it can save time, money, aggravation and reputations in the long run.

If you're in the middle of creating something that people will see on their own computers and in various environments, why not take some time to test it now. Make sure your assumptions are true.

eEurope Digitization Week, November 14-17, 02005

The Ten Thousand Year blog has a posting about the eEurope Digitization Week to be held November 14-17, 2005. The press release that the blog quotes says:
Monday 14th November- Thursday 17th November will see a week of events promoting the use of digital collections in museums, libraries and archives across Europe.
The week includes the events to promote MICHAEL, the Multilingual Inventory of Cultural Heritage in Europe. According to the MICHAEL web site:
MICHAEL aims to provide simple and quick access to the digital collections of museums, libraries and archives from different European countries. Work began in June 2004, with the focus on implementing an innovative multi-lingual open source platform that will be equipped with a search engine. By 2007, the MICHAEL platform will be capable of retrieving digital collections that are dispersed across Europe. There will be many uses for MICHAEL, for example students and researchers will be able to discover information about European collections that might previously have been difficult to find. The services will also support cultural tourism, the creative industries and other interests.
I think we'll have to learn more about MICHAEL....

Wednesday, November 09, 2005

I'm back and thinking about "accuracy"

Some people might trust what is on paper more than what is in electronic form. They would believe that the what has been produced on paper has been researched, proofed, and is accurate. But we know that isn't true. In accuracies abound and some are intentional (like directory publishers who add in a fake entry in order to catch people copying their publication).

Some people might trust what is in electronic form, yet it is easier to spread lies and inaccuracies electronically. An an error can easily be replicated. Even Microsoft makes errors in the information it delivers (thinking specifically of Microsoft's calendar for Outlook thinking that Election Day was on Nov. 1 instead of correctly on Nov. 8).

The bottom line? How do people know that the information you're presenting to them in your digital library is correct, accurate, authentic...? Do they trust you because of who you are (or your institution)? Must they already know enough in order to know that the information is right? Do they just need to blindly trust the accuracy? Should you include a "stamp" of approval?

Funny that after a week away, this is the topic that came to mind. Maybe its from watch a bunch of TV news programs and wondering if I can trust what they are telling me. Or maybe its from knowing that Microsoft applied the Election Day rule wrong. If they got that wrong, what about all the other information (e.g., holidays) that we trust them on?

Presentations online from Strategic Issues in Managing Digital Assets

As posted on sigdl-l:

On October 28, 2005 CNI co-sponsored a symposium on Strategic Issues in Managing Digital Assets, working in partnership wtih the Association of Research Libraries, the Council on Library and Information Resources, and the Digital Library Federation.

The text of the keynote address by Don Waters of the Andrew Mellon Foundation, and the presentations by the other speakers and panelists, are now available at

http://www.arl.org/forum05/

Tuesday, November 01, 2005

An early November respite

I'm going off-line for about a week (Nov. 2 - 9). A respite of sorts and time to catch up on other things. This will give you time to catch up on reading posts in other blogs (because if you're like me, you are dreadfully behind in your reading).

Report on the project "Chartres: Cathedral of Notre-Dame"

Carole Ann Fabian told me on Friday (at NYLA) that the final report published on this University of Pittsburgh project contained cost information and other data. Okay, no cost details, but an overview of what they spent, how long the project took, goals and accomplishments. The report is 12 pages in length. Although I haven't read the report in detail, I can already see a level of honesty that we all appreciate. For example, they talk about underestimating how many images would be needed to visually document Chartres Cathedral.

But what should be digitized?

I write an article now in each issue of WNYLRC Watch, published bimontly by the Western NY Library Resources Council. Below is the article I wrote for the November/December issue, which may be of interest to some people outside of the WNY region. If it strikes a cord with you, please leave a comment and tell me your thoughts.

But what should be digitized? This question exists in the back of the head of every person who is considering digitizing materials. Deciding what is not an easy task, but it can be discerned when thinking about the collection from several points of view.

What story does the material - the collection - tell? The materials must relate to something and either tell a story or enhance a story that has already been told. For example, abolitionist journals or diaries from the 1800s would help to document the Abolitionist Movement and give insight into the abolitionists' actions, how they felt about slavery, and how they interacted with slaves. Those diaries, if digitized, would be useful to history students and researchers who would not travel to view the materials. (Researchers often will peruse materials online in order to evaluate the relevance of the material to their research and make a decision about whether or not to travel to view the items in person.)

Who is interested in that story? If you want external funding to help digitize the materials, then having materials that relate to a story of regional, national or international significance is important. You may want to consider collaborating with other organizations in order to build a stronger and more comprehensive collection. Some agencies prefer to fund collaborative projects because the resultant collection is stronger in scope. There is added value to the project by incorporating the staff input from more than one institution. Many organizations - including for-profit companies and government agencies - digitize heavily used internal documents that are primarily of interest to the institution itself. This can help them be more efficient in their day-to-day operations as well as help them serve their constituents more effectively. If you are going to digitize materials that are generally only of interest to your own institution, you may need to digitize the materials with your own internal funding, since an external funding agency likely will not be interested in funding the digitization of materials for such a limited audience.

Will people find the materials useful if accessed online? Although it is possible to digitize nearly anything, not everything should be digitized. You want to digitize materials that people will use and understand. One way to assess if the materials will be used online is to look at how frequently they are requested or accessed currently. Materials that are frequently used have an audience that understands them and appreciates them. Those materials will be well-used online. Yes, you can digitize materials that are rarely used, but is doing so the best use of your resources? Likely your institution and funding agency will want to concentrate on materials that it knows will be used and appreciated.

If it's not going to be useful online, what do you do? There are two things to consider in determining the answer to this question:

  • Document the story that the materials tell (or help to tell) and place that information online. You might include a few digitized items, to illustrate what is in the collection, but the focus is on communicating the story, the history, or the event. There are benefits to doing this: By placing the story online, the institution hopes to develop greater awareness of the collection. Information available on the Internet may attract more researchers interested in utilizing the collection. This documentation will provide the framework upon which the digitization project will be constructed. An interesting example of this is the information online about the George Eastman House in Rochester, NY (http://www.eastmanhouse.org/inc/visit/house.php). The text gives the history of the House and explains what the House and gardens contain. Once this information was online, the curator of the House found herself being contacted by people who had read the descriptions, viewed the few digital photos, and wanted to know more about the contents of the House (not about its photographic exhibits or motion picture archive). These contacts were unexpected, but also very welcome since they demonstrated that people were interested in the Eastman House itself.
  • Digitizing the finding aid for the collection, before digitizing the actual material for the collection, would allow researchers and those interested in the topic to fully understand the scope of the collection. For example, the University of Buffalo has a Finding Aid for the North American New Music Festival Archive (http://ublib.buffalo.edu/libraries/units/music/spcoll/NANMF/). The online finding aid includes a descriptive summary,administrative information, historical note, scope and content note, series description, container list, and related resources.

Is digitizing for you? This is the million dollar question! It is likely that your institution does have materials that researchers would use if they could find the digitized materials or a finding aid about them online. What you need to decide is whether you want to do a project on your own or collaborate with another institution, and when. For now, learn as much as you can about digitization by attending the workshops being offered by WNYLRC. Knowing more about the various areas that are impacted by a project - including copyright - will help you make better decisions about what to do and how to do it, whether the project is collaborative or not.

Article: Microsoft Launches Book Digitization Project—MSN Book Search

I like the last paragraph in this article, written by Barbara Quint, which says:
And what’s in this for Web users? It seems one thing has become clear. All the major search engines, not to mention the world of Web users, now believe that all information should come onto the Web. Google’s mission statement—”to organize the world’s information and make it universally accessible and useful”—seems to have become the mantra for all major Web suppliers. The race has begun to get it all done.

DPC/PADI What's New in Digital Preservation - Issue 11 available

As received on the Digital-Preservation discussion list:

Issue no. 11 (June - September 2005) of the DPC/PADI "What's New in Digital Preservation" bulletin is now available from the Digital Preservation Coalition Web site and the National Library of Australia's PADI Web site:

National Library of Australia:
http://www.nla.gov.au/padi/qdigest/sep2005.html
Digital Preservation Coalition:
http://www.dpconline.org/graphics/whatsnew/

"What's New" is a summary of selected recent activity in the field of digital preservation, compiled by Deb Woodyard-Robinson for the Digital Preservation Coalition (DPC) and Marian Hanley of the National Library of Australia. Items are compiled from the Preserving Access to Digital Information (PADI) Gateway and the digital-preservation and padiforum-l mailing lists, although additional or related items of interest may also be included.

Issue 11 features news on projects funded by the Digital Curation Centre, the Museums, Libraries and Archives Council (UK), Digital Preservation Coalition, (DPC), Library of Congress, National Archives and Records Administration (US), Long-Lived Data Collections Task Force (US), Center for International Earth Science Information Network, the Daniel Langlois Foundation for Art, Science and Technology (Canada) and the National Consultation on Access to Scientific Research Data (Canada).

The bulletin also includes summaries of recent publications on the themes of digital preservation research and directions, digital preservation readiness, digital repositories, web archiving, e-prints, preservation metadata, standards, personal archiving, storage media and digital preservation training. Summaries of other selected recent publications and information on past and forthcoming events are also provided.