Friday, June 29, 2007


Amish buggyI went to a local diner for breakfast that is near the Greyhound bus station and the Regional Market (think farm-fresh foods). As I meditated over my cup of coffee, a gentleman came in who was Amish. Having grown up near Amish country (and visited areas that are heavily populated by the Amish), they are a group I know something about. However, not everyone knows who the Amish are or understands even vaguely their lifestyle. As I watched him, I wondered how could more people learn about the Amish? Through information provided online, of course! We could photograph, digitize, record audio and use other technology to document and explain their lives. We could provide ways for people all over the world to understand and appreciate the Amish.

And here is the ironic part. The Amish shun electricity and thus the digital age. (Yes, I know there are exceptions.) So...everything we created would be things the Amish likely would never see or use because of their religious beliefs. We might never even be able to get feedback in order to know if our documentation was correct. Would they deem our efforts to be culturally insensitive if they saw (or heard) them? We might never know. We might be stuck guessing or getting an opinion from someone who used to live in the Amish community (an opinion that might not be correct).

Cultural sensitivity is something that we (as a global community) are thinking more about these days. Thankfully, some cultures are helping us become more sensitive. For example, in the National Museum of the American Indian, they allow photos to be taken of the exhibits. However, they do place "do not photograph" signs around exhibits where it would be inappropriate to take photos (like the ghost dresses which are part of this exhibit). The World Digital Library has publicly stated that it wants to be cultural sensitive as it builds its collections; a task that will not be easy.

Here in Onondaga County (New York), I've heard local librarians talk about displaying Native American artifacts. The problem? They hear from one elder that a specific item cannot be displayed, then hear from another elder that it can. They have learned that being sensitive is not always straightforward or simple. It can be confusing and it may lead them to not display something when they don't have a clear answer. (Sounds a bit like the problems we have in copyright clearing items!)

So what is the point of this rambling? You can find ideas for digitization projects everywhere. However, not everything that can be digitize, should be digitized because of the beliefs of the culture. And sometimes you might be able to digitize materials from a culture, but not be able to get proper feedback from the culture (community) in order to understand if you've interpreted the materials correctly. Finally, this is not a unique problem. Unfortunately, there is not a common solution.

All that from a cup of coffee.

Technorati tag:

Thursday, June 28, 2007

Booklet: DSpace How-To Guide: Tips and tricks for managing common DSpace chores

This short booklet is intended to introduce the commonest non-obvious customization-related tasks for newcomers to DSpace administration. It has been written against the stable version 1.4.2 of DSpace and Manakin 1.1. We have tried to include instructions for different operating systems as required; most customizations, however, work identically cross-platform. This booklet was created as a handout for the tutorial "Making DSpace Your Own", at the Joint Conference on Digital Libraries (JCDL) 2007 in Vancouver, British Columbia, Canada.
The introduction states:
This short booklet is intended to introduce the commonest non-obvious customization related tasks for newcomers to DSpace administration. It has been written against the stable version 1.4.2 of DSpace and the Manakin user interface, version 1.1. We have tried to include instructions for different operating systems as required; most customizations, however, work identically cross-platform.
It's free and distributed under Creative Commons Attribution-Noncommercial-Share Alike 3.0 License. For those using DSpace, this could be a tremendous help.

Technorati tag:

Wednesday, June 27, 2007

Article: Librarians: We're still vital in the digital age

I received a phone call yesterday from a reporter who wanted to know the value of libraries now that "everything" is online. I spent a few minutes with her on the phone and she got excited when I outlined the role for libraries today. What I said, as well as information from others was published in USA Today.

Michael Dowling, director of ALA’s International Relations Office, is quoted as saying,
"There's this idea that with everything available online, there's no reason to continue building libraries. But libraries do so much. They are lifelong learning centers. This is an opportunity for us to reach out." I noted that the challenge "is adapting to the ways people want to access resources." Although she called me because I'm a digitization consultant, I never mentioned digitization in my responses to her questions. Instead I talked about meeting the needs of every generation as well as those who do not have access to technology (e.g., broadband Internet). I did not want her to get sidetracked into a narrow focus for libraries. I wanted her to understand the bigger picture; the bigger role.

You can read the entire article here, which include information on what librarians did at the American Library Association Annual Conference besides going to sessions, eating and sightseeing. You might be pleasantly surprised!

Technorati tags: ,

Institutional Repositories, SPEC Kit 292, July 2006

The Association of Research Libraries periodically has a SPEC Kit created to address best practices in a specific area for libraries. In 2006, Charles W. Bailey, Jr. chaired a group that created a SPEC Kit on "Institutional Repositories." The table of contents and executive summary (9 pages) are available for free, along with information from other SPEC Kits (here).

Institutional repositories "collect and provide access to diverse, locally produced digital materials." For the purpose of the SPEC Kit, "an IR was simply defined as a permanent, institution-wide repository of diverse, locally produced digital works that is available for public use and supports metadata harvesting." The last two point -- public use and metadata harvesting -- mean that the content is open for many to see and use.

In my mind, an institutional repository is a type of knowledge management system, even through the definitions for both are somewhat different. Many people scratch their heads when confronted with these different "content" management systems. How are they different? How are they the same? Will we, at some point, just have systems for managing content without worrying about if it is a IR, KM, CMS or...? I hope so! For now, though, the differences do matter. However, no matter what system your working with, the executive summary for "Institutional Repositories" undoubtedly contains information that would be useful to you. It is well-written and concise.

Technorati tag:

Monday, June 25, 2007

Print on Demand services from BookSurge

Yesterday I posted about a print on demand technology that is installed at the New York Public Library. Today, as I read through more of my RSS feeds, I saw this from Friday, June 22:

BookSurge, an Amazon group and leader in Print on Demand services, and Kirtas Technologies, a world leader in high-quality nondestructive book digitization, today announced a collaboration with universities and public libraries to preserve thousands of rare and inaccessible books from their collections and distribute them via BookSurge’s Print-on-Demand service. This collaboration, which will greatly enhance the selection of rare and historic books for sale on and other retail channels, represents a breakthrough approach to digitization and preservation that will ensure the public will have access to these works indefinitely via Print on Demand. This initiative will also help these institutions fund their mission of preserving these vast literary collections by offering a revenue source from the sales of content these institutions own or that is in the public domain on Kirtas will provide economical, nondestructive scanning technology. In addition to providing funding for their ongoing digitization efforts, this collaboration gives libraries and universities complete control over what is being scanned in their collections.

Emory University, University of Maine, Toronto Public Library, and Cincinnati Public Library are the first organizations to enter into agreements with Kirtas to make their rare-book collections available to a readership that extends far beyond their physical geographies to include an audience of millions of customers. This preservation effort is the only method that allows university and public libraries to preserve books and print them on demand as they are ordered. Participating institutions retain full control over what is digitized, so they now have an economical way to preserve, reproduce and distribute important works that may be disappearing from their shelves. we have had to keep track of what libraries have aligned with what digitization vendor/project. Now we need to keep track of what print on demand services is aligned with what digitization vendor/project. At some point, will the mass digitization projects finally intersect -- perhaps because they will all need the same limited resource -- and play with each other?

Technorati tags: ,

3 -Three- Day UCLA Extension Course in Document Imaging and Document

As posted in the IMAGELIB discussion list. The email came through with some special characters in it, so I've done some clean-up and hope that the clean-up is correct.

3 -Three- Day UCLA Extension Course in Document Imaging and Document Management: Fall 2007

(1) Today, a top of the line (in Korea) Samsung cell phone can store 200 thousand scanned pages (20 file cabinets) on a 1 inch, 10 GigaByte hard drive. Next year cell phones will go solid state with 16 GigaBytes of chip-stacked memory. In 5 years cell phones will have over 50 GigaBytes of solid state memory - enough to record and store 2 High Definition (HD) movies or to store 100 file cabinets -- 800 boxes -- one million scanned pages. Storage cost will disappear as an issue and document and records management will be the focus of organizations.
[] []

(2)Microsoft released its first update to Windows and Office (called Vista)
in five years, on November 30, 2006 for corporations and on January 30, 2007
for home users. Hooks for document management and workflow have been added to both Windows and Office. We will be talking about the general trends these changes represent in document management in this Fall's course.

(3) All of the printed class materials are available free on the Internet for those who cannot attend the class:
[]. All of the materials can be downloaded with a single click and then printed with a single click. The materials are in a full text searchable PDF file. All acronyms are spelled out. You can also download the materials as native Microsoft Office files so that you can incorporate these materials in your presentations, publications, or papers. The course is generally offered every quarter.

Course dates

Three days (Fall 2007): Friday, November 30, 2007, 8:00 AM to 5:00 PM, Saturday, December 1, 2007, 8:00 AM to 5:00 PM, and Sunday, December 2, 2007, 8:00 AM to 5:00 PM at UCLA in Los Angeles. Please see below for a detailed course description. To enroll, visit [], click on 'enter keyword', then enter 'document imaging' and click on the 'search' button. Click on first instance of 'view results' on the results screen. Then, click on 'Document Imaging and Document Management'. The course will appear with enrollment instructions, click on the 'add to my study list' button. Please be careful to wait until Fall 2007 enrollment opens on August 8, 2007.

Please see the website for the course description:

Course description

This course is for managers who have been assigned to manage a document imaging system, and must start immediately, but can spend three days to study the subject and its background. This course is designed to assist managers to be more effective in bringing the immediate and long term benefits of document imaging and document management to their organizations and to their organizations' clients, customers, and constituents. Students will gain an understanding of how document imaging can be used and managed in both small and large-scale organizations. Document imaging is the process of scanning paper or microfilm documents. Document imaging moves the documents from their hard-copy format on shelves and in file cabinets to a digital format stored in computer based document repositories. Document management organizes scanned documents, paper documents, and born-digital documents in their native-format, for compliance with records retention requirements, including permanent preservation. This course provides an understanding of the details that there is often no time to review in the rush to implement a system. The course content is intended to be useful to students in their professional work for twenty years into the future and is also intended to be useful for planning to preserve digital documents forever. The course may be too broad for those students seeking to learn a specific software application. Students will learn about the technology of scanning, importing, transmitting, organizing, indexing, storing, protecting, searching, retrieving, viewing, printing, preserving, and authenticating documents for document imaging systems, and archives. Image and document formats, metadata, XML (eXtensible Markup Language), multimedia, rich text, PDF (Portable Document Format), GIS (Geographic Information Systems), CAD (Computer Aided Design), VR (Virtual Reality) and GPS (Global Positioning System) indices, image enabled databases, data visualization, finite element analysis models, animations, molecular models, RAM (Random Access Memory) based SQL (Structured Query Language) databases, knowledge management, data warehousing, records inventories, retention schedules, black and white, grayscale, and color scanning, OCR (Optical Character Recognition), multispectral imaging, audio and video digitizing, destructive (lossy) and non-destructive (lossless) compression, digital signatures and seals, encryption, the three components of vision: resolution, color, and motion, the imaging technology of continuous tone, halftoning, dithering, and pixels, RAID (Redundant Array of Inexpensive Disks) fault tolerance, ECCs (Error Correcting Codes for RAID, CD, and DVD), and mirrored site disaster planning will be discussed. System design issues in hardware, software, networking, ergonomics, and workflow will be covered.

Emerging technologies such as the DVD Digital Video Disc, HDTV (High Definition TV), and very high speed Internet, intranet, and extranet links, Internet protocol stacks, and Internet 2 will be presented. The course will include the DVD's role in completing the convergence of the PC and television, the convergence of telephony, cable, and the Internet, the merging of home and office, the merging of business and entertainment, and the management of the resulting document types. Can everything be digitized? The course follows Shakespeare through being (or not to be), love, wisdom, knowledge, information, data, bits, and discernable differences (optical disc pits). Many professionals including records managers, librarians, archivists, and compliance officers work with document management issues every day. While not limited to these professionals, this course builds on the broad range of tools and techniques that exist in these professions. The class content is designed so that students can benefit from each part of the class without fully understanding every technical detail presented. This course is designed for non-technical professionals.

Several system designs will be done based on system requirements provided by the students. System designs are done to provide an understanding of the design process, not to provide guaranteed solutions to specific problems.

There is no hands-on use of scanning equipment. The course is designed =to improve the ability of non-technical managers to participate in, and to direct, technical discussions. Instructional techniques include storytelling, iconic objects, and videos. Interaction between students is considered an important part of the learning experience.

The course covers a wide variety of materials and provides a foundation for understanding the many types of document management. However, some people might find the materials presented too broad for their purposes. If, in the course materials, you find a single area of great interest to you, but you have no interest in the other topics, it might be better if you included just a portion of the class in a self-study plan. Because the technology continues to evolve rapidly, and the spread of technology is also occurring rapidly, the course continues to evolve and is different each time it is taught.

Instructor:, BA Computer Science, MBA, MLS Specialization in Information Science, CDIA (Certified Document Imaging System Architect), CRM (Certified Records Manager), California Adult Education teaching credential, Sr. Systems Engineer, 25 years of experience in digital document imaging.

Enrollment is limited. Please call the instructor at +1 (310) 937-7000 for questions about the course. Students are encouraged to read the course materials and to speak with the instructor to determine if the course will be suitable for their purposes.

Because there is no charge for making a room reservation, and room costs increase when availability is limited, students are encouraged to make reservations as early as possible. For information on nearby hotels please see: []

The instructor has taught classes similar to this course to document imaging users and managers, in legal records management, to librarians and archivists, and to various industry groups. He has worked in digital document management and document imaging for twenty-five years. His experience in the application of document management and document imaging in industry includes: aerospace, banking, manufacturing, natural resources, petroleum refining, transportation, energy, federal, state, and local government, civil engineering, utilities, entertainment, commercial records centers, archives, non-profit development, education, and administrative, engineering, production, legal, and medical records management. At the same time, he has worked in product management for hypertext, for windows based user interface systems, for computer displays, for engineering drawing, letter size, microform, and color scanning, and for xerographic, photographic, newspaper, engineering drawing, and color printing.

The following is an example of the course materials available at []. There are also several papers that describe various document management topics in prose.

Computer storage requirements for various digitized document types:

1 scanned page (8 1/2 by 11 inches, A4) = 50 KiloBytes (KByte) (on average, black & white, CCITT G4 compressed)

1 file cabinet (4 drawer) (10,000 pages on average) == 500 MegaBytes (MByte)
= 1 CD (ROM or WORM)

2 file cabinets = 10 cubic feet = 1,000 MBytes = 1 GigaByte (GByte)

10 file cabinets = 1 DVD (WORM)

1 box (in inches: 15 1/2 long x 12 wide x 10 deep) (2,500 pages)

1 file drawer =2 linear feet of files = 1 1/4 cubic feet = 125 MBytes

8 boxes = 16 linear feet = 2 file cabinets = 1 GByte

For the 50th the anniversary of the introduction of the first magnetic disk, September 13, 2006, please see:

[] or
[] for more details = and references.

Steve Gilheany, CRM, CDIA
(310) 937-7000

Technorati tag:

Sunday, June 24, 2007

Printing books on demand at NYPL

In 2004, in a speech at the Library of Congress, Brewster Kahle spoke about a bookmobile that could print books on demand for $1.00 a piece. Now the New York Public Library has installed a direct-to-consumer book printing machine in its Science, Industry, and Business Library (SIBL).
The first Espresso Book Machine™ (“the EBM”) was installed and demonstrated today [June 21] at the New York Public Library’s Science, Industry, and Business Library (SIBL). The patented automatic book making machine will revolutionize publishing by printing and delivering physical books within minutes. The EBM is a product of On Demand Books, LLC (“ODB” -, the company founded by legendary publishing executive Jason Epstein and business partner Dane Neller, who joined SIBL’s Kristin McDonough for a private event there to speak about the EBM’s potential impact on the future of reading and publishing.

The Espresso Book Machine will be available to the public at SIBL through August, and will operate Monday- Saturday from 1 p.m. to 5 p.m....

Library users will have the opportunity to print free copies of such public domain classics as “The Adventures of Tom Sawyer” by Mark Twain, “Moby Dick” by Herman Melville, “A Christmas Carol” by Charles Dickens and “Songs of Innocence” by William Blake, as well as appropriately themed in-copyright titles as Chris Anderson’s “The Long Tail” and Jason Epstein’s own “Book Business.” The public domain titles were provided by the Open Content Alliance (“OCA”), a non-profit organization with a database of over 200,000 titles. The OCA and ODB are working closely to offer this digital content free of charge to libraries across the country. Both organizations have received partial funding from the Alfred P. Sloan Foundation.

...The EBM’s proprietary software transmits a digital file to the book machine, which automatically prints, binds, and trims the reader’s selection within minutes as a single, library-quality, paperback book, indistinguishable from the factory-made title.
The web site for On Demand Books is "down", so I pulled up some of its web pages using the cache in Google. The cached page (from June 20, 2007) said the machine can:
produce 15 - 20 library quality paperback books per hour, in any language, in quantities of one, without any human intervention. This technology and process will produce one each of ten different books at the same speed and cost as it can produce ten copies of the same book.
Besides the machine at NYPL, there are two other machines installed (World Bank InfoShop in Washington DC, and one at the Library of Alexandria in Egypt).

I wonder if Google's latest partners will get one or more of these machines? I would think it would be a nice compliment to what they are doing.

Related Post: The Espresso Book Machine, 9/28/2006

Technorati tag:

Saturday, June 23, 2007

No matter what you are advocating, you actions must be timely

Whether your submitting a letter of support for a grant application (e.g., Institute for Museum and Library Services), supporting (or opposing) a legislative action, or advocating something more local/personal, your actions must be timely. For example, when an organization submits a grant application and needs letters of support, often it is asking for those letters at the last minute which means those letters need to be written and delivered quickly. Lending support to something that is occurring within the government may mean having to contact the appropriate government representatives immediately in order for the support to be effective. Advocating often cannot wait for a convenient time.

This past winter, I worked on a grant application with a team of people. Our call for letters of support when out as soon as we had a firm idea of what people were being asked to support (and after we had some documentation to share). But our supporters did not have weeks in order to write those letters; they only had days. We needed very quick responses and got them from those who were able to act immediately.

At the Special Libraries Association annual conference, Doug Newcomb explained that SLA had created a public policy platform that allows the organization to decide quickly what to advocate for. Rather than having to poll members of the Public Policy Advisory Council about each issues, Newcomb can use the public policy platform to decide what to support. That is useful, especially when letters of support need to be done quickly (sometimes even instantaneously).

We did not mention the fact that advocacy must be timely when we talked to the Spectrum Scholars yesterday. ALA had a web site of resources to help library advocates, including the Library Advocate's Handbook (which was handed out yesterday). Being timely is mentioned in at least one bullet point, but it should be in big letters. Yes, there are things you need to do all the time, but you also need to be prepared to act quickly when the need arises.

Although geared specifically for libraries, Library Advocate's Handbook would be useful to any organization. Even new digitization programs would find information here that could help them. Yes, you may have to think a bit about how to change some of the advice to fit your situation, but some of it you could use immediately like the Shaping the Message Worksheet on page 31.

Finally, if you know that you will need people or other institutions to be advocates for you, educate them ahead of time about what you are going to need and when, as well as why. Get them on board now. Get their questions answered now. Then when you need them to act quickly, they will be prepared to do so.

Technorati tags: ,

Friday, June 22, 2007

ALA 2007 Spectrum Leadership Institute

As I noted yesterday, today I was one of the speakers at the American Library Association (ALA) 2007 Spectrum Leadership Institute. Spectrum is "a scholarship program designed to improve library service through the development of an ethnically diverse workforce" sponsored by the ALA Office for Diversity. In 2006, ALA awarded 69 scholarships and I believe all of those scholars were at the Institute.

It was rewarding to be among this group and see their enthusiasm for the profession. It was also heartening to hear the speakers who talked about being librarians outside of the library, the morning's first session. Those speakers were Sandy Littletree (independent contractor), Elisia Johnson (prison librarian), Anne Caputo (corporate director), and Marcia Farabee (orchestra librarian). Adding to the diversity of the already diverse group, Littletree, who is Navajo, introduced herself in the Navajo language as well as in English and Caputo shared proudly that she is Potawatomi. Of the four, Farabee had the job that I had never thought of -- orchestra librarian -- which sounded fascinating.

Having been a Dialog user since 1981, it was interesting to hear Anne Caputo talk about going to work at Dialog as their sixth employee (obviously before 1981). Her involvement in the information industry goes back a long way and I suspect that she had forgotten over the years more than many of us will ever know about the inner workings of the industry.

I bet it was interesting for the Scholars to hear a bit about the value of these individuals and their earning power. When Caputo started at Dialog, librarians had the ability to earn more than they thought they could. And I remember in 1983 knowing how much I was worth if I went to work for the federal government, and then being shocked to learn what I was worth in the corporate world. Johnson was quick to impress on the group that being a prison librarian pays well. People with skills in information storage, retrieval, analysis, etc. are valued assets (than as now).

I spoke on a panel with Doug Newcomb from the Special Libraries Association and Jonathan Band, an attorney who is focused on technology law and policy. We were focused on advocacy. Newcomb gave an overview of several policy issues that library organizations are following globally. Band spoke specifically about Orphaned Works and H.R. 1201 (The FAIR USE Act of 2007). Band was asked numerous questions about copyright and other issues, which was good. These are issues that these new librarians will need to understand, so it was good to hear/see them wanting to know more.

I spoke on personal advocacy and used these four rules as the basis for my talk. When I began working on the presentation, I created this slide of qualities, which are all good, but the four rules worked much better in the time that I had.

In all of our careers, we must advocate for ourselves. We must also empower those around us to be our advocates, our supporters. They must be able to tell our stories and pull other support towards us. After today, I think that every Spectrum Scholar will advocate for themselves and also advocate for the profession. If they do that, they will be unstoppable.

It was a wonderful day. Good conversation and energy. I hope this won't be my last time interacting with this group.

6/23/2007: Sorry about the typos I had in the title earlier! I wrote the post at the end of a long day. And thanks K.P.R. for saving me further embarrassment!

Technorati tags: ,

Thursday, June 21, 2007

Personal advocacy

Personal Advocacy
Tomorrow I am speaking at the ALA 2007 Spectrum Leadership Institute in Crystal City, VA. The session I'm in is on "Advocacy and Librarianship in the Larger Consciousness" and I'll be talking about personal advocacy. Although we will not be able to use slides, I have created this one slide to share here with my key points on it. (I'll also point the audience to it.) The last point may not be obvious, but being self deprecating can be a great way of disarming people.

(You can click on the image to get a clearer one to view.)

Technorati tag: ,

Wednesday, June 20, 2007

Book scanning: Emory Univ. & someone's MPOW

On June 6, Emory University announced that is will be digitizing "about 200,000 of its volumes that are in the public domain and to make the materials available online free or available for purchase as inexpensive print-on-demand volumes through While people would pay for the print-on-demand books, Emory officials said that pricing would be designed just to cover costs, not to earn a profit for the university." (link)

In the actual press release, Martin Halbert, director for digital programs and systems at Emory's Woodruff Library, stated:
We believe that mass digitization and print-on-demand publishing is an important new model for digital scholarship that is going to revolutionize the management of academic materials...Information will no longer be lost in the mists of time when books go out of print. This is a way of opening up the past to the future.
Emory will be using equipment from Kirtas for this project.

Speaking of Kirtas, has a post about the book digitization efforts at his place of work (or as he says "MPOW" or my place of work). He says publicly what many say privately -- the high cost machines are nice, but not for everyone. And some machines aren't for all types of bound materials. For example, he describes the BookDrive by Atiz as a "fully enclosed unit (reminded me of a toaster oven) that turns the pages of the book via an arm with a mild adhesive on it." That's not something you'd use on a priceless book.

Technorati tag:

Follow-up to "The Google Project continues to grow"

A couple days ago I wrote about the latest partners to the Google book digitization project. Yesterday, Roy Tennant also blogged about this. He noted:
To this point, the only Google partner library to aggressively mount the digitized books in its own repository has been the University of Michigan. Therefore, it surprises no one that the University of Michigan, which had already developed their MBooks platform for its own digitized books, will serve as the central repository for the CIC project.
That was a connection in the story that I had not seen, so thanks Roy for pointing it out.

Later he wrote:
This project raises the bar for the other libraries participating in mass digitization projects. Most of the libraries cooperating with Google are making no effort to mount the resulting files themselves. Some may not even be keeping a copy of the files. I think it is disturbing that we don't even know how true that statement might be.
It is disturbing that these libraries are relying so heavily on Google to digitize the materials and make them available. Libraries have gotten burned by companies/vendors in the past that made bold promised then didn't keep them. I'm not saying that Google won't be around forever, but is their future really guaranteed? And will the digitized materials be maintained always as these libraries hope it will be? I hope someone at every Google partner institution has considered those questions.

Technorati tags: ,

Tuesday, June 19, 2007

Event: Digital Preservation Conference, November 28-29, 2007

As posted on Sigdl-l.

Are you responsible for preserving digital collections?

November 28-29, 2007
Hilton Seattle
Seattle, Washington

A conference on digital preservation presented by the Northeast Document Conservation Center (NEDCC) and co-sponsored by the OCLC Western Service Center.

Taught by a faculty of national experts, this two-day conference on digital longevity provides information about the latest developments in digital preservation to help you with the life-cycle management of your institution's collections.

Planning your fall budget? The conference fee has been set at $350.

Watch NEDCC's Web site for full conference details coming in early August.

Partial funding for this conference is provided by the Institute of Museum and Library Services.

To receive a conference brochure when available, please send post mailing address to:
Julie Martin Carlson,

Technorati tag:

Blog post: A View of Regional Digitization Centers

Peter Murray (Disruptive Library Technology Jester) has a written a long blog post about the way regional digitization centers work. Murray writes:
As a part of work for an OhioLINK strategic task force, I have been exploring the creation and operation of regional/collaborative/shared digitization centers. This is a report of findings to date after an open call for information. The report is structured with questions to be explored when considering a regional digitization center followed by narratives from conversations with the Collaborative Digitization Program (formerly the Colorado Digitization Program), the Mountain West Digital Library, and the Ohio Historical Society.
The creation of regional centers is a topic that is often discussed by various consortia. His post outlines some of the questions that should be asked, which some will find very helpful. And reading about the centers that others have developed (then dissolved in some cases) is quite useful.

Technorati tag:

Monday, June 18, 2007

The Google Digitization Project continues to grow

In some ways, the larger this project gets, the less news-worthy it becomes. Yes, more institutions have joined the project. Yes, this is wonderful. No, there are no new details about how they are doing it and nothing trickling out of this project that will help smaller projects with technology, metadata, processing, etc. (When that occurs, that will be news.)

Here,though, are a few interesting quotes from the Penn State press release abut the agreement between Google and the 12-institution consortium called the Committee on Institutional Cooperation (CIC):
"We haven't identified the specific works to be included yet," said Nancy Eaton, dean of the Penn State University Libraries. "However, the aggregation of large collections is more important than any specific title, as it is the 'critical mass' of large collections that will make Google the place for users to go to search first."
As a part of the agreement, the consortium also will create a first-of-its-kind shared digital repository to collectively archive and manage the full content of public domain works digitized by Google that are held across the CIC libraries.
I would think that as Google enters into more agreements that selecting books to digitize could become more of a headache. They must consider what they have already digitized, what is already in the pipeline, what they have promised to digitized for their existing partners, etc. New partners must bring to the table, I suspect, something unique that can easily be identified upfront.

BTW there must be a massive database within Google that tracks all of this stuff. Wouldn't that be interesting to look at?

The second quote notes that the CIC will create a shared digital repository of public domain worked digitized by Google and that these 12 libraries already hold. Notice that the wording does not say that these books will necessarily be digitized from these 12 institution, but that they "hold" them. So it could be -- if I read this correctly -- that they will build this repository using books already digitized by Google elsewhere that are in the public domain and that already exist in their collections. That could be quite nice and very valuable to students and faculty.

Technorati tags: ,

Friday, June 15, 2007


The calendar in the left column of this blog disappeared for a while, but is back. My summer schedule is not as laid back as it looks. Yes, I'm speaking next week at the ALA Spectrum Leadership Institute and then have two workshops scheduled for August on Second Life. What doesn't show is that I'm actually helping with a series of continuing education classes on Second Life that are being hosted by the University of Illinois at Urbana-Champaign and that are being coordinated by the Alliance Library System. These classes are being done "in world" and through their online course management system. (And of course, it doesn't display the project work I'm involved in.)

Does Second Life relate to digitization? Not directly. I find it interesting to see how people are using this three-dimensional digital world and to think about how this technology might impact us in the future. Second Life does relate to the work I do with social networking tools and Web 2.0, so it is not as far off-track for me as you might think. I do hope that some aspects of the Second Life interface will migrate outside of Second Life. I think some of the features of Second Life could be useful to how digital assets are displayed online, but I know that tool/service/?? is a quite a ways into the future.

While my summer speaking schedule seems to be focused on Second Life, you'll notice that digitization workshops are scheduled for the fall (and there are several more in the discussion stage).

BTW If you are at any of the same events I'm at, please stop me and say "hello." If you have time for a cup of coffee, let's do it!

Blog post: Does your library make sure that vendors aren't able to track library users' seaches?

LibraryLaw Blog asks that question in thinking about online databases and integrated library systems (ILS) that libraries use. But we have the same concern when it comes to searches run in our content management systems (or digital asset management systems or...). Mary Minow notes that you can write into a contact something about this, but that you also need to be sure that there is nothing that logs who has done what. I'm sure this is something we're not thinking about in a big way...

Technorati tag:

Thursday, June 14, 2007

Site for Kids: Taking the Mystery Out of Copyright

The Library of Congress has launched a copyright site geared for students and teachers. As soon as I saw an email announcing it, I jumped over to the site to check it out. Now I'm disappointed. This is a site built by adults trying to tell students what they need to know. I doubt that it will be any interest to students (or that they will find it useful). The LOC should have built a site with input from students so it would be something students would truly use.

Students need a way of understanding copyright on all materials -- including those that we digitize. Let's hope that a site is built that is really geared for them (perhaps by the Creative Commons), 'cause this isn't it.

Technorati tag:

Disadvantaged business

A few months ago, I talked to someone who is using Business Technology Career Opportunities, Inc. (BTCO) as their digitization vendor. The person said nothing about this company's services or quality, so I'll assume that the company provides excellent services and a high level of quality. (In fact, if you look at their web site, you'll see that they do work for an impressive list of clients.) What the person did say was this this was a "disadvantaged business." (The BTCO web site, though, does not use that phrase.)

Since you may hear of companies being described this way, let me tell you what a disadvantaged business is.

A disadvantaged business is a small business in specific categories (e.g., women owned, minority owned) that have not had the same access to contracts are others -- generally larger -- businesses. Governments (e.g., federal and state governments in the U.S.) often set aside a percentage of work for these businesses in order to ensure that they can compete with their larger "advantaged" counterparts. Being designated as a disadvantaged business can be helpful for the business because it can help them obtain work. It does not mean that the business produces lower quality work or inferior products.

What I did notice on the BTCO web site is that it has a diversified labor force. The site says:
Business Technology Career Opportunities, Inc., (BTCO) proudly provides technical careers for people with disabilities and superior customer solutions through a fully [integrated] workforce. Founded by the Cerebral Palsy Research Foundation of Kansas (CPRF), BTCO is an approved NISH/JWOD community rehabilitation provider (CRP) that supports the employment of people with disabilities through computer-based technology services, fully adaptive work environments, and competitive wages with full-fringe benefits.
There are other highly qualified digitization vendors who use a similar labor force (e.g., Out-Source Document Imaging). In addition, UNICOR uses inmates at correctional facilities to do their digitization. Does that mean that the work is lower quality? No. And I should also mention that one company uses monks as their workforce.

The bottom line is that if you hear a company described as "disadvantaged" or words are used to describe their workforce that "make you wonder," ask questions. The word "disadvantaged" as we can see means something very specific. If you take it out of context, it may have the wrong meaning. So be sure you know exactly what the person meant.

Technorati tag:

Monday, June 11, 2007

The Future of the Past

This is an except from a post I did in the SLA blog. Since this section focuses specifically on digital preservation, I wanted to repeat it here.

Monday afternoon [June 4] there was a session entitled "The Future of the Past" with Victoria McGargar and Peter Johnson. Sponsored by the News Division, they talked about creating institutional repositories. Some institutions are realizing that long-term preservation is tough.
  • McGargar mentioned that one signification digitization project has quietly started to collect pristine paper backup.
  • She noted that at the LA Times, they tested 300,000 JPEG files and found that 10% of them were corrupt.
  • Non-monetized assets are especially at risk. (Small collections, personal papers, etc.)
  • Although we may migrate files successfully, she should examples of where the resultant files had still become unreadable because things just didn't "translate" well.
  • Files also can become corrupted without provocation.
  • We need a clearer understanding of what "preservation" means.
  • She talked through the core requirements for a trusted digital repository, which can be found here.
  • I have a note that says that the Copyright Clearance Center may start an orphaned works repository. I'm sure she must have thrown that out as an idea or some that is being discussed, but not something that is occurring now.

Technorati tags: ,

Sunday, June 10, 2007

Digitization vendors at the SLA conference, part 3

As I have said before, there were several digitization vendors who exhibited at the SLA Annual Conference this year. This demonstrated that the vendors see special librarians -- those focused on special topics/collections -- as an important market. I've already blogged about some of the vendors, so let me talk about the rest.
  • Google -- Okay, Google wasn't there as a digitization vendor, but given their impact on the industry, it is important to note that they were present. They did have handouts about Google Book Search, one of which talked about the impact on libraries and librarians.
    • "We see our role as complementary to libraries and librarians. Our aim is to help people search and discover all the world's books." Then they talk about helping people find both digitized and non-digitized books.
    • What's in it for Google? "...we hope to provide a better, more comprehensive and useful experience for Google users around the world."
  • Backstage Library Works -- I talked to the booth staff about how the work with organizations and found that they will come on-site if necessary and will either bring all the staff with them that they need or high (and train) some of the operators locally. So they are very flexible in how they will work with a customer. We also talked about the fact that they are doing more projects outside of the U.S. (The places I remember hearing were all in Europe.)
  • S-T Imaging and eImageData both exhibited microform scanners that can be used by patrons (not just by professional staff). There are several difference between how the machine operates with one of the noticeable ones being that the ST200 (by S-T Imaging) does not place microfilm between glass, while the ScanPro 1000 does (eImageData). S-T Imaging reports that by eliminating the glass, it has eliminated the thing that often scratches microfilm. It would be nice to see a side-by-side comparison of the two machines (with software) in order to really see the differences.
  • Indus USA has their 5002C which is an overhead book scanner. Their Walk-up Kiosk System has "touch screen technology for easy use by library patrons who intend to scan or copy pages from bound books or periodicals. It allows patrons to save scanned images to a network drive, USB flash drive, print to a printer or send via e-mail, all with the software used with the touch screen monitor." (From their literature) As I saw, these features are being built into several systems geared towards end-users/patrons.
  • Digital Library Systems Group / Image Access -- They had two scanners in their booth! One was the Knowledge Image Center (KIC) for use by patrons. (This scanner looked very similar to the one by Indus USA.) The other machine they had was the Bookeye 2 which is meant to support interlibrary loan in a library. They did not have in their booth the Bookeye 3, which is a more high-end scanner for digitization projects. Like the others, it is also an overhead scanner, but had different features that would make it more appropriate for a digitization project.
  • nextScan -- I must admit that I stopped by the nextScan booth at the end of the day, when everyone was leaving the exhibit hall. So I picked up their literature, but didn't talk to anyone. nextScan provides equipment for fiche and film scanning. Their products are all production machines and are not meant for end-users. For example, the flexScan can scan rollfilm up to 240 pages per minute or microfiche up to 125 images per minute.
    • Long established libraries often have many microforms, so these production machines can be quite useful for those institutions that can copyright clear their microforms for digitization.
  • William S. Hein & Co. -- many people know W.S. Hein as a publisher, but they also do digitization. They provide consulting, production, hosting access/control and preservation services. Their production facility includes a Kirtas APT Bookscan 1200.
  • Northern Micrographics -- Northern Micrographics will digitize and process materials on your behalf. My notes tell me that nothing significant had occurred with this company since I saw them a year ago.
Next year, the SLA Annual Conference will be in Seattle, WA, June 15 - 18. Let's hope that even more digitization-related vendors attend that conference. (Let's also hope that they are indexed in the exhibit hall guide in a way that makes them easy to find!)

Related posts:
Addendum (06/11/2007): I forgot to mention that PTFS exhibited at the SLA conference. I see and talk to PFTS a couple times a year, so I did not stop by their booth. PTFS does digitization for organizations and also markets content management software (ArchivalWare) which often goes head-to-head against ContentDM when organizations are looking at CMS solutions.

Technorati tags: ,

Event: Unlocking Audio conference London 26-27 October 2007

As posted on the Digital-Preservation discussion list.

Unlocking Audio: Sharing Experience of Mass Digitisation
26-27 October 2007
The British Library Centre for Conservation, London
First circular

Unlocking Audio
is an international conference exploring the planning and strategies required for the successful execution of large-scale audio digitisation projects, and the technical and practical issues involved. Aimed at actual practitioners, sharing best practice and looking at emerging standards, the event will be held at the British Library in London on Friday 26th and the morning of Saturday 27th October 2007.

Invited speakers include:
  • Kevin Bradley (National Library of Australia)
  • Jonathan Leong (BBC Archives)
  • Pekka Gronow (Finnish Radio archives)
  • David Seubert (University of California, Santa Barbara)
  • Mike Casey (University of Indiana)
  • Jim Lindner (Media Matters)
Provisional programme:

Friday 26th October
0930-1000 registration
1000-1015 welcome
1015-1045 keynote speech
1045-1115 coffee
1115-1245 paper sessions
1245-1500 lunch followed by tours of the Sound Archive technical department
1500-1600 round table discussion
1600-1700 paper session
1845-1930 drinks reception
1930-2200 dinner
Saturday 27th October
0830-1000 pastries and posters
1000-1100 paper sessions
1100-1130 coffee
1130-1230 paper sessions
1230-1300 closing keynote
1300-1330 round-up forum
1330-1430 lunch

Participants may be able to join optional excursions to other London sound archives and studios on Saturday afternoon after the close of the conference. Further details will be available soon.
The event will be held at the new purpose-built facility, The Centre for Conservation, located at the main British Library site in London. The Centre contains the technical department of the British Library Sound Archive and includes 10 new soundproof transfer studios, a recording studio, a small workshop and laboratory.

Call for papers

Offers of papers are invited on the planning and execution of large scale digitisation projects, including case studies, the setting of standards, and technical methods. The offers should be submitted before 2nd July along with a title, abstract of no more than 300 words, your name and institutional affiliation, and an indication of whether the paper is an oral or poster presentation, to the contact address below. Authors will be notified by 16th July whether their paper has been accepted for oral presentation. Space will be available for displaying posters and small exhibits during the pastries and posters session on Saturday morning.

Commercial sponsors and exhibitors

If your business is interested in supporting this event, please contact the address below.

Accommodation and travel

Advice on hotels and getting to the conference venue will be issued with the registration form.


The conference registration fee is £75, which includes the conference abstracts, receptions, lunches, refreshments, and breakfast on Saturday 27th but excludes the dinner on Friday 26th. The charge for the dinner is £35. Please do not send payment now.

In order to register your interest, please send your name, the name and address of your organisation and email address before 30th June 2007 to:
Alison Faraday,
“Unlocking Audio”
The British Library
96 Euston Road
London NW1 2DB
United Kingdom
Fax: + 44 (0)207 412 7777

Places are strictly limited to 50 delegates. If the event is oversubscribed, we may have to limit the numbers attending from each institution. Applicants will be notified by July 6th if they have been successful in gaining a place. The deadline for payment of registration fees and the optional dinner will be 10th August.

Technorati tag:

Thursday, June 07, 2007

Dublin Core?

DigitalKoans has a post announcing that the Dublin Core standard has been renewed and updated. Interestingly, I heard another attendee at the SLA Annual Conference "report" that Dublin Core is dead. (Unfortunately, I don't remember exactly who said it.) Dead? I wonder what markets have moved away from Dublin Core? I'm seeing and hearing digitization programs talk about and use Dublin Core, so it is not "dead" for digitization programs. Perhaps it is for general Internet uses that something is beginning to replace this standard. If anyone can confirm this (or has heard something different about the status of Dublin Core for specific uses), please leave a comment here. Thanks!

Technorati tags: ,

Digitization vendors at the SLA conference, part 2

Von Totanes (The Filipino Librarian) recorded a brief video (30 sec.) of the Kirtas BookScan being demoed at the SLA Annual Conference.

The booth staff did include a person who seemed very watchful of how the machine was doing and who occasionally adjusted the book that was being digitized. Von said in his blog posting, "Note that in the video you can also see recently-scanned pages on the screen next to the machine." The machine is a great attention-grabber and it definitely did that at the conference. As I said earlier, some people were just mesmerized by the machine, even though they were not interested in digitizing books.

Technorati tags: ,

Tuesday, June 05, 2007

What's happening at the Environmental Protection Agency (EPA) libraries

Mike Flynn, Deputy Director of the Office of Information Analysis & Access for the Environmental Protection Agency (EPA) spoke to a group of more than 50 people at the Special Libraries Association Annual Conference during a policy update. In the audience was the new national program manager for the national EPA library system (Deborah Balsamo). There were also several EPA librarians and those who work in EPA libraries on a contract basis. I also had the privilege to have lunch with Mike, Deborah and a few others.

So what did I hear?
  • The EPA libraries are alive and well, and it is intended that they remain so.
  • The goal is to improve the library network and make the access and retrieval of important information faster and more flexible.
  • The EPA libraries need to do more with less (like many other libraries) and take advantage of new technologies.
  • That long-term (thinking perhaps 10 years into the future) that there will be a national unified data system that will provide access to all types of information for EPA staff, scientists and the community at large. To me, this was very important to hear. What the EPA is trying to do is to modernize its library network. The changes they have in mind will take time to implement and may be painful. To their credit, the EPA is stepping back and reassessing its efforts in order to ensure that its efforts are in-line with its communities' needs AND are understood by the communities its serves.
  • In the past year, the EPA has:
    • Placed more focus on electronic delivery of information
    • Digitized more documents with now more than 26,000 documents available in digital form
    • Closed three libraries to walk-in traffic (but this does not mean that staff at those sites do not have access to information...they do through systems the EPA has put into place)
    • Made it clear that they do not intend to close all of the libraries
    • Still maintained a full suite of library services
  • The EPA recognizes that it will continue to need a core of librarians to help people with their information requests. Many people can satisfy easy requests themselves, so the librarians are needed to help with the more difficult requests.
  • They are making no more changes while they finish making their plans to move forward.
  • They recognize that they must work hard on the people side of their organization (and they know that will not be easy).
  • They are doing a third party review of their digitization plans and will revise the plans as necessary.
  • The EPA continues to seek feedback and input from its librarians, the federal government, scientists, academics, etc.
  • From the libraries that have closed, all unique EPA documents have been digitized and the hardcopies have been retained. Those materials that are not unique will need to be gone through and decisions made about them.

Mike Flynn understands that a lot of mis-information has circulated about the EPA efforts to re-create its library network so that it can provide information more efficiently and effectively now and in the future (especially for those future employees who will want information in electronic form). He is willing to work hard to correct misperceptions. He is also willing to alter their path, if necessary, to ensure that what they will do will indeed meet all of their future needs.

Monday, June 04, 2007

Digitization vendors at the SLA conference, part 1

I have been impressed with the digitization vendors who are at the conference this year. There are more vendors that offer digitization services AND several of the vendors brought their hardware to show off. I know that scanners and PCs can be tough to ship to a conference, but attendees appreciate being able to see the equipment up close and personal. Someone at the Kirtas booth said that people were just standing and watching their machine operate; mesmerized, I'm sure.

I'll blog in more detail later about all of the vendors, but for now want to talk about the Copyright Clearance Center (CCC). I visited their booth and asked if they helped organizations do copyright clearance on materials to be digitized. The answer at the moment is "no." The CCC can help organizations -- that have an agreement with it -- to pay the copyright fee on those items where it manages the rights. For many digitization programs, it is those items where the copyright holder is not easily known that can stall a program (and not those where the rights are managed by the CCC). It sounds like the CCC might develop "products" that will help digitization programs track down copyright holders that are not currently part of the CCC universe. I hope that they really do, since having an organization that can be hired to help with that work could really help to move a program forward.

Technorati tags: , ,