Digitization 101

Blog post: Commercialising Digitised Content & Interface Design

Alastair Dunning in London (UK) has written a post about the possible impact on a web site when the digitized materials being displayed are for sale. Are commercial sites more inviting because they want people to buy? Should we all make sites so people will peruse as if they are trying to find the perfect item to buy?

Technorati tag:

Marketing

Friday, November 23, 2007

Two interesting quotes from if:book

Actually this first quote is one that if:book quoted from Anthony Grafton's New Yorker piece "Future Reading":

The supposed universal library, then, will be not a seamless mass of books, easily linked and studied together, but a patchwork of interfaces and databases, some open to anyone with a computer and WiFi, others closed to those without access or money. The real challenge now is how to chart the tectonic plates of information that are crashing into one another and then to learn to navigate the new landscapes they are creating. Over time, as more of this material emerges from copyright protection, we’ll be able to learn things about our culture that we could never have known previously. Soon, the present will become overwhelmingly accessible, but a great deal of older material may never coalesce into a single database. Neither Google nor anyone else will fuse the proprietary databases of early books and the local systems created by individual archives into one accessible store of information. Though the distant past will be more available, in a technical sense, than ever before, once it is captured and preserved as a vast, disjointed mosaic it may recede ever more rapidly from our collective attention.

And this from an earlier post in if:book:

We are in the midst of a historic "upload," a frenetic rush to transfer the vast wealth of analog culture to the digital domain. Mass digitization of print, images, sound and film/video proceeds apace through the efforts of actors public and private, and yet it is still barely understood how the media of the past ought to be preserved, presented and interconnected for the future. How might we bring the records of our culture with us in ways that respect the originals but also take advantage of new media technologies to enhance and reinvent them?

Good food for thought...

Technorati tag:

Wednesday, November 21, 2007

Looking ahead (December & 2008)

My last post about my schedule was early in October, when I mentioned that I was going to need a lot of coffee due to my upcoming travels. On the flight into Monterey, CA, I actually met Libraryman (Michael Porter) whose photo (left) I've been using with permission. Very cool!

Between now and the end of the year, I'll be giving three workshops/presentations on social networking tools for Suffolk Cooperative Library System (Nov. 29), SUNY Cortland (Dec. 4) and the Rochester Regional Library Council (Dec. 11). (See calendar in the left column for details and links). I'll also be giving a presentation at E-Info Global Symposium (Dec. 7) in Huntsville, AL on "Trends in eRepositories." This is a one-day conference with speakers come from across the U.S. and Canada.

My talk at E-Info will be on institutional repositories in a broader sense than we might think about them. eRepositories are a potential rich source of information, data, images, research, background info, unpublished materials, and cultural materials. Since they are digital, they must be more proactive in being good stewards of the information they contain. They are the archives that will house the information we'll want in the future, and we need to ensure that we build them well.

Events are already on my schedule for 2008, with several others likely to fall into place soon. (See calendar on the left side of this blog.) Among them is an event on January 16 where I'll talk about "Privacy & Security in Our Online, Networked World." I talk about privacy both in my digitization and social networking workshops, and security is always a concern when talking about social networking. At this event, I'll be able to talk about these issues with members of the financial community.

For more information on any of the events on my calendar, please contact the sponsoring organization or me. If you would me to speak at your event, give me a shout.

Technorati tags:

Second Life,

Social Networking Tools,

Institutional Repository

Yale, Microsoft & Kirtas...and a short rant

Yale University has signed an agreement with Microsoft for the company to digitize 100,000 out-of-copyright books over the next year. University Librarian Alice Prochaska said in an interview that the books Microsoft "scans will be available only on Microsoft’s search engine, the University will receive digital files of all the books that are put online, and the entire digital collection will be linked through the Yale Library Web site and Orbis catalog listings."

The article states that there will be (or is) a non-disclosure agreement, so the financial details will be unknown, however, generally Microsoft and Google subsidize the cost of the digitization either in its entirety or in part.

And who is actually doing the digitization? Kirtas, the creator/manufacturer of a high-speed automated book scanner. Kirtas has an "in-house service bureau that employs more than 75 image technicians and operates three shifts- has mastered a proprietary digitization process that guarantees an overall error rate lower than one per 10,000 pages, ensuring quality mass digitization that will meet the highest standards and endure the test of time."

[rant] I continue to find these (Google, Microsoft, OCA) projects to be fascinating to watch for a variety of reasons. However, I also find it sad to think of the non-book content that should be digitized that is not. There are many cultural heritage organizations that need to begin to digitize, but that can't find funding to get them started. Yes, they should collaborate, but do they have what other collaborators would want? They have content, but not money and maybe not manpower. I also know of libraries in the U.S. that have not yet automated their catalogues. I know that digitization is different than retrospective conversion, but...well...I guess respective conversions aren't sexy at this point. Okay...I'll get off my soapbox. [/rant]

Technorati tag:

Google,

Microsoft

Article: High-tech scanner to digitize UNC's rare books

Actually, what is important about this article is the cost information.

The University's $100,000 one-year contract with the Internet Archive includes the Scribe scanner and an operator.

What will be digitized?

During the year, the Internet Archive will scan 22,000 Spanish-language dramas, 1,200 American and British travel accounts and a century of "Yackety Yack" yearbook.

Technorati tag:

Internet Archive

Tuesday, November 20, 2007

Digitizing records and tapes: will some projects do this instead of using a professional service?

Lifehacker gives -- or links to -- advice on a wide variety of topics for everyday people. This one caught my eye because it used the word "digitize." Indeed here is advice for consumer to digitize old records and tapes. This advice is meant for consumers who want to create personal copies of their records and place them on an MP3. It's not meant for institutions who are trying to digitize old records and tapes, yet I wonder how many institutions will try this instead of having the work done professionally?

Technorati tag:

Digital Audio

Article: French digital library project protects copyright: official

While Google has been criticized for possibly violated copyright in digitization program, the French National Library (Bibliothèque nationale de France) has made it a point not to violate copyright. In a press release last week, the library announced that it and the French Publishers' Association are "drawing up economic and legal guidelines enabling the release of online editions to the public. The plan, to be unveiled next March, calls for free access to works described as part of the national heritage and payment for access to works under copyright."

Elsewhere the press release states that:

In the last decade, the library...has collected 10 billion documents online, and is currently scanning the pages of 300,000 books into digital format as part of a plan to set up a European virtual library.

Technorati tag:

Monday, November 19, 2007

Blog post: Options, Embargoes, and Exemptions in Commercial Microfilm Publishing

Can your microfilm project impact a future digitization project? According to Bennett Lovett-Graff, the answer is "yes." The blog post covers:

ROFOs and ROFRs
Cannibalization
Embargoes & Exemptions

If you're considering microfilming part of your collection, you should read this first.

Technorati tag:

microfilm

Saturday, November 17, 2007

Video: Helen Tibbo talking about digital curation and institutional repositories

"Dr. Helen Tibbo, Professor in SILS [School of Information and Library Science, University of North Carolina at Chapel Hill], discusses digital curation and archives and the importance of preserving access as well as objects/bits. The roles of institutional repositories in research universities and issues of curriculum for archives are also discussed."

Technorati tag:

Digital Preservation

Event: ALCTS Metadata and Digital Library Development Workshop, Jan. 9 - 10, 2008

From the Metadata discussion list.

There is limited space available for
ALCTS Metadata and Digital Library Development Workshop
January 9-10, 2008 in Philadelphia, PA

Advance registration ends November 30th

Metadata and Digital Library Development, an ALCTS/Library of Congress Workshop

In an applied, exercise-based context, this two-day workshop introduces practicing catalogers to metadata implementation considerations and processes in a digital library development context. This workshop prepares attendees to serve as metadata specialists in digital library projects.

Session 1: Introduction To Digital Library System Objectives, Functionality, And Metadata

Session 2: Understanding Functional Requirements

Session 3: Metadata And Functionality

Session 4: Metadata Conversion: Enhancement And Mapping

Session 5: Metadata Workflows

Session 6: Digital Library Development Project Exercise

Presenters:

Barrie Howard, Digital Library Federation, Washington, DC
Jennifer Lang, Princeton University Library, Princeton, NJ

Attendance is limited to 35 people.

If the session fills up before you have a chance to register, we'll consider repeating the workshop at a later date.

To learn more about this event or to register, see the ALCTS Web site at: http://www.ala.org/alcts/events or contact Julie Reese, ALCTS Education & Meetings, at 50 E Huron; Chicago, IL 60611; (800) 545-2433 ext. 5034; or jreese@ala.org.

ALCTS is a division of the American Library Association.

Technorati tag:

Digital Asset Management,

Friday, November 16, 2007

Are we engaging in overkill?

In the last month, I have had two conversations with colleagues where each had the same thought -- some of what we're doing with digitization is overkill. No, they weren't talking about the conversion process, but about the software we're using to store our digital assets and the metadata. Each made compelling arguments, which I'll repeat for your edification!

Thinking of the software we're investing in to house our digital assets and metadata, are we spending too much money on it? Are we investing in software that either does too much or doesn't do what we need? Are we jumping on software "bandwagons" because others have jumped on and we think we should follow them? Both people pointed to the same piece of software when talking about this. It's expensive with good features, and a lot of programs are using it. (You get three guesses and the first two don't count.) The problem is that there are programs using this software that really don't need it, but are taking the easy route (for a variety of reasons, I'm sure) and going along with the decision others have made.

As we play follow-the-leader, we're ignoring lots of software including open source options. One open source option that you likely have not seen is Scriblio, which is a project of Plymouth State University. One digitization project that has used Scriblio is Beyond Brown Paper. (I'm only mentioning this to make you aware that it exists.)

Now I must be honest and tell you that I actually do tell people to consider the software that others in their region are using, because they will have people nearby that they can be supportive. However, it concerns me that programs may be jumping over the product evaluation stage and making a decision based on the bandwagon.

As for our metadata, the question that arose today was "who is all of this metadata for"? Do our users need it? (Yes, a librarian asked this question.) My answer what that we're creating robust metadata for the future. We keeping information that may not be important today, but may come in handy in the future (like information on how the materials were converted). It could be many years before we really know if we've created too little or too much metadata for specific programs.

Do our users need all of the metadata we're creating? No, but I'm not sure that is a reason to create less. Rather perhaps we should show our users less metadata as the default.

BTW I'm not a "cataloguing" librarian, so maybe someone from a cataloguing/metadata background can convince me that the time, effort and expense of creating robust metadata is always worth it.

Am I nuts for admitting that these conversations occur? Hopefully not, but I'll let you decide.

Technorati tags:

Thursday, November 15, 2007

New book scanner from Atiz geared towards consumers

I just received this press release and have not had time to investigate it. If anyone has seen this equipment, please post a comment and let us know what you think about it. My first reaction is that the term "ripper" is not a great one to use (especially in the headline), since we don't want to rip/destroy books (and that's how people are going to interpret that word)!

BTW This is obviously a manual book scanner. Looks like the cameras are not included. The camera used would impact the quality of the digital images.

First Consumer Book “Ripper” Is Now Available From Atiz Innovation

Former “The Apprentice” Finalist Nick Warnock’s Latest Venture Aims to Revolutionize When, Where and How You Read Books

Los Angeles, California – November 15, 2007 – Consumers, avid readers and family archivists now have a tool to help them preserve and share their personal libraries. Atiz Innovation, Inc., (www.atiz.com), the leaders in automatic content digitization, today announced the release of BookSnap, the first digital book ripper designed for individual consumers. BookSnap transforms printed books into the widely available PDF format, allowing users to access their favorite books on common devices such as a laptop, PDA or eBook reader—anywhere and at any time.

With a price of $1,595—compared to commercially focused products that cost upwards of $20,000—and the ability to digitize books at rate of 500 pages per hour, the BookSnap is set to transform the way in which consumers digitally convert and access books and other printed material.

Just as the MP3 became the foundation for the digital music industry, the PDF is a prerequisite for the eBook reading device industry to take off. BookSnap represents the first attainable consumer solution to make this happen.

“We designed the BookSnap for people who have always wanted to digitize their personal libraries but haven’t had a viable way to do it – until now,” said Nick Warnock, president, Atiz Innovation. “We sat down and said, ‘Can we innovate reading?’ How do we take what we are doing with our professional products and make a version tailored to the consumer?’ The result gives archival power to everyone, and changes the way people convert and access their books collections.”

Major features of the BookSnap include:

o Quantity – You don’t have to waste days just to scan a single book. BookSnap digitizes books at a rate of 500 pages per hour, making the scanning process much faster than traditional scanners.

o Quality – The flat images generated by BookSnap look much better than the curled pages generated by flatbed or overhead scanners and yield higher accuracy when converted to text using OCR software.

o Spread the word – BookSnap can output to PDF format, allowing users to read their favorite books on devices such a as a notebook, PDA or eBook reader – anywhere, anytime.

Hardware

Featuring a V-shaped book cradle, BookSnap allows for face-up scanning and keeps handling to a minimum, thereby reducing wear and tear on books. The cradle utilizes a unique auto-centering feature, which eliminates “margin crawl” issues that are common with typical scanners. All output files will have consistent margins and borders.

Software

A complete software package is included with BookSnap and completely automates the ripping process, making it as simple as the push of a button. In addition, the BookSnap Editor allows for post-ripping processing, including cropping, page resizing, brightness adjustment and DPI settings. In addition to PDF, BookSnap can output files to JPEG and TIFF formats.

Pricing & Compatibility

The price of BookSnap is $1,595. It is compatible with various models of Canon digital cameras on the market today including Canon Powershot G7, Canon Powershot A640, Canon Powershot A620, Canon Powershot S80, and Canon Powershot S3 IS. BookSnap is available now at www.atiz.com.

“Recently, there has been a lot of candid discussion about the efforts of some very influential organizations—Google, Microsoft and the Library of Congress, to name a few—to create a universal digitalized library, available throughout the world” said Warnock. “While this is a noble cause, it is one that will take considerable time, effort and money. BookSnap will not only empower individuals to enjoy eBooks, but also have a role in creating and sharing content.

BookSnap is available now at www.atiz.com.

“Think of the implications,” continued Warnock. “People around the world will now be able to create and share their own digital libraries. Your favorite books and other treasured printed materials will be preserved for generations to come. BookSnap allows you to carry hundreds of your favorite eBooks with you to read anywhere and at any time.”

Technorati tag:

Wednesday, November 14, 2007

iPRES 2007 presentations are online

The 2007 International Conference on the Preservation of Digital Objects (iPRES2007) was held in October 11-12, 2007, at the National Science Library, Chinese Academy of Sciences, Beijing, China. Presentations from that conference are now online at http://ipres.las.ac.cn/program.jsp

Event: Digital Futures: from digitization to delivery 7th - 11th April 2008, London, UK.

From the Digital-Preservation discussion list.

King's College London is pleased to announce the Digital Futures 5-day training event for 2008.
http://www.digitalconsultancy.net/digifutures/

Led by experts of international renown, Digital Futures focuses on the development, delivery and preservation of digital resources from cultural and memory institutions. Lasting five days, Digital Futures is aimed at managers and other practitioners from the library, museum, heritage and cultural sectors looking to understand the strategic and management issues of developing digital resources from digitisation to delivery.

Digital Futures will cover the following core areas:

- Planning and management
- Fund raising and sustainability
- Copyright and IPR
- Visual and image based resource development and delivery
- Metadata - introduction and implementation
- Implementing digital resources
- Digital preservation

There will be visits to 2 institutions, which had previously included the National Gallery, the National Archives and the Imperial War Museum.

The agenda is here:
http://www.digitalconsultancy.net/digifutures/digiprog.htm

Digital Futures aims for no more than 25-30 delegates and every delegate will have the opportunity to also spend one-to-one time with a Digital Futures leader to discuss issues specific to them.

Digital Futures will issue a certificate of achievement to each delegate.

The Digital Futures leaders are:
Simon Tanner - Director of King's Digital Consultancy Services, King's College London
Tom Clareson - Program Director for New Initiatives, PALINET.
Other experts will be invited to speak in their areas of expertise.

What past delegates say about Digital Futures:
- "Excellent - I would recommend DF to anyone anticipating a digitization program"
- "I was very pleased. The team was exceptionally knowledgeable, friendly and personable."
- "Excellent, informative and enjoyable. Thank you."
- "Thanks, it has been an invaluable experience."
- "A really useful course and great fun too!"

Cost: £770 (VAT not charged, excludes accommodation)
Venue: King's College London, London
Dates: 7th - 11th April 2008

To register, go here:
http://www.digitalconsultancy.net/digifutures/digireg.htm

The Digital Futures is run by King's Digital Consultancy Services and the Centre for Computing in the Humanities, King's College London working in co-operation with PALINET, USA.

Technorati tag:

Digital Preservation

Monday, November 12, 2007

Blog post: How open is the Open Content Alliance?

Peter Hirtle found an article that says books digitized by the Open Content Alliance are going to be available for printing and sale. He wonders "if the libraries participating in the OCA would also receive royalties from commercial use of public domain works that they have digitized from their collections." Unfortunately, although some of the Google agreements are available for all of us to read, he couldn't find any of the OCA agreements. Hence, "how open is the Open Content Alliance?" If anyone knows where these agreement are lurking, please let Peter know.

By the way, Peter Hirtle states that he is working on a "manual on copyright and digitization for cultural heritage institutions." I, for one, can't wait to see that!

Technorati tag:

Open Content Alliance,

OCA

Sunday, November 11, 2007

Book: Made to Stick

I am a non-book reading librarian and I readily admit it. I read articles, magazines, reports, web pages, but rarely books. Rather than curling up with a good book, I'm more likely to curl up with a stack of magazines or my Bloglines blogroll. Only a few books can capture and hold my attention. Recently, this one did.

Made to Stick: Why Some Ideas Survive and Others Die by Chip and Dan Heath helps us understand why our users (or our coworkers) can repeat the latest web hoax, but can't remember anything about our projects. What we need to do is to create "sticky messages." Sticky messages are not necessarily creative messages. In fact, there is formula that the brothers Heath have discovered that will help us to create sticky, memorable messages. That formula is:

S -- Simple
U -- Unexpected
C -- Concrete
C -- Credible
E -- Emotional
S -- Stories

Two things you can do without even reading the book are:

Use the word "you" in your writings. Many of us write in third-person neutral, but it turns out that making the reader think we're writing for them helps them connect with our text. For example:

You will find on this web site...
We can help you research...
By using the advanced search feature, you...

Tell stories. Now once you read the book, you'll realize that you need to tell simple, unexpected, concrete, credible, emotional stories! However, we tend to spout facts and figures, when people actually react better to stories. So find stories about your projects that you can tell, especially stories that tell how your project can help people.

If you want to learn about the entire formula, borrow the book from your library, borrow it from a friend, or order a copy. It is an easy and enjoyable read, with lots of stories and ideas you can begin to employ.

Technorati tag:

Marketing

Updated "Copyright Term and the Public Domain in the United States"

Peter Hirtle has updated his extremely useful chart entitled "Copyright Term and the Public Domain in the United States." I point many people to this chart because of the detail that Hirtle provides. I now insist that students use this chart (and none other) because I have seen students make incorrect decisions when they use a less detailed chart about U.S. copyright terms.

In an email message to the Digital-Copyright discussion list, Peter Hirtle said:

The biggest change is that, at the request of a user, two new sections have been added. The first is on published and unpublished sound recordings, and the second is on architectural works. Other small changes have been made to clarify some of the problems other readers have identified. In order to facilitate printing, a PDF version of the file is available as well.

The URL for this page is the same as before, so you do not have to update your links. However, if you have a printed a copy, you'll need to replace it with this updated version.

Technorati tag:

Friday, November 09, 2007

Future Reading: Digitization and its discontent

This five-page article byAnthony Grafton is about how the computer and the Internet have transformed reading. Even if you skim it, you'll find your eyes drawn to text that will make you think. Two passages that stood out to me were (page 1):

In fact, the Internet will not bring us a universal library, much less an encyclopedic record of human experience. None of the firms now engaged in digitization projects claim that it will create anything of the kind. The hype and rhetoric make it hard to grasp what Google and Microsoft and their partner libraries are actually doing. We have clearly reached a new point in the history of text production. On many fronts, traditional periodicals and books are making way for blogs and other electronic formats. But magazines and books still sell a lot of copies. The rush to digitize the written record is one of a number of critical moments in the long saga of our drive to accumulate, store, and retrieve information efficiently. It will result not in the infotopia that the prophets conjure up but in one in a long series of new information ecologies, all of them challenging, in which readers, writers, and producers of text have learned to survive.

And on page 3:

Poverty, in other words, is embodied in lack of print as well as in lack of food. The Internet will do much to redress this imbalance, by providing Western books for non-Western readers. What it will do for non-Western books is less clear.

Go ahead...read that last blurb again.

We're digitizing materials from rich nations. What about the materials created by poor, less technically advanced nations and cultures? The World Digital Library is digitizing materials from around the world, but others need to join in to ensure that our digital collection is skewed to one area of the world or to a specific socio-economic class.

Technorati tag:

Wednesday, November 07, 2007

Congrats to Kenny Crews!

I always tell people that they should take a copyright workshop from Kenny Crews. Crews is highly knowledgeable about copyright and talks about it in a way that is keeps the audience engaged. (Okay...what I really say is that he is entertaining.) I learned from him that you need to always go back to the law, something that I repeat everytime I talk about copyright.

This week news has circulated that Kenneth D. Crews, J.D., M.L.S., Ph.D. has been appointed Director of Columbia University Libraries’ new Copyright Advisory Office, starting in January 2008. He is currently the "Samuel R. Rosen II Professor in the Indiana University School of Law-Indianapolis and in the IU School of Library and Information Science. He is also Associate Dean of the Faculties for Copyright Management, and in that capacity he directs the Copyright Management Center based at Indiana University-Purdue University Indianapolis (IUPUI)." In 2005, Crews was the first recipient of the "L. Ray Patterson Award: In Support of Users' Rights" given by the American Library Association's (ALA) Office for Information Technology Policy (OITP) Copyright Advisory Committee.

Congratulations to Kenny Crews!

Technorati tag:

Copyright handout from today's workshop

Today I did a three-hour copyright workshop in Jamestown, NY. This was an outgrowth of the digitization planning work that is occurring here and fit in well with the digitization workshop I did in September. The copyright workshop was an overview with lots of questions and answers. I believe all of the attendees worked in or were associated with public libraries. Most had not taken a previous workshop or course on copyright, although many had done some reading related to copyright.

The two-page handout I used with this workshop is here. In addition, I gave them these resources:

•

Creative Commons
ALA's Copyright Advisory Network
AALL Model Law Firm Copyright Policy
Additional reading on Deed of Gift forms

CIL2006: Digitization Project Management Essentials (deed of gift forms) (2006)

Deed of Gift forms (2004)

A personal deed of gift experience (2004)

Why is this item important? (A personal deed of gift experience, part 2)
(2005) [This is about building context as part of the Deed of Gift]

And I just found this resource that compliments Templeton's "10 Big Myths about copyright explained":

Debunking Eight Myths about Copyright

One of the questions I asked today was about notices the libraries have on their photocopiers. Some didn't have notices on all of their copiers. The AALL Model Library page proposes this text:

The U.S. Copyright Law of the United States (Title 17 U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. The person using this equipment is liable for any infringement.

I wish I could recount all of the questions asked today! What the questions clearly demonstrated is that library staff members do have legitimate copyright questions, but don't know who to ask...and so they don't ask them. Most of the questions didn't have quick answers (copyright questions rarely do) and I think they appreciated hearing the the thought process that goes into trying to answer them.

Technorati tag:

Monday, November 05, 2007

Results of a metadata survey

In October, Crystal Knapp, E-Government Librarian at Oregon State Library, did a web survey on content management systems and metadata. She wrote in her original email post that this survey was for "those of you who use content management systems and maintain a search feature for your website." Although this is about the use of metadata on web site, the results may be still be of interest to you. Knapp has posted the results in two formats at:

http://library.state.or.us/metadata_survey_public_results.doc
http://library.state.or.us/metadata_survey_public_results.xls

She notes that the MS Word file is easier to read, but that the spreadsheet allows you to see what each responder said. She did strip out all identifying information.

Technorati tag:

Mekel Mach IV microfilm scanner at NNYLN

It's nice to see a program showing off its new equipment! Here the Northern New York Library Network (a multi-type library consortium) shows its new microfilm scanner which is rated to do 300 images per minute. If my memory is correct, this is their third microfilm scanner. (I have no idea if they've traded in the other two or not.) Their first was a manual one. Having automated scanners allowed them to work more quickly. To date, they have digitized more than 630,000 newspaper pages.

More information is available here.

By the way, photos of digitization labs are useful to organizations that are thinking of building one. The photos help people to understand the type of space and environment that is needed.

Technorati tag:

Friday, November 02, 2007

Press Release: Library of Congress Collaborates with Xerox to Test Format for Digitally Preserving, Accessing Treasured Images

The Library of Congress and Xerox Corporation are:

studying the potential of using the JPEG 2000 format in large repositories of digital cultural heritage materials such as those contained in the Library and other federal agencies. The eventual outcome may be the creation of leaner, faster systems that institutions around the country can use to store their riches and to make their collections widely accessible.

Later the press release states:

The images to be used from the Library's collection are already digitized (primarily in TIFF format), but JPEG 2000, a newer format for representing and compressing images, could make them easier to store, transfer and display. According to Michael Stelmach, manager of Digital Conversion Services in the Library's Office of Strategic Initiatives, JPEG 2000 holds promise in the areas of visual presentation, simplified file management and decreased storage costs. It offers rich and flexible support for metadata, which can describe the image and provide information on the provenance, intellectual property and technical data relating to the image itself.

Xerox scientists will develop the parameters for converting existing TIFF files to JPEG 2000 and will build and test the system, then turn over the specifications and best practices to the Library of Congress. The specific outcome will be development of JPEG 2000 profiles, which describe how to most effectively use JPEG 2000 to represent photographic content as well as content digitized from maps. The Library plans to make the results available on a public Web site.

This is very good news since it will help members of the cultural heritage community (libraries, archives, etc.) understand the JPEG 2000 format.

By the way, it is a shame that the format was named "JPEG 2000", since it is different than the JPEG format we all now and the "2000" puts a date stamp on the format that we won't appreciate in a few years. It is abbreviated J2K or JP2, and maybe in time we'll just use one of those abbreviations so that the reputation of JPEG isn't attached to JPEG 2000.

Technorati tag:

JPEG2000

IL2007: Searching, Metadata, and my final post about the conference

Danny Sullivan was the Wednesday morning keynote speaker at Internet Librarian. He spoke on the "Future of Search (sort of)." Sullivan talked about what has come true among the predictions he's made in the past and how the major search engines have changed in the last year. I didn't take a lot of notes, but did write down several URLs to articles on his web site that sounded useful:

Yes, some companies are trying to do natural language processing in their search engines (PowerSet and Hakia). This is called natural language search.

And people are trying other types of search like:

Mahalo, "the world's first human-powered search engine"
Search Wikia

Tom Reamy spoke on "Folksonomies & Tagging: Libraries & the Hive Mind." I expect that a copy of his presentation will be here at some point.

Advantages of folksonomies and tagging:

Simple
Lower cost of categorization
Open ended - can respond quickly to changes
Relevance - user's own terms
Support serendipitous form of browsing
East to tag any type of object
Better than nothing
Gets people excited about metadata

Disadvantages: (hint: quality)

Don't work well for finding info
No structure, no conceptual relationships
Issues of scale
Limited applicability
Too personal or too popular
It's a skill
Too many word/phrase variations

He talked about specific web sites where people are tagging and showed some analysis (e.g., Flickr, Del.icio.us, etc.). This was very interesting and helped us to understand his point of view.

What I learned was that we're fickle and incomplete taggers. We generally use a few terms, but we should use more (broader as well as more specific terms). We need to be more consistent in our terms, although that is difficult not only for one person but also for a group. If the sites can make tagging easier (offer suggestions, related words, etc.), then maybe we can do better.

I entered the session thinking I was good at tagging and left the session realizing what a horrible tagger I am!

Wrap Up: I had not been to an Internet Librarian conference before. Yes, I liked it, even though it took a long time to get there, and my flight out of Monterey was canceled due to fog. The location is beautiful and the area is very walkable.

in the in the Conference Center. We all appreciated it. Really...can you have a conference without The conference spaces were good, but there was only free wifiMonterey Conference Center and not in the Marriott. Thanks to Information Today for getting us access to the wifiwifi? No. (ITI...I don't know what hassles or cost it took to get us wireless, but thank you!)

All of the sessions were excellent save one (great odds) and I came away with lots of useful information. I also came away with new friends and colleagues. And that is always good.

In 2008, Internet Librarian will be on Oct. 20 - 22 in Monterey. If you can get there, go. If you can't get to IL2008, then try to head to Computers in Libraries (CIL) in Crystal City, VA, April 7 - 9, 2008. CIL and IL are different yet similar in content, with CIL being a larger conference (1500 vs. 2300).

Technorati tags:

IL2007,

Canadian Digital Information Strategy : draft for comment

You may have already seen this circulating on via email concerning the Canadian Digital Information Strategy. It is good to see that they want feedback from a broad audience. It also may encourage others to think about their strategies.

The objectives and proposed actions outlined briefly in the executive summary are:

Toward strengthening digital content:

mass digitization on a national scale
a conducive digital production environment
improved digital production practices
diversity in digital content production

Toward ensuring digital preservation:

selection and capture of digital content for long-term retention
distributed digital preservation repository network
preservation-related research
new workplace skills
increased public awareness of digital preservation issues

Toward maximizing digital access:

mechanisms for democratic, ubiquitous and equitable access
seamless access and global visibility
more open access to public sector information and data
effective communication and management of copyright
increased user research

I like that they are thinking about those three areas, since each needs the other.

For more information, you can read the (or skim) the 63-page document.

Annonce bilingue / Bilingual announcement (English follows)

Nous sommes fiers d’annoncer que l’ébauche de la Stratégie canadienne sur l’information numérique a été publiée afin d’être soumise à l'évaluation du public. Cette stratégie est le fruit d’une série de réunions qui ont eu lieu partout au pays en 2005 et en 2006, et auxquelles ont participé des représentants gouvernementaux, des producteurs et des utilisateurs de contenu numérique. Au cours des débats, plus de 200 organismes sont intervenus afin de faire valoir leurs idées et leurs commentaires, et près d’une centaine de penseurs parmi les plus influents provenant de tous les domaines du milieu de l’information ont pris part à un sommet national en décembre 2006.

Un comité de 24 membres a puisé dans ces contributions pour élaborer une stratégie nationale. Celle-ci répond à certains enjeux importants liées à accès, à la conservation et à la production de l'information numérique, et elle propose diverses mesures destinées à renforcer le milieu de l’information numérique au Canada.

Le comité recevra les commentaires du public sur l'ébauche de cette stratégie à compter du 23 novembre 2007. Pour télécharger le document de la Stratégie canadienne sur l’information numérique, et pour nous faire part de vos commentaires, veuillez vous rendre à l'adresse suivante:http://www.collectionscanada.gc.ca/scin/index-f.html.

Sean Berrigan, Bibliothèque et Archives Canada
Gérard Boismenu, Université de Montréal
Coprésidents du comité d’élaboration de la Stratégie canadienne sur l’information numérique

******************************************

We are pleased to announce that the draft version of the Canadian Digital Information Strategy has been released for public comment. The Strategy results from a series of meetings that took place across the country in 2005 and 2006 to gather views from content producers, users and government officials. In the course of the deliberations, more than 200 stakeholder organizations offered ideas or commentary, and nearly 100 of Canada’s leading thinkers from across the information environment participated in a national summit in December, 2006.

Building on this rich set of input, the strategy has been drafted by a 24 member development committee. It addresses some of the critical issues in digital information production, preservation and access, and proposes a range of actions to strengthen the Canadian digital information environment.

The Committee welcomes public comment on the draft strategy by November 23rd 2007. Please visit http://www.collectionscanada.gc.ca/cdis/index-e.html to download the strategy document and to provide comments.

Sean Berrigan, Library and Archives Canada
Gérard Boismenu, Université de Montréal
Co-chairs, Canadian Digital Information Strategy Development Committee

Technorati tags: