- The Washington Times - Thursday, April 25, 2002

Businesses once stored their classified information and paperwork the old-fashioned way, in stuffy boxes and filing cabinets.The personal computer changed all that, reducing data into tiny bits and bytes, which made some clunky file cabinets obsolete.
Pile up enough bits and bytes, though, and new storage issues arise.
Nowhere is that more apparent than within the halls of the Library of Congress, home to more than 124 million items, including 18.6 million books.
Library of Congress spokesman Guy Lamolinara says the library has embarked on a plan, called the Information Infrastructure and Preservation Program, to store some of its content in Web-accessible digital formats.
The program, granted a $100 million special appropriation from Congress in December 2000, will consult with other libraries on the best ways to archive, preserve and make available content through its National Digital Library. That online resource, to be found at www.loc.gov, has more than 7.5 million items, from manuscripts to sound recordings. More than 1 million Web users hit the site last year.
Mr. Lamolinara says the program, which needs to raise an additional $75 million from nongovernmental sources because it was passed as a public-private partnership, will be brought before Congress sometime this summer before it can continue.
For now, the library will preserve material only available in digital form, he says. "We're trying to preserve the digital material being created now before it becomes lost. Web sites change all the time."
Scanned items will be pro-cessed into Portable Document Format files, or PDFs, and standard Web pages and placed on Web-accessible servers so Internet users can reach them.
The digital scanning process can be time-consuming. Some factors affecting the process include whether the item will be scanned in color, black-and-white or gray scale; the size of the object; and the resolution needed to capture the material accurately.
The Library of Congress began high-resolution digital scanning in 1995.

Another District repository, the Smithsonian Institution Libraries, is working toward similar storage goals.
Martin Kalfatovic, head of the Smithsonian Institution Libraries' New Media Office, says among the items the library targets for digital preservation are rare books and tomes so fragile their repeated use could lead to irreparable damage.
Much of the brittle material is from the library's trade-literature collection. The manuals were meant to be discarded quickly and are aging poorly. The 350,000-piece collection is the largest of its kind in North America, he says.
Digitizing, which Mr. Kalfatovic says is becoming more common in other large libraries, can be laborious work. Delicate items to be transferred into color images require special cameras, and the material must be cradled at a specific angle to be effective. It can take up to 10 minutes a page to capture its digital image, he says.
The library creates two high-definition computer images in TIFF
(Tag Image File Format, a preferred format for storing high-definition images) of the material scanned to be stored on CD-ROMs (Compact Disc Read Only Memory). These disks store computer data in the form of graphics and text.
Then, JPEGs (for Joint Photographic Experts Group, computer file images with less information and therefore less picture clarity than TIFFs) are created from them for use on its Web site. An average Web site doesn't demand images as sharp as TIFF images.
The technology isn't the pana-cea it might seem for storing materials.
That is because, Mr. Kalfatovic says, digital technology hasn't come up with a solution for long-term storage. CD-ROM producers claim their products have up to 100 years of life, but microfilm, he says, remains the best method for storing data for extended periods of time. The medium boasts a shelf life of up to 1,000 years.
The library eventually may commit important material to microfilm, then make digital copies from the microfilm for online distribution.
The library keeps its stored materials in two separate locations to further protect the material.

Some libraries get around storage issues by licensing information from electronic databases, such as Lexis-Nexis, says Abby Smith, director of programs at the Council on Library and Information Resources, a District nonprofit organization that helps preserve and expand access to information for the public.
"Electronic information is so very easy to copy and disseminate without any control at all. Copyright owners are reluctant to ship the files [even to a] trusted entity like a library," Ms. Smith says. "It's too easy for someone to get a copy of something and hit that 'send' button."
So, storing the information at the library isn't an option because it can be copied and distributed without the permission of the company that licenses it. Some libraries constantly migrate their materials, transferring content from one older system to the latest software to keep ahead of the preservation process.
"Whenever there is a change in software or hardware or both, you have to migrate," Ms. Smith says.
No matter the method, the costs can be considerable.
"If [libraries] started charging patrons a preservation or access fee, patrons would be alarmed to find out how much money it costs to keep them going," she says.

Libraries aren't the only institutions with storage demands.
Companies such as Kofax Image Products of Irvine, Calif., and Boston's Iron Mountain offer data-management services to businesses nationwide.
Closer to the District, Arlington's TrueArc produces software that transforms business-critical information regardless of medium, from e-mail to electronic and paper documents into retrievable microdata.
TrueArc President Russell Stalters says keeping stewardship of a single paper file forces a company to watch it for its life span. Employees are needed to keep track of the file, retrieve it when needed and ensure its safety. Electronic content eliminates those costs.
Companies use software packages to keep in line with various regulations and back up their information on off-site computer servers.
"There is a legal risk for not managing it properly," Mr. Stalters says. "Enron was a good case."
For government departments, items such as land records can be digitized and made available to the public much more easily than having residents dig through stacks of paperwork.
Mr. Stalters says the company's client roster once was exclusively governmental, but now commercial companies are seeking to digitize their content. Once the year-2000 problem was conquered and firms that might have been leery of electronic storage saw their worries subside, they began seeking out the products in greater number, he says.
His company's latest wrinkle is AutoRecords, which scans documents, analyzing their content and storing it based on predetermined algorithms. It then is stored in the appropriate "bucket" or storage file, he says, which helps people who receive a 100 e-mails daily.
Storage concerns aren't solely a matter of the printed word.
Rahul Simha, a professor with George Washington University's department of computer science, says any kind of multimedia requires large amounts of storage.
"Plain text requires much less space," Mr. Simha says.
The bigger problem, he says, is taking what is on paper and putting it into an electronic format. Optical scanning technology can either read and make sense of paper content or store an image of it as a digital document.
Until recently, the average homeowner didn't have enough information, particularly of the digital sort, that needed management.
That is changing slowly, Mr. Stalters says.
Within the next five years, he envisions Web users managing their banking, credit-card statements and other personal information online via a trustworthy third-party system that allows them to control their affairs at one site.
"It'll all go into a virtual file cabinet," he says. "The lines between business information and personal information will begin to blur."
Homeowners currently save and preserve digital content with floppy disks, which are inexpensive and easy to use, followed by increasingly affordable zip drives and rewritable CD-ROMs.
However, Mr. Stalters says, as the public needs more storage space, newer methods, such as third-party providers will be needed.
Just don't expect a smooth transition, he warns.
"The problem is that most computer neophytes do not understand how to effectively preserve this data nor have the time to be bothered with it," Mr. Stalters says.

LOAD COMMENTS ()

 

Click to Read More

Click to Hide