- The Washington Times - Sunday, December 16, 2001

NEW YORK (AP) Even with all the current digital wizardry, faking the videotape in which Osama bin Laden takes credit for the September 11 attacks would be extremely difficult, experts said.
The biggest hurdle would be mimicking the cadence and rhythm of human speech. Synchronizing a doctored soundtrack with existing video also would be tough, and technology that can synthesize Arabic speech is still in its infancy.
Chi-Lin Shih, a language modeling scientist at Lucent Technologies' Bell Labs, described the process as akin to reassembling a broken vase by gluing together its shards. Close scrutiny would likely reveal the cracks.
Software tools allow for elements of a person's speech to be glued together to put words in their mouths, but such a doctored recording would not sound natural to an expert listener, said Kenneth Stevens, head of the speech research lab at the Massachusetts Institute of Technology.
Some hard-line Islamic militants in Pakistan and the Middle East suggested the tape was fabricated to provide a rationale for U.S. military actions in Afghanistan. President Bush called the charge "preposterous." Administration officials said they intentionally declined to try to enhance the video's sound or picture so as not to give detractors ammunition.
The videotape, which has been seen widely in the United States, was distributed Friday to Arabic media with an Arabic transcript so "people who speak Arabic can watch it in the original language with the subtitles and the sound in Arabic," said State Department spokesman Richard Boucher.
Emerging speech synthesis technology is giving computers the ability to mimic a human voice. The creators of AT&T;'s Natural Voices software, for example, claim the program can mimic the speech of actors now dead, such as John Wayne. By allowing computers to analyze enough tapes of an actor's voice, the program could synthesize the voice, allowing it to make statements Wayne never said.
Theoretically, the same could be done with bin Laden's voice because recordings of his speech are readily available, said Lynn Shepherd, a vice president of Fonix Corp., a speech synthesis software company in Salt Lake City. "If they had a lot of recordings of bin Laden, they could create some speech that sounded pretty good," Miss Shepherd said Friday.
However, most software requires a dozen or more hours of high-quality studio recordings, for which a speaker is asked to make all of a language's particular combinations of sounds. "It takes engineers months to break down all these voice fragments so that I can reproduce the language," said Bill DeStefanis, who heads speech technology for ScanSoft Inc. of Peabody, Mass.
On the tape, some of bin Laden's words are unintelligible. The tape's poor sound quality theoretically could be used to mask tampering, experts said. But beyond synthesizing a voice, doctored speech would have to be synchronized video another difficult task usually easy to spot.
Digital synchronization of sound and images is a staple of Hollywood filmmaking. In the 2000 movie "Gladiator," actor Oliver Reed died before shooting ended, and the filmmakers pieced together several scenes using previously shot footage.
Mr. DeStefanis and others said, however, that fooling the trained eye is difficult. "The human eye and ear are very good at seeing out-of-synch lips," he said.

Sign up for Daily Newsletters

Copyright © 2019 The Washington Times, LLC. Click here for reprint permission.

The Washington Times Comment Policy

The Washington Times welcomes your comments on Spot.im, our third-party provider. Please read our Comment Policy before commenting.


Click to Read More and View Comments

Click to Hide