- The Washington Times - Wednesday, November 16, 2005

The next time a consumer hears a voice thanking him for the call, that voice may be generated by a computer — and the caller might not even know it.

The quality and sound of computer-synthesized voices not only continues to improve, but their uses also are expanding, including helping the physically disabled.

Even some “podcasters” — Internet enthusiasts who broadcast audio via the Web — are turning to this technology to convert text into downloadable bits of audio.

Professor J.P. Auffret, director of George Mason University’s technical management program, recalls not so long ago hearing an artificial voice ringing out at an Atlanta airport. That voice, which he describes as something out of “Star Trek,” sounded far more human during a recent trip through Georgia.

Mr. Auffret says companies are approaching voice synthesis from different angles. Some, such as Microsoft, try to unlock the mysteries of the human voice with algorithms to duplicate how speech is created. Others, including AT&T;’s Natural Voices program, break down the elements of language into short snippets of human speech and let powerful computers choose which of these building blocks should be assembled into sentences.

Under this procedure, voice actors read hours of scripts that are recorded and dissected into smaller pieces and mapped to coincide with various phonemes, or the tiniest phonetic unit in a language capable of conveying any meaning. Then, computers must reassemble the words into sentences, taking care to note the varying inflections based on where the snippets go in a sentence or if the sentence is a question.

“Now, it’s possible to have much longer sentences, much more involved text which almost sounds natural,” Mr. Auffret says.

Software supplied by AT&T;’s Natural Voices program plays a considerable role in consumer applications such as customer relations as well as aiding the handicapped.

Juergen Schroeter, director of speech algorithms and engines in AT&T; labs-research, says his company’s ties to artificial speech started with Voder, an attempt displayed at the 1939 World’s Fair in New York.

Yet Mr. Schroeter admits that as few as six years ago, the company’s text-to-speech vocals still sounded “unnatural.”

Today, computer voice technology helps a variety of groups, from the disabled to the U.S. military. Enhanced text-to-speech software brings books alive for blind readers, while computer voices are helping soldiers communicate with the locals wherever their platoon might be.

What the technology still can’t convincingly capture is raw emotion.

Mr. Schroeter says his company has “closed the gap already for short [computer voice] responses of three or four words. We can perfectly say, ‘thank you,’ for example.”

Longer sentences with varied inflections remain a problem, as does trying to make synthesized voices express a variety of crucial emotions.

Rick Ellis, president of the North Carolina-based NextUp.com, says computer voice technology lets a relatively small company like his offer a variety of services that a decade ago would have seemed impossible.

“We felt things would come down the pipeline that would lead to better voices,” Mr. Ellis says of his 6-year-old company’s early days.

Today, its text-to-speech software not only helps budding podcasters, but tells people the latest weather conditions and helps the blind “read” text.

“People are buying our products and playing around with them,” says Mr. Ellis, whose software packages sell for as little as $60. “It’s still early with this [technology].”

Researchers helping the blind initially invested in computer voice technology, but the science has gone mainstream, he says. Among the newer uses for the technology are online learning classes, English as a second language courses and help for those with dyslexia. Mr. Ellis adds that some writers use these tools to help them proofread their work.

He sees continued progress on the voice-software front simply because of advances in computer technology.

“A lot of it is the power and memory of the computers,” he says. “A word is pronounced differently depending on where it is in a sentence, but the average computer is powerful enough to make those decisions on the fly.”

Computerized voices are helping some with Lou Gehrig’s disease (amyotrophic lateral sclerosis, or ALS) communicate with their loved ones again. ALS affects as many as 30,000 people in the United States alone, with 5,000 new cases diagnosed every year, according to the Robert Packard Center for ALS Research at Johns Hopkins University.

Carlos Urroz, assistive technology manager with the local branch of the ALS Association, says the majority of ALS patients lose their ability to speak because of breathing or vocal-cord problems.

“Communication is one of their biggest needs,” Mr. Urroz says.

That’s where digitized speech comes in.

Today’s software to interpret and transform typed messages from ALS patients into computerized talk boasts improved inflection and pauses to make it sound more lifelike, Mr. Urroz says.

A sample system could feature a smaller-than-traditional keyboard for the patient and an activator key to tell the system to begin “speaking.”

The technology is far from perfect, though. Mr. Urroz says these systems depend on audio speakers to convey sounds, which makes them sound different from normal speech simply because they lack recognizable breathing patterns. These systems also vary in quality depending on the sophistication of the audio system being used.

“A speaker can pick up static… and the sound changes depending on the acoustics of the room,” he says.

Name pronunciation also is problematic, and if the sentences are too long, the words tend to slur at the end, he adds.

LOAD COMMENTS ()

 

Click to Read More

Click to Hide