- The Washington Times - Thursday, August 8, 2002

Even HAL 9000, the computer with the soothing tones in the 1968 film classic "2001: A Space Odyssey," would be impressed by the current state of computer voice simulations. While computer generated voices still struggle with the emotional nuances of human speech, technological leaps are allowing mechanized voices to play a larger role in our lives.
Synthesized speech helps the blind interpret their computers, lets weather services instantly transmit information over the radio, and allows companies to give their celebrity spokespeople a break by substituting synthesized messages in their stead.
"It's out of its infancy. It's in its adolescence," says Benjamin Tomassetti, director of American University's audio technology program. Mr. Tomassetti pegs the progress to the increase in processor speeds.
"Inexpensive computers get faster and more powerful. That is, ultimately, what allows the increase in fidelity," Mr. Tomassetti says.
Initially, computer voice simulators mimicked the way the body's vocal cords manufacture sound.
"Your vocal cords produce a wave form, which is a fancy way of saying a sound," he says. "It's akin to a square wave."
The way the human anatomy converts that into speech is through the manipulation of the lips, tongue and nasal passages. Each helps the various frequencies stand out as needed.
"That's hard to synthesize," he says. "That's why those previous synthesizers of 10 years ago would sound very mechanical."
Stuart Patterson, president and CEO of SpeechWorks in Boston, says that approach gave way about four years ago to a newer, simplified method called concatenated speech.
"We can take the signal, the real thing, chop it up into little pieces and reassemble it into real time," says Mr. Patterson, whose company deals in speech recognition and text-to-speech products.
All speech consists of phonemes, the smallest sound building blocks of speech.
The computer is taking a recording of a person's voice and putting it together quickly and seamlessly, he says of the approach.
This opens up all sorts of possibilities.
"You can take different voice talent to make male and female voices," he says. "You can do foreign languages the same way. You can have the person read things with personality."
"It still needs to be tested and tuned," he adds.
The technology isn't foolproof. For starters, such tinkering takes significant effort, and not all applications need the same configuring.
One potential use already in action having computerized voices read back e-mail messages to the blind or to people on the go who can't access their e-mail accounts requires a system to decode common abbreviations like "pls" and "BTW" (by the way).
"We don't have a program that says, click these buttons and personality will come into the voice," he says. "There's a lot of hand tuning now when we create a new voice for weather or sports. There might be some specific idioms in, say, sports or weather broadcasts, that you wouldn't find elsewhere."
Inflection, and the emotional context of speech, Mr. Tomassetti says, are the most difficult things to capture.
"It's come a long way in terms of fidelity, but it still sounds emotionally flat," he says of simulated speech.

Computer voice technology is playing a role in how blind people work in society.
A friend of Mr. Tomassetti's is blind and uses a voice program on his computer. It reads his e-mails to him, and he talks into the computer, which translates his speech through voice recognition software.
Artificial intelligence remains a dream for computer programmers, but today's microprocessors are savvy enough to make minute distinctions in speech.
"In many words you can have an upward or downward inflection," he says. The sound systems often are prepared for those contingencies. Systems such as AT&T;'s Natural Voices can assess the context of a text for instance, whether to read "St." as "street" or "saint" and produce the appropriate sounds.
Bryant Parent, speech technologies vice president with AT&T; Labs in Florham Park, N.J., calls the concatenated process "editing on the fly."
"There's all kinds of labels that have to be run and marked up in the database," Mr. Parent says of his company's software.

The technology isn't cheap, though, nor is it aimed at the average consumer.
The Natural Voices package, which debuted in July 2001, retails for $200,000, he says.
But for those companies who can afford the product, the implications are varied.
A business might have a particular voice talent they use for anything from airline reservations to prescription drug replacement services. Each could use the system to create new voice messages based on existing banks of recorded voices.
One potential use of the new vocal systems is to recreate celebrity voices for posterity.
"They're immortal if the quality of the recordings are good enough," Mr. Parent says of the potential to recreate voices from the past. "We need about 10-20 hours of clean, recorded material in order to recreate a voice."
Or a company like Verizon could program a library of spokesman James Earl Jones' legendary pipes and create new vocal tracks without the two-time Tony winner ever entering the recording studio.
Bell Laboratories began experimenting with this process about 20 years ago, Mr. Parent says, but it took a combination of increased computer speeds and memory levels to make it a realistic use.
Mr. Patterson says the Web will play a major factor in computer voice use. Many of the Internet's functions, from information retrieval to transactions and communications, can be aided by voice technology.
His company also has supplied voice software to banks, which use it in their automated teller machines for the blind.
He says highway safety is another area where the technology will come into play.
In a car, computerized voices can let drivers keep their hands on the wheel while they listen to driving directions generated by the car's computers or listen as their e-mails are read to them.
Computer voices can take on a trivial or entertaining market, but some uses can have life-or-death consequences, as they do for storm warnings.
The National Weather Service's radio system has been around since the 1950s, says Joanne Swanson, a meteorologist and the voice improvement requirement leader with the National Oceanic and Atmospheric Administration.
"We used to have technicians record the broadcasts on eight-track tapes," says Ms. Swanson.
NWS upgraded much of its equipment starting in the late '80s, and among the modifications was the automation of weather radio.
"Our warnings go direct to the air, the same things with observations and forecasts," she says. Using computerized voices, as opposed to live reporters, "allowed us to speed up that dissemination of warnings."
The service's "state of the art" voices sounded mechanical.
"It didn't sound too bad, but it was not human and, as time went on, we continued to get negative feedback about the voice," she says. "It had a robotic tone, it wasn't completely clear. A lot of people said it had a Southern accent but they couldn't agree where it came from. It [also] had trouble with short vowels."
The weather service recently switched to SpeechWorks products to provide clearer, more realistic voices.
The transition isn't easy, though. Each NWS office has to tune the software to handle the geography specific to that area. It takes some time, she says.
"There are so many interesting names across the country," she says.
While the service has employees who can go on the air, if needed, the computer voices focus groups said they wanted credibility in the computer voices.
"They wanted to believe the voice they were listening to knew something about the weather," she says.

LOAD COMMENTS ()

 

Click to Read More

Click to Hide