IBM ViaVoice
Voice Recognition Takes Another Step Forward
by Jim Bray
HAL the conversation
computer isn't here yet, but it appears he's well on his way.
IBM's ViaVoice, now
in versions 7 (a variety of which are available), is the closest step
yet to making your computer respond to your verbal proddings. It still
has its shortcomings (as don't we all!), but on the whole it's a pretty
nifty package.
I tried ViaVoice Pro,
Millennium Edition, which is a fully featured version. It works in the
Windows 9x/NT 4 environments and has the easiest setup I've tried to date
for a voice-activated product.
The package comes
with the software on CD-ROM, as well as a manual and a headset that connects
into your sound card. There's no USB support as of yet, but it would make
sense that would be coming down the road.
There are a lot of
usability features packed into ViaVoice, including wrinkles designed to
let the software learn how you work (and sound) and tailor itself to you.
This is "artificial intelligence" stuff and it works pretty well on the
whole, though it also proves that, no matter how good the software is,
computers are still damn stupid beasts.
ViaVoice is designed
to be more than just a dictation-taker; it's also meant for surfing the
Web, so you can speak to your Browser and have it follow your directions
and it also works with Microsoft Office and Lotus 1-2-3 commands. This
is pretty neat: you order the PC to load a particular application, and
then tell it to do something (for instance, "schedule a meeting") and
away it goes, happily obeying your wishes.
I wish kids were as
compliant!
IBM says ViaVoice
can also be configured for multiple users, which is quite a trick since
it has to learn the nuances of more than one voice.
You can also use it
to speak navigation commands for which you'd normally use the mouse.
When you first install
ViaVoice, and restart Windows, you're greeted by the system's cutesy helper
- a pencil that looks suspiciously like it was inspired by those horrid
Microsoft Office Assistants that are the first thing I turn off. Then
you sally forth into the "User Wizard," which sets up the system for you
and trains you and it to use each other. The Wizard does a really good
job of walking you through the ins and outs, including the proper way
to wear the headset so the mic's placed correctly.
Then you test the
sound quality for input and output, and begin reading a series of paragraphs
into the microphone to get the computer used to your voice. This is actually
a pretty interesting process, because IBM has written the paragraphs not
only as a setup routine, but as an introduction to voice recognition technology
itself, which puts what you're doing into context and explains the difficulties
involved in having a mindless computer recognize - and act upon - your
speaking voice.
Once that section's
completed, the little pencil inside the PC hunkers down and analyzes your
speech, which takes a few minutes. Then you can have the system analyze
the documents you've stored on your computer, and it pores over them looking
for things it can grab onto about your writing style or words that can
help it better match your words to your voice.
When ViaVoice finally
heads into action, it puts a toolbar across the top of your screen and
that darn pencil pops up and gives you a few pointers on using the thing.
It's actually handy advice; I just have trouble getting over the cutesiness,
though I'm sure many people like it.
The first thing I
tried was to load Microsoft Outlook. I said "load Outlook" and ViaVoice
went "Huh?" (figuratively, of course) - so I said "open outlook" and it
said "Huh?" So I said "help" and as if by magic a help wizard came up.
I talked my way through it (it worked very well) to the point where I
learned to say "What can I say?" to find out what commands were recognized.
The pencil probably already told me this if I'd been paying attention,
but I guess artificial intelligence goes both ways.
So I learned you have
to say "Open Program <program manufacturer and name>", as in "open
program Microsoft Word," before the little droid will spring into action
- but once you do that it does, indeed, spring obediently into action.
It takes a while to
get comfortable with the methodology, and for it to get comfortable with
you, but once you're up to speed it's pretty neat - though I have to admit
I can open a lot of programs and type a lot of text in the time it takes
to get up to speed with any of today's voice recognition applications.
Naturally, if you
throw it a curve, ViaVoice may swing and miss. For instance, I dictated
"'Twas brillig and the slythy toves did gyre and gimble in the wabe (You
certainly can't blame ViaVoice for missing that!) and it transcribed
"To was ability antislavery toasted dire and dimple in the way." Even
though that's a howler, there's a certain logic to what the software thought
I said.
I used ViaVoice to
set up Microsoft Outlook, and all the configuration Wizards worked well
with my voice commands until the last one - which was a notice that Outlook
had just crashed, and I had to close that message with the mouse. Once
Outlook was started and I'd clicked through that damn Office Assistant,
however, it worked like a hot darn.
Dictation works well,
as long as you remember to speak your punctuation and don't throw the
application too many surprises. You can also edit and otherwise make corrections
verbally.
The software really
does appear to learn as it goes, at least to a certain extent, and it
does a good job of recognizing contexts of similar-sounding words (like
through and threw). I also noticed that it has no ego to bruise, because
I called it all sorts of names and told it to do all kinds of rude things
and it never talked back to me.
Speaking of talking
back, another neat feature is called "ViaVoice Outloud," which is kind
of like voice recognition in reverse. It takes text, including a web page,
and reads it to you. It sounds like Stephen Hawking, however, so you probably
won't want to use it for reading bedtime stories to the kids, but it can
be great tool for visually impaired people who want to surf the net or
have other files read to them.
Overall, I certainly
had my share of stops and starts and frustrations while ViaVoice and I
were getting used to each other, but on the whole I found the suite works
pretty much as advertised - though once again it also points out how far
the technology has yet to go. Still, there are lots of applications for
such technology already, some of which I hadn't thought of before IBM
pointed them out to me.
For example, in a
project IBM is doing with Canada's St. Mary's University, they're using
voice technology to make lectures accessible to the hearing (or, since
they're students, listening) impaired. What they do is equip the lecturer
with a wireless microphone and run it through the voice recognition software.
This, in turn, is hooked into a big screen video projector, which displays
the lecture as text almost as it's being said. It's a cool idea.
It's also a wonderful
tool for other students, who can get a transcription of the lecture on
floppy disk, or for download - or even converted into Braille! Now they
don't even need to show up for the lectures anymore!
Not only that, but
a deaf student in a remote location can also take advantage of the lecture,
bringing a whole new functionality to remote conferencing.
I could also see this
being used to take minutes at corporate meetings and the like, if it's
trained to recognize everyone's voice around the table.
As long as the prof
doesn't make any arcane references to Lewis Carroll stories...
ViaVoice is the easiest
to use and most integrated voice recognition application I've used so
far. I still won't throw away my mouse and keyboard, but each generation
of this technology gets more interesting and more attractive - and offers
more help to those who can really make good use of this type of technology.
Tell us at TechnoFile what YOU think