State of the Art
By DAVID POGUE
graphic by Stuart Goldenberg
Of all the high-tech fantasies that sci-fi movies tantalize their escapist audiences with, surely that bit about giving your computer spoken orders is one of the most alluring. Ever since “Star Trek,” we’ve dreamed of being able to say, “Computer, display all known sources of dilithium crystals in the Kraxon Nebula!”
So far, the closest we can get is strapping on a headset and dictating, using a program like Dragon NaturallySpeaking to do the typing. This software is great for anyone who can’t type or doesn’t like to. And it lets you speak the names of menu commands and “click” links on a Web page.
But that’s not the same as telling the computer what to do in conversational English.
NaturallySpeaking 10, available Thursday, takes some baby steps in the right direction. It doesn’t turn your computer into the “Star Trek” mainframe; it doesn’t know what you mean by, for example, “Make this document shorter and funnier.” But in its timid, conservative way, it takes voice control unmistakably closer to that holy grail of computing.
NatSpeak’s principal mission, though, is to type out, into any Windows program, whatever you say. And in version 10, its maker, Nuance, claims to have eked out yet another 20 percent accuracy improvement.
I installed the program, donned the included headset and clicked “Skip initial training.” (In the early days of speech recognition, you had to read a 45-minute sample script to train the program to recognize your voice. Today, the software is so good, you can skip the training altogether.)
As a quick test, I read aloud the first 1,000 words of “Freakonomics” into Microsoft Word. Impressively enough, NatSpeak effortlessly transcribed words like “Ku Klux Klan” and “Punic war.” It did, however, mistype seven easier words (“addition” instead of “edition,” for example, and “per trail” instead of “portrayal”). Accuracy tally with no training: 99.3 percent. Not too shabby.
Then I tried a second test: I read one of the five-minute training scripts (a Kennedy speech), which is recommended for even better initial accuracy. I again read the first 1,000 words of “Freakonomics,” and the program mistyped five words. Accuracy this time: 99.5 percent.
In both cases, the number of spelling mistakes was zero. People who use NaturallySpeaking never make typos, only wordos.
As you correct the mistakes with your voice — a speedy, streamlined procedure — the program learns. Whether you skip initial training or not, accuracy inches toward perfection over time.
One way that Nuance has improved accuracy is by acknowledging, for the first time, that not everyone speaks alike. Version 10 recognizes eight accents: general (none), Australian, British, Indian, Great Lakes (Buffalo to Chicago), Southeast Asian, Southern United States and Spanish. If you don’t specify, the program will identify you automatically.
Isn’t that somehow politically incorrect? Should a software program treat you differently depending on how you sound?
Ah, the heck with it. It’s dictation software. A little stereotyping can go a long way.
Speed is another virtue in version 10. The program still waits for a pause in your talking before it types, so that it can use context to choose, for example, the correct homonym (there/they’re/their). But that waiting period has been halved; text appears almost instantaneously at each pause.
Second — and here’s where things start to get Star Trekky — the program understands more “natural language” commands.
For example, italicizing something you’ve already typed, say, the phrase “gas prices,” used to require three separate commands. First, “Select gas prices.” Then, “Italicize that.” Finally, to move your insertion point back where you stopped, “Go to end of document.”
In version 10, a single command does the trick: “italicize ‘gas prices.’” The program makes the change and returns to where you stopped, all in a blink. The same trick also works with the verbs “bold,” “underline,” “delete,” “cut” and “copy.” (Yes, “bold” is a verb now.)
You can speak a series of new Search commands, beginning with “Search computer for ...,” “Search the Web for ...,” “Search e-mail for ...” and so on.
For example: “Search maps for Chinese restaurants near Hoboken.” Or “Search Wikipedia for Bay of Pigs.” Or “Search images for Gwyneth Paltrow.” These shortcuts work 100 percent reliably and do truly save you time and typing. Next version: more of them, please.
And now, the NatSpeak Frequently Asked Questions:
“Does NaturallySpeaking work on a Mac?” Yes, but only when the Mac is running Windows and you’re using a U.S.B. headset adapter. It works fantastically in Boot Camp and fast enough in VMware Fusion, an emulator program.
Of course, it might be simpler just to buy MacSpeech Dictate, a Mac program that uses the same Dragon recognition technology. The current version is fast and accurate, but it lags behind NatSpeak in features and power; it doesn’t even let you make corrections by voice, and therefore the accuracy never improves. But a 1.2 version, with voice correction and voice spelling, is in testing now.
“Can I transcribe interviews with it?” No. NatSpeak knows only one person’s voice: yours. It also requires a clean audio signal, like the one from a headset mike half an inch from your mouth.
“Can I dictate with a wireless Bluetooth earpiece?” Yes. In fact, version 10 greatly expands the number of compatible earpiece models (18 so far, listed at nuance.com). Accuracy may take a hit, though.
“Can I dictate into a pocket recorder and transcribe it later?” Yes. The setup is more involved, though: only some recorders are compatible, and you have to record 15 minutes of training.
“Doesn’t Windows Vista come with speech recognition?” Yes, and it’s really good — quite similar to NatSpeak, actually. But Nuance says that, oddly enough, Vista has had virtually no effect on NatSpeak sales.
I’m guessing that obscurity is part of the reason; most people aren’t even aware that Vista offers such a feature. Vista doesn’t come with the required headset, either. Nor does the Vista version offer the same accuracy, features or power of NatSpeak, and it isn’t available in other languages (French, Italian, German, Spanish, Dutch and so on).
NatSpeak is available in a number of versions. The Standard edition ($100) has the same accuracy as the others, but it’s just for bare-bones dictation.
To get the more advanced goodies described in this review — the natural-language commands, Bluetooth mikes and recorders — you need the Preferred edition ($200). It also lets you set up voice macros that type out boilerplate text. For example, you can say, “Buzz off,” and it will type: “Thanks for thinking of me! Unfortunately, I’m afraid I’m unable to accept your kind offer at this time.”
There are also medical and legal editions ($1,600 and $1,200, yikes), as well as a Professional edition ($900) for corporate administrators who want to manage many NatSpeak installations from a central server. The Pro version also recognizes natural-language commands for Microsoft Outlook, like “Send e-mail to Mom” or “Schedule a meeting with Barack Obama and John McCain.”
Apart from Vista, NatSpeak really has no competition. Philips has dropped out of the American market. I.B.M.’s own ViaVoice hasn’t been updated since 2003, and its sole distributor is, get this, Nuance.
Maybe that’s why Nuance makes only small, confident changes from one version of NatSpeak to the next. Without any rivals, why add bells and whistles that risk mucking up the program’s virtues?
As a result, existing NaturallySpeaking owners can usually afford to skip a generation between upgrades. Version 10 is a healthy leap ahead of version 8, but version 9 owners shouldn’t feel compelled to upgrade.
And now, if you’ll excuse me, I have some real work to do: “Search maps for dilithium crystals near New York City. ...”
Copyright 2008 The New York Times Company