Speech Recognition - Ready for Prime Time?
by Jarred Walton on April 21, 2006 9:00 AM EST- Posted in
- Smartphones
- Mobile
Processor Utilization - Precise Dictation
Now that we have some idea of the accuracy these solutions offer in terms of accuracy, what sort of CPU requirements are we looking at? As with our accuracy charts, we've got a separate section looking at the processor utilization of Dragon NaturallySpeaking when transcribing a WAV file. Below are screenshots of Windows Task Manager showing CPU usage during dictation. In retrospect, finding a utility to track average CPU utilization over time would have been more useful, but these screenshots should suffice for our purposes.
Dictation Processor Utilization
One thing is immediately clear: Dragon NaturallySpeaking requires far more CPU processing time than Microsoft Office. Even at the lowest accuracy setting, Dragon essentially matches the CPU usage of Microsoft's tool at its maximum accuracy setting. However, CPU usage and accuracy are only two of the aspects of this software package, and that much more difficult to describe "user experience" continues to be far preferable to me with Dragon NaturallySpeaking.
The second major point of interest is that having a second processor core does absolutely nothing for these speech recognition packages. (MS might even be able to run without difficulty on a Pentium 3, judging by the CPU usage.) Sure, if you're running multiple applications that are all trying to use the CPU, the second core can be useful. On the other hand, if the only thing you're doing is dictating speech, the current algorithms are clearly single threaded in nature. Given that accurate speech recognition depends in large part on recognizing the context of sounds -- this is especially true for homonyms like their, they're, and there -- there may be some difficulty associated with breaking the task into meaningful, discrete parts. However, difficult does not mean impossible, and with AMD, Intel, and all the other major CPU players moving towards multiple cores, further improvements in accuracy are likely going to require multithreaded algorithms.
Transcription Processor Utilization
As with dictating, transcribing an audio file also fails to benefit from multiple CPU cores. The good news is that processing times are much faster, because a single CPU core can chew through the waveforms as fast as possible. While the maximum accuracy mode didn't seem to do all that well with dictating, it did seem to handle a few phrases better when transcribing. It also takes longer, but if you're in a situation where you can start the transcription process and walk away for awhile, that shouldn't matter too much.
Now that we have some idea of the accuracy these solutions offer in terms of accuracy, what sort of CPU requirements are we looking at? As with our accuracy charts, we've got a separate section looking at the processor utilization of Dragon NaturallySpeaking when transcribing a WAV file. Below are screenshots of Windows Task Manager showing CPU usage during dictation. In retrospect, finding a utility to track average CPU utilization over time would have been more useful, but these screenshots should suffice for our purposes.
Dictation Processor Utilization
DNS8 Maximum Accuracy |
DNS8 Medium Accuracy |
DNS8 Minimum Accuracy |
MSWord Maximum Accuracy |
MSWord Medium Accuracy |
MSWord Minimum Accuracy |
One thing is immediately clear: Dragon NaturallySpeaking requires far more CPU processing time than Microsoft Office. Even at the lowest accuracy setting, Dragon essentially matches the CPU usage of Microsoft's tool at its maximum accuracy setting. However, CPU usage and accuracy are only two of the aspects of this software package, and that much more difficult to describe "user experience" continues to be far preferable to me with Dragon NaturallySpeaking.
The second major point of interest is that having a second processor core does absolutely nothing for these speech recognition packages. (MS might even be able to run without difficulty on a Pentium 3, judging by the CPU usage.) Sure, if you're running multiple applications that are all trying to use the CPU, the second core can be useful. On the other hand, if the only thing you're doing is dictating speech, the current algorithms are clearly single threaded in nature. Given that accurate speech recognition depends in large part on recognizing the context of sounds -- this is especially true for homonyms like their, they're, and there -- there may be some difficulty associated with breaking the task into meaningful, discrete parts. However, difficult does not mean impossible, and with AMD, Intel, and all the other major CPU players moving towards multiple cores, further improvements in accuracy are likely going to require multithreaded algorithms.
Transcription Processor Utilization
DNS8 Maximum Accuracy |
DNS8 Medium Accuracy |
DNS8 Minimum Accuracy |
As with dictating, transcribing an audio file also fails to benefit from multiple CPU cores. The good news is that processing times are much faster, because a single CPU core can chew through the waveforms as fast as possible. While the maximum accuracy mode didn't seem to do all that well with dictating, it did seem to handle a few phrases better when transcribing. It also takes longer, but if you're in a situation where you can start the transcription process and walk away for awhile, that shouldn't matter too much.
38 Comments
View All Comments
Googer - Saturday, April 22, 2006 - link
BMW 7 series Speech recognition is about 50-75% accurate (my guess) and some users have more luck with it than others.Googer - Friday, April 21, 2006 - link
I think you should re-benchmark these on a system that is not overclocked. Overclocking may have contibuted to errouneous test results. It is possible that some of the benchmarks could have been better on a normal system. Also I am surprised this was not tested on a Intel Syststem. Prehaps one of the programs may benefit from the Netburst Architeture with or with out dual core.Also I would love to download the Dication and Normal Voice wav files, so I can understand the differance between them. Thanks for the article, it came in perfect time; Someone who is handicaped was asking me about this last night.
JarredWalton - Friday, April 21, 2006 - link
I'll see about putting up some MP3s of the wave files -- of course, that will open the door for all of you to make fun of how I speak. LOLIn case this wasn't entirely clear in article, this was all done on my system that I use every day for work. It's overclocked, and it's been that way for six months. I run stress tests (Folding at Home -- on both cores) all the time. I would be very surprised if the overclock has done anything to affect accuracy, especially considering that I did run some tests on a couple other systems that were not overclocked, and basically removed them from this article because they would have simply taken more time to put in the article, and they didn't give me any new information.
It's pretty obvious that neither of these algorithms benefit from multiple processing cores -- HyperThreading, dual core, SMP, whatever. I also wasn't sure how much interest there would be from people in this topic, but if a lot of people want to know how this runs on Intel systems I could go back and look at one. One thing worth noting is that SysMark 2004 does include Dragon NaturallySpeaking version 6.5 as one of the tests. Of course, the results are buried in the composite scores.
JarredWalton - Friday, April 21, 2006 - link
MP3 links available:http://www.anandtech.com/multimedia/showdoc.aspx?i...">http://www.anandtech.com/multimedia/showdoc.aspx?i...
Note that DNS only uses WAV files (AFAICT), but uploading 45MB WAV files seems pointless. Convert them to WAVs if you want to try them with Dragon.
Googer - Saturday, April 22, 2006 - link
Excellant job on the dictation/wav files, you are a very good reader and have a nice clear and concice voice. ;ThumbsUP)stelleg151 - Friday, April 21, 2006 - link
Cool article. I hope that voice recognition continues to improve, for I think it could be incredibly useful for areas like HTPC, or as you said messenging while doing other things (gaming).Zerhyn - Friday, April 21, 2006 - link
Have you ever tried out speech recognition and been underwhelmed? To you yearn to play the role of Scotty and call out..?
PrinceGaz - Friday, April 21, 2006 - link
Yes, that was the first thing I noticed before I even started reading the article. Maybe they used speech-recognition software to enter that.I think they should have an editor (or at least let another contributor read what others have written) who has to approve an article before it goes live as the current number of tyops is unforgiveable ;)
JarredWalton - Friday, April 21, 2006 - link
I'm doing my best to catch typos before anything goes live, but after being up all night trying to finish off this article, I went to post and realized I didn't have a title or intro. So, I put one in using Dragon, but my diction goes to put when I'm tired, as does my eyesight and proofing ability. One typo in a 44 word intro (I didn't proof/edit it at all) isn't too bad for the software. Bad for me? Maybe, but mistakes do happpen. :)johnsonx - Friday, April 21, 2006 - link
One nice thing about Dragon, despite the high CPU utilization shown in the article, is that it will run quite happily with very lowly systems. I have a customer who uses it all day long on PentiumIII-850's with only 512Mb RAM (the max for those particular systems). The heaviest user there recently upgraded to a low-end Sempron64 with a gig of RAM, and he says the overall system is far more responsive (of course), but Dragon's operation isn't radically better; it worked great on the PIII, and works great now.