Monday, April 7, 2008

Speech-Recognition for Interviews

Every day, we get phone calls and e-mails from people hoping to use speech recognition software to transcribe interviews, lectures, or meetings. It seems so reasonable to assume that today's technology could easily take a digital voice recording of two or more people and translate that into an accurate text representation.

Sadly, that is not the case. The supercomputers of the CIA and NSA notwithstanding, today's most popular voice to print software programs are unable to translate the speech of more than one person with any degree of real usability for several reasons:

  1. Speech recognition software should be trained for the speaker's voice. While Dragon NaturallySpeaking promotes that "no training is necessary," allowing the software to adapt its algorithms to the speakers voice, speaking style, and writing style can tremendously improve the accuracy of the software.
  2. It is impossible for speech recognition software to discern the difference speakers in a recorded conversation. To the computer all sounds are analyzed to find signs of spoken words. The software can not, at one moment, recognize Speaker A and the next moment recognize Speaker B. It will try to decipher all spoken words using whatever singular user profile has been selected in the software for transcribing.
  3. Speech recognition software is really a dictation tool, rather than a transcription device. That is to say, it works best when the speaker "dictates" to the software. Accurate dictation includes punctuation, spellings of potentially unknown words or names, and uses complete sentences and phrases. When was the last time you heard two people talking together in complete sentences, and speaking intended punctuation?

I'm using speech-recognition software to write this blog. Before I dictate, I formulate the sentences mentally, then speak them evenly and with punctuation into my wireless headset. Almost instantly, the phrase I have just completed appears as text on the screen. And with amazing accuracy. The accuracy is due to the fact that I trained my speech recognition software (in this case MacSpeech Dictate) to understand how I dictate.

Speech recognition software works amazingly well. But, for its intended purpose.

There have been some of our customers who have employed creative workarounds for the purpose of converting interviews to text without typing. One of the most common we hear of, is where the interviewer listens to a recording of the interview in one ear, and dictates what they hear into the speech recognition software. This gives them the ability to add subject names and formatting instructions lacking in the original recording. This method also helps clarify the conversation where both parties were talking at the same time (you can imagine how inaccurate any type of automated transcription would be).

As technology pushes the limits of power and speed in personal computers, we should someday expect to see speech recognition systems that come close to replicating the recognition capabilities of the human brain. Realizing the an enormous capacity and sophistication of our brains, I cannot imagine that happening anytime soon.

No comments:

©2014 American Dictation Corporation. May not be used or reproduced without permission.