Speech processing of audio files

There are a couple of FAQs I often hear about using speech with audio files:

  1. How do I recognize speech that's been recorded to an audio file?
  2. How do I write synthesized speech to an audio file?

This can be done with SAPI, but judging by the frequency of these FAQs in the newsgroups, it's not exactly straight forward.

By contrast, we've tried to make it really easy with the WinFX speech API.

Here's some sample code that works with the Avalon & Indigo Beta 1 RC1 bits.  There's a form with a couple of text boxes on it: txtPath where you type in the full path of a file name (e.g. "c:\test.wav"), and txtContent that either contains the recognized text or the text to be synthesized.  There's a button called btnSynth.  When you click it, we tell the synthesizer that its output is the filepath from txtPath, then tell the synthesizer to render the contents of txtContent.  There's a button called btnReco.  When you click it, we tell the recognizer to get input from the filepath in txtPath, then we load a dictation grammar (a built-in grammar provided by the recognizer that will listen for unconstrained speech), then we tell the recognizer to do a recognition.

Imports

System.Speech.Recognition

Imports

System.Speech.Synthesis

Public

Class Form1

   Dim WithEvents reco As New SpeechRecognizer

   Dim WithEvents synth As New SpeechSynthesizer

   Private Sub btnReco_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnReco.Click

      Me.txtContent.Text = ""

      reco.LoadGrammar(New DictationGrammar())

      reco.SetInput(Me.txtPath.Text)

      reco.RecognizeAsync()

   End Sub

   Private Sub reco_SpeechRecognized(ByVal sender As Object, ByVal e As System.Speech.Recognition.RecognitionEventArgs) Handles reco.SpeechRecognized

      Me.txtContent.Text = "Recognized: " & e.Result.Text

   End Sub

   Private Sub btnSynth_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSynth.Click

      synth.SetOutput(Me.txtPath.Text)

      synth.SpeakAsync(Me.txtContent.Text)

   End Sub

   Private Sub synth_SpeakCompleted(ByVal sender As Object, ByVal e As System.Speech.Synthesis.SpeakCompletedEventArgs) Handles synth.SpeakCompleted

      Me.txtContent.Text = "done!"

      synth.Dispose()

   End Sub

End Class

Some other points to note:

  1. You'll need to record an audio file for input to the recognizer. The Sound Recorder app in Windows is fine for this. Recognition accuracy will be best if you train the recognizer (you can do this from the Control Panel) so that it knows what you personally sound like.  Sound Recorder does depend on the capabilities of your audio hardware - things will work best if you record 22kHz 16-bit mono.
  2. My previous code snippets used a recognizer called DesktopRecognizer, which represents the recognition process that's shared by multiple apps on the system, and the shell itself via the language bar. In this example I use a recognizer called SpeechRecognizer, which represents a private instance of the recognizer where you have a lot more control over exactly what it's working on, including where it should get its input from (the shared DesktopRecognizer always listens to the default microphone specified in the control panel).
  3. Also, in this example I called Dispose() on the synthesizer to reset its state so that it releases the file it was writing to. Beta 2 of the API will have better controls than this.