[Speech to Text/Andoird-Java] How to pass PCM streaming data by AudioConfig.fromStreamInput(AudioInputStream)?

Question

[Speech to Text/Andoird-Java] How to pass PCM streaming data by AudioConfig.fromStreamInput(AudioInputStream)?

Anonymous

1.We have PCM streaming data as input to be translated to text by speech SDK(speech:client-sdk:1.12.1).
2.Here's are our reference link: https://learn.microsoft.com/zh-tw/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams?tabs=debian&pivots=programming-language-java
3.Assume we should pack our streaming PCM data with WAV head and pass to AudioConfig.fromStreamInput(AudioInputStream), but we have no idea how to wrap our WAV streaming data by AudioInputSteam/PullAudioInputStreamCallback. Is there any android/java sample codes?

Thanks for your help.

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2020-07-28T11:37:35.8+00:00

@Anonymous I think this sample should help you along with the link mentioned above for using compressed audio input. This is implemented for streaming from a file with the PullAudioInputStreamCallback

This class also defines the PullAudioInputStreamCallback methods for WAV stream for reference.

I also found this interesting issue that might be similar to your problem but it is a question on using with C# for reference.
Anonymous

2020-07-29T10:05:46.807+00:00

Thanks your great help.
But our condition is different from streaming from a wav file with fixed data size, our input is streaming byte array(raw PCM data). We do not know the total size of the following streaming data. (So we can't create corresponding WAV header)
1.Should we still use PullAudioInputStreamCallback for our case?

We see the sample code(function recognitionWithAudioStreamAsync) in https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechRecognitionSamples.java#L225,
2.The WAV headers are only used for checking audio format, and it's not passed to SDK by PullAudioInputStreamCallback, right?

3.Can we just pass our raw PCM datas to SDK by PullAudioInputStreamCallback?

Thanks.
romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2020-07-29T13:19:31.277+00:00

@@Anonymous For your comment above, my response is posted as an answer since we have a limitation with the number of characters that can be used for comments. Please try to follow them and check if it works for you.

1 answer

Your answer

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2020-07-28T11:37:35.8+00:00

@Anonymous I think this sample should help you along with the link mentioned above for using compressed audio input. This is implemented for streaming from a file with the PullAudioInputStreamCallback

This class also defines the PullAudioInputStreamCallback methods for WAV stream for reference.

I also found this interesting issue that might be similar to your problem but it is a question on using with C# for reference.
Anonymous

2020-07-29T10:05:46.807+00:00

Thanks your great help.
But our condition is different from streaming from a wav file with fixed data size, our input is streaming byte array(raw PCM data). We do not know the total size of the following streaming data. (So we can't create corresponding WAV header)
1.Should we still use PullAudioInputStreamCallback for our case?

We see the sample code(function recognitionWithAudioStreamAsync) in https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechRecognitionSamples.java#L225,
2.The WAV headers are only used for checking audio format, and it's not passed to SDK by PullAudioInputStreamCallback, right?

3.Can we just pass our raw PCM datas to SDK by PullAudioInputStreamCallback?

Thanks.
romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2020-07-29T13:19:31.277+00:00

@@Anonymous For your comment above, my response is posted as an answer since we have a limitation with the number of characters that can be used for comments. Please try to follow them and check if it works for you.

Answer 1

@HsuWeiYuan-8357 Yes, you can use PullAudioInputStreamCallback but it needs to be extended to read() and close() the audio stream before passing it to the SDK. The high level way to achieve this is documented here. Essentially the read and close are also available in the WavStream.java class reference.

For the second question you need to ensure the PCM sample channel, bits/sample and samples/second are correctly defined as expected by the SDK i.e AudioStreamFormat.GetWaveFormatPCM()

Then use the callback methods while defining the audioconfig and recognize it using the SDK recognizer.

Share via

[Speech to Text/Andoird-Java] How to pass PCM streaming data by AudioConfig.fromStreamInput(AudioInputStream)?

1 answer

Your answer