Azure Video Indexer - Auto-detect multi language Fails to Process Both Channels in Stereo

Ray Weber 1 Reputation point
2022-06-15T19:06:26.097+00:00

Azure Video Indexer's Auto-detect multi language fails under the following scenario: When the input WAV file is 8 KHz stereo, and there is a different person talking in each channel, one channel is completely ignored and the resulting transcript only has the dialogue from one channel. Interestingly, if it is only one person, talking into both channels, then both channels are picked up. Also, if the file is merged into mono, then the entire transcript is picked up. Similarly, if the file is recorded in 44.1 KHz, then both channels are picked up as well. Lastly, if Auto-detect single language is used, then again, both channels are picked up.

So, to recap, in Azure Video Indexer, if you select Auto-detect multi language with an 8 KHz stereo WAV file that has a different person talking in each channel, one channel is completely ignored. The vast majority of our input files fit this use case.

Any guidance that you can provide is greatly appreciated!

Azure Media Services
Azure Media Services
A group of Azure services that includes encoding, format conversion, on-demand streaming, content protection, and live streaming services.
318 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.