Azure Video Indexer - Auto-detect multi language Fails to Process Both Channels in Stereo
Azure Video Indexer's Auto-detect multi language fails under the following scenario: When the input WAV file is 8 KHz stereo, and there is a different person talking in each channel, one channel is completely ignored and the resulting transcript only has the dialogue from one channel. Interestingly, if it is only one person, talking into both channels, then both channels are picked up. Also, if the file is merged into mono, then the entire transcript is picked up. Similarly, if the file is recorded in 44.1 KHz, then both channels are picked up as well. Lastly, if Auto-detect single language is used, then again, both channels are picked up.
So, to recap, in Azure Video Indexer, if you select Auto-detect multi language with an 8 KHz stereo WAV file that has a different person talking in each channel, one channel is completely ignored. The vast majority of our input files fit this use case.
Any guidance that you can provide is greatly appreciated!