Change Sampling Rate to 44,100 kHz in Text to Speech via CLI (default is 16,000 kHz)

Ed Czelada 21 Reputation points
2022-10-19T22:12:51.683+00:00

Is there a way to change default audio format in Azure Cognitive Services in Text to Speech CLI from 16,000 kHz to something higher, IE 44,100 kHz?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,736 questions
{count} votes

Accepted answer
  1. romungi-MSFT 46,141 Reputation points Microsoft Employee
    2022-10-20T08:48:28.6+00:00

    @Ed Czelada I believe you can set the format of the output file using the --format option with the spx synthesize command.

    spx synthesize [...] --format FORMAT    
    

    You can get the format by running help on the formats that are supported.

    spx help synthesize format  
    

    This should give you all the supported formats further by running help on all the formats.

    spx help synthesize wav  
    spx help synthesize mp3  
    spx help synthesize ogg  
    spx help synthesize opus  
    spx help synthesize webm  
    spx help synthesize siren  
    spx help synthesize raw  
    

    Set the format that is required in the command and check if this works. For example, the default is riff-16khz-16bit-mono-pcm for wav other formats supported for wav are

    spx help synthesize wav  
      
    USAGE: spx synthesize [...] --format FORMAT                                                                                      │  
    │                                                                                                                                 │  
    │  WHERE: FORMAT is wav                                                                                                           │  
    │     OR: FORMAT is riff-16khz-16bit-mono-pcm                                                                                     │  
    │     OR: FORMAT is riff-24khz-16bit-mono-pcm                                                                                     │  
    │     OR: FORMAT is riff-48khz-16bit-mono-pcm                                                                                     │  
    │     OR: FORMAT is riff-8khz-16bit-mono-pcm                                                                                      │  
    │     OR: FORMAT is riff-8khz-8bit-mono-alaw                                                                                      │  
    │     OR: FORMAT is riff-8khz-8bit-mono-mulaw                                                                                     │  
    │     OR: FORMAT is riff-16khz-16kbps-mono-siren                                                                                  │  
    │                                                                                                                                 │  
    │  NOTE: The default wav format is riff-16khz-16bit-mono-pcm  
    

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.