how to configure ssml

Question

how to configure ssml

Nathalie Froissart 25

Hello, is it possible to use the azure cognitive service, speech to get take in a text from the comand promt and configure the voice to emphesise the first word in the senctence or to have a Higher pitch after a "?" token? I want to make if and else statement to the input text that is supposed to be read out loud.

PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2023-02-27T05:44:45.89+00:00

@Nathalie Froissart We received your feedback that the answer provided on the thread was not helpful.

Kindly let us know what we could have done better to improve the answer and make your engagement experience good. We are here to help you and strive to make your experience better and greatly value your feedback.

Our engineer has provided a detailed follow-up for which you are looking for.

If you wish, you may re-surveying/rating for the engagement you received on the thread. Your feedback is very important to us.

Looking forward to you reply. Much appreciate your feedback!

Regards,

PRADEEPCHEEKATLA-MSFT

Accepted answer

1 additional answer

Your answer

PRADEEPCHEEKATLA 90,641 Reputation points Moderator

2023-02-27T05:44:45.89+00:00

@Nathalie Froissart We received your feedback that the answer provided on the thread was not helpful.

Kindly let us know what we could have done better to improve the answer and make your engagement experience good. We are here to help you and strive to make your experience better and greatly value your feedback.

Our engineer has provided a detailed follow-up for which you are looking for.

If you wish, you may re-surveying/rating for the engagement you received on the thread. Your feedback is very important to us.

Looking forward to you reply. Much appreciate your feedback!

Regards,

PRADEEPCHEEKATLA-MSFT

Answer 1

VasaviLankipalle-MSFT 18,676 Moderator

Hi @Nathalie Froissart , Thanks for using Microsoft Q&A Platform.

Yes, it is possible to use Azure Cognitive Services Speech to get text input from the command prompt and configure the voice to emphasize the first word in the sentence or to have a higher pitch after a "?" token. For information on how to input the text, see this speech synthesis sample code on github.

Scenario1: to emphasize word.

For example, you can use the following SSML code to emphasize the first word in a sentence:

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
    <voice name="en-US-GuyNeural">
     <emphasis level="strong">Hello</emphasis> Good Morning.
    </voice>
</speak>

In this example, the words "Hello" will be emphasized. The emphasis tag with level='strong' will emphasize the first word in the sentence.

You can find more information about using the emphasis tag in SSML in the following documentation: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-voice#adjust-emphasis

Scenario 2: Higher pitch after "?"

You can use the Speech Synthesis Markup Language (SSML) to control the prosody of the speech output. SSML tags allow you to specify the pitch, rate, volume, and pronunciation of the speech.

And you can use the following SSML code to have a higher pitch after a "?" token:

<speak>
What is your name? <prosody pitch="high">Please tell me.</prosody>
</speak>

You can use an if-else statement to control the input text that is supposed to be read out loud. For example, if the input text contains a "?" token, you can use the SSML code with the higher pitch, otherwise, you can use the SSML code with the emphasized first word.

I hope this helps. Let me know if you are looking for any additional information.

Regards,
Vasavi

-Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

Nathalie Froissart 25 Reputation points

2023-02-17T08:55:06.24+00:00
Hello and thanks for the answer.
I have seen these xml files on your webpage. You give examle on already given text but how can you make the voice emphasize the first word in a sentence without knowing what the sentence will be. I would like to write a function in python that take the sencence writen i comand promt and make if and else statement on it. so that I can return the same sentence buth with voice configurations on it. Do you have any example of how to write these kind of functions?

Example: take the text "Hello how are you" as input from command promt. and returns this, that is then being red outloud by the voice

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="en-US-GuyNeural"> <emphasis level="strong">Hello</emphasis>how are you. </voice> </speak> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="en-US-GuyNeural"> <emphasis level="strong">Hello</emphasis>how are you. </voice> </speak>
Nathalie Froissart 25 Reputation points

2023-02-17T19:29:25.2333333+00:00
Hello and thanks for the answer.
I have seen these xml files on your webpage. You give examle on already given text but how can you make the voice emphasize the first word in a sentence without knowing what the sentence will be. I would like to write a function in python that take the sencence writen i comand promt and make if and else statement on it. so that I can return the same sentence buth with voice configurations on it. Do you have any example of how to write these kind of functions?

Example: take the text "Hello how are you" as input from command promt. and returns this, that is then being red outloud by the voice

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US"> <voice name="en-US-GuyNeural"> <emphasis level="strong">Hello</emphasis>how are you. </voice> </speak>

VasaviLankipalle-MSFT 18,676 Moderator

Hi @Nathalie Froissart , we noticed your feedback that the above answer was not helpful.
Thank you for taking time to share your feedback. We are here to help you and strive to make your experience better and greatly value your feedback.

Yes, you can write a function in Python that takes a sentence as input and returns the same sentence with voice configurations. Here's an example of how you can write such a function:


import azure.cognitiveservices.speech as speechsdk

def synthesize_text_to_speech(subscription_key, region):
    # Create speech config
    speech_config = speechsdk.SpeechConfig(subscription=subscription_key, region=region)

    # Get input text from command prompt
    input_text = input("Enter the text to be spoken: ")

    # Split text into sentences
    sentences = input_text.split(". ")

    # Add emphasis to first word of first sentence
    if sentences:
        words = sentences[0].split()
        if words:
            words[0] = f"<emphasis level='strong'>{words[0]}</emphasis>"
            sentences[0] = " ".join(words)

    # Combine sentences into output text
    output_text = ". ".join(sentences)

    # Create SSML string with voice configuration
    ssml_string = "<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'>" \
                  + "<voice name='en-US-Jessa24kRUS'>" + output_text + "</voice>" \
                  + "</speak>"

    # Create speech synthesizer and synthesize the text to audio
    synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
    result = synthesizer.speak_ssml_async(ssml_string).get()

    # Print results
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
        print("Speech synthesized to speaker for text [{}]".format(input_text))
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("Speech synthesis canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
synthesize_text_to_speech(subscription_key="replace your key", region="replace your region")

When you run this code, it will prompt you to enter the text you want to synthesize using the command prompt. The text is split into sentences and the first word of the first sentence is emphasized. The sentences are then combined into an output text and a SSML string is created with voice configuration.

Finally, the script creates a speech synthesizer and synthesizes the text to audio. The result of the synthesis is printed to the console. If the synthesis is successful, the script prints "Speech synthesized to speaker for text [input_text]". If the synthesis is canceled, the script prints "Speech synthesis canceled" and the reason for the cancellation. If the cancellation was due to an error, the error details are also printed.

Please refer to speech synthesis for python GitHub code and SSML documentation.

I hope this helps.

Looking forward to your reply. Much appreciate your feedback!

Regards,
Vasavi

VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-02-21T18:45:45.79+00:00

Hi @Nathalie Froissart , did you get a chance to check my response?
VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-02-22T23:44:26.19+00:00

Hi @Nathalie Froissart , is there anything else you need help with? Kindly check the response and let us know if you need any further information. Thanks!

Answer 2

Nathalie Froissart 25

Thank you for your respons, it helped alot!
I have a follow up question. How do I make the ssml voice not speak out sertain words. Can you make a list with words or sertain settings that it should jump over(or no not say out loud) if generated from the comand promt?

Share via

how to configure ssml

1 additional answer

Your answer