AI text-to-speech is misreading a word in Catalan (tomàquets) but it reads perfectly its singular form (tomàquet), can you fix it?

Ana Laura Rocha Ciappesoni 0 Reputation points
2024-11-15T13:58:16.05+00:00

Hello,

I am using the text-to-speech service with Catalan. The word tomáquets in plural form is not read properly whereas tomàquet in singular is. The accent is misplaced. What can I do to get that fixed?

Thank you,

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,819 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 13,371 Reputation points
    2024-11-15T15:11:54.96+00:00

    Hello Ana Laura Rocha Ciappesoni,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that your AI text-to-speech is misreading a word in Catalan (tomàquets) but it reads perfectly its singular form (tomàquet) and you need a solution to this.

    In my first reading I thought of linguistic customizations for pronunciation accuracy, and in my second view I thought of latency based on the previous projects.

    In CAT 1:

    1. Check if the TTS service in use supports SSML, IPA input, or custom dictionary entries. For Microsoft Azure TTS, for instance, SSML can be utilized to fine-tune specific words, you can achieve this:
              <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="ca-ES">
                <voice name="ca-ES-JoanNeural">
                  Tomàquet <phoneme alphabet="ipa" ph="toˈmakəts"/> tomàquets.
                </voice>
              </speak>
         
      
    2. If possible, use the phoneme tag to force correct pronunciation for problematic words, you might need to employ a service of linguistic expert if there is a need.
    3. For Azure or other advanced TTS services, you can often report a pronunciation issue through their support or feedback portal.
    4. Before switching to other services like Narakeet or LOVO AI, ensure your primary use case isn't hindered by changing providers. If alternate services handle "tomàquets" better, use them selectively for specific content pipelines.

    in CAT 2:

    1. If latency is due to network distance, using a TTS endpoint in a region geographically closer to your users can help. For example, the UK South region endpoint is https://uksouth.tts.speech.microsoft.com/cognitiveservices/v1. Make sure your SDK is configured to use this endpoint if you're based in or near the UK.
    2. Use concise SSML inputs and minimize long pauses or complex tags that could slow down processing, this will provide faster TTS processing.
    3. Latency can also stem from unstable internet connections, try to check that your application runs on a stable, high-speed network.
    4. If your service serves global users, integrating Azure Front Door or a Content Delivery Network (CDN) could reduce latency by optimizing request routing and providing caching where possible.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.