Can I do maui speech-to-text c#

Daniel Crotty 6 Reputation points
2022-09-23T20:04:11.46+00:00

I thought I would play with Maui, so why not a speech to text app. I want to capture the text and manipulate it.

Doing TextToSpeech was pretty easy, and it only took a few minutes to get something to work. Doing the reverse, not so much. After a couple of days searching, it's time to ask. I sure don't want to pay Google to be allowed to do it.

I thought that using Maui, it may make a useful little app. Maybe, but I sure can't find a way of doing it.

If not Maui, what? .Net Forms ? I was thinking of a phone app.

Suggestion/Comment ?

Developer technologies | .NET | .NET MAUI
{count} vote

1 answer

Sort by: Most helpful
  1. dg2k 1,416 Reputation points
    2022-09-24T08:25:34.727+00:00

    Hi @Daniel Crotty

    What you're asking is ASR (automatic speech recognition) which is a very challenging programming task (hence, why Google is charging for it).

    ASR is generally within the realm of Cloud Services and, yes, Microsoft have an associated ASR technology for Maui under Azure Cognitive Services.

    To answer your question in a general way, it should be quite straightforward for your Maui App to use cloud APIs and implement what you want. The complexity is nicely decoupled as Maui App on your side, and cloud services, so I suggest you explore the latter to determine what your Maui App needs to do in order to consume Azure Cognitive Services for ASR. I haven't implemented ASR but implemented a similar Cognitive Service for Computer Vision (to detect what an image is).

    You need to subscribe to Microsoft Azure and you may qualify for a free subscription, more than enough to assess your requirements. As consumption-based service, even paid-for resources are under your control to keep it to a nominal charge per month

    Just to add a remark on ASR versus Text-To-Speech (TTS). TTS relative to ASR is easy due to the fact that Text representation is unambiguous (mono dimension if you like) hence no much sweat to convert from one predetermined text representation to whatever speech type you decide to be (and easy for cloud based AI nowadays). In contrast, imagine ASR now. For a start the number of languages; then accents, pitch and intonations, etc, and easy to guess the near unlimited number of variations a given spoken sentence can have. Yes, AI can tackle this but much much more complex than TTS, and yes such technologically demanding tasks often call for a premium service.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.